Computational linguistics

The Association for Computational Linguistics (ACL) is the international scientific and professional society for people working on problems involving natural language and computation. An annual meeting is held each summer in locations where significant computational linguistics research is carried out. ...more on Wikipedia about "Association for Computational Linguistics"

Automatic summarization is the creation of a shortened version of a text by a computer program. The product of this procedure still contains the most important points of the original text. ...more on Wikipedia about "Automatic summarization"

Benford's law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit 1 occurs much more often than the others (namely about 30% of the time). Furthermore, the larger the digit, the less likely it is to occur as the leading digit of a number. This applies to figures related to the natural world or of social significance; be it numbers taken from electricity bills, newspaper articles, street addresses, stock prices, population numbers, death rates, areas or lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). ...more on Wikipedia about "Benford's law"

Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of extending a library search. ...more on Wikipedia about "Bradford's law"

Complex Text Layout languages (frequently referred to as CTL languages) are languages whose writing systems require complex transformations between text input and text display for proper rendering on the screen or the printed page. In other words, for these languages there may be a difference between the way text is stored and the way it is displayed. The term is used in the field of software internationalization. ...more on Wikipedia about "Complex Text Layout languages"

Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural language from a computational perspective. This modeling is not limited to any particular field of linguistics. Computational linguists were formerly usually computer scientists who had specialized in the application of computers to the processing of a natural language. Recent research has shown that language is much more complex than previously thought, so computational linguistics work teams are now sometimes interdisciplinary, including linguists (specifically trained in linguistics). Computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, cognitive psychologists and logicians, amongst others. ...more on Wikipedia about "Computational linguistics"

Computational Linguistics, published by The MIT Press for the Association for Computational Linguistics (ACL), is the journal in the field of computational Linguistics. The quarterly journal was established in 1974, and includes articles, squibs and book reviews. Robert Dale is the Editor-in-Chief. ...more on Wikipedia about "Computational Linguistics (journal)"

EuroWordNet is a system of semantic networks for European languages. Each language develops its own wordnet; these are interconnected with interlingual links (ILI). ...more on Wikipedia about "EuroWordNet"

Factored language model (FLM) is an extension of conventional Language model. In an FLM, each word is viewed as a vector of k factors: w_i = \{f_i^1, ..., f_i^k\}. An FLM provides the probabilistic model P(f|f_i, ..., f_N) where the prediction of factor f is based on N parents \{f_1, ..., f_N\}. For an example, if w represents word token and t represents Part of speech tag for English, the model P(w_i|w_{i-2}, w_{i-1}, t_{i-1}) gives a model for predicting current work token based on traditional Ngram model as well as Part of speech tag of the previous word. ...more on Wikipedia about "Factored language model"

In computational linguistics, a frequency list is a sorted list of words (word types) together with their frequency, where frequency here usually means the number of occurrences in a given corpus. A short example could be: ...more on Wikipedia about "Frequency list"

The Georgetown-IBM experiment was an influential demonstration of machine translation, which took place on January 7 1954. Developed jointly by the Georgetown University and IBM, the experiment involved fully automatic translation of more than sixty Russian sentences into English. ...more on Wikipedia about "Georgetown-IBM experiment"

In linguistics, Heaps' law is an empirical law which describes the portion of a vocabulary which is represented by an instance document (or set of instance documents) consisting of words chosen from the vocabulary. This can be formulated as ...more on Wikipedia about "Heaps' law"

Statistical language models are probability distributions defined on sequences of words, P(w1..n). Language modeling has been used in many NLP applications such as part-of-speech tagging, parsing, speech recognition, machine translation and information retrieval. Estimating sequences can become expensive in corpora where phrases or sentences can be arbitrarily long ( data sparseness problem), and so these models are most often approximated using smoothed N-gram models based on unigrams, bigrams and/or trigrams. ...more on Wikipedia about "Language model"

In computing, lemmatisation is the process of determining the lemma for a given word. Since the process involves determining the part of speech of a word in a sentence, it requires knowledge of the grammar of a language, and it can therefore be a great deal of work to implement a lemmatiser for a new language. ...more on Wikipedia about "Lemmatisation"

Logic forms are simple, first-order logic knowledge representations of natural language sentences formed by the conjunction of concept predicates related through shared arguments. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. Logic forms can be decorated with word senses to disambiguate the semantics of the word. There are two types of predicates: events are marked with e, and entities are marked with x. The shared arguments connect the subjects and objects of verbs and prepositions together. Example input/output might look like this: ...more on Wikipedia about "Logic form"

In linguistics and computer science, machine translation (MT) is the use of computer software to perform ...more on Wikipedia about "Machine translation"

An N-gram is a sub-sequence of n items from a given sequence. This idea can be traced to an experiment by Claude Shannon's work in information theory. His idea was that given a sequence of letters, (for example, the sequence "for ex") what is the likelihood of the next letter? From training data, you would derive a probability distribution for the next letter given a history of size n: a=0.4, b=0.00001, c=0....; where the probabilities of all possible "next-letters" sums to 1.0. ...more on Wikipedia about "N-gram"

Named entity recognition (NER) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. ...more on Wikipedia about "Named entity recognition"

Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. ...more on Wikipedia about "Natural language generation"

Natural language processing (NLP) is a subfield of artificial intelligence and linguistics. It studies the problems inherent in the processing and manipulation of natural language, and, natural language understanding devoted to making computers "understand" statements written in human languages. ...more on Wikipedia about "Natural language processing"

The Q-systems are a method of directed graph transformations according to given grammar rules, developed at the University of Montreal by Alain Colmerauer in late 1960's for use in natural language processing. University of Montreal's machine translation system, the TAUM, uses the Q-systems as its language formalism. ...more on Wikipedia about "Q-systems"

Question answering (QA) is a type of information retrieval. ...more on Wikipedia about "Question answering"

Sentence extraction is a technique used for automatic summarization. ...more on Wikipedia about "Sentence extraction"

Speech recognition technologies allow computers equipped with a source of sound input, such as a microphone, to interpret human speech, for example, for transcription or as an alternative method of interacting with a computer. ...more on Wikipedia about "Speech recognition"

Speech synthesis is the artificial production of human speech. A system used for this purpose is termed a speech synthesizer, and can be implemented in software or hardware. Speech synthesis systems are often called text-to-speech (TTS) systems in reference to their ability to convert text into speech. However, there exist systems that instead render symbolic linguistic representations like phonetic transcriptions into speech. ...more on Wikipedia about "Speech synthesis"

You've Got Questions. We've Got shortopedia.

Next page 

This article is licensed under the GNU Free Documentation License.
It uses material from the Wikipedia . Direct links to the original articles are in the text.
If you use exact copy or modified of this article you should preserve above paragraph and put also : It uses material from the Shortopedia article about "Computational linguistics".
MAIN PAGE MAIN INDEX CONTACT US