Corpus linguistics The Brown Corpus of Standard American English (or just Brown Corpus) was compiled by Henry Kucera and W. Nelson Francis at Brown University, Providence, RI as a general corpus (text collection) in the field of corpus linguistics. ...more on Wikipedia about "Brown Corpus"
Corpus linguistics is the study of language as expressed in samples ( corpora) or "real world" text. The approach runs counter to Noam Chomsky's view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting. Corpus linguistics does away with Chomsky's competence/performance split; adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference. ...more on Wikipedia about "Corpus linguistics"
In corpus linguistics a keyword is a word which occurs in a text more often than we would expect to occur by chance alone. Keywords are calculated by carrying out a statistical test (e.g. loglinear) which compares the word frequencies in a text against their expected frequencies derived in a much larger corpus, which acts as a reference for general language use. Mike Scott's corpus analysis tool WordSmith can automatically calculate keywords. ...more on Wikipedia about "Keyword (linguistics)"
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). A corpus may contain texts in a single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora. ...more on Wikipedia about "Text corpus"
A treebank is a text corpus in which each sentence has been annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name treebank. Treebanks can be used in corpus linguistics for studying syntactic phenomena or in computational linguistics for training or testing parsers. ...more on Wikipedia about "Treebank"
This article is licensed under the GNU Free Documentation License.
It uses material from the Wikipedia . Direct links to the original articles are in the text.
If you use exact copy or modified of this article you should preserve above paragraph and put also : It uses material from
the Shortopedia article about "Corpus linguistics".
| MAIN PAGE | MAIN INDEX | CONTACT US |