1. Corpus LinguisticsModern trends in English Lexicography are connected with the appearance and rapid development of such branches of linguistics as Corpus-based linguistics (Corpus linguistics) and Computational linguistics. The term “corpus (pl. corpora)” means a large collection of texts stored in a computer, which can be analyzed in many different ways. Typically, a corpus will contain written material taken from books, newspapers, magazines, journals, advertising leafl ets and so on, the content of which will have been transferred into a database. Spoken material recorded from interviews, phone calls, radio programs, public meetings, the academic lectures and so on is, in many cases, also transcribed, or transferred from digital sources, and included in the database. Why the Corpus is used in dictionary making?• Machine-readable corpora allow dictionary makers to extract all authentic, typical examples of the usage of a lexical item from a large body of text in a few seconds.• Corpora allow dictionary makers to select entries based on frequency information. • Corpora can readily provide frequency information and collocation information for readers.• Textual (e.g. register, genre and domain) and sociolinguistic (e.g. user gender and age) information encoded in corpora allows lexicographers to give a more accurate description of the usage of a lexical item.
Corpus linguistics extends our knowledge of language by combining three different approaches: the (procedural) identification of language data by the categorial analysis, the correlation of language data by statistical methods and the (intellectual) interpretation of the results. Corpus-based linguistics deals with compiling various electronic corpora for conducting investigations in different linguistic fi elds such as phonetics, phonology, grammar, stylistics, graphology, discourse, lexicon and many others. Corpus linguistics, on the other hand, aims to reveal the conventions of a certain language community on the basis of a relevant corpus. In a corpus, words are embedded in their context. Corpus linguistics is especially suited to describe the gradual changes in meaning: it is the context which determines the concrete meaning in most areas of the vocabulary. Corpus linguistics aims to analyze the meaning of words within texts, or rather, within their individual context. First and foremost, words are text elements, not lexicon or dictionary entries. In Computational linguistics the techniques of computer science are applied to the analysis and synthesis of a language and speech. The use of a language corpora and the application of modern computational techniques in various lexicographical researches and in dictionary making in particular, have stipulated the appearance of Corpus-based lexicography (Corpus lexicography) and Computational lexicography.Corpus-based lexicography is the major branch of corpus linguistics where can be not only introduced new methods, but also extended the entire scope of research. In the last forty years, great progress has been made in the application of new technologies in language analysis. In particular, the ability to collect and store large collections of texts in electronic form, and then interrogate the data with sophisticated software tools has revolutionized the way we can study language behavior and evolution. Corpus-based lexicography is a widespread use of corpus information in assisting with the compilation and revision of dictionaries. The British compiler John Sinclair leads the way as initiator of the first strictly corpus-based dictionary of general language (COBUILD 1987).
The COBUILD English dictionary used the Bank of English contained the corpus of 20 million words. Britain was also the site of the first corpus-based collocation dictionaries (such as Kjellmer 1994). Bilingual lexicography may also have a benefi t from a corpus-oriented approach: a fact that is evident when comparing the traditional Le Robert & Collins English-French Dictionary edited by B.T.S. Atkins with Valerie Grundy and MarieHel´ ene Corr ` eard’s ´Oxford-Hachette Dictionary which covers the same language pair. 2. Different types of CorporaCorpora can have a technical bias if the sources are mostly of a scientifi c or technical nature, or they might have a literary slant, if the sources consist mostly of works of literature. While there is no such thing as a perfectly balanced corpus, it is still possible to achieve a relatively representative mix of sources.The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007. One of the ways BNC was to be differentiated from existing corpora at that time was to open up the data not just for the use of academic research, but to commercial and educational uses as well. The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fi ction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins. Today, most publishers of the best-selling English learner dictionaries have access to corpora. Some corpora are publicly available collections of data that are the result of the collaboration between various partiesThe Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. The corpus was created by Mark Davies of Brigham Young University, and it is used by tens of thousands of users every month (linguists, teachers, translators, and other researchers). COCA is also related to other large corpora that we have created. The corpus contains more than 450 million words of text and is equally divided among spoken, fi ction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2012 and the corpus is also updated regularly (the most recent texts are from Summer 2012). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language (see the 2011 article in Literary and Linguistic Computing The Brown Corpus of Standard American English: Created in the early sixties, this was the fi rst modern, computer readable, general language corpus. Since this pioneering work, a lot of effort has gone into building bigger language data banks. The Longman Corpus Network contains over 300 million words; The Cambridge International Corpus, around 200 million words; The Oxford English Corpus, today including over a billion words. Summing up we can say that the corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions, in at least two main ways: By genre: comparisons between spoken, fi ction, popular magazines, newspapers, and academic, or even between sub-genres (or domains), such as movie scripts, sports magazines, newspaper editorial, or scientifi c journals Over time: compare different years from 1990 to the present time You can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the collocation of two related words (little/small, democrats/republicans, men/ women), to determine the difference in meaning or use between these words. You can fi nd the frequency and distribution of synonyms for nearly 60,000 words and also compare their frequency in different genres, and also use these word lists as part of other queries. Finally, you can easily create your own lists of semantically-related words, and then use them directly as part of the query.
3. Computational Lexicography. Electronic dictionaries Computational Lexicography deals with the design, compilation use and evaluation of electronic dictionaries.CL involves not only the creation of machine-readable dictionaries, both directly created in electronic format and derived from published dictionaries), but also the building of lexicons for machine use, as well as the development of dictionaries (in databases) for human use. With the rapid development of technology, the market for electronic equipment has been well-sold. In the past decade, the electronic dictionaries are very popular among students and other consumers in comparing with a paper dictionary the electronic dictionary is much easier to use, more portable and contains more than one million words. Generally, users need only type into a word and the defi nitions of the word come into being on the little screen, while traditional ones need you to leaf through several pages. An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including: as dedicated handheld devices; as apps on smartphones and tablet computers or computer software; as a function built into an E-reader; as CD-ROMs and DVD-ROMs, typically packaged with a printed dictionary, to be installed on the user’s own computer; as free or paid-for online products. Most types of dictionary are available in electronic form. These include general-purpose monolingual and bilingual dictionaries, historical dictionaries such as the Oxford English Dictionary, monolingual learner’s dictionaries, and specialized dictionaries of every type, such as medical or legal dictionaries, thesauruses, travel dictionaries, dictionaries of idioms, and pronunciation guides. Electronic dictionary databases, especially those included with software dictionaries are often extensive and can contain up to 500,000 headwords and defi nitions, verb conjugation tables, and a grammar reference section. Bilingual electronic dictionaries and monolingual dictionaries of infl ected languages often include an interactive verb conjugation, and are capable of word stemming and lemmatization. Publishers and developers of electronic dictionaries may offer native content from their own lexicographers, license data from print publications, or both, as in the case of Babylon offering premium content from Merriam Webster, and Ultralingua offering additional premium content from Collins, Masson and Simon & Schuster, and Paragon Software offering original content from Duden, Britannica, Harrap, Merriam-Webster and Oxford. Nowadays, there are some popular electronic dictionaries: CD-ROM which are supplied with the British and American pronunciation of each word; SMART treasures with synonyms on the theme groups, the interactive exercises; QUICK fi nd which allows to fi nd out the meaning of every word. These portable dictionaries with light weight are convenient and user-friendly. There are distinguished two types of electronic dictionaries: online and CD-ROM dictionaries which are usually electronic versions of the printed reference books supplemented by more visual information, pronunciation, interactive exercises and games . An online dictionary is a dictionary that is accessible via the Internet through a web browser. They can be made available in a number of ways: free, free with a paid subscription for extended or more professional content, or a paid-only service. Some online dictionaries are organized as lists of words, similar to a glossary, while others offer search features and additional language tools and content such as verb conjugations, grammar references, and discussion forums. Online dictionaries such as Oxford Online Dictionary, Longman Online Dictionary make convenient the user to carry searches of words and it is also the comfortable and easy way to observe the word lemmatization. Online translate programs such as PROMT, ABBYY Lingua, Slovoed Compact Dictionary and much more pocket dictionaries. These electronic dictionaries allow users to fi nd out the needed word, its translation, synonyms and word compounds. As you see the advantages of electronic dictionaries are tremendous and obvious. In addition, there is some other added functionality, such as calendar, alarm clock, calculator, address book.

