Lexicography in India


What is Lexicography?

The term ‘lexicography’ is used in two distinctive senses: first, it refers to the compilation of dictionaries; and second, it refers to the study of dictionaries. The editors of the Dictionary of Lexicography rightly point out, ‘Lexicography, often misconceived as a branch of linguistics, is sui generis, a field whose endeavours are informed by the theories and practices of information science, literature, publishing, philosophy, and historical, comparative, and applied linguistics. Sister disciplines, such as terminology, lexicology, encyclopaedia work, bibliography, terminography, indexing, information technology, librarianship, media studies, translation, and teaching, as well as the neighbouring disciplines of history, education, and anthropology, provide the wider setting within which lexicographers have defined and developed their field (Hartmann and James 1998:iv).  The core material of lexicography is the ‘dictionary’, the commonest variety of reference work, at once the subject of lexicographical theory (dictionary research) and the product of lexicographical practice. Since dictionaries of different types and kinds occupy the ontological core of any given historical time in any society, lexicographical studies often reveal construction of knowledge systems, their changes, and the sociological reasons for the epistemological shifts. Modern theoreticians discuss four major epistemological shifts in the process of the development of human communication that are revealed through the study of dictionaries (i) the consolidation of speech and gesture; (ii) the development of writing; (iii) the advent of print technology; and (iv) electronic computation. While these four shifts can be traced in the linear history of any society, in India’s case all the four communicative systems coexist as the development of the various languages are at different levels. For instance, while classical languages such as Sanskrit and Tamil have long histories of writing systems, and their grammar and commentary traditions seeping through the sieves of histories of orality, print, and electronic computation, many indigenous languages of India such as Chakma, and Nagamese have acquired writing systems and alphabets as recent as in the twentieth century[1]. Many indigenous Indian languages such as Vaagri Boli (See Srinivasa Varma and Mubarak Ali 2010) continue to be a Creole and function without any writing systems.


Making sense of India’s linguistic diversity[2]

India is home to a staggering number of languages. It is important to grasp the magnitude of the complexity presented by the Indian languages to comprehend the central role played by lexicography in understanding India’s knowledge traditions.   The Eighth Schedule to the Indian Constitution consists of the following 22 languages: (1) Assamese, (2) Bengali, (3) Gujarati, (4) Hindi, (5) Kannada, (6) Kashmiri, (7) Konkani, (8) Malayalam, (9) Manipuri, (10) Marathi, (11) Nepali, (12) Oriya, (13) Punjabi, (14) Sanskrit, (15) Sindhi, (16) Tamil, (17) Telugu, (18) Urdu (19) Bodo, (20) Santali, (21) Maithili and (22) Dogri. Of these languages, 14 were initially included in the Constitution. Sindhi language was added in 1967. Thereafter three more languages viz., Konkani, Manipuri and Nepali were included in 1992. Subsequently Bodo, Dogri, Maithili and Santali were added in 2004. At present, there are demands for inclusion of 38 more languages in the Eighth Schedule to the Constitution. These are: - (1) Angika, (2) Banjara, (3) Bazika, (4) Bhojpuri, (5) Bhoti, (6) Bhotia, (7) Bundelkhandi (8) Chhattisgarhi, (9) Dhatki, (10) English, (11) Garhwali (Pahari), (12) Gondi, (13) Gujjar/Gujjari (14) Ho, (15) Kachachhi, (16) Kamtapuri, (17) Karbi, (18) Khasi, (19) Kodava (Coorg), (20) Kok Barak, (21) Kumaoni (Pahari), (22) Kurak, (23) Kurmali, (24) Lepcha, (25) Limbu, (26) Mizo (Lushai), (27) Magahi, (28) Mundari, (29) Nagpuri, (30) Nicobarese, (31) Pahari (Himachali), (32) Pali, (33) Rajasthani, (34) Sambalpuri/Kosali, (35) Shaurseni (Prakrit), (36) Siraiki, (37) Tenyidi and (38) Tulu. (See ‘Indian Languages, Languages of India - Official, Ancient and Tribal Languages of India’ 2016; B. D. Jayaram 2000).


Speaking of the tribal languages, the numbers are astounding although tribes constitute only about 9 per cent of India’s total population, and they are unevenly distributed (1 per cent of Tamilnadu but 95 per cent of Mizoram is tribal). Nearly 90 million tribal people are concentrated in two regions: the central and eastern districts of Chattisgarh, Orissa, and Madhya Pradesh; and the states in the north-east. Considerable tribal populations also live in Rajasthan, Gujarat and Jharkhand (See ‘List of Languages by Number of Native Speakers in India’ 2016)[3]. The tribal languages belong to Tibeto-Burman, Dravidian, Austro-Asiatic, Indo-Aryan, and Diac family of languages. Their geographical distribution, especially of the Dravidian family of languages, indicates internal migration people, adaptation of neighbouring languages, and Creolization of cultural traditions across India. Of the 23 official Indian languages only Santali and Bodo are tribal languages. The significant variation in size and location of tribes are the reasons for different histories of growth, survival, extinction of tribal languages. Most of the tribal languages are endangered, several are near extinction and many such as Tolcha, Paite, Sengami, and Rangkas are no longer spoken. While Santals and Bhils, for example, have done well and benefited from the neighbourhood of mainstream language groups, Gonds despite being in millions have to struggle to keep their Gondi alive and spoken. Behind the state recognition of tribal languages one can surmise centuries of political struggles, negotiations, reconciliations and violence. Part of the problem is also the changing nature of the political struggles. In the 1950s, for instance, languages determined the reorganization of states in India but sine then all the new states (Meghalaya, Arunachal Pradesh, Tripura, Mizoram, Nagaland, Manipur, Jharkhand, Chhattisgarh, Uttarakhand, and Telangana) have been based on ethnicity. That means, one majority ethnic group and its language will dominate over other tribal languages and different states have adopted different strategies to tackle the problem. Arunachal Pradesh, which has over 80 languages spoken, has adopted English as the official state language whereas Sikkim keeps on adopting languages and its current number of official languages stands at eleven. Many Indian linguists have argued that multilingualism of the kind that is prevalent across India could be a resource and an enabling factor for the individuals growing up in our culture provided the hierarchical status between the languages are eliminated through the production of multilingual dictionaries (Pattanayak 1990). For the implementation of the policy Central Institute of Indian Languages has spent a considerable measure of resources in producing monolingual and multi-lingual dictionaries for the Indian languages  (‘Central Institute of Indian Languages’ 2016)[4].


While the modern-day dictionaries are educational and hence utilitarian in scope the ancient dictionaries were philosophical in nature and encyclopaedic in their organization of the materials.


Histories of dictionaries in Indian languages

Even a cursory appraisal of the scholarly literature available on the histories of dictionaries in Indian languages would indicate that the documentation except in the languages of Sanskrit, Hindi, Tamil, Kannada, Telugu and Gujarati is scant and irregular[5].  Even in languages where documentation exists not the entire primary sources are available to Indian researchers and scholars. For instance, Gregory James’s mammoth Colporuḷ A History of Tamil dictionaries presents an indicative chronology of Tamil dictionaries that runs to several pages and catalogues thousands of Tamil lexicographical manuscripts held in the major European libraries (James 2000a:597-716).  For other languages too, if not such a bewildering volume of dictionaries, a considerable number of dictionaries must have existed and irretrievably lost.


Vogel points out that lexicographic work must have started at a very early date in India especially in Sanskrit with the compilation of word-lists known as nighaṇṭu. While a nighaṇṭu cannot be considered as a proper dictionary in the contemporary sense of term and its application, abhidhānaśāstra (science of words) or abhidhānakośa (treasury of words) or simply koṣa of the later date is the prototype of traditional dictionaries in Indian languages. Vogel further writes that, ‘These glossaries, of which that handed down and commented upon in Yāska’s Nirukta is the best known and probably oldest specimen, did not, however, constitute the prototype of the dictionaries (koṣa) of later times. These are instead a number of marked dissimilarities between them. For one thing, the Vedic word-lists deal with all parts of speech, when in fact the classical dictionaries are generally limited to nouns and indeclinables. For another, the Vedic glossaries are based on one are several individual texts, whereas the classical lexica hardly show any traces of literary influence. For a third, the Nighaṇṭus served as teaching aids in the interpretation of scripture, while the Koṣas were meant primarily to help poets in composition, being an integral part of their education.’ (Claus Vogel 1979:303)  We see in a number of North Indian languages including Bengali, Hindi and Gujarati the influence of the Sanskrit Koṣa tradition in the dictionary making processes of the seventeenth and eighteenth centuries. In the South Indian languages of Tamil, Telugu, Kannada, and Malayalam we see the persistence of the Nighaṇṭu tradition over several centuries. The Nighaṇṭu tradition is so important in Tamil that Gregory James dedicates one chapter for the nighaṇṭus in his history of Tamil dictionaries (James 2000b:55-88).


The dictionary making processes in ancient India followed not one single principle and they followed different philosophies of different religious schools and persuasions and many a time they did not follow alphabetical orderings of the dictionaries. Those Indian dictionaries could be synonymic or homonymic. The synonymic dictionaries are systematic catalogues of words with one and the same meaning; they are grouped subject wise and often have the character of encyclopaedias. The homonymic dictionaries register words with more than one meaning. A hard and fast rule could not be imposed between the categories of synonyms and homonyms and so several dictionaries criss-crossed their word-lists. Most of the lexicographers in many Indian languages were Sanskrit pundits and so their dictionaries followed the pattern of Sanskrit dictionaries familiar to them. For instance the 16th century Rajasthani Dingala Nam-Mala by Kusalalabha Hararaja is a synonymic dictionary whereas the Rajasthani Anekarthi kosh by Kavi Udayarama of unidentified age is a homonymic dictionary (Amaresh Datta 1988:1018-1043)[6]. The first Oriya dictionary was prepared on the model of Sanskrit lexicons in the form of poetry. Upendra Bhanja the foremost poet of Oriya in the early 18th century called his dictionary Gitabhidhan  a dictionary in verse. He arranged the words not alphabetically but consonants and their rhyming words. Similar lexicographic attempts could be discerned in Bengali and Nepali histories of dictionaries.


Of all the ancient Sanskrit dictionaries Amarasiṃha’s Nāmalingānuśāsana (instructions concerning nouns and gener)  or Amarakoṣa (immortal treasure)[7] is the most influential in  determining the growth of lexicons in Sanskrit and other Indian languages.  Amarakoṣa’s popularilty and influence can be discerned from nearly the 80 commentaries it warranted in the history of Sanskrit literature. However the date of Amarakoṣa is still a matter of dispute among Sanskritists and Indologists. Amarasiṃha is said to have been one of the nine gems in the court of Chandragupta II, a Gupta king who reigned around 400AD. Some other sources suggest that he belonged to the period Vikramāditya of seventh century. Vogel records that Amarasiṃha could have been the contemporary of the astronomer Varāhamihira, who lived in sixth century, and since his name ends with –siṃha he could have been of Kṣatriya origin. He further cites that Amarakoṣa was translated into Tibetan, Chinese, Mangolian, Burmese and Sinhalese (Claus Vogel 1979c:312-313).


Part of the fascination with the Amarakoṣa is with its structure. The Amarakoṣa consists of verses that can be easily memorized. It is divided into three khāṇḍas or chapters. The first, svargādi-khāṇḍa (‘heaven and others’) has words pertaining to gods and heavens. The second, bhūvargādi-khāṇḍa (‘earth and others’) deals with words about earth, towns, animals and humans. The third, sāmānyādi-khāṇḍa (‘common’) has words related to grammar and other miscellaneous words. The sub classification system found in the Amarakoṣa is found to be analogous to the Word Net computational linguistics there is significant research has been carried out to harness the text’s potential for contemporary uses (Sivaja S Nair 2011). Recognizing the importance Amarakoṣa’s core structure for several Indian languages a multi lingual online searchable Amarakoṣa is available in Sanskrit, Hindi, Kannada, Punjabi, Bangla, Oriya, Assamese, Maithili, and English[8].


Despite the influence of Sanskrit in many Indian languages, the lexicographers of displayed various attitudes towards Sanskrit words in the dictionaries they compiled. Some wanted to avoid Sanskrit words altogether and opted for translations of English dictionaries as in the case of Assamese Hemakosha compiled by Heamchandra Barua (1835-96). Some other lexicographers opted to include Sanskrit words but chose to explain how the words acquired different meanings in their native tongues. Apabhrashta-shabda-prakash, a Gujarati dictionary published by Prabhakar Ramachandra Pandit in 1880 was one such example. The example of purifying the language of all Sanskrit words could be found in the case of Tamil dictionary compiled by Devaneya Pavanar  (1902-1981).


Local words, dialects, and words used in folk speech had to wait for the grace of the appropriately minded lexicographers to find their way into the dictionaries. Early lexicographers of the bilingual dictionaries like Forster and Carey considered ‘deshaja’ (indigenous) words and foreign Persian words used in Bengali as corruption of the language and discarded them from their dictionary. In Gujarati, Tamil, and Punjabi special dictionaries appeared in the turn of the 19th century to include words from their regional dialects.


Dictionaries in Indian languages also appeared in response to the social and historical needs. In 1838 when Bengali replaced Persian in the courts of justice of Calcutta, it called for urgent production of Persian-Bengali dictionaries. Alma-Nahar Arabi-Malayalam dictionary by Muhammad Abdu Sayeed and V.Mohammad was a product of the long trade relations Kerala enjoyed with the Arabic world.


With the advent of the Christian missionaries in India the production of dictionaries became a missionary activity and in many instances, collaboration between the state administration and the missionaries. Almost all Indian languages have had dictionaries created by the Christian missionaries. Hither to the dictionary making in India was the call of the motivated scholar, poet, a king, a grammarian or a philosopher but with the advent of the Christian missionaries it changed into a bilingual arena of administration and communication. While the purpose of the Christian missionaries to be engaged in lexicography was to translate Bible into local languages for the purposes of spreading the gospel, they were also confronted with the arduous tasks of writing grammar for many Indian languages, inventing writing systems for many oral languages, and grouping languages into families. The publication of A Comparative Grammar of the Dravidian or South Indian Family of Languages by Robert Caldwell in 1875 (Caldwell 2009) acquired an unintended political life of its own in South India and it brought forth the idea that South Indian languages are different from Indo-Aryan languages of North India and propelled a language based sub nationalist movement in South India. In the Central India and the North Eastern India the production of minute editions of tribal language dictionaries by the Christian missionaries helped to Christianize the entire tribal populations of the regions. Even today Christian missionaries are the ones who produce dictionaries for the tribal populace living in remote areas in small numbers. While examples are abound the Lisu-English dictionary that was published in 2007 could serve as instance of the illustriousness of the missionaries (Avia Ngwazah 2007).


Analyzing the contributions of the Christian missionaries to the lexicographical efforts in Tamil, Gregory James mentions alphabetization of the dictionaries, understanding of sociological meaning of mass literacy, and pedagogical necessity of dictionaries as some of the benefits (James 2000b:117) .  Criticizing the earliest missionary dictionary of Tamil, Antão de Provença: Vocabvlario Tamvlico (1679) in another article, he cites giving Portuguese glosses for Tamil lexis, and omitting encyclopedic entries dealing with Hinduism as the major mistakes of de Provença (James 2007).


In Independent India dictionary making is largely in the domain of Universities, State governments, and the Central Institute of Indian Languages, Mysore. Many legible and authoritative dictionaries have been produced by the government agencies.  Much of the governmental effort has gone into spreading and establishing Hindi as a national language.


In post-Independence India dictionaries in native tongues have become markers of identify and cultural heritage. One of the major areas of neglect in contemporary Indian lexicography today is the lack of documentation and analyses of encyclopedic dictionaries of past era in different fields such as medicine, music, astronomy, and flora and fauna. Similarly many of the tribal language dictionaries need to be harnessed for their knowledge potential.


Indian Philosophy of Language and Grammar

An overview essay on the subject of lexicography in India cannot conclude without a discussion on the Indian philosophy of language and grammar since the dictionaries were only by-products of philosophical discourse and practices. Language has been one of the fundamental concerns of Indian philosophy and all schools of thought premised their discussions from the basic problem of communication.  Among the six[9] of the disciplines prescribed for the study of the Vedas, Vedānga, four of them namely, Śikṣā (phonetics, phonology, and pronunciation), Chandas (prosady), Vyākaraṇa (grammar and linguistic analysis) and Nirukta (etymology and explanation of the words) are directly concerned with language and they demand lexicographic engagements. Vyākaraṇa, commonly translated as grammar, means, ‘separation, distinction, discrimination, analysis and explanation of words’ (Monier-Williams) the processes connected with dictionary making. Pāṇini’s Aṣṭādhyāyi and Yāsaka’s Nirukta are the most important texts of the Vyākaraṇa tradition of Indian philosophy. While Pāṇini’s Aṣṭādhyāyi remains the most accomplished example of generative grammar, Yāsaka’s Nirukta provides a semantic analysis of words with their components in the contexts of their occurrence. The original intent of these grammatical texts was to help preserve the integrity of the words, their pronunciation, and meanings in the Vedas so that the pristine purity of the visions of the Vedas was transmitted, mostly orally, without any loss. However, despite serving an auxiliary purpose, the grammatical texts and their lexicographical aids contained philosophies of their own. It is clear that for centuries the various schools of thought in India have carried out studies that have produced insights into the working of the language. The grammarian’s interest was not confined to the descriptions and analysis of particular language, but extended to the true nature of potentialities of language. The importance of language and grammar for Indian philosophy is so accentuated that Karl H. Potter devoted one volume of his celebrated Encyclopaedia of Indian Philosophies to the philosophy of the grammarians (Coward and Raja 2015).


Bimal Krishna Matilal presented an entirely new way to think about the Indian philosophy of grammar language in his three books, The Central Philosophy of Jainism (Anekānta-vāda), The Word and the World India’s contribution to the Study of Language, and Epistemology, Logic, and Grammar in Indian Philosophical Analysis (Bimal Krishna Matilal 1981; Matilal 1990; Matilal and Ganeri 2005a).  As Ganeri succinctly puts it, Matilal was able to present the history of Indian philosophy no longer as a history of competing metaphysical frameworks, but as an extended investigation into the question of knowledge, its grounds, possibility, and domain. Epistemology, Matillal argues, is the key philosophical discipline in the Indian debate, not metaphysics, a claim that does not preclude the discussion of metaphysical questions, but sees their resolutions as lying in an analysis of the structures of knowledge and language (Matilal and Ganeri 2005b:x).


Matilal’s writing on the doctrine of sevenfold predication (saptabaṅhgī) (See Bimal Krishna Matilal 1990:41), for instance help us understand why Jain monks were engaged in the preparation of Tamil dictionary Tivākaram in the eighth century. While any common history of Tamil literature would explain that with the advent Jain dictionaries (or thesauri) the vocabulary of Tamil language increased manifold it would be impossible to know what motivated the Jain monks to produce them and what was the philosophy behind their endeavours. Matilal’s analysis makes it clear that the Jain anekāndtavāda literally, the ‘doctrine of non-one-sidedness’, a belief in the non-one-sided, pluralistic nature of reality meant that Jains viewed conflicting philosophical systems as equally valid, each system being a correct description of just one aspect of this manifold reality. It is possible to conjecture that since Jains espoused anekāndtavāda they produced thesauri containing words that would bring in many layers of reality.  Language designates things in an incomplete manner; it can choose only one of the many activities associated with an object. As such there is some of sort of permanent relation between a word and its meaning. It is accepted that even in the primary meaning of a word is not definitely circumscribed and that the boundaries of the meaning often change on the basis of contextual factors, not only in the case of ambiguous words but even in that of ordinary words.


If Matilal Made epistemology and logic at the centre of Indian philosophy, Potter effectively argued for making a case for Vyākaraṇa as a darśana in its own right (Coward and Raja 2015:18). Crediting the fifth century Sanskrit grammarian Bhartṛhari for leading grammar into philosophy proper Potter presented that all the schools of Indian philosophy contributed to the philosophy of language.


The foregoing analysis would make it clear that lexicography holds the key to unlock India’s knowledge systems. Unfortunately we do not subject our traditional and modern dictionaries and lexicographical practices to rigorous epistemological analyses. If we were to subject from tribal language dictionaries to classical dictionaries to such philosophical analyses we would be able to access an array of Indian knowledge systems. Works of philological scholarship demand knowledge of a whole field of Indological research extending over centuries but the fruits of which are too sweet to ignore.


From the discussions on the histories of dictionaries in Indian languages one learns that contact with different languages help languages to grow their vocabularies. It is interesting to note that both the purist movements in different languages and the mix and borrow tendencies in languages have helped to grow the productivity of languages. As people modernize dictionaries also modernize but only when they accommodate everyday spoken language. The call further is to subject lexicographical practices to rigorous philosophical analyses.  Matilal wrote, ‘‘Indian philosophy’ has unfortunately come to denote a group of occult religious cults, a system of dogmas, and an odd assortment of spirituality, mysticism, and imprecise thinking, concerned almost exclusively with ‘spiritual liberation’. Books, pamphlets, and other materials dealing with this theme are quite considerable in number and unfortunately too easily available’ (Matilal 2005:xii). Philosophies of lexicography in India hold the key to change this perception. 






