The vocabulary contains 2131 meaning-word pairs ("entries") corresponding to core LWT meanings from the recipient language Japanese. The corresponding text chapter was published in the book Loanwords in the World's Languages. The language page Japanese contains a list of all loanwords arranged by donor languoid.
In most cases, the aim is for the most basic and readily understood Japanese words to be entered into the database. Overly formal or colloquial words are used when otherwise only circumlocutions are available, which are only included when they constitute set phrases. Somewhat formal words that are prevalent in writing are also used, usually alongside words more common in spoken language.
The word forms are cited in the standard Hepburn transcription, with vowel lengths dinstinguished by way of the macron, with the exception of /i:/ which is doubled as
The word forms are cited in the Japanese script. Japanese employs a mixture of three scripts, two syllabaries and a logographic script:
(2) Hiragana: this script was orginally developed from the cursive writing style of Chinese calligraphy and was first used in diaries and novels that were written in the Japanese vernacular as opposed to the more formal writing which was done in Chinese. The script is currently used for most function words (with the most important exception being personal pronouns) and verb and adjective desinences. In the latter case, there are sometimes several possible variants, of which only one is included in the database. For instance, azukaru ‘keep’ can be both written 預かる and 預る. Hiragana is also used in cases where a word originally written in kanji has grammaticalized into a function word (for instance miru as a full verb means ‘see’ and is written 見る, while as an aspectual auxiliary expressing the conative aspect, it is written completely in hiragana as みる). Hiragana is also used for any word of either Native Japanese or Sino-Japanese stratum that is written with characters outside of the set prescribed by the Ministry of Education. However, I have usually tried to always give the kanji spelling for lexical words if applicable.
(3) Katakana: this script was originally developed from shorthand renderings of Chinese characters and used in conjunction with Chinese characters as a reading aid. The script is now used to write any foreign names that are not part of the Sinosphere and foreign loanwords (these include all words from languages other than Chinese and Chinese loanwords that were borrowed after the 19th century. As mentioned above, some Foreign words used to be written in kanji as well, but this has ceased, and where applicable the kanji spelling has been included alongside the primary katakana one. For instance, buriki ‘tinplate’ used to have two kanji spellings: 錻力, which is a case of ateji, and鉄葉, which is a case of semantically-based kanji assignment (characters meaning ‘steel’ and ‘leaf’); both have been discarded today in favour of the katakana spelling ブリキ.
The so-called mimetic words are usually written in hiragana, but sometimes in katakana for emphasis. For more discussion of the different scripts and the development of their modern usage, cf. Hayashi (1982) and Gottlieb (1995:ch.1).
Sometimes, the words are written in different characters but where it can be argued that these constitute cases of polysemy (since Japanese also employs Chinese characters to write Native Japanese words, this sometimes leads to semantic distinctions made in Chinese superimposed on Native Japanese items, for instance, the verb miru “see” can be written with 4 or 5 different Chinese characters). Thus we have tobu 跳ぶ “jump” and tobu 飛ぶ fly”, and fune 舟 “boat” and fune 船 “ship”. In cases like that, I have included all the relevant variants in this field, as I regard the differences in writing to be secondary.
A meaning is entered here, whenever necessary, that is in cases where the Japanese word has a significantly different meaning, or is a hyperonym or hyponym of the LWT meaning. Also, when a given word was first attested with a meaning different from the LWT meaning, this is noted in W6 and if possible the first attestation for the LWT meaning is given (however, the word is still categorised according to when the form itself was attested, not the particular meaning). Also, commonly used compounds or phrasal expressions involving the word form are also mentioned in this field.
This field is for any remarks on grammatical specifics related to the entry word, but most common are:
1. in case of nouns that are used as predicates by means of the light verb suru, or used attributively by means of the attributive particle no.
|Comment on word form||
This field is used for the following purposes:
1. Whenever a word underwent a noteworthy phonological or morphological change, this is noted here. Regular sound change leading to forms in Modern Japanese different from older stages of Japanese are usually not recorded throughout the database.
The following criteria have been followed in assigning the analyzability values in the database:
This is chosen when the word cannot be further analyzed in modern Japanese, including verbs and adjectives, which are inflected but whose stem cannot be further broken down. However, reduplicated forms and words that are historically analyzable are viewed as ‘semi-analyzable’.
A. Reduplicated forms whose base does not occur on its own. These are usually adverbs, the process is not overly productive. Cf. Hamano 1998. There is also another type of reduplication, which is regarded as a derived form and is discussed below.
A. Reduplicated forms that form the collective of the base. These are usually from nouns and are regarded as derived forms here. One example is hito-bito ‘people’ from hito ‘person’.
Compounds are clearly recognisable as such, both from the accent (cf. Akamatsu 2000:268-270) and also from certain morphophonological phenomena that occur with them. For the NJ stratum and some SJ words, rendaku is a common occurrence (however, it should be noted that this is a phenomenon with many exceptions, cf. Shibatani 1990:173-175 and Vance 1987:146-148), and for the SJ stratum, an assimilation of a bisyllabic first element to certain following consonants is common, as in gakkō 'school', where gaku becomes gak- in front of kō (Vance 1995: 155-164). For the Sino-Japanese words, it can be argued in some cases whether these are really compound words since some elements nowadays never occur outside of compounds, but due to the high degree of transparency of the Chinese characters to native speakers I have opted to analyse these as compounds.
The entering of phrases into the database was usually avoided and only undertaken when it was the only choice available (sometimes, commonly used phrases were also entered into the W6 field). This was only deviated from when the only nonphrasal expression available was nonstandard or extremely rare.
Except for words marked as unanalyzable, and some marked as semi-analyzable, morpheme-by-morpheme glosses have been provided for all entries. The rules and abbreviations laid out in the Leipzig Glossing Rules were followed, with the following exceptions:
The genealogical classification of the Japanese language is a famously controversial question. Except for the really far-fetched theories such as those linking Japanese to Indo-European, Basque or Sumerian, the majority of the scholars working on the question seems to prefer a relation to either Altaic or Austronesian. In the case of Altaic theories, some scholars restrict themselves to positing a closer relationship between Japanese and Korean (Martin 1966, Lewin 1976, Whitman 1995, Beckwith 2005 arguing for a connection between Koguryoic and Japanese rather than Korean and Japanese), while others then relate Japanese and Korean (and usually Ainu as well) to the Altaic family as a whole (Miller 1971, Miller 1996, and Vovin 1994). For the Austronesian theories, usually an Altaic-Austronesian superstrate-substrate mix is proposed (Polivanov 1918), although Benedict 1990 has proposed a genetic connection to Austronesian, Miao-Yao and Tai-Kadai, resulting in a super-family called “Japanese/Austro-Tai”. However, Vovin 1994 argues against Benedict 1990 on a number of methodological grounds. I will follow Vovin 1994 in its criticism and assume that Austronesian does not have a genetic link to Japanese, but very well might have a substrate relationship. Finally, it should be noted, that even for the most convincing theory, the Altaic/Korean hypothesis, the number of cognates does not exceed 320. For this reason I decided to include the information on possible Korean-Japanese cognates in a separate custom field 8 “Korean”, rather than including this in the Age field, which I have restricted to periods Japanese written records are available for. This would set the earliest period at the 8th century AD. The field follows a periodisation based on a superset of historical periods that seems to be agreed upon by most authors, even though there are slight differences:
— Old Japanese (Jōko Nihongo上古日本語, abbreviated OJ): this is usually equated with the historical Nara Period (710-794). It represents the period with the earliest written records of Japanese, even though some ritual texts (Norito, s. Philippi 1959:1-4 and Bentley 2001:6-36) that were recorded in that period might originally have been devised a century or so prior to their publication date.
As far as the period prior to the first written records is concerned, cf. the remarks under W9 in the section on the Native Japanese stratum, regarding the putative early loans from Classical Chinese into Old Japanese and the alleged substrate items from Austronesian.
This field was only used very sparingly. The default value was set to “regular” for all entries, and only words that were specifically of a highly colloquial or of an exclusively formal nature were marked as such.
The frequency figures are based on the data in Kokuritsu Kokugo Kenkyūsho 2005. 70 magazines from the year 1994 were used in the study, with a total of 1,074,617 morphemes (the study usually counted inflectional verb desinences as own words, but as far as derivational morphology goes, this depended on productivity). Due to the relatively low number of tokens, approximately only the 500 most frequent terms from the database were entered into this field.
The point of departure here is the stratification of the Japanese lexicon. The values are as follows:
0. no evidence for borrowing
Native Japanese (NJ) stratum
NJ words are by default presumed to be 0. Dictionaries provide excellence resources for almost all words that entered the language after writing was adopted in Japan. The only problem lies with words that have been present in the language since the earliest records. There have been several attempts to link some of the NJ stratum to various languages:
— Chinese: Karlgren 1926 offered a list of 23 Japanese words that might be borrowings from Chinese. He suggested that these might be comparable to "Lehnwörter" which is a term that designates words that have been borrowed into the German language but integrated phonologically in such a way that they are no longer recognizable as loans. Likewise, these early loans in Japanese would be integrated in the NJ lexicon to such an extent that they would not be recognizable as Chinese loans, which would be the case with the much larger SJ vocabulary. In Karlgren’s view, these early loans could provide additional hints for the phonology of Archaic Chinese as they were borrowed well before the Middle Chinese period. Altogether, there is a list of about 29 forms that are said to be borrowed from Archaic Chinese. Except for three or four, on which the academic community seems to widely agree, the validity of most of these remain somewhat controversial, as in Kamei 1954, which is largely a rebuttal to Karlgren 1926, but since they represent probable loan scenarios both in terms of phonology, semantics and cultural context, they are assigned an 3. Miyake X has reviewed the list of loans proposed by Karlgren. If Miyake finds a particular word to be an “invalid” example of a loan (grade F in the study), then a 2 is assigned. If Miyake argues against the word being a loan (grade C in the study), a 2 is only assigned if there are compounding factors such as semantic likelihood of the term being borrowed.
— Austronesian: said to be a substratum of Japanese and as such could be source for borrowings. While Shinmura 1908 is probably the earliest proponent of an Austronesian substrate theory, Polivanov 1918 offers a first systematic proposal. He notes a number of phonological and morphological characteristics that set Japanese apart from Korean and other languages usually said to be genetically related to Japanese, which he ascribes to an Austronesian influence on Japanese, as for instance the presence of some prefixes and the fact that open syllables are typical. They found a supporter in Izui 1953 who proposed a list of sound correspondences extending to about 55 Proto-Malayo-Polynesian (PMP) based on the reconstruction in Dempwolff (1934-8). A more recent account of potential loans from Old Javanese, in Kumar and Rose 2000, has not been considered for this paper, as they do not employ proto-forms available but Old Javanese, which would reflect a time depth of roughly 2000 years, which would make the case for contact even less probable. Most sound correspondences are quite plausible, even though the fact that both Japanese and PMP have simple phonotactics compound the problem. Similarly, most of the semantic relations are also plausible, however the biggest remaining problem is the fact that first, no particular cultural domain can be associated with the allegedly Austronesian words and that second, even though there might be genetic evidence (Kumar and Rose 2000), there is no clear archaeological record that points to contact between the Malayo-Polynesian world and Japan (Peter Bellwood, p.c.). Thus, I assign these words a 1.
— Korean: Martin 1966 links 330 words of NJ to Korean. The vast majority, 243 out of 256, occur in either Old Japanese or Late Old Japanese. Thus, if the Japanese-Korean hypothesis holds, there would be a good chance that these words would be part of the inherited lexicon, so that it would not affect the borrowing scale here. However, I have annotated the entries where applicable. There are a number of items disputed to be borrowings from Korean, which are usually doubted by Japanese scholars. They are assigned either a 2 or 3, depending on how much support a theory has.
Sino-Japanese (SJ) stratum
The SJ stratum mainly consists of borrowings from Chinese, however there are some exceptions:
— so-called ateji, where Chinese characters have only been used for their phonetic value. The most important example of this in the database is kega “wound” written with the characters for “blame (v), suspicious” and “I”, ultimately a NJ word written in a SJ manner.
As for the Foreign stratum, the individual word etymologies are usually quite well-documented so that the assignment of either 0 (Japanese coinage) or 4 (borrowing) should be relatively straightforward.
As for a working definition, the following guidelines were used:
Both loan translations and loan renditions were classified as calques, but it was mentioned in “comments on borrowed” W10 what class they fall into.
As for Meiji modernization era vocabulary,
Some potentially problematic cases were classified as follows:
For words that are not loans but contain a borrowed element, this field is used to indicate this. It is also used to indicate whether a word is suspected to be a loan formation. Since loan formations are not easily recognisable and cannot be determined without more detailed philological work, these can only be conjectures. Typically, when a given word was representative of scientific or technological knowledge imported from the West in the course of the Meiji modernization, the word has usually been coded as a loan formation; all the more if it was marked by a dictionary as such. For a short discussion of loan formations and the somewhat similar category of loan translations or calques, see the annotation to the Calqued field.
|Comment on borrowed||
This field has been used for the following purposes:
— For SJ terms that were attested in annotated Chinese text. From the beginning of record keeping until the dawn of the Modern Period, the majority of administrative, religious and academic works was written in Chinese (kanbun). However, concomitant with the decline of linguistic competence in Chinese, texts were increasingly read in a peculiar mixed style called hentai-kanbun, which made it possible to read out a Chinese text in Japanese by means of adding Japanese desinences and particles as diacritic signs and transforming the Chinese word order into a Japanese one by means of reversal and return marks. For an English language introduction, see Crawcour 1965. If the Nippon Kokugo Daijiten gave such a text as earliest reference, it was used for the Age field, since such a text was intended to be read as Japanese and hence it could be argued that said word at this point had entered the Japanese language. Furthermore, not in all cases was the earliest reference for a “pure” Japanese text (wabun) also given. Nevertheless, in all cases the date of the wabun reference was entered here.
If a loanword was by itself a loanword from a different language, the earliest form, for which written records were available, was entered. Thus, in a case like oriibu, which came from the English olive, the Greek word elaion was entered into this field, while the "Loan history" field reflects the fact that the word was borrowed into Latin from the Greek, and then was borrowed from the Old French into the English language.
Some loanwords from Chinese were in turn calques from Sanskrit. In such a case, the information concerning the original Sanskrit term which was the source of the calque was entered into the "earlier source word" fields, with the "Loan history" field reflecting the status as a calque.
If a word is considered to be a loan translation of some sort, then the information concerning the assumed source word for the translation was entered into the "earlier source word" fields, with the "Loan history" field indicating the loan translation assumption.
Each single entry has been checked with the Nippon Kokugo Daijiten, which accordingly was usually not given as a source. Other references were usually given in the author year format, with the following exceptions:
Daigenkai: Daigenkai, Ōtsuki (ed.)
Daigenkai 大言海. 1932. Ōtsuki (ed.).
Daikanwa Jiten大漢和辞典. 1955. Morohashi (ed.). 50,000 characters and 530,000 words.
Esenseu hanyeong sajeon 엣엔스 한영 사전. 2000. 105,000 words.
Merriam Webster Online Dictionary. http://www.merriam-webster.com/ (accessed on various dates)
Akamatsu, Tsutomu. (2000). Japanese phonology: A functional approach. München: LINCOM
Irwin, Mark. 2005. Rendaku-based lexical hierarchies in Japanese: the behaviour of Sino-Japanese
When a new word is borrowed into a language, and if there already is an existent word, the new word rarely ever simply ‘replaces’ the old one. The two words are in co-existence until one of them might fall into disuse. I have assumed a replacement to have taken place when an older term has fallen into disuse or has become exceedingly rare. The exact details will only become clear after a thorough philological analysis. If it was unclear if an older term existed and if the concept was unlikely to have been newly introduced to Japan, the value was set to “No Information”. A number of examples for this, taken from the database:
• shinseki ‘relatives’ replaced ukara, yakara (obsolete)
A variety of sources was used in the attempt to find words used in older periods of Japanese that might have fallen out of use by now: the Nippon Kokugo Daijiten, which was used for any entry, and in some cases Daigenkai and Serifu 2007.
If it can be assumed that the concept associated with term was introduced newly to Japan alongside with the word, this value was used.
This was used when the old term continues to be used alongside the newly borrowed term. Loveday 1996:80-82 points out that there is a number of cases where a NJ or SJ term is in co-existence with an English loanword, and where the former refers to a more traditional version of the concept/object, while the English term to a more modern, Western one. For instance, tatami is the traditional Japanese matting, while kāpetto is a Western-style carpet. Instances like these do not appear often in the database, but as much as they do, they have been treated as cases of co-existence, as for instance in the case of tobira and doa “door”.
This has been seen in direct correlation to what stratum a word belongs to. Some loanwords are no longer perceived as such, and felt to be part of the NJ stratum, and were thus marked “highly integrated”. SJ words have been part of the lexicon for so long that they can be analysed as “highly integrated”, which leaves borrowings from other foreign languages that are “unintegrated”.
I have usually assumed that if a term was available in Old Japanese, the concept was present prior to contact. An exception are those words presented in Karlgren 1926, which do not refer to clear cases of innovations transmitted from China to Japan, which are marked as “no information”.
Exotic animals and objects that might now be present in zoos or museums have been marked as “not present”.