Vocabulary Japanese

The vocabulary contains 2131 meaning-word pairs ("entries") corresponding to core LWT meanings from the recipient language Japanese. The corresponding text chapter was published in the book Loanwords in the World's Languages. The language page Japanese contains a list of all loanwords arranged by donor languoid.

Word form LWT code Meaning Core list Borrowed status Source words

Field descriptions


In most cases, the aim is for the most basic and readily understood Japanese words to be entered into the database. Overly formal or colloquial words are used when otherwise only circumlocutions are available, which are only included when they constitute set phrases. Somewhat formal words that are prevalent in writing are also used, usually alongside words more common in spoken language.

The word forms are cited in the standard Hepburn transcription, with vowel lengths dinstinguished by way of the macron, with the exception of /i:/ which is doubled as . The pitch accent is usually not marked, with the exception of where it might be relevant for instances of apparent homophony.

Original script

The word forms are cited in the Japanese script. Japanese employs a mixture of three scripts, two syllabaries and a logographic script:

(1) Kanji: this is the logographic script borrowed from Chinese. It is used for most full words: nouns, personal pronouns and the unchanging part of verbs and adjectives, of both the Native Japanese and Sino-Japanese strata. Whereas with words from the Sino-Japanese stratum each character reflects a monosyllabic Chinese pronunciation (called on’yomi ‘sound reading’ in Japanese), Native words are usually assigned completely to a Chinese character based on the meaning of the Chinese character, seldom multi-character comibinations (such as kyō ‘today’ which is assigned to a multi-character combination 今日 meaning ‘today’ in Chinese). This way of pronouncing the kanji like Native words is called kun’yomi ‘meaning reading’. This was also done for words of Foreign origin. Also, some words of Native Japanese origin that are distinguished in writing but can reasonably be classified to be instances of polysemy rather than homophony. In these cases, each instance is entered into W3 together with a meaning description.
There is the phenomenon of ateji (‘directed characters’, cf. Tajima 1998:452-461), where Chinese characters are applied to words that are not Sino-Japanese not according to their meaning but according to the sound. For instance, kega 怪我 ‘wound’ is a Native word, yet the Chinese characters, meaning “suspicious” and “self” respectively, do not reflect the meaning of the word, but rather were chosen only for their pronunciation. There are a few instances of this throughout the database, which have all been marked as such in W3. Earlier, ateji were also used for words of Foreign origin, which has also been included where applicable.
The Ministry of Education prescribes a set of 1,945 characters to be taught in schools. After the Second World War, the characters underwent a series of mild reforms, which applied to the prescribed set only and was of much smaller scale than the efforts undertaken in China. I have followed this and have not noted the traditional forms in cases where the characters have been reformed.

(2) Hiragana: this script was orginally developed from the cursive writing style of Chinese calligraphy and was first used in diaries and novels that were written in the Japanese vernacular as opposed to the more formal writing which was done in Chinese. The script is currently used for most function words (with the most important exception being personal pronouns) and verb and adjective desinences. In the latter case, there are sometimes several possible variants, of which only one is included in the database. For instance, azukaru ‘keep’ can be both written 預かる and 預る. Hiragana is also used in cases where a word originally written in kanji has grammaticalized into a function word (for instance miru as a full verb means ‘see’ and is written 見る, while as an aspectual auxiliary expressing the conative aspect, it is written completely in hiragana as みる). Hiragana is also used for any word of either Native Japanese or Sino-Japanese stratum that is written with characters outside of the set prescribed by the Ministry of Education. However, I have usually tried to always give the kanji spelling for lexical words if applicable.

(3) Katakana: this script was originally developed from shorthand renderings of Chinese characters and used in conjunction with Chinese characters as a reading aid. The script is now used to write any foreign names that are not part of the Sinosphere and foreign loanwords (these include all words from languages other than Chinese and Chinese loanwords that were borrowed after the 19th century. As mentioned above, some Foreign words used to be written in kanji as well, but this has ceased, and where applicable the kanji spelling has been included alongside the primary katakana one. For instance, buriki ‘tinplate’ used to have two kanji spellings: 錻力, which is a case of ateji, and鉄葉, which is a case of semantically-based kanji assignment (characters meaning ‘steel’ and ‘leaf’); both have been discarded today in favour of the katakana spelling ブリキ.
Katakana are also used for animal names in a scientific context, even in cases where kanji are available. However, in cases like these I have chosen the kanji whenever possible.

The so-called mimetic words are usually written in hiragana, but sometimes in katakana for emphasis. For more discussion of the different scripts and the development of their modern usage, cf. Hayashi (1982) and Gottlieb (1995:ch.1).

Sometimes, the words are written in different characters but where it can be argued that these constitute cases of polysemy (since Japanese also employs Chinese characters to write Native Japanese words, this sometimes leads to semantic distinctions made in Chinese superimposed on Native Japanese items, for instance, the verb miru “see” can be written with 4 or 5 different Chinese characters). Thus we have tobu 跳ぶ “jump” and tobu 飛ぶ fly”, and fune 舟 “boat” and fune 船 “ship”. In cases like that, I have included all the relevant variants in this field, as I regard the differences in writing to be secondary.

Free meaning

A meaning is entered here, whenever necessary, that is in cases where the Japanese word has a significantly different meaning, or is a hyperonym or hyponym of the LWT meaning. Also, when a given word was first attested with a meaning different from the LWT meaning, this is noted in W6 and if possible the first attestation for the LWT meaning is given (however, the word is still categorised according to when the form itself was attested, not the particular meaning). Also, commonly used compounds or phrasal expressions involving the word form are also mentioned in this field.

Grammatical info

This field is for any remarks on grammatical specifics related to the entry word, but most common are:

1. in case of nouns that are used as predicates by means of the light verb suru, or used attributively by means of the attributive particle no.
2. when a verb is part of a intransitive-transitive verb pairing (cf. Jacobsen 1996), this verb is usually marked as Semi-analyzable in the Analyzability field with no further analysis. The counterpart verb is then given in this field.

Comment on word form

This field is used for the following purposes:

1. Whenever a word underwent a noteworthy phonological or morphological change, this is noted here. Regular sound change leading to forms in Modern Japanese different from older stages of Japanese are usually not recorded throughout the database.
2. Variants are also recorded in this field. These are usually limited to words sharing the same lexical material. Phrases and compounds that are also commonly used are noted in the Meaning field, and in case of loanwords synonyms are discussed in the Other Comments field.
3. Also, in cases of apparent homophony in Native words, the locus of the pitch-accent is noted. For instance, awa ‘foam’ and awa ‘millet’ can be ruled out as an instance of polysemy since the former has the accent on the second mora and the latter on the first.
4. When a word is written according to the principle of ateji, i.e. Chinese characters used according to their pronunciation, not their meaning, this is recorded here, and also if the ateji writing is still being used.


The following criteria have been followed in assigning the analyzability values in the database:


This is chosen when the word cannot be further analyzed in modern Japanese, including verbs and adjectives, which are inflected but whose stem cannot be further broken down. However, reduplicated forms and words that are historically analyzable are viewed as ‘semi-analyzable’.


A. Reduplicated forms whose base does not occur on its own. These are usually adverbs, the process is not overly productive. Cf. Hamano 1998. There is also another type of reduplication, which is regarded as a derived form and is discussed below.
B. Words that include a cranberry morph, i.e. an element that does not occur by itself. For instance, katatsumuri ‘snail’, consists of two elements, kata ‘single’ and tsumuri, with the latter not appearing on its own.
C. Words that include affixes that are no longer productive and often opaque. For instance taira, which is a contraction from a now opaque prefix ta- and the root hira ‘flat’. There are also a number of cases involving an adjective suffix –ka, which is no longer productive.
D. Words which are historically polymorphemic but which are now perceived to be monomorphemic. For instance sakana is originally a compound of sake ‘wine’ and na ‘food’, this is no longer perceived as such.
E. Words whose original structure has been obscured through time. This is usually because of one of the following factors, or any combination of the three:
a. writing: a compound word is written with one character, obscuring the internal structure, or a derived word is written with a character different than the root it is derived from. For instance, mabuta ‘eyelid’ is originally a compound of me ‘eye’ and futa ‘lid’, a fact that is obscured by the fact that mabuta is written in one character. Also, mise ‘shop’ is said to have been derived from the nominalised form of the verb miseru ‘to show’, but again this is obscured through the writing.
b. contraction: a compound word that has undergone exceptional phonological change (for instances of regular phonological or morphological change, see Frellesvig 1995). Examples include otto ‘husband’, a contraction of o ‘man’ and hito ‘person’; fude ‘pen’, from fumi ‘text’ and te ‘hand’; hōki ‘broom’, from ha ‘leaf’ and haki, the nominalised form of the verb haku ‘to sweep’.
c. phrase: in some cases, what was originally a phrase has become a single word. Often this involves two nouns connected with an attributive marker, such as no, na or the archaic tsu. Without exception, these words are written in one character, and have sometimes undergone further phonological changes of their own. Some examples: kinoko ‘mushroom’, from the phrase ki no ko ‘tree’s child’; honoo ‘flame’, from the phrase hi no ho ‘fire’s ear’; matsuge ‘eyelash’, from the phrase me tsu ke ‘eye’s hair’.
F. Verbs that are marked as transitive or intransitive. As discussed at length in Jacobsen 1992, there is a small number of verbs that forms sets of intransitive-transitive pairs. This process is no longer productive and the marking unpredictable. Whenever this is applicable to a verb in the database, it is noted in W4 whether it is transitive or intransitive and what the form of its counterpart verb is.
G. Sino-Japanese compounds were usually regarded as ‘analyzable compounds’ because of the fact that their internal structure was readily available on account of the logographic characters. However, in certain cases, exceptional phonological change has led to making this less obvious. Usually these words are also no longer written in Chinese characters, but rather with hiragana. Examples include yakan ‘kettle’, originally a compound of yaku ‘medicine’ and kan ‘can’,

Analyzable derived

A. Reduplicated forms that form the collective of the base. These are usually from nouns and are regarded as derived forms here. One example is hito-bito ‘people’ from hito ‘person’.
B. Words that are derived by affixes. As far as Native Japanese words are concerned, by far the most frequent case is a deverbal noun, derived by means of a nominalising suffix which is identical in form to the stem form (but not in accent, cf. Martin 1988:387). For instance, hikari 'light' is derived from the verb hikaru 'to light'. As for Sino-Japanese words, this usually refers to suffixes used to form nouns.

Analyzable compound

Compounds are clearly recognisable as such, both from the accent (cf. Akamatsu 2000:268-270) and also from certain morphophonological phenomena that occur with them. For the NJ stratum and some SJ words, rendaku is a common occurrence (however, it should be noted that this is a phenomenon with many exceptions, cf. Shibatani 1990:173-175 and Vance 1987:146-148), and for the SJ stratum, an assimilation of a bisyllabic first element to certain following consonants is common, as in gakkō 'school', where gaku becomes gak- in front of kō (Vance 1995: 155-164). For the Sino-Japanese words, it can be argued in some cases whether these are really compound words since some elements nowadays never occur outside of compounds, but due to the high degree of transparency of the Chinese characters to native speakers I have opted to analyse these as compounds.

Analyzable phrasal

The entering of phrases into the database was usually avoided and only undertaken when it was the only choice available (sometimes, commonly used phrases were also entered into the W6 field). This was only deviated from when the only nonphrasal expression available was nonstandard or extremely rare.


Except for words marked as unanalyzable, and some marked as semi-analyzable, morpheme-by-morpheme glosses have been provided for all entries. The rules and abbreviations laid out in the Leipzig Glossing Rules were followed, with the following exceptions:

AR: adjectivizer
NR: nominalizer
SUFF: suffix
VR: verbalizer


The genealogical classification of the Japanese language is a famously controversial question. Except for the really far-fetched theories such as those linking Japanese to Indo-European, Basque or Sumerian, the majority of the scholars working on the question seems to prefer a relation to either Altaic or Austronesian. In the case of Altaic theories, some scholars restrict themselves to positing a closer relationship between Japanese and Korean (Martin 1966, Lewin 1976, Whitman 1995, Beckwith 2005 arguing for a connection between Koguryoic and Japanese rather than Korean and Japanese), while others then relate Japanese and Korean (and usually Ainu as well) to the Altaic family as a whole (Miller 1971, Miller 1996, and Vovin 1994). For the Austronesian theories, usually an Altaic-Austronesian superstrate-substrate mix is proposed (Polivanov 1918), although Benedict 1990 has proposed a genetic connection to Austronesian, Miao-Yao and Tai-Kadai, resulting in a super-family called “Japanese/Austro-Tai”. However, Vovin 1994 argues against Benedict 1990 on a number of methodological grounds. I will follow Vovin 1994 in its criticism and assume that Austronesian does not have a genetic link to Japanese, but very well might have a substrate relationship. Finally, it should be noted, that even for the most convincing theory, the Altaic/Korean hypothesis, the number of cognates does not exceed 320. For this reason I decided to include the information on possible Korean-Japanese cognates in a separate custom field 8 “Korean”, rather than including this in the Age field, which I have restricted to periods Japanese written records are available for. This would set the earliest period at the 8th century AD. The field follows a periodisation based on a superset of historical periods that seems to be agreed upon by most authors, even though there are slight differences:

— Old Japanese (Jōko Nihongo上古日本語, abbreviated OJ): this is usually equated with the historical Nara Period (710-794). It represents the period with the earliest written records of Japanese, even though some ritual texts (Norito, s. Philippi 1959:1-4 and Bentley 2001:6-36) that were recorded in that period might originally have been devised a century or so prior to their publication date.
— Late Old Japanese, also called Classical Japanese by some (Chūko Nihongo中古日本語, abbreviated as CJ): This is usually equated with the historical Heian Period (794-1185).
— Middle Japanese (Chūsei Nihongo中世日本語): this spans the three historical periods, which are usually referred to as the Japanese Middle Ages: Kamakura, Muromachi and Azuchi-Momoyama, all together from 1185-1603. Contact with the West actually ensues from the mid-1500s, thus the first influx of vocabulary from Western languages still falls towards the end of Middle Japanese.
— Early Modern Japanese (Kinsei Nihongo近世日本語): This is usually equated with the Edo Period (1604-1867), a period where Japan isolates itself politically, but where a trading outpost is maintained in Nagasaki with an intensive exchange with the Dutch trading mission posted there.
— Modern Japanese (Gendai Nihongo現代日本語): Japan is forced to open up to the outside world in the 1850s and embarks on an endeavour of rapid modernisation and westernisation of the country, ushering in the Meiji Restoration in 1867/8.

As far as the period prior to the first written records is concerned, cf. the remarks under W9 in the section on the Native Japanese stratum, regarding the putative early loans from Classical Chinese into Old Japanese and the alleged substrate items from Austronesian.


This field was only used very sparingly. The default value was set to “regular” for all entries, and only words that were specifically of a highly colloquial or of an exclusively formal nature were marked as such.

Numeric frequency

The frequency figures are based on the data in Kokuritsu Kokugo Kenkyūsho 2005. 70 magazines from the year 1994 were used in the study, with a total of 1,074,617 morphemes (the study usually counted inflectional verb desinences as own words, but as far as derivational morphology goes, this depended on productivity). Due to the relatively low number of tokens, approximately only the 500 most frequent terms from the database were entered into this field.


The point of departure here is the stratification of the Japanese lexicon. The values are as follows:

0. no evidence for borrowing
1. Very little evidence for borrowing
2. perhaps borrowed
3. probably borrowed
4. clearly borrowed

Native Japanese (NJ) stratum

NJ words are by default presumed to be 0. Dictionaries provide excellence resources for almost all words that entered the language after writing was adopted in Japan. The only problem lies with words that have been present in the language since the earliest records. There have been several attempts to link some of the NJ stratum to various languages:

— Chinese: Karlgren 1926 offered a list of 23 Japanese words that might be borrowings from Chinese. He suggested that these might be comparable to "Lehnwörter" which is a term that designates words that have been borrowed into the German language but integrated phonologically in such a way that they are no longer recognizable as loans. Likewise, these early loans in Japanese would be integrated in the NJ lexicon to such an extent that they would not be recognizable as Chinese loans, which would be the case with the much larger SJ vocabulary. In Karlgren’s view, these early loans could provide additional hints for the phonology of Archaic Chinese as they were borrowed well before the Middle Chinese period. Altogether, there is a list of about 29 forms that are said to be borrowed from Archaic Chinese. Except for three or four, on which the academic community seems to widely agree, the validity of most of these remain somewhat controversial, as in Kamei 1954, which is largely a rebuttal to Karlgren 1926, but since they represent probable loan scenarios both in terms of phonology, semantics and cultural context, they are assigned an 3. Miyake X has reviewed the list of loans proposed by Karlgren. If Miyake finds a particular word to be an “invalid” example of a loan (grade F in the study), then a 2 is assigned. If Miyake argues against the word being a loan (grade C in the study), a 2 is only assigned if there are compounding factors such as semantic likelihood of the term being borrowed.

— Austronesian: said to be a substratum of Japanese and as such could be source for borrowings. While Shinmura 1908 is probably the earliest proponent of an Austronesian substrate theory, Polivanov 1918 offers a first systematic proposal. He notes a number of phonological and morphological characteristics that set Japanese apart from Korean and other languages usually said to be genetically related to Japanese, which he ascribes to an Austronesian influence on Japanese, as for instance the presence of some prefixes and the fact that open syllables are typical. They found a supporter in Izui 1953 who proposed a list of sound correspondences extending to about 55 Proto-Malayo-Polynesian (PMP) based on the reconstruction in Dempwolff (1934-8). A more recent account of potential loans from Old Javanese, in Kumar and Rose 2000, has not been considered for this paper, as they do not employ proto-forms available but Old Javanese, which would reflect a time depth of roughly 2000 years, which would make the case for contact even less probable. Most sound correspondences are quite plausible, even though the fact that both Japanese and PMP have simple phonotactics compound the problem. Similarly, most of the semantic relations are also plausible, however the biggest remaining problem is the fact that first, no particular cultural domain can be associated with the allegedly Austronesian words and that second, even though there might be genetic evidence (Kumar and Rose 2000), there is no clear archaeological record that points to contact between the Malayo-Polynesian world and Japan (Peter Bellwood, p.c.). Thus, I assign these words a 1.

— Korean: Martin 1966 links 330 words of NJ to Korean. The vast majority, 243 out of 256, occur in either Old Japanese or Late Old Japanese. Thus, if the Japanese-Korean hypothesis holds, there would be a good chance that these words would be part of the inherited lexicon, so that it would not affect the borrowing scale here. However, I have annotated the entries where applicable. There are a number of items disputed to be borrowings from Korean, which are usually doubted by Japanese scholars. They are assigned either a 2 or 3, depending on how much support a theory has.

Sino-Japanese (SJ) stratum

The SJ stratum mainly consists of borrowings from Chinese, however there are some exceptions:

— so-called ateji, where Chinese characters have only been used for their phonetic value. The most important example of this in the database is kega “wound” written with the characters for “blame (v), suspicious” and “I”, ultimately a NJ word written in a SJ manner.
— where a NJ term is reanalyzed as a SJ term. For example, the NJ term kaerigoto 返事 “answer” was reanalysed as a SJ term henji, which is treated as a calque and hence as 0.
— there are a number of SJ terms that are likely to be Japanese neologism. Roughly, two groups should be differentiated: neologisms created before the Meiji era modernization of the second half of the 19th century and neologisms created before that. It is usually acknowledged that during the modernization of the 19th century in East Asia, Japanese took some kind of pioneering role and thus created a lot of neologism that later spread on to Korean and Chinese. For the time before the Meiji Modernization, it is somewhat more unlikely that a term was coined in Japan and spread on to Chinese and Korean, which means that a neologism from this era should ideally be unique to Japanese.
o Meiji modernisms are classified as 0 if a reputable source can be found to attest for its status as a Japanese neologism (e.g. notation in Morohashi as a Japanese coinage). However if there is disagreement in several sources, they are classified as 1 or 2 depending on the credibility of said sources
o It is classified as 1 if the term in question does not have currency in Mandarin (having no currency means here that it is not recorded in any of the big Chinese dictionaries and that Google hits of Chinese websites mainly point to a Japanese context) and if it belongs to a semantic domain typically associated with modernisation after the model of the West, e.g. Western style military, technology, Western social concepts.
o It is classified as 2 if the term in question does not have currency in Mandarin, but does not particularly belong to the domain of modernization. Some other criteria used are
- its absence in old texts (this can be a tricky criterion and sometimes only be proven by omission, i.e. if dictionaries that usually cite classical Chinese texts as sources do not mention such sources for the word in question
- if it is fairly frequent in Japanese but rare in Chinese
o It is classified as 3 if a source can be found in both Japanese and Chinese, even if there is a semantic divergence, but nevertheless seems to be absent from classical texts. This still might not rule out a coinage in Japan and subsequent borrowing into Chinese.
o It is classified as 4 if sources point towards a Chinese coinage, or there are precursors of this in ancient texts (for example, if a word first appears in the modernization era in a Japanese text, but nevertheless was already used in a classical Chinese text)
— for those Japanese SJ words from the pre-modernization period, the following rules apply:
o if a word is clearly referenced to be a Japanese coinage, it is accorded a 0. A 1 or 2 if sources are conflicting each other.
o If a word is not in currency in Mandarin, it is accorded a 3
o A 4 in all other cases. The basic guideline for pre-modernization SJ words is that the default assumption will be borrowing from Chinese rather than coinage within Japanese.

Foreign stratum

As for the Foreign stratum, the individual word etymologies are usually quite well-documented so that the assignment of either 0 (Japanese coinage) or 4 (borrowing) should be relatively straightforward.


As for a working definition, the following guidelines were used:
— a true calque is a loan translation, with all components being translated from another language
— a calque in a less strict sense is a “loan rendition”, (in the case of a compound consisting of two elements), with one component being translated from another language and the other element with a semantically similar term, e.g. German Wolkenkratzer after English skyscraper.
— not a calque: a “loan formation” (Lehnschöpfung), which means a term whose coinage was due to external influence.

Both loan translations and loan renditions were classified as calques, but it was mentioned in “comments on borrowed” W10 what class they fall into.
The working hypothesis was that any NJ words that could potentially be calques, would be based on Western languages, mostly of the Meiji modernization era, but some of the terms also from the 15/16th century onwards when Japan came into contact with the West.

As for Meiji modernization era vocabulary,
- if a word can reliably, by way of references, shown to be based on a Western item, 4 if it is a true calque and 3 if it is a loan translation (also record the distinction between those in the field “created on loan basis”.)
- if a word is a true neologism, not based on a classic Chinese term, but no further references can be found, 3 if it is a true calque and 2 if it is a loan translation

Some potentially problematic cases were classified as follows:

- In some cases a NJ term is reanalyzed as a SJ one. For example, the NJ term kaerigoto 返事 “answer” was reanalyzed as a SJ term henji. Cases like this were classified as a calque.
- Another tricky issue involves the case of potential NJ calques modeled after Chinese words. If a Chinese word exists with the same characters, a “1” was given, a “2” if the resulting NJ structure runs counter to the structure encountered in other NJ words (for instance, Chinese compounds usually exhibit a VO structure, while NJ compounds usually follow a OV pattern).
- Another problem for NJ vocabulary is the question what to do with words that are likely calques in Chinese that were then borrowed into Japanese, e.g. byōin 病院 “hospital” which is said to have been coined after a Dutch word. These cases have not been marked as calques.

Borrowed base

For words that are not loans but contain a borrowed element, this field is used to indicate this. It is also used to indicate whether a word is suspected to be a loan formation. Since loan formations are not easily recognisable and cannot be determined without more detailed philological work, these can only be conjectures. Typically, when a given word was representative of scientific or technological knowledge imported from the West in the course of the Meiji modernization, the word has usually been coded as a loan formation; all the more if it was marked by a dictionary as such. For a short discussion of loan formations and the somewhat similar category of loan translations or calques, see the annotation to the Calqued field.

Comment on borrowed

This field has been used for the following purposes:

— For SJ terms that were attested in annotated Chinese text. From the beginning of record keeping until the dawn of the Modern Period, the majority of administrative, religious and academic works was written in Chinese (kanbun). However, concomitant with the decline of linguistic competence in Chinese, texts were increasingly read in a peculiar mixed style called hentai-kanbun, which made it possible to read out a Chinese text in Japanese by means of adding Japanese desinences and particles as diacritic signs and transforming the Chinese word order into a Japanese one by means of reversal and return marks. For an English language introduction, see Crawcour 1965. If the Nippon Kokugo Daijiten gave such a text as earliest reference, it was used for the Age field, since such a text was intended to be read as Japanese and hence it could be argued that said word at this point had entered the Japanese language. Furthermore, not in all cases was the earliest reference for a “pure” Japanese text (wabun) also given. Nevertheless, in all cases the date of the wabun reference was entered here.
— For SJ terms that are controversial, especially from the Meiji Modernization, it has been noted whether these terms in questions are likely to be neologisms or calques. For disputedly borrowed NJ terms, no specific mention has been made in this field, but cf. the remarks in the annotation to the Borrowed field.
— Any word marked as calque was also commented upon in this field. Cf. also remarks below on calques in the annotation to the Calqued field .
— Also, information on when an entity the term is referring to is thought to have been brought to Japan is recorded in this field.
— This field has also been used for additional remarks that would not fit anywhere else. For instance, synonyms for native words were recorded here (whereas for borrowed words, synonyms were usually entered in the field Other comments).

Loan history

If a loanword was by itself a loanword from a different language, the earliest form, for which written records were available, was entered. Thus, in a case like oriibu, which came from the English olive, the Greek word elaion was entered into this field, while the "Loan history" field reflects the fact that the word was borrowed into Latin from the Greek, and then was borrowed from the Old French into the English language.

Some loanwords from Chinese were in turn calques from Sanskrit. In such a case, the information concerning the original Sanskrit term which was the source of the calque was entered into the "earlier source word" fields, with the "Loan history" field reflecting the status as a calque.

If a word is considered to be a loan translation of some sort, then the information concerning the assumed source word for the translation was entered into the "earlier source word" fields, with the "Loan history" field indicating the loan translation assumption.


Each single entry has been checked with the Nippon Kokugo Daijiten, which accordingly was usually not given as a source. Other references were usually given in the author year format, with the following exceptions:

Daigenkai: Daigenkai, Ōtsuki (ed.)
MW: MerriamWebster Online Dictionary.
NKDJ: Nippon Kokugo Daijiten.
OED: Oxford English Dictionary.

Daigenkai 大言海. 1932. Ōtsuki (ed.).
Ekushiido eiwa jiten エクシード英和辞典. 2005. 126,000 words. (EJ)
Ekushiido waei jitenエクシード和英辞典. 2005. 94,000 words. (JE)
Gendaigo kara kogo ga hikeru kogo ruigo jiten. 現代語から古語が引ける古語類語辞典. 2007.
Serifu (ed.) 10,000 Modern Japanese words, corresponding to 50,000 Old Japanese words.
Jidaibetsu kokugo daijiten – Jōdaihen時代別国語大辞典上代編. 1967. Omodaka (ed.).
8,500 Old Japanese words. Sanseidō.
Jidaibetsu kokugo daijiten – Muromachi jidaihen時代別国語大辞典室町時代編. 1985-
2001. Omodaka (ed.). approx. 60,000 words from the Muromachi and Azuchi-
Momoyama eras. Sanseidō.
Konsaisu gairaigo jiten コンサイス外来語辞典. 1972. 20,000 foreign words.
Nippon kokugo daijiten 日本国語大辞典. 2000². 600,000 words. Shōgakkan.
Sanseidō sūpā daijirin 三省堂スーパー大辞林. 2005. 233,000 words.

Chinese and Chinese characters

Daikanwa Jiten大漢和辞典. 1955. Morohashi (ed.). 50,000 characters and 530,000 words.
Chóngbiān guóyŭ cídiăn. 重編國語辭典. 2005. Approx. 170,000 words. Ministry of
Education, Taiwan.
Shinmeikai Kanwa Jiten新明解漢和辞典. 1992. 12,200 characters.
Xīn déhàn cídiăn 新德汉词典. 1985. 6,000 characters, 70,000 words. (GC)


Esenseu hanyeong sajeon 엣엔스 한영 사전. 2000. 105,000 words.
Poketto puroguresshibu kannichi nikkan jiten ポケットプログレッシブ韓日日韓辞典.
2004. 70,000 KJ, 20,000 JK. Shōgakkan.


Merriam Webster Online Dictionary. http://www.merriam-webster.com/ (accessed on various dates)
Oxford English dictionary.


Akamatsu, Tsutomu. (2000). Japanese phonology: A functional approach. München: LINCOM
Beckwith, Christopher I. 2004. The language of Japan’s continental relatives. Leiden: Brill.
Benedict, Paul K. 1990. Japanese/Austro-Tai. Ann-Arbor: Karoma.
Bentley, John R. 2001. A descriptive grammar of early Old Japanese prose. Leiden: Brill.
Carroll, Tessa. 2001. Language planning and language change in Japan. Richmond, Surrey:
Comrie, Bernard, Haspelmath, Martin and Bickel, Balthasar. 2004. The Leipzig Glossing Rules:
  Convention for interlinear morpheme-by-morpheme glosses. Downloadable from
Crawcour, Sydney. 1965. An Introduction to Kanbun. Michigan Center for Japanese Studies.
Crystal, David. 1997. English as a global language. Cambridge University Press.
Dempwolff, O. 1934-8. Vergleichende Lautlehre des Austronesische Wortschatzes. Beihefte
zur Zeitschrift für Eingeborenen-Sprachen, vols 1-3: 15, 17, 19. Berlin: Dietrich
Frellesvig, Bjarke. 1995. A case study in diachronic phononology – the Japanese onbin sound
changes. Aarhus University Press.
Gottlieb, Nanette. 1995. Kanji politics. London: Kegan Paul.
Habein, Yaeko S. 1984. The history of the Japanese written language. University of Tokyo Press.
Hamano, Shoko. 1998. The sound-symbolic system of Japanese. Tokyo: Kurosio.
Hane, Mikiso. 1991. Premodern Japan – a historical survery. Oxford: Westview Press.
Hayashi, Ōki. 1982. Nihongo no goi no hyōki. In Saitō (ed.), Nihongo no goi no tokushoku, pp.

Irwin, Mark. 2005. Rendaku-based lexical hierarchies in Japanese: the behaviour of Sino-Japanese
mononoms in hybrid noun compunds. Journal of East Asian Linguistics 14, pp. 121-153.
Ito, Junko and Armin Mester. 1996. The phonological lexicon. In Natsuko Tsujimura (ed.), The
handbook of Japanese linguistics, pp. 62-100. Cambridge, MA: Blackwell
Izui, Hisanosuke泉井久之助. 1953. Nihongo to nantō shogo 日本語と南島諸語 [Japanese and
  the Languages of the Southern islands]. Minzokugaku Kenkyu 17.2.
Jacobsen, Wesley. 1992. The transitive structure of events in Japanese. Tokyo: Kurosio.
Kamei, Takashi 亀井孝. 1954. Chinese borrowings in prehistoric Japanese. Tokyo.
Karlgren, Bernhard. 1926. Philology and Ancient China, chapter VI. Oslo.
Karlgren, Bernhard. 1957. Grammatica serica recensa. Stockholm.
Kokuritsu Kokogu Kenkyūsho 国立国語研究所. (1962). Gendai zasshi kyūjusshu no yōgo yōji, i:
Sōki oyobi goihyō 現代雑誌九十種の用語用字:総記及び概評.
Kokuritsu Kokugo Kenkyūsho国立国語研究所. 2005. Gendaizasshi no goi chōsa – 1994nen
hakkō 70shi 現代雑誌の語彙調査 -1994年発行70誌-. Tokyo: Kokuritsu
Kokugo Kenkyūsho.
Kumar, Ann and Phil Rose. 2000. Lexical evidence for early contact between Indonesian
languages and Japanese. Oceanic Linguistics 39-2, pp. 219-255.
Lewin, Bruno. 1976. Japanese and Korean: the problems and history of a linguistic comparison.
Journal of Japanese Studies, Vol. 2-2, pp. 389-412.
Loveday, Leo J. 1996. Language contact in Japan: A socio-linguistic history. Oxford.
McCawley, James D. 1968. The phonological component of a grammar of Japanese. The Hague:
Martin, Samuel E. 1966. “Lexical evidence relating Korean to Japanese”. Language, vol 42, pp.
Martin, Samuel E. 1988. A reference grammar of Japanese. Vermont: Tuttle.
Miller, Roy Andrew. 1967. The Japanese language, chapter 6. The University of Chicago
Miller, Roy Andrew. 1971. Japanese and the other Altaic languages. University of Chicago Press.
Miller, Roy Andrew. 1996. Languages and history: Japanese, Korean and Altaic. Bangkok: White
Orchid Press.
Miyajima, Tatsuo 宮島達夫. 1997. Zasshi kyūjisshu hyōkihō no tōkei 雑誌九十種の統計. In
Nihongo kagaku 1, pp. 92-104.
Ota, Mitsuhiko. 2004. The learnability of the stratefied lexicon. Journal of Japanese Linguistics 20,
pp. 19-40.
Patrie, James. 1982. The genetic relationship of the Ainu language. University of Hawai’i Press.
Philippi, Donald L. 1959. Norito: a translation of the ancient Japanese ritual prayers. Princeton
University Press.
Polivanov, E.D. 1918. “One of the Japanese-Malayan parallels”. Reprinted in Selected works,
compiled by A.A. Leont’ev, 1974.
Pulleyblank, Edwin. G. 1991. Lexicon of reconstructed pronunciation in early Middle Chinese,
late Middle Chinese, and early Mandarin. Vancouver: UBC Press.
Shibatani, Masayoshi. 1990. The languages of Japan. Cambridge University Press.
Shinmura, Izuru . 1908. Kokugo keitō no mondai. 國語系統の問題[The question of the
geneaology of the Japanese language].
Sohn, Ho-Min. 1999. The Korean language. Cambridge University Press.
Tajima, Masaru. 1998. Kindai kanji hyōkigo no kenkyū. Osaka: Izumi.
Umegaki, Minoru 楳垣実. 1963. Nippon gairaigo no kenkyū日本外来語の研究.
Vance, Timothy J. 1987. An introduction to Japanese phonology. Albany: State University of New
York Press.
Vovin, Alexander. 1994. “Is Japanese related to Austronesian?” In: Oceanic Linguistics, vol. 33,
No. 2, pp. 369-390.
Vovin, Alexander. 2003. “The genetic relationship of Japanese: Where do we go from here? 日本
語系討論の現在:これからどこへ”. In: Vovin and Osada (eds.), pp.15-40.
Vovin, Alexander and Toshiki Osada 長田俊樹 (eds.). 2003. Perspectives on the origins of the
Japanese language 日本語系討論の現在. Tokyo: International Research Center for
Japanese Studies.
Whitman, John B. 1985. The phonological basis for comparison of Japanese and Korean. Ph.D.
dissertation, Harvard University.
Yamada Yoshio. 山田孝雄. 1940. Kokugo no naka ni okeru kango no kenkyū 國語のなかに於け
  る漢語の研究. Tokyo: Hōbunkan.
Yamada Yūichirō. 2005. Gairaigo no shakaigaku 外来語の社会学. Tokyo: Shumpūsha.
Yoshitake, Saburō. Etymology of the Japanese word fude. Bulletin of the School of Oriental studies VI. [cited in Kamei, with no further details]



When a new word is borrowed into a language, and if there already is an existent word, the new word rarely ever simply ‘replaces’ the old one. The two words are in co-existence until one of them might fall into disuse. I have assumed a replacement to have taken place when an older term has fallen into disuse or has become exceedingly rare. The exact details will only become clear after a thorough philological analysis. If it was unclear if an older term existed and if the concept was unlikely to have been newly introduced to Japan, the value was set to “No Information”. A number of examples for this, taken from the database:

• shinseki ‘relatives’ replaced ukara, yakara (obsolete)
• jishin ‘earthquake’ replaced nawi (can still be found in dialects)
• bokujō ‘pasture’ replaced maki (poetic language)
• koji ‘orphan’ replaced minashigo (poetic)
• basu ‘bus’ replaced noriaijidōsha (might be used in formal language)
• denki ‘electricity’ replaced ereikishiteito (obsolete)
• ninshin suru ‘get pregnant’ replaced haramu (rare)
• shitai ‘corpse’ replaced shikabane (rare)
• kenkō ‘healthy’ replaced sukoyaka (rare)

A variety of sources was used in the attempt to find words used in older periods of Japanese that might have fallen out of use by now: the Nippon Kokugo Daijiten, which was used for any entry, and in some cases Daigenkai and Serifu 2007.


If it can be assumed that the concept associated with term was introduced newly to Japan alongside with the word, this value was used.


This was used when the old term continues to be used alongside the newly borrowed term. Loveday 1996:80-82 points out that there is a number of cases where a NJ or SJ term is in co-existence with an English loanword, and where the former refers to a more traditional version of the concept/object, while the English term to a more modern, Western one. For instance, tatami is the traditional Japanese matting, while kāpetto is a Western-style carpet. Instances like these do not appear often in the database, but as much as they do, they have been treated as cases of co-existence, as for instance in the case of tobira and doa “door”.


This has been seen in direct correlation to what stratum a word belongs to. Some loanwords are no longer perceived as such, and felt to be part of the NJ stratum, and were thus marked “highly integrated”. SJ words have been part of the lexicon for so long that they can be analysed as “highly integrated”, which leaves borrowings from other foreign languages that are “unintegrated”.


I have usually assumed that if a term was available in Old Japanese, the concept was present prior to contact. An exception are those words presented in Karlgren 1926, which do not refer to clear cases of innovations transmitted from China to Japan, which are marked as “no information”.

Exotic animals and objects that might now be present in zoos or museums have been marked as “not present”.


AR: adjectivizer
NR: nominalizer
SUFF: suffix
VR: verbalizer