Japanese writing system

Japanese
Japanese;
	Japanese novel using kanji kana majiri bun (text with both kanji and kana), the most general orthography for modern Japanese. Ruby characters (or furigana) are also used to transcribe kanji words (in modern publications these would generally be omitted for well-known kanji). The text is in the traditional tategaki ("vertical writing") style; it is read down the columns and from right to left, like traditional Chinese. Published in 1908.
Script type	mixed logographic (Kanji), syllabic (hiragana and katakana)
Period	4th century AD to present
Direction	Top-to-bottom, right-to-left; Left-to-right, top-to-bottom;
Languages	Japanese language; Ryukyuan languages; Hachijō language
Related scripts
Parent systems	(See kanji and kana) Japanese;
ISO 15924
ISO 15924	Jpan (413), Japanese (alias for Han + Hiragana + Katakana)
Unicode
Unicode range	U+4E00–U+9FBF Kanji; U+3040–U+309F Hiragana; U+30A0–U+30FF Katakana

The modern Japanese writing system (日本語の表記体系, Nihongo no hyōki taikei) uses a combination of logographic kanji, which are adopted Chinese characters, and syllabic kana. Kana itself consists of a pair of syllabaries: hiragana, used primarily for native or naturalized Japanese words and grammatical elements; and katakana, used primarily for foreign words and names, loanwords, onomatopoeia, scientific names, and sometimes for emphasis. Almost all written Japanese sentences contain a mixture of kanji and kana. Because of this mixture of scripts, in addition to a large inventory of kanji characters, the Japanese writing system is considered to be one of the most complicated currently in use.¹²³

Several thousand kanji characters are in regular use, which mostly originate from traditional Chinese characters. Others made in Japan are referred to as "Japanese kanji" (和製漢字, wasei kanji), also known as "[our] country's kanji" (国字, kokuji). Each character has an intrinsic meaning (or range of meanings), and most have more than one pronunciation, the choice of which depends on context. Japanese primary and secondary school students are required to learn 2,136 jōyō kanji as of 2010.⁴ The total number of kanji is well over 50,000, though this includes tens of thousands of characters only present in historical writings and never used in modern Japanese.

In modern Japanese, the hiragana and katakana syllabaries each contain 46 basic characters, or 71 including diacritics. With one or two minor exceptions, each different sound in the Japanese language (that is, each different syllable, strictly each mora) corresponds to one character in each syllabary. Unlike kanji, these characters intrinsically represent sounds only; they convey meaning only as part of words. Hiragana and katakana characters also originally derive from Chinese characters, but they have been simplified and modified to such an extent that their origins are no longer visually obvious.

Texts without kanji are rare; most are either children's books—since children tend to know few kanji at an early age—or early electronics such as computers, phones, and video games, which could not display complex graphemes like kanji due to both graphical and computational limitations.

To a lesser extent, modern written Japanese also uses initialisms from the Latin alphabet, for example in terms such as "BC/AD", "a.m./p.m.", "FBI", and "CD". Romanized Japanese is most frequently used by foreign students of Japanese who have not yet mastered kana, and by native speakers for computer input.

Use of scripts

Kanji

Kanji (漢字) are logographic characters (Japanese-simplified since 1946) taken from Chinese script and used in the writing of Japanese.

It is known from archaeological evidence that the first contacts that the Japanese had with Chinese writing took place in the 1st century CE, during the late Yayoi period. However, the Japanese people of that era probably had little to no comprehension of the script, and they would remain relatively illiterate until the 5th century CE in the Kofun period, when writing in Japan became more widespread.

Kanji characters are used to write most content words of native Japanese or (historically) Chinese origin, which include the following:

many nouns, such as 川 (kawa, "river") and 学校 (gakkō, "school")
the stems of most verbs and adjectives, such as 見 in 見る (miru, "see") and 白 in 白い (shiroi, "white")
the stems of many adverbs, such as 速 in 速く (hayaku, "quickly") and 上手 as in 上手に (jōzu ni, "masterfully")
most Japanese personal names and place names, such as 田中 (Tanaka) and 東京 (Tōkyō). (Certain names may be written in hiragana or katakana, or some combination of these, plus kanji.)

In official administrative and legal contexts, the use of kanji is explicitly regulated to ensure absolute uniformity. According to the Kōyōbun guidelines and the 2010 Cabinet Instruction on Kanji Usage (公用文における漢字使用等について, Kōyōbun ni okeru kanji shiyō tō ni tsuite), verbs, adjectives, and adverbs are prescriptively written in kanji, subject strictly to the extensive hiraki exceptions detailed in the #Hiragana section below. Conversely, the 2010 Cabinet Instruction explicitly mandates that certain grammatical categories, which might otherwise be opened into hiragana in casual or commercial writing, must strictly retain their kanji. This includes a specific set of pronouns (e.g. 俺, 彼, 誰, 何, 僕, 私, 我々), adnominals (e.g. 明くる, 大きな, 来る, 去る, 小さな, 我が), and the four primary statutory conjunctions (及び, 並びに, 又は, 若しくは). Furthermore, the directive provides an exhaustive checklist of thirty-six specific adverbs that must default to kanji (余り, 至って, 大いに, 恐らく, 概して, 必ず, 必ずしも, 辛うじて, 極めて, 殊に, 更に, 実に, 少なくとも, 少し, 既に, 全て, 切に, 大して, 絶えず, 互いに, 直ちに, 例えば, 次いで, 努めて, 常に, 特に, 突然, 初めて, 果たして, 甚だ, 再び, 全く, 無論, 最も, 専ら, 僅か, 割に).

Some Japanese words are written with different kanji depending on the specific usage of the word—for instance, the word naosu (to fix, or to cure) is written 治す when it refers to curing a person, and 直す when it refers to fixing an object.

Most kanji have more than one possible pronunciation (or "reading"), and some common kanji have many. These are broadly divided into on'yomi, which are readings that approximate to a Chinese pronunciation of the character at the time it was adopted into Japanese, and kun'yomi, which are pronunciations of native Japanese words that correspond to the meaning of the kanji character. However, some kanji terms have pronunciations (gikun) that correspond to neither the on'yomi nor the kun'yomi readings of the individual kanji within the term, such as 明日 (ashita, "tomorrow") and 大人 (otona, "adult").

Unusual or nonstandard kanji readings may be glossed using furigana. Kanji compounds are sometimes given arbitrary readings for stylistic purposes. For example, in Natsume Sōseki's short story The Fifth Night, the author uses 接続って for tsunagatte, the gerundive -te form of the verb tsunagaru ("to connect"), which would usually be written as 繋がって or つながって. The word 接続, meaning "connection", is normally pronounced setsuzoku.

Kana

Kana (仮名) are a set of syllabic scripts used in the Japanese writing system. The term originally means “provisional” or “borrowed names.” In modern usage, kana are primarily divided into hiragana and katakana, though historically they also included other forms such as man’yōgana and sōgana. All types of kana ultimately derive from Chinese characters(kanji): they borrow the original phonetic values of kanji and were developed from simplified forms or components of those characters.

Historically, there was no strict functional distinction between hiragana and katakana. Texts could be written in a mixture of Chinese characters and either form of kana entirely alone without a fixed division of roles. This situation changed following language reforms after World War II, which established a functional division between the two syllabaries comparable to the distinction between lowercase and uppercase letters in Western alphabetic scripts.

This functional division closely parallels the historical development of the Latin alphabet: just as cursive Latin minuscules evolved to serve as the majority lowercase text while inscriptional Roman capitals were reserved for a minority uppercase role (such as proper nouns and emphasis), the flowing hiragana now functions as the default baseline script, with the angular, regular-script-derived katakana relegated to specialized minority applications.

In contemporary Japanese, the pronunciation and orthography of kana are regulated by the system known as modern kana usage (gendai kanazukai).

Hiragana

Hiragana (平仮名) emerged as a manual simplification via cursive script of the most phonetically widespread kanji among those who could read and write during the Heian period (794–1185). The main creators of the current hiragana were ladies of the Japanese imperial court, who used the script in the writing of personal communications and literature, such as The Tale of Genji.

Hiragana is used to write the following:

okurigana (送り仮名)—inflectional endings for adjectives and verbs—such as る in 見る (miru, "see") and い in 白い (shiroi, "white"), and respectively た and かった in their past tense inflections 見た (mita, "saw") and 白かった (shirokatta, "was white").
joshi (助詞)—small, usually common words that, for example, mark sentence topics, subjects and objects or have a purpose similar to English prepositions such as "in", "to", "from", "by" and "for".
miscellaneous native Japanese (和語, wago) or Sino-Japanese (漢語, kango) words that lack a kanji rendition, or whose kanji are obscure or considered too difficult to understand for the context (such as in children's books). In official administrative writing, the Kōyōbun guidelines mandate that words containing characters or readings outside the Jōyō table must be written in hiragana—provided they cannot be paraphrased, are not clarified using furigana, and are not biological or chemical terms (which default to katakana).
furigana (振り仮名)—phonetic renderings of hiragana placed above or beside the kanji character. Furigana may aid children or non-native speakers or clarify nonstandard, rare, or ambiguous readings, especially for words that use kanji not part of the jōyō kanji list.

There is also some flexibility for words with common kanji renditions to be instead written in hiragana, depending on the individual author's preference (all Japanese words can be spelled out entirely in hiragana or katakana, even when they are normally written using kanji). In the publishing and proofreading industries, the deliberate replacement of a kanji with its hiragana reading is known as hiraki (開き, "opening"), while the reverse is called toji (閉じ, "closing"). Writing words in kanji might give them a more formal, visually dense tone, while employing hiraki may impart a softer, more accessible, or emotional feeling.⁵ For example, the word kawaii, the Japanese equivalent of "cute", can be written entirely in hiragana as in かわいい, or with kanji as 可愛い. Beyond establishing an aesthetic tone, professional editors actively utilize hiraki to manage a text's "kanji ratio" (kanji no ganyūritsu (漢字の含有率)), aiming to prevent visual fatigue caused by overly dark, character-dense lines. To this end, modern style guides routinely prescribe hiraki for conjunctions (e.g. しかし instead of 然し), adverbs, and words utilizing complex phonetic ateji (e.g. さすが instead of 流石) where the characters merely represent sound rather than providing tangible semantic context to the reader.

Some lexical items that are normally written using kanji have become grammaticalized in certain contexts, functioning as auxiliary verbs (hojo dōshi (補助動詞)) or formal nouns (keishiki meishi (形式名詞)). According to standard Japanese orthographic conventions, especially the government's official document guidelines (公用文作成の考え方, Kōyōbun sakusei no kangaekata),⁶ these grammaticalized terms are prescriptively written in hiragana. For example, the root of the verb miru (見る, "see") is normally written with the kanji 見 for the mi portion. However, when used as an auxiliary verb as in tameshite miru (試してみる) meaning "to try out", the whole verb is prescribed to be written in hiragana as みる. The guidelines universally extend this hiraki mandate to auxiliary adjectives (e.g. 〜てほしい, 〜てよい), auxiliary verbal extensions (e.g. 〜のようだ, やむを得ない), and particles (e.g. くらい, ほど).

To ensure absolute uniformity, the government guidelines enforce a strict orthographic separation (kakiwake (書き分け)) between substantive and formal meanings for high-frequency polysemous words. Formal nouns like mono (thing), toki (time), and tokoro (place) are spelled with kanji when referring to concrete physical items, specific chronological moments, or geographic locations, but must be opened into hiragana when serving purely abstract grammatical functions (e.g. その時, sono toki, "at that time" vs. したとき, shita toki, "when one did"). This semantic separation dictates the script for verbs and conjunctions as well. For example, aru and nai are written in hiragana for general existence (問題がある; 欠点がない) but retain kanji when explicitly contrasting presence versus absence (財産が有る; 有り・無し) or emphasizing physical location (東に在る). Similarly, naru is opened to hiragana for abstract state changes (1万円になる) but retains kanji for literal composition or transformation (本表から成る, "consists of"; 金に成る, "promotes to a gold general"). The word tomo is written in hiragana when acting as an abstract conjunction (するとともに, "at the same time"), but retains kanji when describing a physical shared action (彼と共に, "together with him"). Furthermore, words whose original kanji meanings have significantly weakened into idiomatic greetings are prescribed in hiragana (e.g. ありがとう, おはよう, こんにちは, 逆さま), though the root adjective 有り難い retains its kanji.

Beyond grammaticalization, the government mandates hiragana across several specific parts of speech to reduce visual clutter, even when the characters exist in the Jōyō list. This includes the complete paradigm of demonstrative pronouns (e.g. これ, この, そこ), adnominals (e.g. あらゆる, いかなる, いわゆる), and a vast index of common adverbs (e.g. いろいろ, いかに, いずれ, かなり, ここに, たくさん, ちょうど, とても, やがて, よほど, わざと, わざわざ). Suffixes are also strictly opened into hiragana (e.g. げ, とも, たち, ら, ぶる, ぶり, み), as are the native honorific prefix o- and the Sino-Japanese go- when preceding a word written in hiragana due to non-Jōyō characters (e.g. ごちそう, ごもっとも). The guidelines also instruct writers to open certain conjunctions into hiragana (e.g. さらに, また) while reserving kanji for their adverbial or adjectival counterparts (e.g. 更に, 更なる).

Finally, the administrative framework heavily restricts the use of phonetic ateji and jukujikun. Unless explicitly listed in the appendix of the Jōyō Kanji List, these terms must be written in hiragana (e.g. いつ, いかん, 思わく, さすが, すばらしい, たばこ, ちょっと, ふだん, めった). This rule creates strict phonetic locks; for instance, 明後日 and 十八番 are only permitted to be written in kanji without furigana if they are intended to be read formally as myōgonichi and jūhachiban (rather than the colloquial asatte and ohako). Furthermore, to maintain complete structural alignment with national legislative and statutory drafting precedents, everyday prose enforces hiragana for high-frequency structural markers, programmatic conjunctions, and postpositional extensions (e.g. おそれ, かつ, ただし, ただし書, ほか, よる).

Katakana

Katakana (片仮名) emerged around the 9th century, in the Heian period, when scholars—primarily Buddhist monks and court bureaucrats—created a syllabary derived from Chinese characters to simplify their reading, using portions of the characters as a kind of shorthand. The origin of the system is often attributed to the monk Kūkai. Because it was allegedly developed specifically as an academic tool to squeeze Japanese phonetic readings and grammatical markers between the dense columns of regular-script Chinese texts (a practice known as kanbun kundoku), katakana naturally inherited the sharp, geometric aesthetics of the formal characters it was extracted from. Characterized by short, straight strokes and angular corners, it is visually distinct from the cursive hiragana, which evolved simultaneously in the aristocratic courts as a flowing script for native Japanese poetry and personal prose. This scholastic origin permanently cemented katakana as the specialized script for scholarship, foreign elements, and technical annotation. It is alternatively argued that katakana is based on chap'il kugyŏl (字筆口訣, 자필구결) from Silla.⁷

Katakana is used to write the following:

transliteration of foreign words and names, such as コンピュータ (konpyūta, "computer") and ロンドン (Rondon, "London"). This includes wasei-eigo and most foreign place names outside the Sinosphere. However, some older naturalized borrowings may be rendered in hiragana, such as たばこ (tabako, "tobacco") or かるた (karuta, "playing cards"). See also Transcription into Japanese.
commonly used names of animals and plants (such as トカゲ tokage "lizard", ネコ neko "cat", and バラ bara "rose"), alongside academic, technical, and scientific terminology, including chemical and mineral names (e.g. カリウム kariumu "potassium").
In general commercial publishing and unofficial writing, katakana is also used to write native or Sino-Japanese words with less common kanji (e.g. ゴミ gomi "trash", カギ kagi "key", or ローソク rōsoku "candle") in order to make the surrounding sentence easier to parse.
onomatopoeia, such as ワンワン (wan-wan, "woof-woof"), and other sound symbolism.
emphasis, much like italicisation in European languages.⁸ This includes highlighting a specific nuance of a word—for instance, writing Hiroshima in katakana (ヒロシマ) is often used specifically to invoke the context of the atomic bombing, separating it from the geographic city (広島). It is also frequently used to indicate slang, colloquialisms, or rough language (e.g. バカ baka "idiot" or ヤる yaru).
phonetic readings of kanji, such as displaying on'yomi in dictionaries or indicating a non-standard pronunciation.
list enumeration, functioning similarly to "A, B, C" or "1, 2, 3" in English (e.g. ア, イ, ウ), often following the traditional Iroha ordering.

In official administrative and legal contexts, the usage of katakana is strictly regulated. The foundational 1952 official document drafting guidelines (公用文作成の要領, Kōyōbun sakusei no yōryō) established hiragana as the absolute default script for the state, explicitly restricting katakana to "special cases" such as foreign names and loanwords. Crucially, this 1952 directive also codified the orthographic exception allowing deeply naturalized borrowings where the consciousness of their foreign origin had faded—such as tabako (たばこ, "tobacco"), karuta (かるた, "playing cards"), and sarasa (さらさ, "chintz")—to be written in hiragana. Today, while academic publishing universally defaults to katakana for biological and scientific names, the modern administrative guidelines (公用文作成の考え方, Kōyōbun sakusei no kangaekata) mandate katakana for these terms only when their traditional kanji or readings fall outside the Jōyō table (e.g. writing リン酸, rinsan, "phosphoric acid", ヨウ素, yōso, "iodine", or フッ素, fusso, "fluorine"). Furthermore, the guidelines strictly reject the commercial publishing practice of using katakana for general native words lacking Jōyō kanji (such as gomi or kagi). To maintain hiragana as the absolute administrative default, government writers are required to render these words entirely in hiragana (e.g. ごみ, かぎ).

Due to its scholarly pedigree, katakana historically held a more formal status than hiragana. It served as the standard script alongside kanji for official government documents, laws, and imperial edicts from the Meiji period until the end of World War II (such as the Meiji Constitution).

In modern media, katakana can also be used to impart the idea that words are spoken in a foreign or otherwise unusual accent. It is frequently employed to visually represent the broken or heavily accented Japanese of non-native speakers, or the flat, emotionless delivery of robots and early text-to-speech software.

Rōmaji

The first contact of the Japanese with the Latin alphabet occurred in the 16th century, during the Muromachi period, when they had contact with Portuguese navigators, the first European people to visit the Japanese islands. The earliest Japanese romanization system was based on Portuguese orthography. It was developed around 1548 by a Japanese Catholic named Anjirō.

The Latin alphabet is used to write the following:

Latin-alphabet acronyms and initialisms, such as NATO or UFO
Japanese personal names, corporate brands, and other words intended for international use (for example, on business cards, in passports, etc.)
foreign names, words, and phrases, often in scholarly contexts
foreign words deliberately rendered to impart a foreign flavour, for instance, in commercial contexts
other Japanized words derived or originated from foreign languages, such as Jリーグ (jei rīgu, "J. League"), Tシャツ (tī shatsu, "T-shirt") or B級グルメ (bī-kyū gurume, "B-rank gourmet [cheap and local cuisines]")

Arabic numerals

Arabic numerals (as opposed to traditional kanji numerals) are often used to write numbers in horizontal text, especially when numbering things rather than indicating a quantity, such as telephone numbers, serial numbers and addresses. Arabic numerals were introduced in Japan probably at the same time as the Latin alphabet, in the 16th century during the Muromachi period, the first contact being via Portuguese navigators. These numerals did not originate in Europe, as the Portuguese inherited them during the Arab occupation of the Iberian peninsula.

In the modern period, Japanese keyboards, such as the IME (Input Method Editor), primarily default their usage to the fullwidth Unicode Arabic numerals １ as opposed to 1, though most actual usage uses the common halfwidth one 1, especially when used to represent a quantity. The fullwidth character may be used for spacing purposes aesthetically.

Hentaigana

Hentaigana (変体仮名), a set of archaic kana made obsolete by the Meiji reformation, are sometimes used to impart an archaic flavor, like in items of food (esp. soba).

Additional mechanisms

Jukujikun is the writing of words using kanji that reflect the meaning of the word though the pronunciation of the word is entirely unrelated to the usual pronunciations of the constituent kanji. Conversely, ateji is the employment of kanji that appear solely to represent the sound of the compound word but are, conceptually, utterly unrelated to the signification of the word.

Examples

Sentences are commonly written using a combination of all three Japanese scripts: kanji (in red), hiragana (in purple), and katakana (in orange), and in limited instances also include Latin alphabet characters (in green) and Arabic numerals (in black):

Tシャツを3枚購入しました。

The same text can be transliterated to the Latin alphabet (rōmaji), although this will generally only be done for the convenience of foreign language speakers:

Tīshatsu o san-mai kōnyū shimashita.

Translated into English, this reads:

I bought 3 T-shirts.

All words in modern Japanese can be written using hiragana, katakana, and rōmaji, while only some have kanji. Words that have no dedicated kanji may still be written with kanji by employing either ateji (as in man'yogana, から = 可良) or jukujikun, as in the title of とある科学の超電磁砲 (超電磁砲 being used to represent レールガン).

Kanji	Hiragana	Katakana	Rōmaji	English translation
私	わたし	ワタシ	watashi	I, me
金魚	きんぎょ	キンギョ	kingyo	goldfish
煙草 or 莨	たばこ	タバコ	tabako	tobacco, cigarette
東京	とうきょう	トーキョー	tōkyō	Tokyo, literally meaning "eastern capital"
八十八	やそはち	ヤソハチ	yasohachi	eighty-eight
none	です	デス	desu	is, am, to be (hiragana, of Japanese origin); death (katakana, of English origin)

Although rare, there are some words that use all three scripts in the same word. An example of this is the term くノ一 (rōmaji: kunoichi), which uses a hiragana, a katakana, and a kanji character, in that order. It is said that if all three characters are put in the same kanji "square", they all combine to create the kanji 女 (woman/female). Another example is 消しゴム (rōmaji: keshigomu) which means "eraser", and uses a kanji, a hiragana, and two katakana characters, in that order.

Orthographic standards

The statutory baseline for modern Japanese orthography is established by Cabinet Notifications (内閣告示, Naikaku kokuji), most notably the Jōyō Kanji Table and the rules for attaching kana endings. While these notifications provide a broad, permissive framework for the general public, the state requires absolute uniformity for legal and administrative purposes. To achieve this, the government enforces a strict internal filter built upon two foundational pillars: the 2010 Cabinet Instruction on Kanji Usage (公用文における漢字使用等について, Kōyōbun ni okeru kanji shiyō tō ni tsuite), which provides rigid, legally binding checklists for script choices, and the 2022 official document drafting guidelines (公用文作成の考え方, Kōyōbun sakusei no kangaekata), which serve as the modern stylistic and accessibility framework. Because local municipalities, private corporations, and the commercial publishing industry routinely adopt these administrative directives to guarantee formal stylistic safety, this two-pillar framework functions as Japan's de facto standard orthography.

Despite this shared foundation, a recognized divergence exists between rigid administrative conventions and the orthography taught in compulsory school education, stemming from their differing objectives. When applying the Cabinet's rules for okurigana (inflectional kana endings), schools strictly teach the primary, unabbreviated spellings (e.g., 食品売り場, "food counter") to ensure children fully grasp the underlying morphology. Conversely, official state documents utilize the Cabinet's permitted exceptions. The 2010 Cabinet Instruction contains an exhaustive, legally binding list of exactly 186 specific uninflected compound nouns where okurigana must be systematically omitted to maximize conciseness and save physical space (e.g., 食品売場). Acknowledging that the general public relies on the explicit spellings learned in school, the modern 2022 guidelines instruct government writers to deviate from strict administrative standards and revert to educational orthography when drafting public relations materials (広報, kōhō) or general explanations.

In contrast, the grammatical principle of hiraki—writing formal nouns (like こと, koto) and auxiliary verbs in hiragana—demonstrates a top-down convergence. The Ministry of Education does not issue its own exhaustive pedagogical lists of words that must be opened into kana. Instead, to pass strict textbook authorization boards, which require educational materials to reflect "generally accepted societal standards," commercial textbook publishers voluntarily adopt the exhaustive administrative checklists codified in the 2010 Cabinet Instruction. Consequently, the government's internal bureaucratic mandates seamlessly become the fundamental grammar and spelling rules taught to Japanese students.

Historically, the visual landscape of official Japanese documents was governed by the 1952 document drafting guidelines (公用文作成の要領, Kōyōbun sakusei no yōryō). For seventy years, this framework mandated rigid, typewriter-era constraints, such as forcing the use of mazegaki (replacing non-Jōyō kanji in a compound word with hiragana, e.g., writing 改竄 as 改ざん). When the government adopted the 2022 Kangaekata guidelines, the 1952 rules were officially abolished. Modern orthography rejects forced mazegaki in favor of retaining the original kanji and attaching furigana (ruby text) to ensure both etymological clarity and reader accessibility.

Statistics

A statistical analysis of a corpus of the Japanese newspaper Asahi Shimbun from the year 1993 (around 56.6 million tokens) revealed:⁹

Character frequency
Characters	Types	Proportion of corpus (%)
Kanji	4,476	41.38
Hiragana	83	36.62
Katakana	86	6.38
Punctuation and symbols	99	13.09
Arabic numerals	10	2.07
Latin letters	52	0.46

Kanji frequency
Frequency rank	Cumulative frequency (%)
10	10.00
50	27.41
100	40.71
200	57.02
500	80.68
1,000	94.56
1,500	98.63
2,000	99.72
2,500	99.92
3,000	99.97

Collation

Collation (word ordering) in Japanese is based on the kana, which express the pronunciation of the words, rather than the kanji. The kana may be ordered using two common orderings, the prevalent gojūon (fifty-sound) ordering, or the old-fashioned iroha ordering. Kanji dictionaries are usually collated using the radical system, though other systems, such as SKIP, also exist.

Direction of writing

Traditionally, Japanese is written in a format called tategaki (縦書き), which was inherited from traditional Chinese practice. In this format, the characters are written in columns going from top to bottom, with columns ordered from right to left. After reaching the bottom of each column, the reader continues at the top of the column to the left of the current one.

Modern Japanese also uses another writing format, called yokogaki (横書き). This writing format is horizontal and reads from left to right, as in English.

A book printed in tategaki opens with the spine of the book to the right, while a book printed in yokogaki opens with the spine to the left.¹⁰

Spacing and punctuation

Japanese is normally written without spaces between words, and text is allowed to wrap from one line to the next without regard for word boundaries. This convention was originally modelled on Chinese writing, where spacing is superfluous because each character is essentially a word in itself (although compounds are common). However, in text involving kana, readers of Japanese must work out where word divisions lie based on an understanding of what makes sense. For example, あなたはお母さんにそっくりね。 must be mentally divided as あなた / は / お母さん / に / そっくり / ね。 (Anata wa okāsan ni sokkuri ne; "You're just like your mother"). In rōmaji, it may sometimes be ambiguous whether an item should be transliterated as two words or one. For example, 愛する ("to love"), composed of 愛 (ai; "love") and する (suru; (here a verb-forming suffix)), is variously transliterated as aisuru or ai suru. Particles, like the possessive particle の in 君の犬 ("your dog"), are sometimes joined with the preceding term (kimino inu), or written as separate words (kimi no inu).

Words in potentially unfamiliar foreign compounds, normally transliterated in katakana, may be separated by a punctuation mark called a 中黒 (nakaguro; "middle dot") to aid Japanese readers. For example, ビル・ゲイツ (Biru Geitsu; Bill Gates). This punctuation is also occasionally used to separate native Japanese words, especially in concatenations of kanji characters where there might otherwise be confusion or ambiguity about interpretation, and especially for the full names of people.

The Japanese full stop (。) and comma (、) are used for similar purposes to their English equivalents, though comma usage can be more fluid than is the case in English. There is no clear standard of where the positions of commas should be inserted in a Japanese sentence.¹¹ The question mark (？) is not used in traditional or formal Japanese, but it may be used in informal writing, or in transcriptions of dialogue where it might not otherwise be clear that a statement was intoned as a question. The exclamation mark (！) is restricted to informal writing. Colons and semicolons are available but are not common in ordinary text. Quotation marks are written as 「 ... 」, and nested quotation marks as 『 ... 』. Several bracket styles and dashes are available.

History of the Japanese script

Importation of kanji

Japan's first encounters with Chinese characters may have come as early as the 1st century AD with the King of Na gold seal, said to have been given by Emperor Guangwu of Han in AD 57 to a Japanese emissary.¹² However, it is unlikely that the Japanese became literate in Chinese writing any earlier than the 4th century AD.¹²

Initially Chinese characters were not used for writing Japanese, as literacy meant fluency in Classical Chinese, not vernacular Japanese. Eventually a system called kanbun (漢文) developed. This system, which closely resembled Classical Chinese in grammar and employed kanji, used diacritics to hint at the Japanese translation. Informal mokkan (木簡) wooden tablets dating from mid-7th to mid-8th century were written in both Classical Chinese and Old Japanese kanbun, suggesting that literacy was widespread in the late 7th century.¹³¹⁴ The earliest surviving written history of Japan, the Kojiki (古事記), compiled sometime before 712, was written in kanbun. Even today Japanese high schools and some junior high schools teach kanbun as part of the curriculum.

The development of man'yōgana

No full-fledged script for written Japanese existed until the development of man'yōgana (万葉仮名), which adapted kanji for their phonetic value (derived from their Chinese readings) rather than their semantic value. Man'yōgana was initially used to record poetry, as in the Man'yōshū (万葉集), compiled sometime before 759, whence the writing system derives its name. Some scholars claim that man'yōgana originated from Baekje, but this hypothesis is denied by mainstream Japanese scholars.¹⁵¹⁶ The modern kana, namely hiragana and katakana, are simplifications and systemizations of man'yōgana.

Due to the large number of words and concepts entering Japan from China which had no native equivalent, many words entered Japanese directly, with a similar pronunciation to the original Chinese. This Chinese-derived reading is known as on'yomi (音読み), and this vocabulary as a whole is referred to as Sino-Japanese in English and kango (漢語) in Japanese. At the same time, native Japanese already had words corresponding to many borrowed kanji. Authors increasingly used kanji to represent these words. This Japanese-derived reading is known as kun'yomi (訓読み). A kanji may have none, one, or several on'yomi and kun'yomi. Okurigana are written after the initial kanji for verbs and adjectives to give inflection and to help disambiguate a particular kanji's reading. The same character may be read several different ways depending on the word. For example, the character 行 is read i as the first syllable of iku (行く; "to go"), okona as the first three syllables of okonau (行う; "to carry out"), gyō in the compound word gyōretsu (行列; "line" or "procession"), kō in the word ginkō (銀行; "bank"), and an in the word andon (行灯; "lantern").

Some linguists have compared the Japanese borrowing of Chinese-derived vocabulary as akin to the influx of Romance vocabulary into English during the Norman conquest of England. Like English, Japanese has many synonyms of differing origin, with words from both Chinese and native Japanese. Sino-Japanese is often considered more formal or literary, just as latinate words in English often mark a higher register.

Script reforms

Meiji period

The significant reforms of the 19th century Meiji era did not initially impact the Japanese writing system. However, the language itself was changing due to the increase in literacy resulting from education reforms, the massive influx of words (both borrowed from other languages or newly coined), and the ultimate success of movements such as the influential genbun itchi (言文一致) which resulted in Japanese being written in the colloquial form of the language instead of the wide range of historical and classical styles used previously. The difficulty of written Japanese was a topic of debate, with several proposals in the late 19th century that the number of kanji in use be limited. In addition, exposure to non-Japanese texts led to unsuccessful proposals that Japanese be written entirely in kana or rōmaji. This period saw Western-style punctuation marks introduced into Japanese writing.¹⁷

In 1900, the Education Ministry introduced three reforms aimed at improving the process of education in Japanese writing:

standardization of hiragana, eliminating the range of hentaigana then in use;
restriction of the number of kanji taught in elementary schools to about 1,200;
reform of the irregular kana representation of the Sino-Japanese readings of kanji to make them conform with the pronunciation.

The first two of these were generally accepted, but the third was hotly contested, particularly by conservatives, to the extent that it was withdrawn in 1908.¹⁸

Pre–World War II

The partial failure of the 1900 reforms combined with the rise of nationalism in Japan effectively prevented further significant reform of the writing system. The period before World War II saw numerous proposals to restrict the number of kanji in use, and several newspapers voluntarily restricted their kanji usage and increased usage of furigana; however, there was no official endorsement of these, and much opposition. However, one successful reform was the standardization of hiragana, which involved reducing the possibilities of writing down Japanese morae down to only one hiragana character per morae, which led to labeling all the other previously used hiragana as hentaigana and discarding them in daily use.¹⁹

Post–World War II

The period immediately following World War II saw a rapid and significant reform of the writing system. This was in part due to influence of the Occupation authorities, but to a significant extent was due to the removal of traditionalists from control of the educational system, which meant that previously stalled revisions could proceed. The major reforms were:

gendai kanazukai (現代仮名遣い)—alignment of kana usage with modern pronunciation, replacing the old historical kana usage (1946);
promulgation of various restricted sets of kanji:
- tōyō kanji (当用漢字) (1946), a collection of 1850 characters for use in schools, textbooks, etc.;
- kanji to be used in schools (1949);
- an additional collection of jinmeiyō kanji (人名用漢字), which, supplementing the tōyō kanji, could be used in personal names (1951);
simplifications of various complex kanji letter-forms shinjitai (新字体).

At one stage, an advisor in the Occupation administration proposed a wholesale conversion to rōmaji, but it was not endorsed by other specialists and did not proceed.²⁰

In addition, the practice of writing horizontally in a right-to-left direction was generally replaced by left-to-right writing. The right-to-left order was considered a special case of vertical writing, with columns one character high, rather than horizontal writing per se; it was used for single lines of text on signs, etc. (e.g., the station sign at Tokyo reads 駅京東, which is 東京駅 from right-to-left).¹⁰ The post-war reforms have mostly survived, although some of the restrictions have been relaxed. The replacement of the tōyō kanji in 1981 with the 1,945 jōyō kanji (常用漢字)—a modification of the tōyō kanji—was accompanied by a change from "restriction" to "recommendation", and in general the educational authorities have become less active in further script reform.²¹ In 2004, the jinmeiyō kanji (人名用漢字), maintained by the Ministry of Justice for use in personal names, was significantly enlarged. The jōyō kanji list was later extended to 2,136 characters in 2010.⁴

Romanization

There are a number of methods of rendering Japanese in Roman letters. The Hepburn method of romanization, designed for English speakers, is a de facto standard widely used inside and outside Japan. The Kunrei-shiki system has a better correspondence with Japanese phonology. There are differences in the romanization, such as Kunrei-shiki writing "ち" as "ti", while the Hepburn writes it as "chi".²² Other systems of romanization include Nihon-shiki, JSL, and Wāpuro rōmaji.

Lettering styles

Variant writing systems

References

Serge P. Shohov (2004). Advances in Psychology Research. Nova Publishers. p. 28. ISBN 978-1-59033-958-9.
Kazuko Nakajima (2002). Learning Japanese in the Network Society. University of Calgary Press. p. xii. ISBN 978-1-55238-070-3.
Seeley 1991, p. ix.
"Japanese Kanji List". www.saiga-jp.com. Archived from the original on March 4, 2016. Retrieved February 23, 2016.
Joseph F. Kess; Tadao Miyamoto (January 1, 1999). The Japanese Mental Lexicon: Psycholinguistics Studies of Kana and Kanji Processing. John Benjamins Publishing. p. 107. ISBN 90-272-2189-8.
文化審議会 (January 7, 2022). "（3）常用漢字表に使える漢字があっても仮名で書く場合" (PDF). 公用文作成の考え方（建議） (in Japanese). p. 11–12.
Vovin, Alexander (2010). "Is Japanese Katakana Derived from Korean Kwukyel?". In Lee, Sang-Oak (ed.). Contemporary Korean Linguistics: International Perspectives. Thaehaksa Publishing. ISBN 978-89-5966-389-7.
"A Walk-Through of the Japanese Alphabet | Motto Japan Media - Japanese Culture & Living in Japan". motto-jp.com. Motto Japan. Archived from the original on November 15, 2025. Retrieved December 22, 2024.
Chikamatsu, Nobuko; Yokoyama, Shoichi; Nozaki, Hironari; Long, Eric; Fukuda, Sachio (2000). "A Japanese logographic character frequency list for cognitive science research". Behavior Research Methods, Instruments, & Computers. 32 (3): 482–500. doi:10.3758/BF03200819. PMID 11029823. S2CID 21633023.
Shiraishi-Miles, Rebecca (March 11, 2017). "Is Japanese Read from Right to Left or Left to Right?". Team Japanese. Retrieved February 7, 2025.
Murata, Masaki; Ohno, Tomohiro; Matsubara, Shigeki (2010). "Automatic Comma Insertion for Japanese Text Generation" (PDF). 2010 Conference on Empirical Methods in Natural Language Processing: 892–901.
Miyake 2003.
Piggott 1990.
Frellesvig, 2010 & 22. sfn error: no target: CITEREFFrellesvig201022 (help)
Shunpei Mizuno, ed. (2002). 韓国人の日本偽史—日本人はビックリ! (in Japanese). Shogakukan. ISBN 978-4-09-402716-7.
Shunpei Mizuno, ed. (2007). 韓vs日「偽史ワールド」 (in Japanese). Shogakukan. ISBN 978-4-09-387703-9.
Twine 1991.
Seeley 1991, pp. 143–144.
Hashi (January 25, 2012). "Hentaigana: How Japanese Went from Illegible to Legible in 100 Years". Tofugu. Retrieved March 11, 2016.
Unger 1996.
Gottlieb 1996.
"ローマ字表記 70年ぶり改定も視野に文化庁の審議会に検討諮問 | NHK" [Agency for Cultural Affairs asks council to consider first romanization change in 70 years]. NHK News. May 14, 2024. Archived from the original on May 14, 2024. Retrieved February 7, 2025.

Sources

Frellesvig, Bjarke (2010). A history of the Japanese language. Cambridge New York: Cambridge University Press. ISBN 978-0-521-65320-6.
Gottlieb, Nanette (1996). Kanji Politics: Language Policy and Japanese Script. Kegan Paul. ISBN 0-7103-0512-5.
Habein, Yaeko Sato (1984). The History of the Japanese Written Language. University of Tokyo Press. ISBN 0-86008-347-0.
Miyake, Marc Hideo (2003). Old Japanese: A Phonetic Reconstruction. RoutledgeCurzon. ISBN 0-415-30575-6.
Piggott, Joan R. (1990). "Mokkan. Wooden Documents from the Nara Period". Monumenta Nipponica. 45 (4): 449–470. doi:10.2307/2385379. ISSN 0027-0741. JSTOR 2385379.
Seeley, Christopher (1984). "The Japanese Script since 1900". Visible Language. XVIII. 3: 267–302.
Seeley, Christopher (1991). A History of Writing in Japan. University of Hawai'i Press. ISBN 0-8248-2217-X.
Twine, Nanette (1991). Language and the Modern State: The Reform of Written Japanese. Routledge. ISBN 0-415-00990-1.
Unger, J. Marshall (1996). Literacy and Script Reform in Occupation Japan: Reading Between the Lines. OUP. ISBN 0-19-510166-9.

External links

The Modern Japanese Writing System: An excerpt from Literacy and Script Reform in Occupation Japan, by J. Marshall Unger.
The 20th Century Japanese Writing System: Reform and Change by Christopher Seeley
Japanese Hiragana Conversion API by NTT Resonant Archived March 2, 2016, at the Wayback Machine
Japanese Morphological Analysis API by NTT Resonant Archived May 5, 2016, at the Wayback Machine

[Shohov2004-1] Serge P. Shohov (2004). Advances in Psychology Research. Nova Publishers. p. 28. ISBN 978-1-59033-958-9.

[Nakajima2002-2] Kazuko Nakajima (2002). Learning Japanese in the Network Society. University of Calgary Press. p. xii. ISBN 978-1-55238-070-3.

[FOOTNOTESeeley1991ix-3] Seeley 1991, p. ix.

[:0-4] "Japanese Kanji List". www.saiga-jp.com. Archived from the original on March 4, 2016. Retrieved February 23, 2016.

[KessMiyamoto1999-5] Joseph F. Kess; Tadao Miyamoto (January 1, 1999). The Japanese Mental Lexicon: Psycholinguistics Studies of Kana and Kanji Processing. John Benjamins Publishing. p. 107. ISBN 90-272-2189-8.

[Kangaekata-6] 文化審議会 (January 7, 2022). "（3）常用漢字表に使える漢字があっても仮名で書く場合" (PDF). 公用文作成の考え方（建議） (in Japanese). p. 11–12.

[7] Vovin, Alexander (2010). "Is Japanese Katakana Derived from Korean Kwukyel?". In Lee, Sang-Oak (ed.). Contemporary Korean Linguistics: International Perspectives. Thaehaksa Publishing. ISBN 978-89-5966-389-7.

[8] "A Walk-Through of the Japanese Alphabet | Motto Japan Media - Japanese Culture & Living in Japan". motto-jp.com. Motto Japan. Archived from the original on November 15, 2025. Retrieved December 22, 2024.

[9] Chikamatsu, Nobuko; Yokoyama, Shoichi; Nozaki, Hironari; Long, Eric; Fukuda, Sachio (2000). "A Japanese logographic character frequency list for cognitive science research". Behavior Research Methods, Instruments, & Computers. 32 (3): 482–500. doi:10.3758/BF03200819. PMID 11029823. S2CID 21633023.

[:1-10] Shiraishi-Miles, Rebecca (March 11, 2017). "Is Japanese Read from Right to Left or Left to Right?". Team Japanese. Retrieved February 7, 2025.

[11] Murata, Masaki; Ohno, Tomohiro; Matsubara, Shigeki (2010). "Automatic Comma Insertion for Japanese Text Generation" (PDF). 2010 Conference on Empirical Methods in Natural Language Processing: 892–901.

[FOOTNOTEMiyake2003-12] Miyake 2003.

[FOOTNOTEPiggott1990-13] Piggott 1990.

[FOOTNOTEFrellesvig201022-14] Frellesvig, 2010 & 22. sfn error: no target: CITEREFFrellesvig201022 (help)

[15] Shunpei Mizuno, ed. (2002). 韓国人の日本偽史—日本人はビックリ! (in Japanese). Shogakukan. ISBN 978-4-09-402716-7.

[16] Shunpei Mizuno, ed. (2007). 韓vs日「偽史ワールド」 (in Japanese). Shogakukan. ISBN 978-4-09-387703-9.

[FOOTNOTETwine1991-17] Twine 1991.

[FOOTNOTESeeley1991143–144-18] Seeley 1991, pp. 143–144.

[19] Hashi (January 25, 2012). "Hentaigana: How Japanese Went from Illegible to Legible in 100 Years". Tofugu. Retrieved March 11, 2016.

[FOOTNOTEUnger1996-20] Unger 1996.

[FOOTNOTEGottlieb1996-21] Gottlieb 1996.

[22] "ローマ字表記 70年ぶり改定も視野に文化庁の審議会に検討諮問 | NHK" [Agency for Cultural Affairs asks council to consider first romanization change in 70 years]. NHK News. May 14, 2024. Archived from the original on May 14, 2024. Retrieved February 7, 2025.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Japanese
Japanese novel using kanji kana majiri bun (text with both kanji and kana), the most general orthography for modern Japanese. Ruby characters (or furigana) are also used to transcribe kanji words (in modern publications these would generally be omitted for well-known kanji). The text is in the traditional tategaki ("vertical writing") style; it is read down the columns and from right to left, like traditional Chinese. Published in 1908.
Script type	mixed logographic (Kanji), syllabic (hiragana and katakana)
Period	4th century AD to present
Direction	Top-to-bottom, right-to-left Left-to-right, top-to-bottom
Languages	Japanese language Ryukyuan languages Hachijō language
Related scripts
Parent systems	(See kanji and kana) Japanese
ISO 15924
ISO 15924	Jpan (413), Japanese (alias for Han + Hiragana + Katakana)
Unicode
Unicode range	U+4E00–U+9FBF Kanji U+3040–U+309F Hiragana U+30A0–U+30FF Katakana