bips/bip-0039-wordlists.md at f0dd2d58ab920647a8e288c4ab93129686b20be2

mirror of https://github.com/bitcoin/bips.git synced 2025-06-30 12:42:43 +00:00

bip39jp f0dd2d58ab Clarify necessity for ideographic spaces.

I left it unclear / open to interpretation on whether to use ideograpic
spaces, but realized that without being specific on its necessity,
developers may implement something that would cause trouble with the
Japanese user. (two words looking like one word, or phrase verification
failing because it can't handle ideographic spaces, etc.)

2015-03-12 14:06:22 +09:00

2.0 KiB

Raw Blame History

#Wordlists

##Wordlists (Special Considerations)

###Japanese

Developers implementing phrase generation or checksum verification must separate words using ideographic spaces / accommodate users inputting ideographic spaces.
(UTF-8 bytes: 0xE38080; C/C+/Java: "\u3000"; Python: u"\u3000")
However, code that only accepts Japanese phrases but does not generate or verify them should be fine as is. This is because when generating the seed, normalization as per the spec will automatically change the ideographic spaces into normal ASCII spaces, so as long as your code never shows the user an ASCII space separated phrase or tries to split the phrase input by the user, dealing with ASCII or Ideographic space is the same.
Word-wrapping doesn't work well, so making sure that words only word-wrap at one of the
ideographic spaces may be a necessary step. As a long word split in two could be mistaken easily
for two smaller words (This would be a problem with any of the 3 character sets in Japanese)

###Spanish

Words can be uniquely determined typing the first 4 characters (sometimes less).
Special Spanish characters like 'ñ', 'ü', 'á', etc... are considered equal to 'n', 'u', 'a', etc... in terms of identifying a word. Therefore, there is no need to use a Spanish keyboard to introduce the passphrase, an application with the Spanish wordlist will be able to identify the words after the first 4 chars have been typed even if the chars with accents have been replaced with the equivalent without accents.
There are no words in common between the Spanish wordlist and any other language wordlist, therefore it is possible to detect the language with just one word.

###Chinese

Chinese text typically does not use any spaces as word separators. For the sake of uniformity, we propose to use normal ASCII spaces (0x20) to separate words as per standard.

2.0 KiB Raw Blame History

2.0 KiB

Raw Blame History