5C Problem

Jun 29, 2019 18:48
5C Problem

Several days ago, I encountered the "5C problem."

The 5C problem is a kind of programming errors that could occur when using Japanese characters.

Japanese characters are usually represented by two bytes in computers and programming languages.

However, when using "Shift-JIS," which is one of the Japanese character codes, the second bytes of some Japanese characters (such as 表, 十, and ソ) become '5C'.

The '5C' represents a backslash character, and it has been adopted as the escape character for many programming languages.

Because of this, some Japanese characters have a special meaning in programming, hence they could induce errors.
5C問題

私は先日、「5C問題」と遭遇しました。

5C問題とは、プログラミングなどで日本語を扱う際に起こりうる問題です。

日本語は通常、それぞれの文字が2バイトで表現されます。

しかし、日本語用文字コードの一つである Shift-JIS を使うと、特定の文字(例えば「表」や「十」、「ソ」)の2バイト目の文字コードが '5C' となります。

'5C' 単体ではバックスラッシュ記号となり、これは多くのプログラミング言語のエスケープ文字として採用されています。

このため、日本語の特定の文字がプログラミング上で特殊な意味を持ち、エラーなどを誘発するというわけです。
No. 1 Fieryterminator
  • 5C Problem
  • This sentence is perfect! No correction needed!
  • 5C Problem
  • This sentence is perfect! No correction needed!
  • Several days ago, I encountered the "5C problem."
  • This sentence is perfect! No correction needed!
  • The 5C problem is a kind of programming errors that could occur when using Japanese characters.
  • The 5C problem is a kind of programming errors that can occur when using Japanese characters.

    Instead of writing "can", you can also write "may". Using "could" makes it sound like it no longer happens, and only occurred in the past, which does not seem true here.

  • Japanese characters are usually represented by two bytes in computers and programming languages.
  • This sentence is perfect! No correction needed!
  • However, when using "Shift-JIS," which is one of the Japanese character codes, the second bytes of some Japanese characters (such as 表, 十, and ソ) become '5C'.
  • This sentence is perfect! No correction needed!
  • The '5C' represents a backslash character, and it has been adopted as the escape character for many programming languages.
  • This sentence is perfect! No correction needed!
  • Because of this, some Japanese characters have a special meaning in programming, hence they could induce errors.
  • Because of this, some Japanese characters have a special meaning in programming, hence they can induce errors.

    The word "could" here was changed to "can" for the same reason I mentioned above.

This was interesting to learn. Do you also know of the set of kanji included in ASCII that are not real kanji? It's very fascinating stuff.

Toru
Thank you for the correction!

> Do you also know of the set of kanji included in ASCII that are not real kanji?
I do not know that. Is it a set of kanji for jokes or something?
Fieryterminator
I spent a long time searching, but I couldn't find the page where I learned this. The story is that when computer designers first sought out to make a working Japanese alphabet in ASCII, they sent requests all over the country asking for each town name in Kanji. When it came time to transcribe them though, the designers made some mistakes, and they accidentally invented several kanji that have no meaning, but are still in the ASCII alphabet today. It's fascinating.
This isn't the link I was talking about, but here is a site that collects more "fake kanji":
http://zht.glyphwiki.org/wiki/Group:%E5%89%B5%E4%BD%9C%E6%BC%A2%E5%AD%97%E3%82%B3%E3%83%B3%E3%83%86%E3%82%B9%E3%83%88
Fieryterminator
It looks like the link didn't work, but if you can search "創作漢字", it should take you to them then.
Toru
Wow, thank you so much for letting me know that! The story and fake kanji (創作漢字) are very interesting and fascinating. I will check them more. :)
No. 2 Yalmar
  • 5C Problem
  • This sentence is perfect! No correction needed!
  • 5C Problem
  • This sentence is perfect! No correction needed!
  • Several days ago, I encountered the "5C problem."
  • This sentence is perfect! No correction needed!
  • The 5C problem is a kind of programming errors that could occur when using Japanese characters.
  • The 5C problem is a kind of programming error that could occur when using some Japanese characters.
  • Japanese characters are usually represented by two bytes in computers and programming languages.
  • Japanese characters are usually represented by two bytes in computer and programming languages.
  • However, when using "Shift-JIS," which is one of the Japanese character codes, the second bytes of some Japanese characters (such as 表, 十, and ソ) become '5C'.
  • However, when using "Shift-JIS," which is one of the Japanese character sets, the second byte of some Japanese characters (such as 表, 十, and ソ) is '5C'.
  • The '5C' represents a backslash character, and it has been adopted as the escape character for many programming languages.
  • '5C' represents a backslash character, and it has been adopted as the escape character for many programming languages.
  • Because of this, some Japanese characters have a special meaning in programming, hence they could induce errors.
  • This sentence is perfect! No correction needed!
Toru
Thank you for the correction! :)
Yalmar
You're welcome :)