-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shell does not support UTF-8 #13
Comments
Do you have some examples that don't work so we can more easily attempt to test/verify? (perhaps a short shell script which shows the failure, or specific characters you're having trouble inputting?) I've done some very simple testing with $ docker run -it --rm busybox
/ # echo '?'
☠
/ # echo -e "\xE2\x98\xA0"
☠ |
Output works, sorry for that. But input not. When I type UTF-8 characters it shows '?' marks in Shell as you pointed. In vi it shows '.' marks (docker run -it --rm --entrypoint vi busybox). But it prints correct vi edited text. |
Interesting -- while looking into this, I see |
This looks related: CONFIG_SUBST_WCHAR:
Typical values are 63 for '?' (works with any output device),
30 for ASCII substitute control code,
65533 (0xfffd) for Unicode replacement character.
Symbol: SUBST_WCHAR [=63]
Prompt: Character code to substitute unprintable characters with
Defined at Config.in:186
Depends on: UNICODE_SUPPORT
Location:
-> Busybox Settings
-> General Configuration
-> Support Unicode (UNICODE_SUPPORT [=y]) |
And this: CONFIG_LAST_SUPPORTED_WCHAR:
Any character with Unicode value bigger than this is assumed
to be non-printable on output device. Many applets replace
such chars with substitution character.
The idea is that many valid printable Unicode chars are
nevertheless are not displayed correctly. Think about
combining charachers, double-wide hieroglyphs, obscure
characters in dozens of ancient scripts...
Many terminals, terminal emulators, xterms etc will fail
to handle them correctly. Choose the smallest value
which suits your needs.
Typical values are:
126 - ASCII only
767 (0x2ff) - there are no combining chars in [0..767] range
(the range includes Latin 1, Latin Ext. A and B),
code is ~700 bytes smaller for this case.
4351 (0x10ff) - there are no double-wide chars in [0..4351] range,
code is ~300 bytes smaller for this case.
12799 (0x31ff) - nearly all non-ideographic characters are
available in [0..12799] range, including
East Asian scripts like katakana, hiragana, hangul,
bopomofo...
0 - off, any valid printable Unicode character will be printed.
Symbol: LAST_SUPPORTED_WCHAR [=767]
Prompt: Range of supported Unicode characters
Defined at Config.in:195
Depends on: UNICODE_SUPPORT
Location:
-> Busybox Settings
-> General Configuration
-> Support Unicode (UNICODE_SUPPORT [=y]) |
For what it's worth, here are the values Debian uses for its
|
Contrast that with Alpine's values:
|
See docker-library/busybox#13 for discussion.
I can't input to or output from shell UTF-8 text (Russian).
It's actual for all variants: busybox:uclibc, busybox:glibc, busybox:musl
The text was updated successfully, but these errors were encountered: