[ruby-core:109489] [Ruby master Bug#18955] Kernel#sprintf - %c ignores a non-ASCII character's encoding
From:
"nobu (Nobuyoshi Nakada)" <noreply@...>
Date:
2022-08-16 06:02:06 UTC
List:
ruby-core #109489
Issue #18955 has been updated by nobu (Nobuyoshi Nakada).
A codepoint is expected for `%c`, then the former examples are currently expected behaviors, I think.
The latter example is a bug.
----------------------------------------
Bug #18955: Kernel#sprintf - %c ignores a non-ASCII character's encoding
https://bugs.ruby-lang.org/issues/18955#change-98657
* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* ruby -v: 3.0.3
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
I haven't found any similar existing issue so decided to create a new one.
I noticed that `sprintf("%c", string)` doesn't handle (in an expected way) a case when encodings of format sequence and string argument aren't the same and the string argument contains non-ASCII character.
In this case it seems to me that `sprintf` just uses binary representation of a character and assigns (or interprets with) encoding of the format sequence string.
I would expect that `sprintf` negotiates encoding and converts everything (the character and the format string) to the chosen one. And raises error when negotiation fails.
Examples to illustrate this behavior:
```ruby
format = "%c".encode("Windows-1251")
string = "Й".encode(Encoding::KOI8_U)
r = sprintf(format, string)
r.encoding
# => #<Encoding:Windows-1251>
r == "Й".encode("Windows-1251")
# => false
r.codepoints
# => [234]
string.codepoints
# => [234]
```
In this example the result's encoding is a format's encoding. But codepoint isn't changed and equals a codepoint of the character in the original string's encoding. But it should be different:
```ruby
"Й".encode("Windows-1251").codepoints
# => [201]
```
Another example:
```ruby
string = "À".encode(Encoding::CP1252)
sprintf("%c", string)
# => in `sprintf': invalid byte sequence in UTF-8 (ArgumentError)
```
In this example the error means that `sprintf` doesn't encode properly a codepoint (of string's encoding) in UTF-8. It uses just raw bytes.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>