[ruby-core:68987] [Ruby trunk - Feature #11094] Remove traces of 6-byte UTF-8

From: nobu@...
Date: 2015-04-25 04:42:44 UTC
List: ruby-core #68987
Issue #11094 has been updated by Nobuyoshi Nakada.

File 0001-enc-utf_8.c-pack.c-limit-UTF-8.patch added

And `pack("U")` and `unpack("U")`?

Also rubyspec seems to fail.

~~~
Array#pack with format 'U' encodes values larger than UTF-8 max codepoints ERROR
RangeError: pack(U): value out of range
~~~


----------------------------------------
Feature #11094: Remove traces of 6-byte UTF-8
https://bugs.ruby-lang.org/issues/11094#change-52240

* Author: Martin D端rst
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
UTF-8 was originally defined with a codespace up to 31 bits, and therefore with up to 6 bytes per character. Since quite a few years ago, it has been reduced in all the relevant definitions (ISO, Unicode, IETF) to a codespace up to 0x10FFFF and a maximum of 4 bytes per character. Many places in the Ruby code base are updated to this 4 byte limit (e.g. EncLen_UTF8 in enc/utf_8.c). But other places in the Ruby code base are not yet updated to this limit (e.g. code_to_mbclen in enc/utf_8.c). This should be fixed.
[I have classified this as a feature because I wasn't able to find a way to expose this problem in Ruby code, but this should be reclassified as a bug if such a problem can be found.]

---Files--------------------------------
0001-enc-utf_8.c-pack.c-limit-UTF-8.patch (6.68 KB)


-- 
https://bugs.ruby-lang.org/

In This Thread

Prev Next