[#77789] [Ruby trunk Feature#12012] Add Boolean method — prodis@...
Issue #12012 has been updated by Fernando Hamasaki de Amorim.
4 messages
2016/10/27
[ruby-core:77535] Re: [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
From:
RRRoy BBBean <rrroybbbean@...>
Date:
2016-10-10 02:44:51 UTC
List:
ruby-core #77535
PIPES: I wrote a small gem several years ago that handled a problem with
UTF-8 I/O. The key parts, extracted from their containing module &
class, are below. This is how I dealt with Hangeul (Korean) characters
used as data for a non-web application.
@stdout_callback = 'UTF-8'
def run
validate_and_configure
@stdin, @stdout, @stderr, @wait_thread = Open3.popen3(
@cmd_text, :chdir=>@cd_path )
@stdin.set_encoding @stdio_encoding
@stdout.set_encoding @stdio_encoding
@stderr.set_encoding @stdio_encoding
@running = monitor_stdout && monitor_stderr && attend_thread
end
DIRECTORY LISTINGS: From some other code, I use this trick to read
filenames in Hangeul.
Dir.entries(@titles_path,:encoding=>'UTF-8').each {|thing_in_directory|
... }
FILE I/O with BOM: For file I/O with Hangeul, I use crazy stuff like this.
BOM = "\xEF\xBB\xBF".force_encoding("UTF-8")
Note that some applications (Firefox, Notepad++) recognize the Byte
Order Mark, and other applications are befuddled when they encounter it.
I, personally, prefer to use the Byte Order Mark because it immediately
identifies the file format as UTF-8 (for applications that recognize the
BOM).
def strip_bom line
return nil if line.nil? || line.empty?
line.force_encoding 'UTF-8'
line.gsub( BOM, '' )
end
Also note that when files containing the BOM are concatenated or pasted
into one-another by BOM-befuddled applications, one or more Byte Order
Marks can easily become embedded within the data. That's why I use the
above method.
Anyway, I learned to cope with some of the UTF-8 issues in Ruby, because
of my work with Korean. I like the way Ruby handles UTF-8 now. although
it would be nice if everyone could adopt UTF-8 as the de facto standard.
I'm not claiming that my coding techniques are any good, but maybe this
will help someone.
On 10/07/2016 08:25 PM, ethan_j_brown@hotmail.com wrote:
> Issue #12650 has been updated by Ethan Brown.
>
>
> If you could rethink the plan to wait until Ruby 3, that would be great.
>
> I would expect Ruby to normalize on UTF-8 strings everywhere internally, and only convert to local codepage on the boundary (such as writing to console, file, etc).
>
> We are tracking a number of issues in Puppet that we believe are caused by the current behavior:
>
> * [Puppet Throws Exception when Running Under Unicode Windows User](https://tickets.puppetlabs.com/browse/PUP-6035)
> * [Bundler Fails when Running Under a Unicode Windows User](https://tickets.puppetlabs.com/browse/PUP-6034)
> * [Puppet Crashes when Unicode User Applies Manifest](https://tickets.puppetlabs.com/browse/PUP-5822)
>
> ----------------------------------------
> Feature #12650: Use UTF-8 encoding for ENV on Windows
> https://bugs.ruby-lang.org/issues/12650#change-60787
>
> * Author: Dト」is Mosト]s
> * Status: Open
> * Priority: Normal
> * Assignee:
> ----------------------------------------
> Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
> Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.
>
> I've attached a patch which implements this and fixes bug #9715
>
>
> ---Files--------------------------------
> 0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)
>
>
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>