[#1147] Copying RVALUE — why the lucky stiff <ruby-core@...>

Hello, everyone. Hope you are all doing well.

18 messages 2003/06/17
[#1155] Re: Copying RVALUE — matz@... (Yukihiro Matsumoto) 2003/06/20

Hi,

[#1157] Re: Copying RVALUE — why the lucky stiff <ruby-core@...> 2003/06/20

Yukihiro Matsumoto (matz@ruby-lang.org) wrote:

[#1173] class.c code cleanup (rb_class_*_instance_methods) — Matthew Dempsky <jivera@...>

Hi, I'm new to this mailing list so I don't know the procedure for

15 messages 2003/06/22
[#1174] Re: [Patch] class.c code cleanup (rb_class_*_instance_methods) — nobu.nokada@... 2003/06/22

Hi,

[#1175] Re: [Patch] class.c code cleanup (rb_class_*_instance_methods) — Matthew Dempsky <jivera@...> 2003/06/22

On Sun, 2003-06-22 at 05:36, nobu.nokada@softhome.net wrote:

[#1176] Re: [Patch] class.c code cleanup (rb_class_*_instance_methods) — nobu.nokada@... 2003/06/22

Hi,

[#1193] Re: [Patch] class.c code cleanup (rb_class_*_instance_methods) — Matthew Dempsky <jivera@...> 2003/06/25

On Sun, 2003-06-22 at 07:41, nobu.nokada@softhome.net wrote:

[#1177] Re: In 1.8.0 nil.to_s is not the same as "" — ts <decoux@...>

14 messages 2003/06/22

Re: [OT] A question on input encodings

From: Tanaka Akira <akr@...17n.org>
Date: 2003-06-06 04:49:14 UTC
List: ruby-core #1132
In article <3EDF5585.1080104@pragprog.com>,
  Dave Thomas <dave@pragprog.com> writes:

> Now, someone in Japan wants to add Japanese comments (I have no idea 
> which encoding would be used, but let's say SJIS). If they fire up an 
> editor in SJIS mode, will that get confused by the existing UTF-8 
> characters in the file? If I edit the file later using a UTF-8 editor, 
> will it normalize the SJIS characters and destroy those comments? Is is 
> possible in practice to edit a file that contains multiple incompatible 
> encodings?

It is not practical.

Some SJIS editor works as follows:

1. read a file and convert it from SJIS to internal encoding.
2. modify the contents represented in the internal encoding.
3. convert the contents from internal encoding to SJIS and write to
  the file.

If the internal encoding is also SJIS, the conversion is identity
function for valid SJIS byte sequence, but it'll doesn't work well
with UTF-8 strings which cannot be recognized as SJIS.

(Some editor which assumes a file is valid SJIS may read a file
without conversion.  But such editor tends to SEGV in 2nd stage
when the file is not valid as SJIS.)

I imagine this is similar to editing ISO-8859-1 file in UTF-8 editor.
Since an ISO-8859-1 string is almost arbitrary sequence of bytes, some
strings are not valid as UTF-8.  I think UTF-8 editor will behaves
variously in such situation, no?

If you have ruby with iconv :-), you can test similar situation.

% ruby -riconv -e '
def Iconv.conv(to,from,str) Iconv.iconv(to,from,str).join end
iso_8859_1_str = "\xc0" # LATIN CAPITAL LETTER A WITH GRAVE
utf8_str = Iconv.conv("UTF-8", "ISO-8859-1", iso_8859_1_str)
p utf8_str

editor_internal_encoding = "Shift_JIS"
editor_external_encoding = "Shift_JIS"
p Iconv.conv(editor_internal_encoding, editor_internal_encoding, utf8_str)
'
"\303\200"
-e:2:in `iconv': "\200" (Iconv::IllegalSequence)
        from -e:2:in `conv'
        from -e:9

In this case, iconv rejects conversion.

But I don't surprise other various behaviors: removing non-SJIS
segment, map them to editor defined special characters or SJIS
characters, SEGV or other behaviours I can't expect.  Anyway I can't
assume round-tripness.

Of course there are editors which doesn't use iconv such as Emacs,
programmars are free from iconv to implement the behavior such as
SEGV.

I recommend single encoding in single file.
-- 
Tanaka Akira

In This Thread

Prev Next