[#2332] Ruby-Python fusion? — mrilu <mrilu@...>
Usually I give some time for news to settle before I pass the word, but
7 messages
2000/04/01
[#2353] Re: Function of Array.filter surprises me — schneik@...
5 messages
2000/04/03
[#2361] crontab — Hugh Sasse Staff Elec Eng <hgs@...>
I want to have a program that may be run between certain times.
11 messages
2000/04/05
[#2375] Marshal: Want string out, but want depth specified? — Hugh Sasse Staff Elec Eng <hgs@...>
@encoded = [Marshal.dump(@decoded, , depth)].pack("m")
7 messages
2000/04/07
[#2378] Re: Marshal: Want string out, but want depth specified?
— matz@... (Yukihiro Matsumoto)
2000/04/07
Hi,
[#2376] Iterator into array — Dave Thomas <Dave@...>
15 messages
2000/04/07
[#2397] Could missing 'end' be reported better? — mrilu <mrilu@...>
I'm not sure one could easily parse, or moreover report, this error better.
5 messages
2000/04/08
[#2404] Re: Iterator into array — Andrew Hunt <andy@...>
>It's still possible to introduce a new syntax for collecting yielded
6 messages
2000/04/08
[#2412] Re: Could missing 'end' be reported better? — h.fulton@...
7 messages
2000/04/09
[#2414] Re: Could missing 'end' be reported better?
— matz@... (Yukihiro Matsumoto)
2000/04/09
Hi,
[#2429] Please join me, I'm Hashing documentation — mrilu <mrilu@...>
This is a story about my hashing ventures, try to bear with me.
5 messages
2000/04/10
[#2459] Precedence question — Dave Thomas <Dave@...>
7 messages
2000/04/12
[#2474] Ruby 1.4.4 — Yukihiro Matsumoto <matz@...>
Ruby 1.4.4 is out, check out:
5 messages
2000/04/14
[#2494] ANNOUNCE : PL/Ruby — ts <decoux@...>
7 messages
2000/04/17
[#2495] Re: 'in' vs. 'into' — Andrew Hunt <andy@...>
># rescue MyException into myVar
4 messages
2000/04/17
[#2514] frozen behavior — Andrew Hunt <Andy@...>
7 messages
2000/04/19
[#2530] Re: 'in' vs. 'into' — Andrew Hunt <andy@...>
>Hmm, I've not decided yet. Here's the list of options:
6 messages
2000/04/20
[#2535] Default naming for iterator parameters — mrilu <mrilu@...>
I'm back at my computer after some traveling. I know I think Ruby
5 messages
2000/04/20
[#2598] different thread semantics 1.4.3 -> 1.4.4 — hipster <hipster@...4all.nl>
Hi fellow rubies,
4 messages
2000/04/28
[ruby-talk:02490] Ruby 1.[56] and Gtk 1.4
From:
Yasushi Shoji <yashi@...>
Date:
2000-04-17 00:47:10 UTC
List:
ruby-talk #2490
hello all,
could someone, who knows character encoding and internal Ruby, update
me with a case study for ruby for the attached mail?
I assume we will just use a method .to_utf8 or something for
converting. do we want it happen behind the seen of library user?
that is, all user-strings will be auto-magically converted to UTF-8 in
Ruby/Gtk?
thanks in advance,
--
yashi
>From gtk-devel-list-request@redhat.com Sun Apr 16 18:59:21 2000
Received: from lists.redhat.com (lists.redhat.com [199.183.24.247]) by yashi.com (8.8.5) id SAA12115; Sun, 16 Apr 2000 18:59:20 -0400 (EDT)
Received: (qmail 11477 invoked by uid 501); 16 Apr 2000 22:58:51 -0000
Resent-Date: 16 Apr 2000 22:58:51 -0000
Resent-Cc: recipient list not shown:;@redhat.com
MBOX-Line: From gtk-devel-list-request@redhat.com Sun Apr 16 18:58:50 2000
X-Authentication-Warning: fresnel.labs.redhat.com: otaylor set sender to otaylor@fresnel.labs.redhat.com using -f
To: gtk-devel-list@redhat.com, gnome-bindings@helixcode.com
Cc: Moshe Zadka <moshez@math.huji.ac.il>
Subject: Strings and bindings
From: Owen Taylor <otaylor@redhat.com>
Date: 15 Apr 2000 14:13:41 -0400
Message-ID: <ybe4s93cjvu.fsf@fresnel.labs.redhat.com>
Lines: 159
User-Agent: Gnus/5.0802 (Gnus v5.8.2) Emacs/20.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Resent-Message-ID: <"WYpaV2.0.ro2.gMa-u"@lists.redhat.com>
Resent-From: gtk-devel-list@redhat.com
Reply-To: gtk-devel-list@redhat.com
X-Mailing-List: <gtk-devel-list@redhat.com> archive/latest/1269
X-Loop: gtk-devel-list@redhat.com
Precedence: list
Resent-Sender: gtk-devel-list-request@redhat.com
X-URL: http://www.redhat.com
X-UIDL: 709ce37811c68c8f7ab974365db5e5b3
A little while ago, Moshe Zadka sent me some mail asking
about how strings would be exported to language bindings
in GTK+-1.4. I replied that I thought that GTK_TYPE_STRING
would continue to be sufficient, since strings will
always be 8-bit and UTF-8.
But, with a bit more consideration, I'm not 100% sure
that is the right answer, so I thought I'd send this
mail out.
For GTK+-1.4, there will essentially be two types of strings:
- Strings for display. These strings are specified to
be iso-10646 encoded in UTF-8. We'll call these user-strings.
- Strings not for display. (For instance, the string in:
gtk_object_set_data(), gtk_signal_emit_by_name() or
gtk_text_tag_create()) These do not have a specified
encoding. We'll call these key-strings.
(Programs would be advised to stick to straight ASCII
for such keys, but there is no requirement for this.)
The C language mapping (a char *), and rules for passing
and memory management are the same. However, it isn't
clear that they should always map the same for all
language bindings.
Lets look at some case studies:
Perl (5.6)
==========
Strings are not marked as unicode or not. Instead utf8 processing can
be turned on for a block using the 'use utf8' pragma.
This works very will with GTK+-1.4. No changes in the binding or
applications are necessary.
Applications should be advised to use 'use utf8' whenever processing
strings from GTK+, but that is the only change necessary.
Python (1.6)
============
There are distinct Unicode-string and normal string types
conversions can be made both ways - by default the conversions
assume utf-8, but the encoding for normal strings can also
be explicitely declared.
The simplest way of handling things is to simply say that
the binding considers all normal Python strings to be
encoded in UTF-8. Then the rules for passing strings into
GTK+ are simple:
- normal strings are passed through unconverted
- unicode strings are converted from the internal representation
to UTF-8 before passing to
The only question is for returning key-strings
(something which is very rare in GTK+ currently) - do we
a) Return them as unicode strings, like user visible strings
b) Return them as normal strings
Option b) requires a distinction in the type system between
the two types of GTK+ strings.
Things become more complex if you want to allow for setting
the assumed encoding for normal strings to something other
than utf-8. In that case, you need to do conversions
when passing in normal strings to GTK+ for user-strings.
I don't think current plans for Python have the idea of
a "runtime encoding for normal strings", though there
probably will be some provision for specifying the encoding
of scripts during parsing. So, this may not be a necessary
feature.
C++
===
The C++ standard does not say anything about encodings. There
are two standard string types - string, and wstring, with
wstring being a sequence of wide characters.
One problem with wide characters is 16-bit vs. 32-bit characters.
GTK+, because it handles things in UTF-8, has almost no
overhead for allowing 32-bit characters, and therefore
does so. But a fixed-width wide character encoding that
uses 32-bit characters is quite expensive.
The Unicode standard is currently only using a 16-bit characters,
all common characters for living languages are planned to be
included in the 16-bit space, and many systems do use 16-bit
characters. (Windows, Java, Python)
Howevever, there will soon be some character sets defined out
side of the 16-bit "Basic Multilingual Plane", and allowing
32-bit characters, is, IMO, nicer than confining oneself to
an almost-full character space.
There are at least three ways I can think of to handle
GTK+'s utf-8 strings in Unicode:
- convert them to wstring
- convert them to basic_string<gunichar>
that is, avoid the problem of the unspecified width, by
defining a new string type using a type of specified
width.
- Create an STL-string-like wrapper for a utf8 string. The
problem here is that you don't get O(1) random access, which
will no doubt disturb some of the people reading this.
Whatever, the solution on the C++ side, the binding situation
is much like Python.
- There should be implicit conversions between 8-bit strings
and unicode strings that assume that the 8-bit strings are
utf8.
- For simplicity, all function arguments and results could
be wide/unicode strings, or for performance, you could distinguish
between user-strings (mapped to wide/unicode strings) and
key-strings (mapped to strings).
If one did use the standard STL wstring type, then one would
run into the problem that there will be no
wstring (const char *eightbit_string);
constructor so you would probably have to subclass it to add
that converter in any case. But I'm not enough of a C++ expert
to really comment.
So, the question is, do we need two types in the type system for
user-visible and non-user-visible strings or just one? My default
answer is that we should keep it simple, and just have one, but I'm
very willing to accept input on this issue.
Regards,
Owen
--
To unsubscribe: mail gtk-devel-list-request@redhat.com with
"unsubscribe" as the Subject.