[#79440] [Ruby trunk Bug#13188] Reinitialize Ruby VM. — shyouhei@...
Issue #13188 has been updated by Shyouhei Urabe.
6 messages
2017/02/06
[#79441] Re: [Ruby trunk Bug#13188] Reinitialize Ruby VM.
— SASADA Koichi <ko1@...>
2017/02/06
On 2017/02/06 10:10, shyouhei@ruby-lang.org wrote:
[#79532] Immutable Strings vs Symbols — Daniel Ferreira <subtileos@...>
Hi,
15 messages
2017/02/15
[#79541] Re: Immutable Strings vs Symbols
— Rodrigo Rosenfeld Rosas <rr.rosas@...>
2017/02/15
Em 15-02-2017 05:05, Daniel Ferreira escreveu:
[#79543] Re: Immutable Strings vs Symbols
— Daniel Ferreira <subtileos@...>
2017/02/16
Hi Rodrigo,
[#79560] Re: Immutable Strings vs Symbols
— Rodrigo Rosenfeld Rosas <rr.rosas@...>
2017/02/16
Em 15-02-2017 22:39, Daniel Ferreira escreveu:
[ruby-core:79693] [Ruby trunk Feature#13241] Method(s) to access Unicode properties for characters/strings
From:
shevegen@...
Date:
2017-02-22 17:21:55 UTC
List:
ruby-core #79693
Issue #13241 has been updated by Robert A. Heiler.
Jan Lelis wrote:
> I think, it should be always plural methods which return a list of properties used in the
> string, since Ruby does not distinguish between single characters and strings. The first
> example would then rather be: "Aあア".scripts => [:hiragana, :katakana, :latin] (like the
> fourth example).
I agree in the sense that your example given makes more sense than the first example,
where:
"Aあア".script => :latin # returns script of first character only
Only returned one result. I understand it was just an example, but it confused me because
I wondered what happened to the other characters?
I like the name "property" or "properties" more than "script" - script sounds a bit
non-descript (pun intended!).
Since matz said that it should be indicative of unicode, e. g. with a unicode_prefix,
the example by Jan Lelis would seem good:
"string here".unicode_properties(optional_args)
Other name suggestions:
.unciode_category
.unciode_categories
.unciode_tokenset
.unciode_token_set
.unciode_tokens
And similar perhaps.
PS: By the way, what should it return for an empty string like ""? Or numbers
or similar semi-common tokens?
----------------------------------------
Feature #13241: Method(s) to access Unicode properties for characters/strings
https://bugs.ruby-lang.org/issues/13241#change-63114
* Author: Martin Dürst
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
[This is currently an exploratory proposal.]
Onigmo allows Unicode properties in regular expressions. With this, it's e.g. possible to check whether a string contains some Hiragana:
```
"ABC あ DEF" =~ /\p{hiragana}/
```
However, it is currently impossible to ask for e.g. the script of a character. I propose to add a method (or some methods) to String to be able to get such properties. Various (to some extent conflicting) examples:
```
"Aあア".script => :latin # returns script of first character only
"Aあア".script => [:latin, :hiragana, :katakana] # returns array of property values
"Aあア".property(:script) => :latin # returns specified property of first character only
"Aあア".property(:script) => [:latin, :hiragana, :katakana] # returns array of specified properties' values
"Aあア".properties([:script, :general_category]) => [[:latin, :Lu], [:hiragana, :Lo], [:katakana, :Lo]]
# returns arrays of property values, one array per character
```
The interface is still in flux, comments welcome!
Implementation depends on #13240.
In Python, such functionality (however, quite limited in property coverage, and not directly on String) is available in the standard library (see https://docs.python.org/3/library/unicodedata.html).
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>