[#44036] [ruby-trunk - Feature #6242][Open] Ruby should support lists — "shugo (Shugo Maeda)" <redmine@...>

20 messages 2012/04/01

[#44084] [ruby-trunk - Bug #6246][Open] 1.9.3-p125 intermittent segfault — "jshow (Jodi Showers)" <jodi@...>

22 messages 2012/04/02

[#44156] [ruby-trunk - Feature #6265][Open] Remove 'useless' 'concatenation' syntax — "rosenfeld (Rodrigo Rosenfeld Rosas)" <rr.rosas@...>

45 messages 2012/04/06

[#44163] [ruby-trunk - Bug #6266][Open] encoding related exception with recent integrated psych — "jonforums (Jon Forums)" <redmine@...>

10 messages 2012/04/06

[#44303] [ruby-trunk - Feature #6284][Open] Add composition for procs — "pabloh (Pablo Herrero)" <pablodherrero@...>

57 messages 2012/04/12

[#44349] [ruby-trunk - Feature #6293][Open] new queue / blocking queues — "tenderlovemaking (Aaron Patterson)" <aaron@...>

10 messages 2012/04/13

[#44402] [ruby-trunk - Feature #6308][Open] Eliminate delegation from WeakRef — "headius (Charles Nutter)" <headius@...>

20 messages 2012/04/17

[#44403] [ruby-trunk - Feature #6309][Open] Add a reference queue for weak references — "headius (Charles Nutter)" <headius@...>

15 messages 2012/04/17

[#44533] [ruby-trunk - Bug #6341][Open] SIGSEGV: Thread.new { fork { GC.start } }.join — "rudolf (r stu3)" <redmine@...>

24 messages 2012/04/22

[#44630] [ruby-trunk - Feature #6361][Open] Bitwise string operations — "MartinBosslet (Martin Bosslet)" <Martin.Bosslet@...>

31 messages 2012/04/26

[#44648] [ruby-trunk - Feature #6367][Open] #same? for Enumerable — "prijutme4ty (Ilya Vorontsov)" <prijutme4ty@...>

16 messages 2012/04/26

[#44704] [ruby-trunk - Feature #6373][Open] public #self — "trans (Thomas Sawyer)" <transfire@...>

61 messages 2012/04/27

[#44748] [ruby-trunk - Feature #6376][Open] Feature lookup and checking if feature is loaded — "trans (Thomas Sawyer)" <transfire@...>

13 messages 2012/04/28

[ruby-core:44759] Re: [ruby-trunk - Feature #6361] Bitwise string operations

From: Joshua Ballanco <jballanc@...>
Date: 2012-04-29 16:50:11 UTC
List: ruby-core #44759
On Saturday, April 28, 2012 at 8:52 AM, KOSAKI Motohiro wrote:
> On Fri, Apr 27, 2012 at 8:53 PM, MartinBosslet (Martin Bosslet)
> <Martin.Bosslet@googlemail.com (mailto:Martin.Bosslet@googlemail.com)> wrote:
> >  
> > Issue #6361 has been updated by MartinBosslet (Martin Bosslet).
> >  
> >  
> > nobu (Nobuyoshi Nakada) wrote:
> > > Then what kind of methods should Blob have?
> > >  
> > > And does it need to be built-in?
> >  
> > A real advantage of having it built-in could be
> > that this gives us the chance to fix #5741 at
> > the same time. I could imagine that we have two
> > kinds of "byte array" classes - one, mutable,
> > that shares COW semantics and all the other
> > optimizations with String, but with no notion of
> > encoding and a yet-to-be-defined interface.
> >  
> > And then a second class, which is basically the
> > immutable version of the first one. By sharing
> > only a reference we could ensure that the content
> > would not be proliferated and we could securely
> > erase its contents after use.
> >  
>  
>  
> I don't dislike a bult-in idea. But you haven't show a detailed spec
> and I don't think I clearly understand your idea. Can you spend a
> few time for writing a spec? (probably rough a few line explanation
> is enough)
>  
>  
>  


If I may intrude for a moment… I think the advantage to having a built in Data/Blob library would be that it could be used in all places where a data class is more appropriate than a string. For example, the Socket library currently returns Strings for data read in from a socket. I think a Data class is more appropriate here since the socket itself does not contain encoding information (i.e. either an arbitrary default encoding needs to be set, a heuristic can be used to guess the encoding, or the encoding is set by a previously agreed up convention; but you cannot ask a socket for its encoding).

As for a spec, I think it should be kept relatively simple. The one interesting optimization from NSData that might be useful is the option of copying bytes on instantiation. Copying is the default, but it is also possible to create a Data object that merely points at the storage of another live object and allows byte-wise manipulation. This is particularly interesting for the case of strings, since I would guess that String and Data would have identical storage layout, allowing one to optimize the case of creating a Data from a String with no copying.

A quick attempt at a spec:

-----
Data.new #=> New, dynamically resizable container to store some bytes
Data.new('Test') #=> Can be created from any object that responds to #bytes with an enumerator
Data.new('Hello', copy_bytes: false) #=> Creates the Data from the String by merely pointing to the same storage

Data.open('./foo/test.txt') #=> Create a Data object from a File
Data.open('./bar/test.txt', copy_bytes: false) #=> Same as open above, but manipulates IO#pos for access
Data.write('./baz/test.txt') #=> Writes the bytes to disk.

d = Data.new(a_string)
d[2] #=> Returns the third byte, same as a_string.bytes.to_a[2]
d[2] = 42 #=> Same as a_string.setbyte(2, 42)
d.each #=> Equivalent to a_string.each_byte
d.length #=> Number of bytes currently being stored
d.slice(2, 4) #=> Similar to String#slice
d.slice(2, 4, copy_bytes: false) #=> New data object from slice shares storage with the original
d << other_data #=> Appends bytes from other_data
d.to_s #=> Returns a string using the default internal encoding
d.string_with_encoding('UTF-16') #=> Returns a string using the encoding passed
-----

I know it seems like this class is just wrapping String and always defaulting to byte-wise operations, but it's more fundamental than that. Because there is no encoding on the bytes, there will never be an encoding error when working with them. This could be extremely useful for applications that combine bytes from multiple sources (e.g. Socket data + a file on disk + immediate strings in code) that could potentially have different encodings. By operating on bytes, you can defer the encoding checks until later, if at all.

- Josh

In This Thread