[#33640] [Ruby 1.9-Bug#4136][Open] Enumerable#reject should not inherit the receiver's instance variables — Hiro Asari <redmine@...>

Bug #4136: Enumerable#reject should not inherit the receiver's instance variables

10 messages 2010/12/08

[#33667] [Ruby 1.9-Bug#4149][Open] Documentation submission: syslog standard library — mathew murphy <redmine@...>

Bug #4149: Documentation submission: syslog standard library

11 messages 2010/12/10

[#33683] [feature:trunk] Enumerable#categorize — Tanaka Akira <akr@...>

Hi.

14 messages 2010/12/12
[#33684] Re: [feature:trunk] Enumerable#categorize — "Martin J. Dst" <duerst@...> 2010/12/12

[#33687] Towards a standardized AST for Ruby code — Magnus Holm <judofyr@...>

Hey folks,

23 messages 2010/12/12
[#33688] Re: Towards a standardized AST for Ruby code — Charles Oliver Nutter <headius@...> 2010/12/12

On Sun, Dec 12, 2010 at 9:55 AM, Magnus Holm <judofyr@gmail.com> wrote:

[#33689] Re: Towards a standardized AST for Ruby code — "Haase, Konstantin" <Konstantin.Haase@...> 2010/12/12

On Dec 12, 2010, at 17:46 , Charles Oliver Nutter wrote:

[#33763] [Ruby 1.9-Bug#4168][Open] WeakRef is unsafe to use in Ruby 1.9 — Brian Durand <redmine@...>

Bug #4168: WeakRef is unsafe to use in Ruby 1.9

43 messages 2010/12/17

[#33815] trunk warnflags build issue with curb 0.7.9? — Jon <jon.forums@...>

As this may turn out to be a 3rd party issue rather than a bug, I'd like some feedback.

11 messages 2010/12/22

[#33833] Ruby 1.9.2 is going to be released — "Yuki Sonoda (Yugui)" <yugui@...>

-----BEGIN PGP SIGNED MESSAGE-----

15 messages 2010/12/23

[#33846] [Ruby 1.9-Feature#4197][Open] Improvement of the benchmark library — Benoit Daloze <redmine@...>

Feature #4197: Improvement of the benchmark library

15 messages 2010/12/23

[#33910] [Ruby 1.9-Feature#4211][Open] Converting the Ruby and C API documentation to YARD syntax — Loren Segal <redmine@...>

Feature #4211: Converting the Ruby and C API documentation to YARD syntax

10 messages 2010/12/26

[#33923] [Ruby 1.9-Bug#4214][Open] Fiddle::WINDOWS == false on Windows — Jon Forums <redmine@...>

Bug #4214: Fiddle::WINDOWS == false on Windows

15 messages 2010/12/27

[ruby-core:33683] [feature:trunk] Enumerable#categorize

From: Tanaka Akira <akr@...>
Date: 2010-12-12 11:13:29 UTC
List: ruby-core #33683
Hi.

How about a method for converting enumerable to hash?

  enum.categorize([opts]) {|elt| [key1, ..., val] } -> hash

categorizes the elements in _enum_ and returns a hash.

The block is called for each elements in _enum_.
The block should return an array which contains
one or more keys and one value.

  p (0..10).categorize {|e| [e % 3, e % 5] }
  #=> {0=>[0, 3, 1, 4], 1=>[1, 4, 2, 0], 2=>[2, 0, 3]}

The keys and value are used to construct the result hash.
If two or more keys are provided
(i.e. the length of the array is longer than 2),
the result hash will be nested.

  p (0..10).categorize {|e| [e&4, e&2, e&1, e] }
  #=> {0=>{0=>{0=>[0, 8],
  #            1=>[1, 9]},
  #        2=>{0=>[2, 10],
  #            1=>[3]}},
  #    4=>{0=>{0=>[4],
  #            1=>[5]},
  #        2=>{0=>[6],
  #            1=>[7]}}}

The value of innermost hash is an array which contains values for
corresponding keys.
This behavior can be customized by :seed, :op and :update option.

This method can take an option hash.
Available options are follows:

- :seed specifies seed value.
- :op specifies a procedure from seed and value to next seed.
- :update specifies a procedure from seed and block value to next seed.

:seed, :op and :update customizes how to generate
the innermost hash value.
:seed and :op behavies like Enumerable#inject.

If _seed_ and _op_ is specified, the result value is generated as follows.
  op.call(..., op.call(op.call(seed, v0), v1), ...)

:update works as :op except the second argument is the block value itself
instead of the last value of the block value.

If :seed option is not given, the first value is used as the seed.

  # The arguments for :op option procedure are the seed and the value.
  # (i.e. the last element of the array returned from the block.)
  r = [0].categorize(:seed => :s,
                     :op => lambda {|x,y|
                       p [x,y]               #=> [:s, :v]
                       1
                     }) {|e|
    p e #=> 0
    [:k, :v]
  }
  p r #=> {:k=>1}

  # The arguments for :update option procedure are the seed and the array
  # returned from the block.
  r = [0].categorize(:seed => :s,
                     :update => lambda {|x,y|
                       p [x,y]               #=> [:s, [:k, :v]]
                       1
                     }) {|e|
    p e #=> 0
    [:k, :v]
  }
  p r #=> {:k=>1}

The default behavior, array construction, can be implemented as follows.
  :seed => nil
  :op => lambda {|s, v| !s ? [v] : (s << v) }

Note that matz doesn't find satisfact in the method name, "categorize".
[ruby-dev:42681]

Also note that matz wants another method than this method,
which the hash value is the last value, not an array of all values.
This can be implemented by enum.categorize(:op=>lambda {|x,y| y}) { ... }.
But good method name is not found yet.
[ruby-dev:42643]
-- 
Tanaka Akira

Attachments (1)

enum-categorize.patch (9.21 KB, text/x-diff)
% svn diff --diff-cmd diff -x '-u -p'
Index: enum.c
===================================================================
--- enum.c	(revision 30148)
+++ enum.c	(working copy)
@@ -15,7 +15,7 @@
 #include "id.h"
 
 VALUE rb_mEnumerable;
-static ID id_next;
+static ID id_next, id_call, id_seed, id_op, id_update;
 #define id_each idEach
 #define id_eqq  idEqq
 #define id_cmp  idCmp
@@ -2595,6 +2595,211 @@ enum_slice_before(int argc, VALUE *argv,
     return enumerator;
 }
 
+struct categorize_arg {
+    VALUE seed;
+    VALUE op;
+    VALUE update;
+    VALUE result;
+};
+
+static VALUE
+categorize_update(struct categorize_arg *argp, VALUE acc, VALUE ary, VALUE val)
+{
+    if (argp->op != Qundef) {
+        if (SYMBOL_P(argp->op))
+            return rb_funcall(acc, SYM2ID(argp->op), 1, val);
+        else
+            return rb_funcall(argp->op, id_call, 2, acc, val);
+    }
+    else if (argp->update != Qundef) {
+        if (SYMBOL_P(argp->update))
+            return rb_funcall(acc, SYM2ID(argp->update), 1, ary);
+        else
+            return rb_funcall(argp->update, id_call, 2, acc, ary);
+    }
+    else {
+        if (NIL_P(acc))
+            return rb_ary_new3(1, val);
+        else {
+            Check_Type(acc, T_ARRAY);
+            rb_ary_push(acc, val);
+            return acc;
+        }
+    }
+}
+
+static VALUE
+categorize_i(VALUE i, VALUE _arg, int argc, VALUE *argv)
+{
+    struct categorize_arg *argp;
+    VALUE ary, h;
+    VALUE lastk, val, acc;
+    long j;
+
+    ENUM_WANT_SVALUE();
+
+    argp = (struct categorize_arg *)_arg;
+
+    ary = rb_yield(i);
+    ary = rb_convert_type(ary, T_ARRAY, "Array", "to_ary");
+    if (RARRAY_LEN(ary) < 2) {
+        rb_raise(rb_eArgError, "array too short");
+    }
+    lastk = RARRAY_PTR(ary)[RARRAY_LEN(ary)-2];
+    val = RARRAY_PTR(ary)[RARRAY_LEN(ary)-1];
+    h = argp->result;
+    for (j = 0; j < RARRAY_LEN(ary) - 2; j++) {
+        VALUE k = RARRAY_PTR(ary)[j];
+        VALUE h2;
+        h2 = rb_hash_lookup2(h, k, Qundef);
+        if (h2 == Qundef) {
+            h2 = rb_hash_new();
+            rb_hash_aset(h, k, h2);
+        }
+        else {
+            Check_Type(h2, T_HASH);
+        }
+        h = h2;
+    }
+    acc = rb_hash_lookup2(h, lastk, Qundef);
+    if (acc == Qundef) {
+        if (argp->seed == Qundef)
+            acc = val;
+        else
+            acc = categorize_update(argp, argp->seed, ary, val);
+    }
+    else {
+        acc = categorize_update(argp, acc, ary, val);
+    }
+    rb_hash_aset(h, lastk, acc);
+    return Qnil;
+}
+
+/*
+ * call-seq:
+ *   enum.categorize([opts]) {|elt| [key1, ..., val] } -> hash
+ *
+ * categorizes the elements in _enum_ and returns a hash.
+ *
+ * The block is called for each elements in _enum_.
+ * The block should return an array which contains
+ * one or more keys and one value.
+ *
+ *   p (0..10).categorize {|e| [e % 3, e % 5] }
+ *   #=> {0=>[0, 3, 1, 4], 1=>[1, 4, 2, 0], 2=>[2, 0, 3]}
+ *
+ * The keys and value are used to construct the result hash.
+ * If two or more keys are provided
+ * (i.e. the length of the array is longer than 2),
+ * the result hash will be nested.
+ *
+ *   p (0..10).categorize {|e| [e&4, e&2, e&1, e] }
+ *   #=> {0=>{0=>{0=>[0, 8],
+ *   #            1=>[1, 9]},
+ *   #        2=>{0=>[2, 10],
+ *   #            1=>[3]}},
+ *   #    4=>{0=>{0=>[4],
+ *   #            1=>[5]},
+ *   #        2=>{0=>[6],
+ *   #            1=>[7]}}}
+ *
+ * The value of innermost hash is an array which contains values for
+ * corresponding keys.
+ * This behavior can be customized by :seed, :op and :update option.
+ *
+ *   a = [{:fruit => "banana", :color => "yellow", :taste => "sweet"},    
+ *        {:fruit => "melon", :color => "green", :taste => "sweet"},      
+ *        {:fruit => "grapefruit", :color => "yellow", :taste => "tart"}]      
+ *   p a.categorize {|h| h.values_at(:color, :fruit) }
+ *   #=> {"yellow"=>["banana", "grapefruit"], "green"=>["melon"]}
+ *
+ *   pp a.categorize {|h| h.values_at(:taste, :color, :fruit) }
+ *   #=> {"sweet"=>{"yellow"=>["banana"], "green"=>["melon"]},
+ *   #    "tart"=>{"yellow"=>["grapefruit"]}}
+ *
+ * This method can take an option hash.
+ * Available options are follows:
+ *
+ * - :seed specifies seed value.
+ * - :op specifies a procedure from seed and value to next seed.
+ * - :update specifies a procedure from seed and block value to next seed.
+ *
+ * :seed, :op and :update customizes how to generate
+ * the innermost hash value.
+ * :seed and :op behavies like Enumerable#inject.
+ *
+ * If _seed_ and _op_ is specified, the result value is generated as follows.
+ *   op.call(..., op.call(op.call(seed, v0), v1), ...)
+ *
+ * :update works as :op except the second argument is the block value itself
+ * instead of the last value of the block value.
+ *
+ * If :seed option is not given, the first value is used as the seed.
+ *
+ *   # The arguments for :op option procedure are the seed and the value.
+ *   # (i.e. the last element of the array returned from the block.)
+ *   r = [0].categorize(:seed => :s,
+ *                      :op => lambda {|x,y|
+ *                        p [x,y]               #=> [:s, :v]
+ *                        1
+ *                      }) {|e|
+ *     p e #=> 0
+ *     [:k, :v]
+ *   }
+ *   p r #=> {:k=>1}
+ *
+ *   # The arguments for :update option procedure are the seed and the array
+ *   # returned from the block.
+ *   r = [0].categorize(:seed => :s,
+ *                      :update => lambda {|x,y|
+ *                        p [x,y]               #=> [:s, [:k, :v]]
+ *                        1
+ *                      }) {|e|
+ *     p e #=> 0
+ *     [:k, :v]
+ *   }
+ *   p r #=> {:k=>1}
+ *
+ * The default behavior, array construction, can be implemented as follows.
+ *   :seed => nil
+ *   :op => lambda {|s, v| !s ? [v] : (s << v) }
+ *
+ */
+static VALUE
+enum_categorize(int argc, VALUE *argv, VALUE enumerable)
+{
+    VALUE opts;
+    struct categorize_arg arg;
+
+    RETURN_ENUMERATOR(enumerable, 0, 0);
+
+    rb_scan_args(argc, argv, "0:", &opts);
+
+    if (NIL_P(opts)) {
+        arg.seed = Qnil;
+        arg.op = Qundef;
+        arg.update = Qundef;
+    }
+    else {
+        arg.seed = rb_hash_lookup2(opts, ID2SYM(id_seed), Qundef);
+        arg.op = rb_hash_lookup2(opts, ID2SYM(id_op), Qundef);
+        arg.update = rb_hash_lookup2(opts, ID2SYM(id_update), Qundef);
+        if (arg.op != Qundef && arg.update != Qundef) {
+            rb_raise(rb_eArgError, "both :update and :op specified");
+        }
+        if (arg.op != Qundef && !SYMBOL_P(arg.op))
+            arg.op = rb_convert_type(arg.op, T_DATA, "Proc", "to_proc");
+        if (arg.update != Qundef && !SYMBOL_P(arg.update))
+            arg.update = rb_convert_type(arg.update, T_DATA, "Proc", "to_proc");
+    }
+
+    arg.result = rb_hash_new();
+
+    rb_block_call(enumerable, id_each, 0, 0, categorize_i, (VALUE)&arg);
+
+    return arg.result;
+}
+
 /*
  *  The <code>Enumerable</code> mixin provides collection classes with
  *  several traversal and searching methods, and with the ability to
@@ -2662,6 +2867,11 @@ Init_Enumerable(void)
     rb_define_method(rb_mEnumerable, "cycle", enum_cycle, -1);
     rb_define_method(rb_mEnumerable, "chunk", enum_chunk, -1);
     rb_define_method(rb_mEnumerable, "slice_before", enum_slice_before, -1);
+    rb_define_method(rb_mEnumerable, "categorize", enum_categorize, -1);
 
     id_next = rb_intern("next");
+    id_call = rb_intern("call");
+    id_seed = rb_intern("seed");
+    id_op = rb_intern("op");
+    id_update = rb_intern("update");
 }
Index: test/ruby/test_enum.rb
===================================================================
--- test/ruby/test_enum.rb	(revision 30148)
+++ test/ruby/test_enum.rb	(working copy)
@@ -384,4 +384,33 @@ class TestEnumerable < Test::Unit::TestC
                  ss.slice_before(/\A...\z/).to_a)
   end
 
+  def test_categorize
+    assert_equal((1..6).group_by {|i| i % 3 },
+                 (1..6).categorize {|e| [e % 3, e] })
+    assert_equal(Hash[ [ ["a", 100], ["b", 200] ] ],
+                 [ ["a", 100], ["b", 200] ].categorize(:op=>lambda {|x,y| y }) {|e| e })
+    h = { "n" => 100, "m" => 100, "y" => 300, "d" => 200, "a" => 0 }
+    assert_equal(h.invert,
+                 h.categorize(:op=>lambda {|x,y| y }) {|k, v| [v, k] })
+    assert_equal({"f"=>1, "o"=>2, "b"=>2, "a"=>2, "r"=>1, "z"=>1},
+                 "foobarbaz".split(//).categorize(:op=>:+) {|ch| [ch, 1] })
+    assert_equal({"f"=>1, "o"=>2, "b"=>2, "a"=>2, "r"=>1, "z"=>1},
+                 "foobarbaz".split(//).categorize(:update=>lambda {|s, a| s + a.last }) {|ch| [ch, 1] })
+    assert_equal({"f"=>["f", 1],
+                  "o"=>["o", 1, "o", 1],
+                  "b"=>["b", 1, "b", 1],
+                  "a"=>["a", 1, "a", 1],
+                  "r"=>["r", 1],
+                  "z"=>["z", 1]},
+                 "foobarbaz".split(//).categorize(:seed=>[], :update=>:+) {|ch| [ch, 1] })
+    assert_raise(ArgumentError) { [0].categorize {|e| [] } }
+    assert_raise(ArgumentError) { [0].categorize {|e| [1] } }
+    assert_equal(
+      {"f"=>{"o"=>{"o"=>{:c=>1}}},
+       "b"=>{"a"=>{"r"=>{:c=>1},
+                   "z"=>{:c=>1}}}},
+      %w[foo bar baz].categorize(:op=>:+) {|s| s.split(//) + [:c, 1] })
+    #assert_raise(TypeError) { [[1, 2], [1, 2, 3]].categorize {|e| e } }
+  end
+
 end

In This Thread

Prev Next