[#108461] [Ruby master Bug#18762] Add an Array#undigits that compliments Integer#digits — "shan (Shannon Skipper)" <noreply@...>

Issue #18762 has been reported by shan (Shannon Skipper).

8 messages 2022/05/02

[#108499] [Ruby master Bug#18767] IO.foreach hangs up when passes limit=0 — "andrykonchin (Andrew Konchin)" <noreply@...>

Issue #18767 has been reported by andrykonchin (Andrew Konchin).

9 messages 2022/05/10

[#108500] [Ruby master Bug#18768] Inconsistent behavior of IO, StringIO and String each_line methods when return paragraph and chomp: true passed — "andrykonchin (Andrew Konchin)" <noreply@...>

Issue #18768 has been reported by andrykonchin (Andrew Konchin).

7 messages 2022/05/10

[#108511] [Ruby master Feature#18773] deconstruct to receive a range — "kddeisz (Kevin Newton)" <noreply@...>

Issue #18773 has been reported by kddeisz (Kevin Newton).

12 messages 2022/05/11

[#108514] [Ruby master Feature#18774] Add Queue#pop(timeout:) — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18774 has been reported by Eregon (Benoit Daloze).

17 messages 2022/05/11

[#108522] [Ruby master Feature#18776] Object Shapes — "jemmai (Jemma Issroff)" <noreply@...>

Issue #18776 has been reported by jemmai (Jemma Issroff).

25 messages 2022/05/11

[#108543] [Ruby master Bug#18779] `GC.compact` and other compaction related methods should be defined as rb_f_notimplement on non supported platforms. — "byroot (Jean Boussier)" <noreply@...>

Issue #18779 has been reported by byroot (Jean Boussier).

10 messages 2022/05/13

[#108546] [Ruby master Bug#18780] Incorrect binding receiver for C API rb_eval_string() — "daveola (David Stellar)" <noreply@...>

Issue #18780 has been reported by daveola (David Stellar).

21 messages 2022/05/13

[#108549] [Ruby master Bug#18781] MJIT tests failing with Ubuntu focal with gcc-11 and some flags — "jaruga (Jun Aruga)" <noreply@...>

Issue #18781 has been reported by jaruga (Jun Aruga).

8 messages 2022/05/14

[#108552] [Ruby master Bug#18782] Race conditions in autoload when loading the same feature with multiple threads. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18782 has been reported by ioquatix (Samuel Williams).

11 messages 2022/05/14

[#108565] [Ruby master Bug#18784] `FileUtils.rm_f` and `FileUtils.rm_rf` should not mask exceptions — deivid <noreply@...>

Issue #18784 has been reported by deivid (David Rodr鱈guez).

33 messages 2022/05/16

[#108590] [Ruby master Feature#18788] Support passing Regexp options as String to Regexp.new — janosch-x <noreply@...>

Issue #18788 has been reported by janosch-x (Janosch M端ller).

10 messages 2022/05/17

[#108659] [Ruby master Bug#18798] `UnboundMethod#==` with inherited classes — "ko1 (Koichi Sasada)" <noreply@...>

Issue #18798 has been reported by ko1 (Koichi Sasada).

16 messages 2022/05/24

[#108708] [Ruby master Bug#18808] Cannot compile ruby 3.1.2 on powerpc64le-linux without disabling the jit features — "npn (John Davis)" <noreply@...>

Issue #18808 has been reported by npn (John Davis).

17 messages 2022/05/26

[#108724] [Ruby master Feature#18809] Add Numeric#ceildiv — "kyanagi (Kouhei Yanagita)" <noreply@...>

Issue #18809 has been reported by kyanagi (Kouhei Yanagita).

9 messages 2022/05/27

[#108728] [Ruby master Bug#18810] Make `Kernel#p` interruptable. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18810 has been reported by ioquatix (Samuel Williams).

13 messages 2022/05/28

[ruby-core:108498] [Ruby master Feature#18611] Promote best practice for combining multiple values into a hash code

From: "ioquatix (Samuel Williams)" <noreply@...>
Date: 2022-05-10 06:24:17 UTC
List: ruby-core #108498
Issue #18611 has been updated by ioquatix (Samuel Williams).


This makes me happy, thanks everyone!

----------------------------------------
Feature #18611: Promote best practice for combining multiple values into a hash code
https://bugs.ruby-lang.org/issues/18611#change-97543

* Author: chrisseaton (Chris Seaton)
* Status: Closed
* Priority: Normal
----------------------------------------
User-defined hash methods often work by combining the hash code of several values into one. This requires some logic to combine the values. In our experience, users are making a variety of choices for the algorithm for this, and in many cases are picking an option which may not be the best for security and performance in multiple ways. It's also a shame that users are having to think about how to do this basic operation themselves.

For example, this hash method creates a single hash code by combining the hash value of three values that make up the object. The user has combined the values using the xor operator and multiplication by prime numbers to distribute bits. This is an ok way to combine multiple values into a hash code.

```ruby
def hash
  x.hash ^ (y.hash * 3) ^ (z.hash * 5)
end
```

But users have to know to do this, and they sometimes get it wrong, writing things like this.

```ruby
def hash
  x.hash ^ y.hash ^ z.hash
end
```

This isn't distributing the bits very well. A bad hash code may harm performance if it cause more collisions in a hash table. Collisions may also cause excess cache eviction, which would further harm performance. If performance is reduced in this way there's a potential security risk due to denial-of-service. (We don't think this is an immediate practical security problem, which is why we're discussing in the open issue tracker, not the security mailing list.)

The `x.hash ^ (y.hash * 3) ^ (z.hash * 5)` pattern is still not ideal, as users have to manually write it, and it's a lot of logic to execute in the Ruby interpreter, when it could be possibly be done in native code instead. A better pattern we think is this:

```ruby
def hash
  [x, y, z].hash
end
```

This leaves the logic of creating a suitable and safe hash value to `[...].hash`, which does it correctly.

Why doesn't everyone already use this pattern? Because it's not documented as the right thing to do. We want to present a couple of options for what could be done to encourage people to use this pattern or an equivalent, to help people write more concise and clear code that is also more performant and secure.

# Document `[...].hash` as best practice and optimise it

If we want people to use `[...].hash`, we should say that in the documentation for `Kernel#hash` as the best practice. Wording along the lines of

> If you're implementing a hash code for a compound set of values, best practice is to combine them with `[...].hash`. For example....

This way people reading the documentation on `Kernel#hash` get pointed in the clear, concise, performant, secure direction.

We can combine this recommendation with an optimisation to `[...].hash` to remove the array allocation in implementation of Ruby without escape analysis and scalar replacement, similar to what is done for `Array#min` and `#max`. This way the best practice is even faster.

# Introduce a new similar method, but specifically for the purpose so it is discoverable

A second option is to introduce a new method, specifically for this task, `hash_objects(...)`. This is inspired By Java's `Objects.hash(...)`. The reason for the new method is that it should make it more discoverable - if you go looking for a tool to combine hash values you'd find one. We'd still link to it from `Kernel#hash`. This method would not require the array allocation removal optimisation, as it's just a simple call.

# Examples of hash methods

Even the MRI codebase has some suboptimal hash methods we don't need to look very far for examples. For example lib `lib/resolv.rb`, these two hash methods don't distribute the bits they combine

* https://github.com/ruby/ruby/blob/c445963575a8572f6b0baf7135093c128adab3b9/lib/resolv.rb#L1734
* https://github.com/ruby/ruby/blob/c445963575a8572f6b0baf7135093c128adab3b9/lib/resolv.rb#L1307

Both these examples could be replaced with either of our proposals.

A good example of someone already using best practice is this.

* https://github.com/ruby/ruby/blob/128972189284f4338722e8a910d0b4f6e7a02b31/lib/bundler/source/git.rb#L50

But this would still be faster with the optimisation we proposed, or using `hash_objects(...)`, as that'd remove the array allocation and the `hash` call.

# Other things we've already done

We've proposed a RuboCop cop to try to catch the pattern we think is suboptimal https://github.com/rubocop/rubocop/pull/10441.

Co-authored with @sambostock.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next