Re: Bob Jenkins' hashing implementation in Ruby
From:
Mauricio Fern疣dez <batsman.geo@...>
Date:
2003-03-02 21:43:01 UTC
List:
ruby-core #905
On Mon, Mar 03, 2003 at 06:13:38AM +0900, Mathieu Bouchard wrote: > > On Sat, 1 Mar 2003, Mauricio Fern疣dez wrote: > > > I am using the Jenkins' implementation from Perl 5.8. Moreover, I got > > rid of the modulus operations (and the prime numbers) by using bitwise > > AND and ensuring that num_bins is 2^n - 1 (this implies that I've had > > to add 1 in a number of places, but I believe it is more important to > > make FIND_ENTRY as fast as possible). > > The point of modulo-big-prime is that you minimize the chance that a bad > hashfunction maps to a limited number of slots. With 2**n slots you > _almost_ _maximize_ it! I was assuming we would use a good hash function... > > > 10000 30000 100000 300000 500000 1000000 > > ruby-1.6.8.jenkins 0.0910 0.3198 1.2080 10.0227 7.7709 18.0764 > > ruby-1.6.8.jenkins.orig 0.0907 0.3254 1.2281 10.2327 8.0457 18.3917 > > ruby.jenkins 0.0857 0.2942 1.3588 4.2214 8.0908 17.9354 > > ruby.jenkins.orig 0.0902 0.3133 1.4469 4.5248 8.5991 18.7564 > > [BTW it seems 1.8's GC has been improved quite a bit as shown by the > > great difference for 300000 iterations; great work! :) ] > > Because in the 1.6.8 case the result of 300000 is (significantly!) bigger > than in the 500000 case, I consider the whole test to be completely bogus. > You'd have to find why it fails to produce results that make sense. It's because of the GC. I got similar speed increases when disabling it. > And THEN you'd have to make the comparison on Symbols, Fixnums, plain > Objects, Arrays of whatever, etc. Pick at least three different common > cases to make sure you aren't deoptimising too many things. We can probably find one way to hash these so that collisions don't grow too much. I'll check it when I get some time. -- _ _ | |__ __ _| |_ ___ _ __ ___ __ _ _ __ | '_ \ / _` | __/ __| '_ ` _ \ / _` | '_ \ | |_) | (_| | |_\__ \ | | | | | (_| | | | | |_.__/ \__,_|\__|___/_| |_| |_|\__,_|_| |_| Running Debian GNU/Linux Sid (unstable) batsman dot geo at yahoo dot com * JHM wonders what Joey did to earn "I'd just like to say, for the record, that Joey rules." -- Seen on #Debian