[#70464] ljust, rjust... — "Chris Pine" <nemo@...>

Just thought I would run these ideas by everyone:

11 messages 2003/05/01

[#70502] temporary redirection of stdout — Andrew Walrond <andrew@...>

I'm new to ruby, so forgive any obvious stupididity, but can anyone

20 messages 2003/05/02

[#70535] SWIG on Solaris problem — Jim Freeze <jim@...>

Hi folks.

14 messages 2003/05/02

[#70594] Why is PHP so popular? What can we learn from the PHP camp? — ptkwt@...1.aracnet.com (Phil Tomson)

....and what can we learn from PHP's rapid rise to success?

99 messages 2003/05/05
[#70641] Re: Why is PHP so popular? What can we learn from the PHP camp? — Andreas Schwarz <usenet@...> 2003/05/05

Aredridel wrote:

[#70652] A wishlist for a "Ruby Standard Library" — Aredridel <aredridel@...> 2003/05/05

A wishlsit for a "Ruby Standard Library":

[#70655] Re: A wishlist for a "Ruby Standard Library" — Brian Candler <B.Candler@...> 2003/05/05

On Tue, May 06, 2003 at 07:39:54AM +0900, Aredridel wrote:

[#70673] Re: A wishlist for a "Ruby Standard Library" — Mark Wilson <mwilson13@...> 2003/05/06

[snipped many wonderful things.]

[#70759] Testing for a class existence — "Gennady" <gfb@...>

Does anybody know an easy way to test for a class/module existence in Ruby? What I do is adding a method to ObjectSpace for this, so that I can test it like:

15 messages 2003/05/06

[#70770] capture output — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I have seen much talking about this topic, but no working code!

68 messages 2003/05/06
[#70929] Re: IO.pipe + thread = hangs (was: Re: capture output) — "Robert Klemme" <bob.news@...> 2003/05/08

[#71741] Named Pipes — Mark Firestone <nedry@...> 2003/05/19

What is the recommended procedure for using named pipes in Ruby. Does one

[#71745] Re: Named Pipes — Brian Candler <B.Candler@...> 2003/05/19

On Mon, May 19, 2003 at 06:33:17PM +0900, Mark Firestone wrote:

[#70842] Symbiosis offer: trade Ruby for German :-) — Mauricio Fern疣dez <batsman.geo@...>

17 messages 2003/05/07

[#70865] access a variables name? — "meinrad.recheis" <my.name.here@...>

is it possible to access the variable-name of an object?

14 messages 2003/05/07

[#70891] Syck 0.25 + YAML.rb -- Objects in plain-text — why the lucky stiff <ruby-talk@...>

..my faithful friends..

20 messages 2003/05/07

[#70919] petition for raa-install to be included in 1.8 — ptkwt@...1.aracnet.com (Phil Tomson)

Similar to the YamlInRuby petition which has now closed.

14 messages 2003/05/08
[#70920] Re: petition for raa-install to be included in 1.8 — Sam Roberts <sroberts@...> 2003/05/08

I just looked again, and remember why I don't know anything about

[#70921] Re: petition for raa-install to be included in 1.8 — why the lucky stiff <ruby-talk@...> 2003/05/08

You can find a tutorial on using raa-install (as well as its API) at:

[#70985] Can a global be a constant? — Jim Freeze <jim@...>

Hi

36 messages 2003/05/08
[#71001] Re: Can a global be a constant? — "Hal E. Fulton" <hal9000@...> 2003/05/08

----- Original Message -----

[#71003] Re: Can a global be a constant? — Jim Freeze <jim@...> 2003/05/08

On Friday, 9 May 2003 at 8:23:52 +0900, Hal E. Fulton wrote:

[#71007] Re: Can a global be a constant? — dblack@... 2003/05/08

Hi --

[#71036] Re: Regexp: why does (re)* return only last repetition? — "Robert Klemme" <bob.news@...>

21 messages 2003/05/09
[#71209] Re: Regexp: why does (re)* return only last repetition? — "Robert Klemme" <bob.news@...> 2003/05/12

[#71225] Re: Regexp: why does (re)* return only last repetition? — Austin Ziegler <austin@...> 2003/05/12

On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:

[#71229] Re: Regexp: why does (re)* return only last repetition? — Brian Candler <B.Candler@...> 2003/05/12

On Mon, May 12, 2003 at 10:18:00PM +0900, Austin Ziegler wrote:

[#71266] Re: Regexp: why does (re)* return only last repetition? — Austin Ziegler <austin@...> 2003/05/12

On Mon, 12 May 2003 23:51:44 +0900, Brian Candler wrote:

[#71042] TCP Sockets — Dominik Werder <dwerder@...>

Hi there,

28 messages 2003/05/09
[#71089] Re: TCP Sockets — Tom Felker <tcfelker@...> 2003/05/09

On Fri, 2003-05-09 at 05:40, Dominik Werder wrote:

[#71543] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

>> How can I tell how many bytes can be read from an IO object without

[#71547] Re: TCP Sockets — Brian Candler <B.Candler@...> 2003/05/16

On Fri, May 16, 2003 at 05:14:17PM +0900, Dominik Werder wrote:

[#71550] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

my problem is not the http protocol itself (not at this time :) but the IO-

[#71551] Re: TCP Sockets — Brian Candler <B.Candler@...> 2003/05/16

On Fri, May 16, 2003 at 07:20:30PM +0900, Dominik Werder wrote:

[#71553] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

> Maybe, but threads are really the "ruby way" to solve this problem.

[#71557] Re: TCP Sockets — Brian Candler <B.Candler@...> 2003/05/16

On Fri, May 16, 2003 at 07:53:39PM +0900, Dominik Werder wrote:

[#71562] Re: TCP Sockets — Dominik Werder <dwerder@...> 2003/05/16

> That would mean mixing the binary streams in a non-deterministic way,

[#71107] RCR for child execution — Brian Candler <B.Candler@...>

Looking on RubyGarden it seems that the RCR process there is "resting", so

99 messages 2003/05/10
[#71122] Re: RCR for child execution — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/10

On Sun, 11 May 2003 01:50:49 +0900, Brian Candler wrote:

[#71126] Re: RCR for child execution — Brian Candler <B.Candler@...> 2003/05/10

On Sun, May 11, 2003 at 01:27:31AM +0900, Simon Strandgaard wrote:

[#71364] Re: RCR for child execution — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/13

On Tue, 13 May 2003 21:11:08 +0000, ahoward wrote:

[#71385] Re: RCR for child execution — matz@... (Yukihiro Matsumoto) 2003/05/14

Hi,

[#71152] Is Rubygarden's wiki restricted to English? — Mauricio Fern疣dez <batsman.geo@...>

20 messages 2003/05/11
[#71160] Re: Is Rubygarden's wiki restricted to English? — "Hal E. Fulton" <hal9000@...> 2003/05/11

----- Original Message -----

[#71165] Re: Is Rubygarden's wiki restricted to English? — Mauricio Fern疣dez <batsman.geo@...> 2003/05/11

On Mon, May 12, 2003 at 12:40:26AM +0900, Hal E. Fulton wrote:

[#71189] efficiency advice needed — "meinrad.recheis" <my.name.here@...>

hi,

12 messages 2003/05/11

[#71297] State Pattern Implementation — "Robert Klemme" <bob.news@...>

22 messages 2003/05/13

[#71361] Objects VS Datastructures — Simon Vandemoortele <deliriousREMOVEUPPERCASETEXTTOREPLY@...>

19 messages 2003/05/13

[#71447] Embedding/GC/heap corruption problem — "Jan Bernhardt" <j.bernhardt@...>

Hi,

22 messages 2003/05/14

[#71488] Test::Unit sequencing — Brian Candler <B.Candler@...>

A question for more experienced Test::Unit users.

23 messages 2003/05/15
[#71492] Re: Test::Unit sequencing — Anders Bengtsson <ndrsbngtssn@...> 2003/05/15

--- Brian Candler <B.Candler@pobox.com> wrote:

[#71508] Re: Test::Unit sequencing — ahoward <ahoward@...> 2003/05/15

On Thu, 15 May 2003, [iso-8859-1] Anders Bengtsson wrote:

[#71510] RCR: $INCLUDED global var — martindemello@... (Martin DeMello)

$INCLUDED = (__FILE__ != $0)

25 messages 2003/05/15
[#71515] Re: RCR: $INCLUDED global var — matz@... (Yukihiro Matsumoto) 2003/05/15

Hi,

[#71525] Re: RCR: $INCLUDED global var — ahoward <ahoward@...> 2003/05/15

On Fri, 16 May 2003, Yukihiro Matsumoto wrote:

[#71520] public/protected/private syntax — Guillaume Marcais <guslist@...>

I tend to find the public/protected/private keywords in Ruby a little odd.

27 messages 2003/05/15
[#71540] Re: public/protected/private syntax — "Robert Klemme" <bob.news@...> 2003/05/16

[#71573] Re: public/protected/private syntax — Guillaume Marcais <guillaume.marcais@...> 2003/05/16

On Friday 16 May 2003 03:38 am, you wrote:

[#71595] Re: public/protected/private syntax — Austin Ziegler <austin@...> 2003/05/16

On Fri, 16 May 2003 23:33:21 +0900, Guillaume Marcais wrote:

[#71560] gzip cgi compression — Dominik Werder <dwerder@...>

Is zlib compatible with HTTP-gzip-output-compression?

14 messages 2003/05/16

[#71636] select strange behavier — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

'select' is suppose to watch some file-descriptors and when an event

22 messages 2003/05/17

[#71673] An Object Going Out Of Scope — "vinita Papur" <gkapur@...>

A quick question. How can one discern when an object goes out of scope?

46 messages 2003/05/18
[#71678] Re: An Object Going Out Of Scope — "MikkelFJ" <mikkelfj-anti-spam@...> 2003/05/18

[#71680] Re: An Object Going Out Of Scope — Mauricio Fern疣dez <batsman.geo@...> 2003/05/18

On Sun, May 18, 2003 at 06:08:43PM +0900, MikkelFJ wrote:

[#71681] ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

i need this for a realtime game application which has embedded ruby -- after

[#71683] Re: ruby garbage collection — Mauricio Fern疣dez <batsman.geo@...> 2003/05/18

On Sun, May 18, 2003 at 08:35:11PM +0900, Gaffer wrote:

[#71685] Re: ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

strange, i found the rb_gc call on my own and called that to good effect

[#71688] Re: ruby garbage collection — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/18

On Sun, 18 May 2003 22:10:18 +0900, Gaffer wrote:

[#71689] Re: ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

i think its actually the GC cleaning up matrix and vector classes (my own

[#71691] Re: ruby garbage collection — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/18

On Sun, 18 May 2003 22:39:17 +0900, Gaffer wrote:

[#71692] Re: ruby garbage collection — "Gaffer" <gaffer@...> 2003/05/18

i'm pretty sure i've tracked down the cause, this is my first time embedding

[#71695] Re: ruby garbage collection — "Simon Strandgaard" <0bz63fz3m1qt3001@...> 2003/05/18

On Sun, 18 May 2003 23:48:28 +0900, Gaffer wrote:

[#71948] How I'd like method-wrapping to work... — "Hal E. Fulton" <hal9000@...>

OK, I read Matz's blog entries as well as I could.

16 messages 2003/05/21

[#72030] why is "does" missing from this sub!-stitution? — Dave Oshel <dcoshel@...>

[~/Desktop] dave$ cat foobar.rb ; foobar.rb

19 messages 2003/05/22
[#72037] Re: why is "does" missing from this sub!-stitution? — Dave Oshel <dcoshel@...> 2003/05/22

In article <20030522202818.GA24497@student.ei.uni-stuttgart.de>,

[#72056] Naive CGI question — "Hal E. Fulton" <hal9000@...>

I'm betting this is either impossible

15 messages 2003/05/23

[#72134] Problem compiling extension on Solaris — "Tim Hunter" <cyclists@...>

I have an user who is trying to build RMagick on Solaris with Ruby 1.6.8.

22 messages 2003/05/25
[#72262] Re: Problem compiling extension on Solaris — Daniel Berger <djberge@...> 2003/05/27

[#72150] Binary Tree vs. Hash — Xiangrong Fang <xrfang@...>

Hi ruby fans,

47 messages 2003/05/26

[#72184] Project Directory Structure — Jim Freeze <jim@...>

Hi:

47 messages 2003/05/26
[#72218] Re: Project Directory Structure — "MikkelFJ" <mikkelfj-anti-spam@...> 2003/05/26

[#72222] Re: Project Directory Structure — Jim Freeze <jim@...> 2003/05/26

Thanks everyone for your input so far.

[#72244] Re: Project Directory Structure — Robert Feldt <feldt@...> 2003/05/27

On Tue, 27 May 2003, Jim Freeze wrote:

[#72260] Re: Project Directory Structure — Jim Freeze <jim@...> 2003/05/27

On Tuesday, 27 May 2003 at 18:26:53 +0900, Robert Feldt wrote:

[#72265] Re: Project Directory Structure — Jim Freeze <jim@...> 2003/05/27

Thanks for all the input. A description of the Project

[#72269] Re: Project Directory Structure — Robert Feldt <feldt@...> 2003/05/27

On Wed, 28 May 2003, Jim Freeze wrote:

[#72274] RCR: unpack/pack Bignum — Robert Feldt <feldt@...>

I'm sure this has been discussed before and maybe there are good reasons

27 messages 2003/05/27
[#72375] Re: RCR: unpack/pack Bignum — Robert Feldt <feldt@...> 2003/05/28

No one seems to be interested in this issue so I'll have to reply to

[#72381] Re: RCR: unpack/pack Bignum — nobu.nokada@... 2003/05/28

Hi,

[#72394] Re: RCR: unpack/pack Bignum — Robert Feldt <feldt@...> 2003/05/29

On Thu, 29 May 2003 nobu.nokada@softhome.net wrote:

[#72403] Re: RCR: unpack/pack Bignum — nobu.nokada@... 2003/05/29

Hi,

[#72600] What is BER compression? (was RCR: unpack/pack Bignum) — Sam Roberts <sroberts@...> 2003/05/31

Is it documented anywhere, what this 'w' template is useful for?

[#72371] Windows Installer for Ruby 1.8.0 (CVS) — Andrew Hunt <andy@...>

Hi all,

16 messages 2003/05/28

[#72388] Array.extend versus instance.extend — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I want to install 'shift_until_kind_of' in the global Array class

18 messages 2003/05/29

[#72420] Metakit for Ruby - Would you want it? — bobx@... (Bob)

I have a gentleman in England who I have been talking with who is

23 messages 2003/05/29

[#72439] Iteration - last detection — "Orion Hunter" <orion2480@...>

Is there any built in functionality for iteration that will allow me to

41 messages 2003/05/29
[#72510] Re: Iteration - last detection — Carlos <angus@...> 2003/05/30

> Is there any built in functionality for iteration that will allow me to

[#72577] IF statement in ruby 1.8.0 (2003-05-26) [i386-mswin32] — "Shashank Date" <sdate@...>

Just when I thought that I had perfectly understood the IF statement in

14 messages 2003/05/31

Re: Binary Tree vs. Hash

From: "MikkelFJ" <mikkelfj-anti-spam@...>
Date: 2003-05-26 16:40:05 UTC
List: ruby-talk #72183
"Xiangrong Fang" <xrfang@hotmail.com> wrote in message
news:20030526135729.583F.XRFANG@hotmail.com...
> Hi ruby fans,
>
> I have a question about algorithm efficiency.

OK

> I have used hash in my ruby program. It worked fine. However, I found
> that the memory consumed by a hash is tremendous. While the hash file is
> stored on disk (using pstore) it is merely 100-200KB, but in memory, it
> consumes apporximately 5-10 Megabytes! I don't know if it is related to
> the cache algorithm used by ruby?

I'm assuming 32 bit architecture below.

I don't know how the hash works in Ruby, or how you dumped it on disk.
But if the disk is non-indexed such that it is loaded into memory before
becoming useful, you can save an awful lot of memory.
A hash can be quite memory consuming. Each entry in the hash typically has:
hashkey, reference to real key, reference to data, reference to next
collision.

This is typically 4 32bit words plus the actually key.

For afficient allocation you typically have 25% extra entries but this can
vary, so you actually use 5 x 32bit in this case.
You also need a bucket table. A bucket table could be that array of entries
directly, but it is better to have a separate table. The bucket table takes
up on reference to the entry. The bucket table is preferably more empty than
full (which is why it is better to not use the entry table directly.
How much overcapacity you use in the bucket table is a design decision, but
for at least half empty you need two references.

So we are totalling 7 x 32bit per stored entry, not counting any keys
(strings or whatever). Keys are typically short, even for strings, so one or
two 32bit words probably.

A total rough estimate is therefore 8 32 bit words per stored entry in the
hash table.
If you have 2 32 bit words worth of data stored in a flat file unindexed,
you use 4 times that memory in a in-memory hash.
This suggests that Ruby should use around 1-2MB if your on disk
datastructure uses 100-200KB. However, Ruby probably has a lot of extra
per-object overhead for keys.

>
> I tried to reduce this memory requirment and thought of using Binary
> Tree. Just did some test in Delphi, it seems good. I loaded an English
> dictionary of about 1M (90000+ lines) it just took about 6-8M of memory,
> seems much better than hash.

Binary trees are typically implemented as red-black trees. The have
reference to data, reference to left and right child, often also reference
to parent. And they have some bits to store its color. This can be done
tagging a bit in one of the pointers, but typically a node useds 5 32 bit
words and the keys. If we use 2 32 bit words for storing keys, we also end
up using 7 32bit words per entry.
That is only marginally better than hash tables.

> Now my question is, since hash is O(1) and BTree is O(LOG(N)), does it
> mean that hash is born to be less storage-efficient than BTree? How can
> I get the best of both BTree and hash?

Don't worry too much about the Log(N) in B-Trees. It is not log2(N)
but logM(N), where M is in the range of 10 to 1000. For most practical
purposes
the B-Tree can be viewed as close to O(1). However, the constant factor
may be somewhat larger than for hash tables. It just means a B-Tree scales
well.
A binary tree does not scale nearly as well. Here you truly get log(N)
performance.

A binary tree and a B-Tree keeps an order relation, a hash does not
- or rather:

Usually you do not get to keep insertion order in hashtables, but it is in
fact possible to do so without paying any significant time/space overhead.
I have written such a hashtable but only implemented it for 32bit keys
(which
therefor can be stored in-place in the hash entry. This is not sorting, but
it is
extra ordering information.

Searching Binary trees are usually faster than most things for small (<100)
items.
This assumes the tree allocates tree node from a common pool so all nodes
are close
to each other (for cache performance).
Arrays are faster for (<10) items (linear search, not binary search).

A hash table generally performs very well for medium to large amounts of
data.
One problem is the memory consumption which hurts cache efficiency. Another
problem is the large bucket table with random access pattern. This is also
bad for
cache performance. However, despite these problems, a good hash so fast that
it still manages to beat more clever datastructures except in the
competition for size.

A B-Tree uses [reference to key, reference to data]. It also uses additional
memory
for internal tree nodes (assuming all data is stored at leaf entries).
Had the B-Tree been binary, it would use the same amount of memory for
internal nodes
as for leaf storage, but the branching is higher, so the memory consuption
drops. I don't
have any exact mesures here, but the cost of internal nodes are limited.
A B-Tree operates at about 2/3 of allocated memory. So you need 3 32 bit
words plus internal
node overhead. A rough estimate is therefore 4 entries. Then you need to add
the overhead
for any keys you store externally. Thus we get to about 5 bit words per
entry.

The best way to work with in-memory B-Trees is to ensure large branchfactor
that still is cache friendly.
Furthermore, we want to scan a B-Tree node linearly and not using binary
search becuase it is too slow
for anything below 100 entries. A good choice for in memory nodes appear to
be around 7-15 entries,
but that depends on processor and the data stored. If the key is external,
it may be relevant to use
binary seach to reduce the number of comparions.
The B-Tree generally has very good cache performance, except it can be hurt
be external keys.

The Judy tree / trie is even more complex than the B-Tree but is more memory
efficient than B-Trees
and may be slightly more cache friendly.

The B-Tree and Judy tree both scales well to very large datastructures,
whereas a hash table gets into
trouble as soon as you start trashing to disk. A Binary tree is just not as
efficient long before memory
becomes a problem.

The Judy trie does not have a problem a problem with external keys. I have
developed a B-Trie which are
stacked B-Trees each holding a part of the key. It uses too much memory
becuase there are many
mostly empty B-Tree nodes, but this can fixed by optimizing B-Trees for very
small collections.

A believe the B-Tree is easier to update than Judy trees and is better for
combined in-memory and on-disk
datastructures. However, for strictly in-memory operation the Judy tree may
be better.

Another datastructure, the Skip-list datastructure has been very popular
becuase it is 10 times
easier to implement than an efficient in-memory B-Tree. It has been claimed
to better than B-Trees.
This is not true. A skip-list jumps all over the memory and has poor cache
performance. It suffers from
pointer chasing. I have done some benchmarks, and I 've seen one other
person do some similar
investigations reaching the same conclusion.
This does not mean you shouldn't use skip-lists. They are still useful
datastructures - they may be better than than red-black trees which are
typically used for map
implementations.

B-Trees are very kind to efficient allocation of memory. Judy trees and skip
lists allocate all sorts of
different memory sizes. A B-Tree only allocates one node size (and possibly
one smaller node size
for the optimized special case of very small collections).

Mikkel



In This Thread