[#68415] eval'ing a config file — Ian Macdonald <ian@...>

Hi,

19 messages 2003/04/01

[#68421] sharing objects between tests (revisited?) — Paul Brannan <pbrannan@...>

I don't know if I've asked this on this list before or only on irc (I

13 messages 2003/04/01

[#68436] April Fools. — Daniel Carrera <dcarrera@...>

Hey guys and gals,

24 messages 2003/04/01

[#68449] Newbie question:read file speed — "Greg Brondo" <greg@...>

Why is ruby (on windows) so much slower at reading lines in a file (as

36 messages 2003/04/01

[#68527] Any Hardware/EDA engineers out there? — ptkwt@...1.aracnet.com (Phil Tomson)

12 messages 2003/04/03

[#68605] keeping track of non-exported global variables — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

problem:

12 messages 2003/04/04

[#68623] To inherit or to include? That is the question. — Jim Freeze <jim@...>

Hi

11 messages 2003/04/04

[#68707] Call for standardised package installation procedure — google@... (Tom Payne)

I'm helping maintain Ruby and Ruby packages in Gentoo Linux.

57 messages 2003/04/06
[#68712] Re: Call for standardised package installation procedure — "James Britt" <james@...> 2003/04/06

> It would make my job a lot easier if just one were chosen, and perhaps

[#68729] Re: Call for standardised package installation procedure — "Hal E. Fulton" <hal9000@...> 2003/04/06

----- Original Message -----

[#68738] Re: Call for standardised package installation procedure — google@... (Tom Payne) 2003/04/07

John Johnson <jj5412@earthlink.net> wrote in message news:<1049655145.1847.10.camel@hppav.home.net>...

[#68779] Re: Call for standardised package installation procedure — Austin Ziegler <austin@...> 2003/04/07

On Mon, 7 Apr 2003 16:47:20 +0900, Tom Payne wrote:

[#68781] Re: Call for standardised package installation procedure — Mauricio Fern疣dez <batsman.geo@...> 2003/04/07

On Tue, Apr 08, 2003 at 02:34:26AM +0900, Austin Ziegler wrote:

[#68826] Re: Call for standardised package installation procedure — Gavin Sinclair <gsinclair@...> 2003/04/08

On Tuesday, April 8, 2003, 3:42:49 AM, Mauricio wrote:

[#68803] Having trouble getting iconv-0.5 working on OS X — Sam Roberts <sroberts@...>

I do a make, install, and then:

10 messages 2003/04/08
[#68806] Re: Having trouble getting iconv-0.5 working on OS X — Nobuyoshi Nakada <nobu.nokada@...> 2003/04/08

Hi,

[#68811] Array Sutraction — Jim Freeze <jim@...>

Ok, this has been discussed at length previously,

25 messages 2003/04/08
[#68820] Re: Array Sutraction — "Robert Klemme" <bob.news@...> 2003/04/08

[#68828] Re: Array Sutraction — Michael Campbell <michael_s_campbell@...> 2003/04/08

>

[#68843] Ruby for graphics — "Your Name Here" <jim@...>

I just learned of Ruby, and was wondering if its a good lang for

17 messages 2003/04/08
[#68844] Re: [Q] Ruby for graphics — Michael Campbell <michael_s_campbell@...> 2003/04/08

--- Your Name Here <jim@fivek.com> wrote:

[#68908] The "!" and "?" characters. — Daniel Carrera <dcarrera@...>

One of the things I like about Ruby is that it can use ! and ? in method

22 messages 2003/04/08

[#68929] embedding ruby — emilie3012@... (Steve Hart)

Please forgive the following if answers appear elsewhere but I have

13 messages 2003/04/09

[#68943] unknown node type 0 — Francois GORET <fg@...>

Hello,

12 messages 2003/04/09

[#68996] ANN: ri v1.8 — Dave Thomas <dave@...>

I'm releasing a very preliminary version of 'ri' for Ruby 1.8. This

21 messages 2003/04/09

[#69025] tutorial on embedding ruby (review) — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

What do you think about it ?

30 messages 2003/04/09

[#69054] PRIVATE AND CONFIDENTIAL — "Mr. John Eze" <mr_musa3@...>

16 messages 2003/04/10
[#69066] Re: PRIVATE AND CONFIDENTIAL — Peter Hickman <peter@...> 2003/04/10

As an aside not only would the origonal spam be removed by your rules,

[#69096] Need IO Optimization help — Jim Freeze <jim@...>

Hello:

69 messages 2003/04/10
[#69197] Re: Need IO Optimization help — ptkwt@...1.aracnet.com (Phil Tomson) 2003/04/11

In article <20030411115918.A35958@linnet.org>,

[#69239] Does dynamic typing make it easier to place an object in a container? — Mark Wilson <mwilson13@...> 2003/04/12

The following is from

[#69240] Re: Does dynamic typing make it easier to place an object in a container? — Joel VanderWerf <vjoel@...> 2003/04/12

Mark Wilson wrote:

[#69245] Re: Does dynamic typing make it easier to place an object in a container? — Ryan Pavlik <rpav@...> 2003/04/12

On Sat, 12 Apr 2003 13:40:07 +0900

[#69581] Re: Need IO Optimization help — David King Landrith <dave@...> 2003/04/17

In my experience, the fastest way to access files (by far) is mmap.

[#69583] Re: Need IO Optimization help — Jim Freeze <jim@...> 2003/04/17

On Thursday, 17 April 2003 at 19:29:16 +0900, David King Landrith wrote:

[#69591] Re: Need IO Optimization help — David King Landrith <dave@...> 2003/04/17

On Thursday, April 17, 2003, at 06:45 AM, Jim Freeze wrote:

[#69593] Re: Need IO Optimization help — Jim Freeze <jim@...> 2003/04/17

On Thursday, 17 April 2003 at 22:11:55 +0900, David King Landrith wrote:

[#69179] Two questions — "Steve Adams" <adamss@...>

What restrictions does the Ruby license place on the construction and sale

14 messages 2003/04/11

[#69194] splat question — "Chris Pine" <nemo@...>

(This question assumes that the unary `*' (used in arrays and such) is

13 messages 2003/04/11

[#69214] class documentation — "Bermejo, Rodrigo" <rodrigo.bermejo@...>

Hi all;

13 messages 2003/04/11

[#69271] Controlling an interactive program from Ruby — Daniel Carrera <dcarrera@...>

Hi,

12 messages 2003/04/13

[#69280] ruby_script() — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I am wondering what exactly ruby_script() is doing ?

15 messages 2003/04/13

[#69357] A class, that knows about it's instances + Sets — KONTRA Gergely <kgergely@...>

Hi!

11 messages 2003/04/14

[#69413] rb_class_new_instance behaves strange — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

My code is behaving different, when im doing this change:

23 messages 2003/04/15

[#69424] Urgent Assistance — "Victor Aloma" <victorloma@...>

12 messages 2003/04/15

[#69439] ANN: Debian packages of FreeRIDE, FOX, FXRuby, Ripper, FXScintilla, etc — Mauricio Fern疣dez <batsman.geo@...>

9 messages 2003/04/15

[#69470] regular expressions — "Chris Pine" <nemo@...>

When I first learned regular expressions, they were no problem. It was in a

27 messages 2003/04/15

[#69518] Roundoff problem with Float and Marshal — cilibrar@... (Rudi Cilibrasi)

The following small test program:

29 messages 2003/04/16

[#69536] Reg. Expressios with "\n" — Daniel Carrera <dcarrera@...>

Hello,

14 messages 2003/04/16

[#69585] extension - redirect a block — student_vienna@... (daniel)

hello,

11 messages 2003/04/17

[#69595] ANN: ri 1.8b — Dave Thomas <dave@...>

I've updated ri:

14 messages 2003/04/17

[#69645] avoiding the module name — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

11 messages 2003/04/17

[#69700] Biased weighted random? — "Hal E. Fulton" <hal9000@...>

Hi, all...

51 messages 2003/04/18

[#69762] Multi-Lingual Ruby — Jim Weirich <jweirich@...>

I was following a Java VS Perl discussion on a web board that I read.

12 messages 2003/04/20

[#69806] ASCII class inheritance tree generator — Simon Vandemoortele <deliriousNOSPAM@...>

I thought I might share this little piece of code that generates a

10 messages 2003/04/21

[#69830] Ruby in a university course — "Chad Fowler" <chadfowler@...>

Maybe this has already been covered here, but I found it interesting that Cem Kaner is using Ruby in a software testing course at the Florida Institute of Technology. (I'm guessing this is due to some influence that Brian Marick had on him).

14 messages 2003/04/21

[#69931] Ruby.shop — "Hal E. Fulton" <hal9000@...>

Hello, all.

28 messages 2003/04/23

[#69956] grep and regular expressions in ruby — "Krishna Dole" <kpd@...>

I'm quite taken with ruby, but recently I ran into trouble using grep. I

15 messages 2003/04/23

[#69969] Subject: Re: [ANN] Ruby.shop — Jim Weirich <jweirich@...>

On Wed, 2003-04-23 at 18:16, Hal E. Fulton wrote:

18 messages 2003/04/24

[#70015] How to call an object instance's method? — Rene Tschirley <pooh@...>

Dear Ruby Experts,

28 messages 2003/04/24
[#70016] Re: How to call an object instance's method? — "Robert Klemme" <bob.news@...> 2003/04/24

[#70019] Re: How to call an object instance's method? — Rene Tschirley <pooh@...> 2003/04/24

Robert Klemme wrote:

[#70072] Re: How to call an object instance's method? — "Chris Pine" <nemo@...> 2003/04/24

----- Original Message -----

[#70017] MathN — Dave Thomas <dave@...>

I'm trying to get to grips with the 'mathn' library. I can see what it

12 messages 2003/04/24

[#70034] block.call vs. yield — "Orion Hunter" <orion2480@...>

I noticed that the use of block/yield differs slightly when a "break" is

44 messages 2003/04/24
[#70046] Re: block.call vs. yield — matz@... (Yukihiro Matsumoto) 2003/04/24

Hi,

[#70087] Re: block.call vs. yield — matz@... (Yukihiro Matsumoto) 2003/04/25

Hi,

[#70113] Re: block.call vs. yield — dblack@... 2003/04/25

Hi --

[#70182] Re: block.call vs. yield — "Hal E. Fulton" <hal9000@...> 2003/04/26

----- Original Message -----

[#70189] Re: block.call vs. yield — dblack@... 2003/04/26

Hi --

[#70039] Accessing Ruby class from C extention — ptkwt@...1.aracnet.com (Phil Tomson)

I know it's possible to write Ruby in C but is it possible to instantiate

16 messages 2003/04/24

[#70064] Hashes and Enumerable#each_with_index — Ryan Pavlik <rpav@...>

OK, looking at the archives I know this was discussed a few years ago,

16 messages 2003/04/24

[#70265] Generating a DLL file? — "Rich" <rich@...>

Let's start with:

22 messages 2003/04/27
[#70277] Re: Generating a DLL file? — "Rich" <rich@...> 2003/04/28

I don't know C - or C++... and I'd rather not learn.

[#70280] Re: Generating a DLL file? — Michael Campbell <michael_s_campbell@...> 2003/04/28

--- Rich <rich@lithinos.com> wrote:

[#70268] c++/ruby debugging advices — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

Im embedding ruby into c++ and im having a segfault problem which

11 messages 2003/04/27

[#70422] Pass-by reference VS encapsulation ? — Simon Vandemoortele <deliriousREMOVEUPPERCASETEXTTOREPLY@...>

34 messages 2003/04/30

Re: Need IO Optimization help

From: David King Landrith <dave@...>
Date: 2003-04-17 16:11:07 UTC
List: ruby-talk #69614
On Thursday, April 17, 2003, at 09:46 AM, Brian Candler wrote:

> On Thu, Apr 17, 2003 at 10:34:53PM +0900, ts wrote:
>> Jim Freeze <jim@freeze.org> writes:
>>>
>>> Good luck, for this
>>>
>>> MMapIO.new(readfile).each { |line
>>>     # start rockin'
>>>     f << line
>>> }
>>>
>>>
>>> Has this been written before I start to do it myself?
>>
>>  yes, but don't expect that it will be faster ...
>
> I agree completely, because I've run essentially that program (but 
> using
> Ruby's standard I/O library) through a profiler. It spends around 5% 
> of its
> time doing I/O.
>
> The problems are:
> - you want to process the file a *line* at a time
> - you are allocating a new String object for each line
> - you are calling 'yield' on a Ruby block for each line

I've found that in the real world the speed advantages of using mmap 
are more significant than you seem to allow for. (see "My experiments" 
below.)

> As a guess, I'd say your savings would be minimal. Should you decide 
> to mmap
> the whole 260MB into your address space at once you may actually get 
> worse
> overall performance, unless you have that amount of free RAM available.

This is a real possibility, if not probability.  I believe in my 
original email, I alluded to this as a disadvantage of using mmap.  
However, it will depend very heavily upon how your system implements 
mmap, and there are wildly varying implementations.  For example, Linux 
Kernels before 2.4.20 probably don't handle files that size well.  Mmap 
can also have widely varying results on different systems, and with 
different functions on the same system.  I'm using MacOS X, for 
example, and its mmap is happy with strncpy(x, mmapVar, n) but takes a 
significant performance hit when you do strncat(x, mmapVar, n); I have 
no idea why.  I do not remember such a problem with Linux.

My experiments
--------------
I wrote a function as a Ruby extension (Text_CSV.decode) that uses a 
state machine to parse files one byte at a time and convert each line 
from a character delimited record to an array.  At a high level, it 
looks like this: Accumulate bytes for a given field into a c string. 
When field terminator is found, convert c string to ruby string and 
push it onto a ruby array.  When record terminator is found, either 
yield the array or push a clone of it onto another array.  (There are 
also other issues like quotes, escaped characters, and the ability to 
filter non-visible ASCII character).  Thus, there are a lot of 
comparisons, conditionals, and strncpy calls from mmap to the field 
accumulator.  Moreover, it tests for whether a block is given and 
yields a completed row as a ruby array if so.

My early versions buffered IO from the the low level open and close 
commands in unistd.h.  I switched to mmap because I was frustrated with 
the performance.  Unfortunately, I don't have the early code that used 
buffered IO because I never bothered to check them in (shame on me).  I 
ran into the following problems with buffered IO that mmap immediately 
solved:

1. The code that handles the mmap in C is much simpler, because it 
eliminates the need for a buffer.  As far as the rest of my code is 
concerned, I'm simply iterating through a char*
2. I was unable to utilize a buffer of more than 10k (10,240 chars) to 
read data without getting a segfault.  (I was allocating it using an 
instant array allocation at the top of my function -- I did not try 
using malloc.)
3. Since I'm analyzing the file byte by byte, using a buffer requires 
testing for the end of the buffer/end of the file after each byte, 
which is more slightly more complicated than simply comparing an 
incremented value to a limit.

While I was developing, I gauged performance using a file with 20,000 
lines, 424,250 words, and 3,443,296 characters (which I do still have). 
  I did both a bulk read (returning an array of arrays of the entire 
file) and an iterative read (where each row is yielded to a block as an 
array; the block in this case joins the row with a tab.)  For a file 
this size, there was no noticeable difference.  Performance parsing 
this file using either read method (iterative or bulk) was as follows:

4. Using C open/close with file descriptor: approx. 10 seconds to 
process
5. Using mmap: just over 2 seconds to process

Simple profiling gave the following results:

6. running ruby -rprofile for bulk read gave me the following:
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  71.59     3.78      3.78        1  3780.00  5200.00  Text_CSV.decode
  26.89     5.20      1.42    20000     0.07     0.07  Array#clone
   0.95     5.25      0.05        5    10.00    16.00  Kernel.require
...
(Keep in mind that the program runs significantly slower using 
-rprofile, so that the time indicated above do not reflect the real, 
non-profiled execution time)

7. running ruby -rprofile for iterative read give me the following:
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  75.94     4.64      4.64        1  4640.00  6050.00  Text_CSV.decode
  23.08     6.05      1.41    20000     0.07     0.07  Array#join
   0.49     6.08      0.03        5     6.00    12.00  Kernel.require
....
(Keep in mind that the program runs significantly slower using 
-rprofile, so that the time indicated above do not reflect the real, 
non-profiled execution time)

Lastly, if my memory serves me correctly, then the Perl Text_CSV module 
(which uses stdio.h) is about the same speed as my ruby that used the 
unistd.h functions, and so it is much slower than my ruby routine that 
uses mmap.

Best,

Dave

-------------------------------------------------------
David King Landrith
   (w) 617.227.4469x213
   (h) 617.696.7133

One useless man is a disgrace, two
are called a law firm, and three or more
become a congress   -- John Adams
-------------------------------------------------------
public key available upon request


In This Thread