[#343034] RUBY vs COMMON LISP — fft1976 <fft1976@...>

On Aug 1, 8:55m, p...@informatimago.com (Pascal J. Bourguignon)

16 messages 2009/08/02

[#343054] Inconsistency in Array#[] — Wojciech Piekutowski <w.piekutowski@...>

Disclaimer: I know what docs say, but I'd prefer a different

14 messages 2009/08/02

[#343135] Kind of ParsTree for 1.9.1 — Macario Ortega <macarui@...>

22 messages 2009/08/04

[#343186] Bizarre Range behavior — Scott Briggs <scott.br@...>

Can someone please explain this behavior in ruby (1.8.6p111):

42 messages 2009/08/04
[#343187] Re: Bizarre Range behavior — Yossef Mendelssohn <ymendel@...> 2009/08/04

On Aug 4, 1:47m, Scott Briggs <scott...@gmail.com> wrote:

[#343193] Re: Bizarre Range behavior — Rob Biedenharn <Rob@...> 2009/08/04

On Aug 4, 2009, at 3:04 PM, Yossef Mendelssohn wrote:

[#343196] Re: Bizarre Range behavior — "Matthew K. Williams" <matt@...> 2009/08/04

On Wed, 5 Aug 2009, Rob Biedenharn wrote:

[#343199] Re: Bizarre Range behavior — Rob Biedenharn <Rob@...> 2009/08/04

On Aug 4, 2009, at 3:45 PM, Matthew K. Williams wrote:

[#343234] Re: Bizarre Range behavior — Yukihiro Matsumoto <matz@...> 2009/08/05

Hi,

[#343251] Re: Bizarre Range behavior — Brian Candler <b.candler@...> 2009/08/05

Yukihiro Matsumoto wrote:

[#343261] Re: Bizarre Range behavior — Yukihiro Matsumoto <matz@...> 2009/08/05

Hi,

[#343266] Re: Bizarre Range behavior — "David A. Black" <dblack@...> 2009/08/05

Hi --

[#343272] Re: Bizarre Range behavior — Yukihiro Matsumoto <matz@...> 2009/08/05

Hi,

[#343273] Re: Bizarre Range behavior — Rick DeNatale <rick.denatale@...> 2009/08/05

On Wed, Aug 5, 2009 at 12:21 PM, Yukihiro Matsumoto<matz@ruby-lang.org> wrote:

[#343235] remove commas from string — Jason Lillywhite <jason.lillywhite@...>

I have following string:

14 messages 2009/08/05

[#343288] including gems with standalone app — Eric Peterson <ericdp@...>

Morning,

10 messages 2009/08/05

[#343320] 1.9 String and M17N documentation — Brian Candler <b.candler@...>

I have put together a document which tries to outline the M17N

20 messages 2009/08/06
[#343351] Re: [ANN] 1.9 String and M17N documentation — James Gray <james@...> 2009/08/06

On Aug 6, 2009, at 6:47 AM, Brian Candler wrote:

[#343378] Re: [ANN] 1.9 String and M17N documentation — Eric Hodel <drbrain@...7.net> 2009/08/07

On Aug 6, 2009, at 08:57, James Gray wrote:

[#343423] How do I add ? — chutsu <chutsu@...>

I've got a file that is in two columns, how do I add the second column

14 messages 2009/08/07

[#343566] Reading contents of a file and storing — Shekar Ls <idealone5@...>

Guys,

14 messages 2009/08/10

[#343592] Destroying related objects doubt ... basic oop question — Soh Dubom <sohdubom@...>

::Destroying related objects doubt

16 messages 2009/08/10
[#343726] Re: Destroying related objects doubt ... basic oop question — Mike Stephens <rubfor@...> 2009/08/12

Yet another reason for steering well clear of object-relational mappers.

[#343751] Re: Destroying related objects doubt ... basic oop question — Fabian Streitel <karottenreibe@...> 2009/08/12

> Yet another reason for steering well clear of object-relational mappers.

[#343649] Good editor for Windows Ruby — Peter Bailey <pbailey@...>

Hello,

21 messages 2009/08/11

[#343658] Readline not working with Ruby — Stewart <stewart.matheson@...>

24 messages 2009/08/11

[#343756] Class#descendants? — Jason Lillywhite <jason.lillywhite@...>

Ruby can do Class#ancestors but not Class#descendants.

18 messages 2009/08/12
[#343757] Re: Class#descendants? — Joel VanderWerf <vjoel@...> 2009/08/12

Jason Lillywhite wrote:

[#343771] skip_before_filter (Do I need a lesson in modules/mixins?) — Cris Shupp <cshupp1@...>

Gurus,

13 messages 2009/08/12

[#343831] newbie question making a folder with ruby — Simon Staton <simon@...>

ok well the program I am in the middle of programming I need it to

20 messages 2009/08/13
[#343851] Re: newbie question making a folder with ruby — Lui Core <usurffx@...> 2009/08/13

Simon Staton wrote:

[#343898] Re: newbie question making a folder with ruby — Simon Staton <simon@...> 2009/08/14

Lui Core wrote:

[#343902] Re: newbie question making a folder with ruby — Simon Staton <simon@...> 2009/08/14

To give more of an idea this is the code that I have on the .rb file. it

[#343920] Class method aliased in superclass bypasses subclass overrides — Marcos <markjreed@...>

This seems like it should work:

11 messages 2009/08/14

[#344009] start_with? Does someone need a grammar lesson? — 7stud -- <bbxx789_05ss@...>

String#start_with?-------------------------------

15 messages 2009/08/17

[#344088] fromdos dos2unix in ruby — Krzysztof Cierpisz <ciapecki@...>

how can I achieve in ruby the result of running:

18 messages 2009/08/18
[#344103] Re: fromdos dos2unix in ruby — krzysztof cierpisz <ciapecki@...> 2009/08/18

>

[#344109] Re: fromdos dos2unix in ruby — Robert Klemme <shortcutter@...> 2009/08/18

2009/8/18 krzysztof cierpisz <ciapecki@gmail.com>:

[#344125] exercise in DRY — Peter Ehrlich <crazedcougar@...>

I have some simple code for a thumbs up/thumbs down functionality.

13 messages 2009/08/18

[#344180] #has_arguments? — Intransition <transfire@...>

Messing with optional argument check for the umpteenth time, eg.

23 messages 2009/08/19

[#344218] Confirm my Performance Test Against Java? — Ben Christensen <benjchristensen@...>

I'm evaluating Ruby for use in a variety of systems that are planned by

40 messages 2009/08/19
[#344222] Re: Confirm my Performance Test Against Java? — brabuhr@... 2009/08/19

On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen

[#344342] extending ruby - handling errors — Jason Lillywhite <jason.lillywhite@...>

I want to raise an ArgumentError, "Function only takes numeric objects."

11 messages 2009/08/20

[#344345] How do I estimate how long it will take a calculation to complete? — Paul <tester.paul@...>

Hi there, I wrote a short ruby script to calculate the prime factors

10 messages 2009/08/20

[#344366] Why, oh why, _why? — Karl von Laudermann <doodpants@...>

_why the lucky stiff appears to have disappeared from the internets!

79 messages 2009/08/20
[#344400] Re: Why, oh why, _why? — Ralf Mueller <ralf.mueller@...> 2009/08/21

Karl von Laudermann wrote:

[#344421] Re: Why, oh why, _why? — Gregory Brown <gregory.t.brown@...> 2009/08/21

On Fri, Aug 21, 2009 at 4:24 AM, Ralf Mueller<ralf.mueller@zmaw.de> wrote:

[#344432] Re: Why, oh why, _why? — Rick DeNatale <rick.denatale@...> 2009/08/21

On Fri, Aug 21, 2009 at 7:17 AM, Gregory Brown<gregory.t.brown@gmail.com> wrote:

[#344434] Re: Why, oh why, _why? — Gregory Brown <gregory.t.brown@...> 2009/08/21

On Fri, Aug 21, 2009 at 9:49 AM, Rick DeNatale<rick.denatale@gmail.com> wrote:

[#344441] Re: Why, oh why, _why? — Jason Roelofs <jameskilton@...> 2009/08/21

On Fri, Aug 21, 2009 at 9:54 AM, Gregory Brown <gregory.t.brown@gmail.com>wrote:

[#344448] Re: Why, oh why, _why? — Xavier Noria <fxn@...> 2009/08/21

On Fri, Aug 21, 2009 at 4:07 PM, Jason Roelofs<jameskilton@gmail.com> wrote:

[#344453] Re: Why, oh why, _why? — Jason Roelofs <jameskilton@...> 2009/08/21

On Fri, Aug 21, 2009 at 10:23 AM, Xavier Noria <fxn@hashref.com> wrote:

[#344525] Re: Why, oh why, _why? — Todd Benson <caduceass@...> 2009/08/22

On Thu, Aug 20, 2009 at 3:10 PM, Karl von

[#344526] Re: Why, oh why, _why? — Todd Benson <caduceass@...> 2009/08/22

On Sat, Aug 22, 2009 at 7:43 AM, Todd Benson<caduceass@gmail.com> wrote:

[#344404] How to convert string "/regexp/i" to /regexp/i - ? — Joao Silva <rubyforum@...>

When i try to use:

20 messages 2009/08/21

[#344462] Github and _why — John W Higgins <wishdev@...>

I'm about to get very nasty responses but this absolutely is a very bad

19 messages 2009/08/21
[#344467] Re: Github and _why — Tony Arcieri <tony@...> 2009/08/21

On Fri, Aug 21, 2009 at 10:38 AM, John W Higgins <wishdev@gmail.com> wrote:

[#344545] 1.8.7 String#lines keeps new-line chars (say it ain't so in 1.9) — Intransition <transfire@...>

Ruby 1.8.7 p72

19 messages 2009/08/22

[#344554] Ruby Editor — sasan <sasan.bahrieh@...>

I need a good software for ruby programming. please post message for

21 messages 2009/08/22

[#344573] ruby-debug does not hit breakpoints at class-methods — Sys Ppp <systemppp@...>

ruby-1.8.6-p369

19 messages 2009/08/22
[#344593] Re: ruby-debug does not hit breakpoints at class-methods — 7stud -- <bbxx789_05ss@...> 2009/08/23

...and more

[#344597] Re: ruby-debug does not hit breakpoints at class-methods — Sys Ppp <systemppp@...> 2009/08/23

> $ rdebug r2test.rb

[#344652] Re: ruby-debug does not hit breakpoints at class-methods — 7stud -- <bbxx789_05ss@...> 2009/08/24

Sys Ppp wrote:

[#344653] Re: ruby-debug does not hit breakpoints at class-methods — 7stud -- <bbxx789_05ss@...> 2009/08/24

...or I guess this would be a better example:

[#344656] Re: ruby-debug does not hit breakpoints at class-methods — Sys Ppp <systemppp@...> 2009/08/24

To 7stud. These are all normal, except class-method, as i wrote.

[#344691] Re: ruby-debug does not hit breakpoints at class-methods — Rick DeNatale <rick.denatale@...> 2009/08/24

On Sun, Aug 23, 2009 at 11:50 PM, Sys Ppp<systemppp@gmail.com> wrote:

[#344705] Re: ruby-debug does not hit breakpoints at class-methods — Sys Ppp <systemppp@...> 2009/08/24

Rick Denatale wrote:

[#344580] Development - works, production not - why? — Joao Silva <rubyforum@...>

My development envrioment:

11 messages 2009/08/23

[#344680] Comparison Ruby, Python, Php, Groovy ecc. — Marco Mastrodonato <m.mastrodonato@...>

Comparison script languages for the fractal geometry, these are the

25 messages 2009/08/24
[#344684] Re: Comparison Ruby, Python, Php, Groovy ecc. — Urabe Shyouhei <shyouhei@...> 2009/08/24

Are those executables compiled with identical compilers + compile flags?

[#344717] _why's "suicide" note? — Graham Agnew <graham.agnew@...>

Just found this as _why's last tweet on the Google cached copy of _why's

12 messages 2009/08/24

[#344762] Calling method from another class — Kostas Lps <louposk@...>

Hi guys,

15 messages 2009/08/24

[#344872] ||= with 1.8 and 1.9 ? — Aldric Giacomoni <aldric@...>

A friend of mine on Twitter recently posted this tidbit of code:

30 messages 2009/08/26
[#344879] Re: ||= with 1.8 and 1.9 ? — Brian Candler <b.candler@...> 2009/08/26

Aldric Giacomoni wrote:

[#344899] Re: ||= with 1.8 and 1.9 ? — Rick DeNatale <rick.denatale@...> 2009/08/26

On Wed, Aug 26, 2009 at 11:08 AM, Brian Candler<b.candler@pobox.com> wrote:

[#344921] Re: ||= with 1.8 and 1.9 ? — Brian Candler <b.candler@...> 2009/08/26

Rick Denatale wrote:

[#344923] Re: ||= with 1.8 and 1.9 ? — Joel VanderWerf <vjoel@...> 2009/08/26

Brian Candler wrote:

[#344881] # sign does not work as expected form irb — Salvador Sanjuan <salvador.sanjuan@...>

I have just started Ruby. I have tried to do some exercises usin irb but

15 messages 2009/08/26
[#344887] Re: # sign does not work as expected form irb — Aldric Giacomoni <aldric@...> 2009/08/26

Salvador Sanjuan wrote:

[#344988] Re: # sign does not work as expected form irb — Salvador Sanjuan <salvador.sanjuan@...> 2009/08/27

Aldric Giacomoni wrote:

[#344938] Rack must not dictate how to create a middleware — Sys Ppp <systemppp@...>

In current realization of Rack::Builder the method :use dictates that

15 messages 2009/08/26
[#344940] Re: Rack must not dictate how to create a middleware — Brian Candler <b.candler@...> 2009/08/26

Sys Ppp wrote:

[#345037] Possible to use Ruby for Dynamic HTML sites without Rails? — Frank Peterson <fictionalperson@...>

How easy is this to do? I've been going though a Ruby book again (was

13 messages 2009/08/27

[#345070] I need a string#all_indices method--is there such a thing? — timr <timrandg@...>

In ruby you can use string#index as follows:

22 messages 2009/08/28

[#345079] #map, #select semantics — James Coglan <jcoglan@...>

I imagine this has come up before, though I can't find anything about it. I

16 messages 2009/08/28

[#345097] How to call this method — Robert Dober <robert.dober@...>

Hi list

15 messages 2009/08/28
[#345100] Re: How to call this method — Robert Klemme <shortcutter@...> 2009/08/28

2009/8/28 Robert Dober <robert.dober@gmail.com>:

[#345222] Sorting an array by multiple elements? — Paul <tester.paul@...>

Hi there, I have an array of arrays that I want to sort by multiple

14 messages 2009/08/30

[#345267] What is the ruby conventions to name private method? — pierr <pierr.chen@...>

16 messages 2009/08/31
[#345274] Re: What is the ruby conventions to name private method? — Ryan Davis <ryand-ruby@...> 2009/08/31

[#345322] help on phps $$ equivalent in ruby — Arthur Rats <simon.jacobs.ams@...>

i just cant find this anywhere, googling for a while and read up tons on

11 messages 2009/08/31

Re: Parsing pdf files

From: Arun Kumar <arun.einstein@...>
Date: 2009-08-23 13:05:23 UTC
List: ruby-talk #344617
Helo Alex,

Suppose the data in the pdf is two-columned ( as is the case in research
papers) or has some tables . The copied version should have the same amount
of spaces between words and columns. I'll attach an example two columned
text in here for your reference. For the program I'm writing, the layout is
most essential.

If you are not able to see a two column output in your text editor (since
its probably more than 80 characters per line) please reduce the font size
of your text editor (or use a large monitor ;) ).

Observe how theres space between the two columned output. This was done by
copying from evince to gedit or emacs




On Sun, Aug 23, 2009 at 4:50 PM, Axel Etzold <AEtzold@gmx.de> wrote:
>
> -------- Original-Nachricht --------
> > Datum: Sun, 23 Aug 2009 19:46:23 +0900
> > Von: Arun Kumar <arun.einstein@gmail.com>
> > An: ruby-talk@ruby-lang.org
> > Betreff: Re: Parsing pdf files
>
> > hello Alex,
> > Thank you. But I would like to point out that its not very accurate in
> > maintaining the layout. I already tried it out. you can copy a pdf file
> > from
> > evince to gedit, you will get a better accuracy of layout. What escapes
me
> > is how to do it programatically :)
> >
> > cheers & regards,
> > Arun
>
> Dear Arun,
>
> could you say something more about what layout features you need ?
>
> Best regards,
>
> Axel
>
> --
> Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
> für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
>



--
|| श्री जानकीरघुनाथो विजयते ||

Attachments (1)

ie.txt (8.83 KB, text/plain)
Sample Job Posting:                                              number, publisher, and price of book from an Amazon web
Job Title: Senior DBMS Consultant                                page.
Location: Dallas,TX                                              IE systems can also be used to extract data or knowledge
Responsibilities:                                                from less-structured web sites by using both the HTML text
DBMS Applications consultant works with project teams            in their pages as well as the structure of the hyperlinks be-
to define DBMS based solutions that support the enterprise        tween their pages. For example, the WebKB project at
deployment of Electronic Commerce, Sales Force Automa-           Carnegie Mellon University has explored extracting struc-
tion, and Customer Service applications.                         tured information from university computer-science depart-
Desired Requirements:                                            ments [22]. The overall WebKB system attempted to iden-
3-5 years exp. developing Oracle or SQL Server apps using        tify all faculty, students, courses, and research projects in
Visual Basic, C/C++, Powerbuilder, Progress, or similar.         a department as well as relations between these entities
Recent experience related to installing and configuring           such as: instructor(prof, course), advisor(student, prof),
Oracle or SQL Server in both dev.             and deployment     and member(person, project).
environments.
                                                                 2.2     IE Methods
Desired Skills:
Understanding of UNIX or NT, scripting language. Know            There are a variety of approaches to constructing IE sys-
principles of structured software engineering and project        tems. One approach is to manually develop information-
management                                                       extraction rules by encoding patterns (e.g. regular expres-
                                                                 sions) that reliably identify the desired entities or relations.
Filled Job Template:                                             For example, the Suiseki system [8] extracts information on
                                                                 interacting proteins from biomedical text using manually de-
title: Senior DBMS Consultant
                                                                 veloped patterns.
state: TX
city: Dallas                                                     However, due to the variety of forms and contexts in which
country: US                                                      the desired information can appear, manually developing
language: Powerbuilder, Progress, C, C++, Visual Basic           patterns is very difficult and tedious and rarely results in
platform: UNIX, NT                                               robust systems. Consequently, supervised machine-learning
application: SQL Server, Oracle                                  methods trained on human annotated corpora has become
area: Electronic Commerce, Customer Service                      the most successful approach to developing robust IE sys-
required years of experience: 3                                  tems [14]. A variety of learning methods have been applied
desired years of experience: 5                                   to IE.
                                                                 One approach is to automatically learn pattern-based ex-
                                                                 traction rules for identifying each type of entity or relation.
     Figure 2: Sample Job Posting and Filled Template
                                                                 For example, our previously developed system, Rapier [12;
                                                                 13], learns extraction rules consisting of three parts: 1) a
                                                                 pre-filler pattern that matches the text immediately pre-
affiliated with a specific organization [73; 24]. In biomedical
                                                                 ceding the phrase to be extracted, 2) a filler pattern that
text, one can identify that a protein interacts with another
                                                                 matches the phrase to be extracted, and 3) a post-filler pat-
protein or that a protein is located in a particular part of the
                                                                 tern that matches the text immediately following the filler.
cell [10; 23]. For example, identifying protein interactions
                                                                 Patterns are expressed in an enhanced regular-expression
in the abstract excerpt in Figure 1 would require extracting
                                                                 language, similar to that used in Perl [72]; and a bottom-up
the relation: interacts(NOSIP, eNOS).
                                                                 relational rule learner is used to induce rules from a corpus
IE can also be used to extract fillers for a predetermined set
                                                                 of labeled training examples. In Wrapper Induction [37] and
of slots (roles) in a particular template (frame) relevant to
                                                                 Boosted Wrapper Induction (BWI) [30], regular-expression-
the domain. In this paper, we consider the task of extract-
                                                                 type patterns are learned for identifying the beginning and
ing a database from postings to the USENET newsgroup,
                                                                 ending of extracted phrases. Inductive Logic Programming
austin.jobs [12]. Figure 2 shows a sample message from
                                                                 (ILP) [45] has also been used to learn logical rules for iden-
the newsgroup and the filled computer-science job template
                                                                 tifying phrases to be extracted from a document [29].
where several slots may have multiple fillers. For exam-
                                                                 An alternative general approach to IE is to treat it as a se-
ple, slots such as languages, platforms, applications, and
                                                                 quence labeling task in which each word (token) in the docu-
areas usually have more than one filler, while slots related
                                                                 ment is assigned a label (tag) from a fixed set of alternatives.
to the job s title or location usually have only one filler.
                                                                 For example, for each slot, X, to be extracted, we include a
Similar applications include extracting relevant sets of pre-
                                                                 token label BeginX to mark the beginning of a filler for X
defined slots from university colloquium announcements [29]
                                                                 and InsideX to mark other tokens in a filler for X. Finally,
or apartment rental ads [67].
                                                                 we include the label Other for tokens that are not included
Another application of IE is extracting structured data from
                                                                 in the filler of any slot. Given a sequence labeled with these
unstructured or semi-structured web pages. When applied
                                                                 tags, it is easy to extract the desired fillers.
to semi-structured HTML, typically generated from an un-
                                                                 One approach to the resulting sequence labeling problem is
derlying database by a program on a web server, an IE sys-
                                                                 to use a statistical sequence model such as a Hidden Markov
tem is typically called a wrapper [37], and the process is
                                                                 Model (HMM) [57] or a Conditional Random Field (CFR)
sometimes referred to as screen scraping. A typical applica-
tion is extracting data on commercial items from web stores      [38]. Several earlier IE systems used generative HMM mod-
for a comparison shopping agent (shopbot) [27] such as MySi-     els [4; 31]; however, discriminately-trained CRF models have
mon (www.mysimon.com) or Froogle (froogle.google.com).           recently been shown to have an advantage over HMM s [54;
For example, a wrapper may extract the title, author, ISBN       65]. In both cases, the model parameters are learned from

In This Thread