[#72642] Advantages of Symbols over constants — Marek Janukowicz <childNOSPAM@...17.ds.pwr.wroc.pl>

11 messages 2003/06/01

[#72732] case of sub! not working — Ian Macdonald <ian@...>

Hi,

27 messages 2003/06/03
[#72734] Re: case of sub! not working — Joel VanderWerf <vjoel@...> 2003/06/03

Ian Macdonald wrote:

[#72744] Re: case of sub! not working — Ian Macdonald <ian@...> 2003/06/03

On Tue 03 Jun 2003 at 10:21:43 +0900, Joel VanderWerf wrote:

[#72769] Re: case of sub! not working — Michael Campbell <michael_s_campbell@...> 2003/06/03

[#72907] Syck 0.35 + YAML.rb 0.60 -- the 1st stable release — why the lucky stiff <ruby-talk@...>

Pleased to announce:

18 messages 2003/06/05
[#75182] Re: Syck 0.35 + YAML.rb 0.60 -- the 1st stable release — Richard Zidlicky <rz@...68k.org> 2003/07/04

On Fri, Jun 06, 2003 at 06:15:58AM +0900, why the lucky stiff wrote:

[#72908] Problem with "require" stmt in "test-first " tutorial — RLMuller@... (Richard)

Hi All,

27 messages 2003/06/05

[#72940] VAPOR 0.06, Transparent Persistence to PostgreSQL — "Oliver M. Bolzer" <oliver@...>

Hi!

22 messages 2003/06/06

[#72975] join block — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

29 messages 2003/06/06

[#72986] multiple blocks or proc arguments to method — itsme213@... (you CAN teach an old dog ...)

I was trying to write a collect_if method:

11 messages 2003/06/07

[#73081] requiring standard libs with save level 1 — Eugene Scripnik <Eugene.Scripnik@...>

I've set up new version of Ruby from CVS and my programs failed to work.

13 messages 2003/06/09
[#73114] Re: requiring standard libs with save level 1 — matz@... (Yukihiro Matsumoto) 2003/06/09

Hi,

[#73134] tcltklib does not get compiled. — John Fletcher <J.P.Fletcher@...>

I have installed ruby 1.6.7 on two computers using Red Hat 8.0 Linux.

14 messages 2003/06/10

[#73148] OT: Regexp question — Dominik Werder <dwerder@...>

Hi all,

25 messages 2003/06/10

[#73215] Rubyx (provisionally named) linux distro. Made by and run by Ruby — Andrew Walrond <andrew@...>

I have developed a little script which creates a simple linux distro

38 messages 2003/06/11

[#73260] Multiple Initialize methods? — "Nick" <nick.robinson@...>

Hi,

21 messages 2003/06/11

[#73283] Ruby advantages over Perl — Marek Janukowicz <childNOSPAM@...17.ds.pwr.wroc.pl>

68 messages 2003/06/11
[#73374] Re: Ruby advantages over Perl — Jason Creighton <androflux@...> 2003/06/12

On Thu, 12 Jun 2003 17:56:02 +0900

[#73356] does each work on a copy? — Rasputin <rasputin@...>

17 messages 2003/06/12

[#73372] Reason for implicit block syntax ? — itsme213@... (you CAN teach an old dog ...)

What is the reason for the implicit block in Ruby invocations?

13 messages 2003/06/12

[#73463] Hispeed String concat — Dominik Werder <dwerder@...>

What is the fastest way to add many small Strings to a big buffer?

17 messages 2003/06/13

[#73503] RaaInstallInRuby petition — ptkwt@...1.aracnet.com (Phil Tomson)

18 messages 2003/06/13

[#73555] I need a code beautifier or formatter — joaopedrosa@... (Joao Pedrosa)

Hello,

13 messages 2003/06/14

[#73600] Get songtitle from Winamp — calvin8@... (Andi Scharfstein)

Hi,

26 messages 2003/06/15
[#73601] Re: Get songtitle from Winamp — Daniel Carrera <dcarrera@...> 2003/06/15

-----BEGIN PGP SIGNED MESSAGE-----

[#73602] Re: Get songtitle from Winamp — Chad Fowler <chadfowler@...> 2003/06/15

It's a Win32API convention meaning "Window Handle".

[#73603] Re: Get songtitle from Winamp — Daniel Carrera <dcarrera@...> 2003/06/15

-----BEGIN PGP SIGNED MESSAGE-----

[#73605] Re: Get songtitle from Winamp — Wesley J Landaker <wjl@...> 2003/06/15

On Sunday 15 June 2003 9:34 am, Daniel Carrera wrote:

[#73609] Re: Get songtitle from Winamp — Daniel Carrera <dcarrera@...> 2003/06/15

-----BEGIN PGP SIGNED MESSAGE-----

[#73640] Standardizing Installers — Tom Clarke <tom@...2i.com>

I was thinking about some of the issues raised involving ruby libraries

16 messages 2003/06/16

[#73663] /BEGIN/ .. /END/ file reading — Wild Karl-Heinz <kh.wild@...>

hello

15 messages 2003/06/16
[#73674] Re: /BEGIN/ .. /END/ file reading — "Robert Klemme" <bob.news@...> 2003/06/16

[#73677] Re: /BEGIN/ .. /END/ file reading — Michael Campbell <michael_s_campbell@...> 2003/06/16

> A range operator with a regexp works like a flip flop (bistable

[#73680] Multiline comments? — "Christoph Tapler" <christoph.tapler@...>

I'm new to Ruby and I'm wondering that there is no possibility to write

38 messages 2003/06/16

[#73781] editor / ide recommentation on Windows — itsme213@... (you CAN teach an old dog ...)

What editor / ide would you recommend for serious Ruby work on

20 messages 2003/06/17

[#73787] Array#push(empty array expanded) => no exception — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

This strange behavier really surprised me..

13 messages 2003/06/17

[#73821] European Ruby Conference — "Hal E. Fulton" <hal9000@...>

I don't think I've mentioned this before, but I

15 messages 2003/06/17

[#73924] Re: TCP/IP protocol and Net::HTTP — "J.Hawkesworth" <J.Hawkesworth@...>

Works for me too.

13 messages 2003/06/19
[#73931] Re: TCP/IP protocol and Net::HTTP — Nigel Gilbert <n.gilbert@...> 2003/06/19

I am beginning to wonder if this problem arises from the MacOS X

[#73943] collect info about ruby-api — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I have long been longing for a good description of ruby C api.

35 messages 2003/06/19

[#74039] WxRuby status? — ptkwt@...1.aracnet.com (Phil Tomson)

14 messages 2003/06/20
[#74507] Re: WxRuby status? — Richard Kilmer <rich@...> 2003/06/26

Things are progressing great. Kevin Smith has taken the development

[#74070] How to test if a file exists? — Daniel Carrera <dcarrera@...>

-----BEGIN PGP SIGNED MESSAGE-----

12 messages 2003/06/21

[#74096] Exasperated with ruby/tk - anybody successfully using it? — "Richard Browne" <richb@...>

General question: Is ruby/tk still being maintained in 1.7/1.8 or is it

10 messages 2003/06/22

[#74104] String#decorate — martindemello@... (Martin DeMello)

When chaining methods, it'd be neat to have something that was passed

17 messages 2003/06/22

[#74156] Marshal bug? — Anders Borch <spam@...>

Hi!

15 messages 2003/06/23
[#74161] Re: Marshal bug? — Dave Thomas <dave@...> 2003/06/23

Anders Borch wrote:

[#74205] can't find appropriate regexp — "Patrick Zesar" <jonnypichler@...>

spamassassin blocked my previous post :-((((

17 messages 2003/06/23

[#74279] Ruby Developer's Guide - hurt book sale — dennis@... (Dennis Sutch)

Syngress Publishing is having a hurt book sale. Per Syngress

11 messages 2003/06/24

[#74379] protect parents from children — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I fell into these pitfalls yesterday.. that a child was modifying a parent!

27 messages 2003/06/25

[#74413] Ruby/Java integration through JNI: working implementation — Mauricio Fern疣dez <batsman.geo@...>

14 messages 2003/06/25
[#74436] Re: Ruby/Java integration through JNI: working implementation — D T <tran55555@...> 2003/06/25

Yet An other JRuby ?? :-)

[#74465] DBD for Oracle9i — Jim Cain <list@...>

Hi all. I was looking for a Ruby interface to 9i that would handle all

25 messages 2003/06/25

[#74478] RPM for 1.8.0 — John Carter <john.carter@...>

I would like to get / build a Mandrake 9.1 RPM for Ruby-1.8.0 Preview 3

17 messages 2003/06/26

[#74506] String#split(' ') and whitespace (perl user's surprise) — mike@... (Mike Stok)

I have to confess that I use a lot of Perl, and some of its idioms are

15 messages 2003/06/26

[#74573] Using & for arrays of objects — "Krishna Dole" <kpdole@...>

Hi,

39 messages 2003/06/27

[#74579] why can't I use $3somevar for global variable in ruby 1.8.0? — Donglai Gong <donglai@...>

Hi, I'm new to Ruby programming and I just upgraded from 1.6.8 to 1.8.0

10 messages 2003/06/27

[#74702] Slides from my talk are up on rubyhacker.com — "Hal E. Fulton" <hal9000@...>

I was pleased to attend the European Ruby Conference

25 messages 2003/06/29

[#74706] Help with UnboundMethod#bind error — gabriele renzi <surrender_it@...1.vip.lng.yahoo.com>

Hi gurus and nubys,

16 messages 2003/06/29
[#74708] Re: Help with UnboundMethod#bind error — nobu.nokada@... 2003/06/29

Hi,

[#74732] Re: Help with UnboundMethod#bind error — matz@... (Yukihiro Matsumoto) 2003/06/30

Hi,

[#74919] Re: Help with UnboundMethod#bind error — "Pit Capitain" <pit@...> 2003/07/02

On 30 Jun 2003 at 17:18, Yukihiro Matsumoto wrote:

[#74717] Re: Message catalogs (I18N) overnight hack... — "Hal E. Fulton" <hal9000@...>

----- Original Message -----

17 messages 2003/06/29

[#74747] Editor like Textpad on Linux? — Dominik Werder <dwerder@...>

Hello,

13 messages 2003/06/30

[#74768] dynamic object creation — Aryeh Friedman <aryeh@...>

If I have something like this:

15 messages 2003/06/30

Re: HTML -> list of sentences? (semi-impossible task)

From: "Hal E. Fulton" <hal9000@...>
Date: 2003-06-12 23:16:07 UTC
List: ruby-talk #73401
----- Original Message -----
From: "Dave Oshel" <dcoshel@vcmails.com>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, June 12, 2003 9:29 AM
Subject: Re: HTML -> list of sentences? (semi-impossible task)


> It depends on what you mean by "sentence", 'ey?  Do you mean natural
> language (English? Rumanian? Urdu? Hakka? Thai? Japanese?), or
> artificial formalisms like programming languages (Perl, Ruby, FORTH)?

In this case, English sentences. Not as in formal grammars, or as
in prison sentences. Not that those two are so different.

> But someone went to a lot of trouble to carve up their perceptions of
> reality (heh) into procrustean HTML, so you may as well begin there.
> Determine the major syntactical units  (TABLE, DIV, P, HR, PRE, TT, H1,
> etc.).  Recursing, determine what is a "sentence" on semantic,
> idiomatic (BR, B, U), or at least grammatical  (カ、ネー、ニ、
ヘ、。。。), grounds.
>   Collect these purely formal "sentences" and send the list to
> post-processing (possibly human inspection) to be vetted and refined
> (e.g., does your system account for utterances which are meaningful but
> grammatically abbreviated, like "What up?" (MTV argot used by
> advertisers to slide nickels out of pockets) or "Annta desu" (kids
> choosing sides for oni in Osaka). )

I think even that is perhaps too much intelligence.I don't want to
build in knowledge about nouns and verbs.

> If you have access to a page's CSS, your hints about what the author(s)
> intended are much expanded.  Maybe not so impossible after all?  This
> does not seem like a difficult task to me, but maybe I haven't
> appreciated the context from which the question is posed?

My parents sometims quote a comedian from before I was born: "Easy for
you, difficult for me."

>  Does the
> solution have to be extremely general, or is it a one-shot?

Ehh, somewhat general in the sense of several chapters. But very
one-shot in that I'm looking at one particular document, and it's
about Ruby. ;)

I think the replies I've got are fairly promising along with my
own dirty hack from last night.

Cheers,
Hal


> David
>
>
> On Wednesday, June 11, 2003, at 09:38  PM, Hal E. Fulton wrote:
>
> > Hello, all.
> >
> > Here's an idea I'm toying with. Suggestions
> > are welcome.
> >
> > I want to take an HTML document (reasonably
> > well-formed, but not guaranteed) and remove
> > all the tags from it...
> >
> > ...and get a list of the *sentences* in the
> > document.
> >
> > There are, of course, several things that make
> > this difficult:
> >   - need to distinguish between end-of-sentence
> >     and embedded punctuation, including both
> >     abbreviations and textual references to
> >     Ruby methods such as eof? and split!
> >   - need to treat sentence fragments as sentences
> >   - need to ignore blocks of code
> >   - etc.
> >
> > My current approach is to start with htmlsplit
> > from the RAA. This is fairly simplistic, but
> > at least it doesn't have any dependencies.
> >
> > Not sure whether to do it in two steps or not:
> > 1. Convert to text
> > 2. Process
> >
> > Might be just as easy to do it in one step if
> > I knew what I was doing.
> >
> > Also not sure what is the best tool/library for
> > this job.
> >
> > Comments welcome.
> >
> > Hal
> >
> > --
> > Hal Fulton
> > hal9000@hypermetrics.com
> >
> >
> >
> >
> --
> David C. Oshel               mailto:dcoshel@mac.com
> Cedar Rapids, Iowa       http://homepage.mac.com/dcoshel
> ``I think most pleasantly in metaphors, and smoking brings metaphors to
> mind." - Augustus Srb, in Alexei Panshin's  _Star Well_
>
>


In This Thread