[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [syndication] Re: Translate non-structured documents into Xml RSS format
We do use Webmethods, but we have customized it extensively and much of what
we use is bespoke. Webmethods is now more of a B2B tool than a scraping
tool.
Dave
. . . . . . . . . . . . . . . . .
David Galbraith - Chief Architect, founder
Moreover.com - the webfeed company
david@moreover.com
415-577-8828 (US)
0777-565-8880 (UK)
favorite webfeed:
http://www.moreover.com/xml
> -----Original Message-----
> From: ben@ubiquick.com [mailto:ben@ubiquick.com]
> Sent: Tuesday, September 26, 2000 4:08 AM
> To: syndication@egroups.com
> Subject: [syndication] Re: Translate non-structured documents into Xml
> RSS format
>
>
> Thanks a lot for your answer... Ian, you are talking about
> Webmethods, have you an idea of the product name that Moreover use to
> scrape headline ?
>
>
> --- In syndication@egroups.com, Ian Davis <ian@c...> wrote:
> >
> > On Monday, September 25, 2000, 10:45:19 PM, Jeff wrote:
> >
> > > I think that Ben is asking for an HTML scraper. They
> > > generally use some obscenely complex Perl regular
> > > expressions to extract the relevant headlines from
> > > a page. The expressions are specific to the page.
> >
> > > I know that Ian over at Internet Alchemy runs one.
> > I do run one still, although I don't maintain it as much as I
> should.
> >
> > > I'm not a big fan of scraping -- it seems to be
> > > fragile and error-prone -- if the site changes
> > > its format the regular expressions could break.
> > Scrapers can be fragile, but the breakage is not as high as you
> might
> > think. Many sites use them. I believe Moreover uses WebMethods to
> > create their large set of feeds.
> >
> > Ian
>
>
>
>
>