[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
How to scrape?
Hi,
I'd like to "scrape" either the headlines or full stories from a couple of
different sites that are not currently producing an RSS file or available
through any existing aggregators.
The legal issues are not really a consideration in my case - It is only
going to be done with a couple of sites that have already given me
permission to do so.
I'm guessing I'd do it by spidering the pages somehow but there really
doesn't seem to be much information about how it could be done on the web.
If anyone has any suggestions or knows of any code examples or literature on
the subject I'd love hear about it.
Thanks
With Kindest Regards
Alis Marsden
Purple Pages
http://www.purplepages.ie
e: alis@purplepages.ie
t: + 353 1 4961943
f: + 353 1 4911497