[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Translate non-structured documents into Xml RSS format
Hi all,
Though I'm preaty new on this list, I don't think this issue has
already been discussed...
I would like to know if anybody has already worked on a bot that
could grab unstructured documents and translate them into RSS format.
This should work as following :
Someone should fill a quew with several URL. The bot will then browse
those web pages like a spider, recognise documents, grab them and
save them into a database using the diferent fields required for RSS
format. Then you'll only have to export your datas in a RSS format.
Each time you'll have to grab documents on a specific URL, your bot
should be learned to recognise document structures knowing the
website graphical chart.
I think Ondisplay have such a bot... has someone arleady read about
such a stuff ?
Thanks a lot
Ben.