[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [syndication] robots.txt and rss
> > opinion: RSS aggregators probably *should* respect robots.txt.
>
> How so? How is the robots.txt file germaine to a reader's behavior?
when i said 'aggregator', i wasn't necessarily talking about end-user
tools like radio, but more about large-scale tools and sites that collect
mountains of feeds at once.
this isn't really a short-term thing, but something that will be handy in
the *long* term. let's say that i'm using RSS to syndicate indices of
content, and that the RSS is generated on-the-fly - maybe inside Zope or
some other environment. well, i probably DONT want a site like syndic8 to
re-collect those feeds on a daily or even weekly basis, if there's a lot
of content - HUGE server load could result.
and *who knows* how people might use the feature? i sure don't.
> The only way a robots.txt file is going to have any relevance here is
> that the robots.txt file could indicate that a particular user-agent
> should NOT load from within a given part of the hierarchy. This would
> be equivalent to not having the feed available.
maybe i want to block out specific software users or types of users, and
allow individuals to keep doing what they're doing.
> If you're interested in blocking the hammering of a feed then you'd need to use
> other means to do so. Ban the IP address of the offending client machine. Or
> use the server mechanisms to detect the user-agent and block it that way.
maybe i don't want to block the whole site, just an RSS feed - maybe
because of some peculiar interaction between a particular site (or
something exposed on it - maybe some sort of service?) and an aggregator.
(i'm making this up, clearly - sometimes that's useful.)
> But you're on a slipperly slope here is you speak of banning user-agents in a
> wholesale manner and that's all robots.txt would allow.
nah, i just might want to use the per-file granularity that robots.txt
supports.
see the example at this url:
http://www.searchtools.com/robots/robots-txt.html
~elijah