I''d been ignoring this error message in my log for a while: ActionController::RoutingError (Recognition failed for "/robots.txt"): I had never touched robots.txt. So I decided to make it a proper robots.txtfile I found this great article... http://www.ilovejackdaniels.com/seo/robots-txt-file/ ...where Dave explains the ins and outs of the file. Before I changed mine, I thought I''d poll the group to see if anyone had any good thoughts on the subject -like any rails-specific excludes. And whether some samples could be posted. Mine was going to look like this: User-agent: * Disallow: /404.php Thanks, Steve http://www.smarkets.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060210/657d61f1/attachment.html
Hi Steve,
I would like to warn you about the issue of abusing bots, i.e. bots who do
not obey robots.txt. Such bots can really eat up your bandwidth fast.
On one of my servers in October last month the total bandwidth usage was
2300GB (serving just text and images). A detailed log scan showed that most of
the bandwidth was used by bots in Asian countries.
Scraping is really "hot" these days so you will want to ensure that
you get hold of various abusing bots to include in your robots.txt file.
Since I manage multiple domains on dedicated clusters, the solution for me
was to ban these bots using mod_rewrite. If you would like, I can post a copy
of my mod_rewrite banned user agents.
I recommend putting crawl delay in your robots.txt file (if you have a large
webite) otherwise bots like MSN can hit your site hard
User-agent: *
Crawl-delay: 17
Not that this answers your question, but I thought it may help.
Frank
Steve Odom <steve.odom@gmail.com> wrote: I''d been ignoring this
error message in my log for a while:
ActionController::RoutingError (Recognition failed for "/robots.txt"):
I had never touched robots.txt. So I decided to make it a proper robots.txt
file
I found this great article...
http://www.ilovejackdaniels.com/seo/robots-txt-file/
...where Dave explains the ins and outs of the file.
Before I changed mine, I thought I''d poll the group to see if anyone
had any good thoughts on the subject -like any rails-specific excludes. And
whether some samples could be posted.
Mine was going to look like this:
User-agent: *
Disallow: /404.php
Thanks,
Steve
http://www.smarkets.net
_______________________________________________
Rails mailing list
Rails@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails
---------------------------------
What are the most popular cars? Find out at Yahoo! Autos
---------------------------------
Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new
and used cars.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://wrath.rubyonrails.org/pipermail/rails/attachments/20060210/fd402c6b/attachment-0001.html
Yes, keep in mind that robots.txt is just a *suggestion* -- nothing has to follow it. Another easy way of banning abusive bots is to use Allow/Deny rules in your Apache configuration and ban them by IP or subnet, e.g. Deny from www.xxx.yyy.zzz http://www.brainstormsandraves.com/archives/2005/10/09/htaccess/ Tony On 2/10/06, softwareengineer 99 <softwareengineer99@yahoo.com> wrote:> > Hi Steve, > > I would like to warn you about the issue of abusing bots, i.e. bots who do > not obey robots.txt. Such bots can really eat up your bandwidth fast. > > On one of my servers in October last month the total bandwidth usage was > 2300GB (serving just text and images). A detailed log scan showed that most > of the bandwidth was used by bots in Asian countries. > > Scraping is really "hot" these days so you will want to ensure that you > get hold of various abusing bots to include in your robots.txt file. > > Since I manage multiple domains on dedicated clusters, the solution for me > was to ban these bots using mod_rewrite. If you would like, I can post a > copy of my mod_rewrite banned user agents. > > I recommend putting crawl delay in your robots.txt file (if you have a > large webite) otherwise bots like MSN can hit your site hard > > User-agent: * > Crawl-delay: 17 > > Not that this answers your quest ion, but I thought it may help. > > Frank > > > > *Steve Odom <steve.odom@gmail.com>* wrote: > > I''d been ignoring this error message in my log for a while: > > ActionController::RoutingError (Recognition failed for "/robots.txt"): > > I had never touched robots.txt. So I decided to make it a proper > robots.txt file > > I found this great article... > http://www.ilovejackdaniels.com/seo/robots-txt-file/ > > ...where Dave explains the ins and outs of the file. > > Before I changed mine, I thought I''d poll the group to see if anyone had > any good thoughts on the subject -like any rails-specific excludes. And > whether some samples could be posted. > > Mine was going to look like this: > User-agent: * > Disallow: /404.php > > Thanks, > > Steve > http://www.smarkets.net > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > > ------------------------------ > > What are the most popular cars? Find out at Yahoo! Autos<http://us.rd.yahoo.com/evt=38382/_ylc=X3oDMTEzNWFva2Y2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDMmF1dG9z/*http://autos.yahoo.com/newcars/popular/thisweek.html+%0A> > > ------------------------------ > Yahoo! Autos<http://us.rd.yahoo.com/evt=38381/+ylc=X3oDMTEzcGlrdGY5BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDMWF1dG9z/*http://autos.yahoo.com/index.html+>. > Looking for a sweet ride? Get pricing, reviews, & more on new and used cars. > > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060210/cd0996b5/attachment.html
Hello Kevin,
Thank you for your reply.
Here is a mini tutorial I created on how to install/configure mod_security
for Apache:
http://frankmash.blogspot.com/2005_12_09_frankmash_archive.html
Here I have posted my current useragents.conf files and a sample of how
to ban using .htaccess/mod_rewrite, as well as links to a great three -part
discussion on WebmasterWorld.
http://frankmash.blogspot.com/2006/02/banning-abusing-bots-using-modrewrite.html
And finally, I also posted a current copy of my blacklist.conf file for
mod_security
http://network-security-blacklists.blogspot.com/2006/02/my-etcmodsecurityblacklistconf.html
Please feel free to ask any question you may have regarding this.
Hope this helps.
Frank
Kevin Skoglund <kevin@pixelandpress.com> wrote:Please do post (or send
just to me) your mod_rewrite banned user
agents. That would be very helpful.
Thanks,
Kevin
---------------------------------
Yahoo! Mail
Use Photomail to share photos without annoying attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://wrath.rubyonrails.org/pipermail/rails/attachments/20060210/2abe57c9/attachment.html