KathysKode-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2008-Jun-19 17:21 UTC
Google, MSN, Yahoo spiders crawling off my ''database universe''?
I recently figured out how to create a fairly complex Google Sitemap file and am happy to share this code with anyone who asks. As I have a highly nested database a common URL for me will look something like; www.MyWebsite.com/parents/23/children/45. The spiders come to my website and ''crawl'' along and increase these sequences which eventually puts them into; www.MyWebsite.com/parents/23/children/46 and as this URL has gone off the edge of my database universe, my exception_notification features send me an email. Is there a way to put logic somewhere so that if a spider (or person) is messing around and requests a URL that isn''t there, that a routine kicks in telling them "You''ve gone off the edge of my database universe" on a view, and then takes them back to where they were? Thank you for any thoughts you may offer. Kathleen --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Dhaval Parikh
2008-Jun-20 04:54 UTC
Re: Google, MSN, Yahoo spiders crawling off my ''database uni
Well u can setup proper robotx.txt and disallow certain things to be accessed from ur site..This is the only way to put restriction on crawling. Hope this helps Thanks Dhaval Parikh Software Engineer http://www.railshouse.com sales(AT)railshouse(DOT)com -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Matt Stone1
2008-Jun-20 06:30 UTC
Re: Google, MSN, Yahoo spiders crawling off my ''database universe''?
KathysKode-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:> I recently figured out how to create a fairly complex Google Sitemap > file and am happy to share this code with anyone who asks. As I have a > highly nested database a common URL for me will look something like; > > www.MyWebsite.com/parents/23/children/45. > > The spiders come to my website and ''crawl'' along and increase these > sequences which eventually puts them into; > > www.MyWebsite.com/parents/23/children/46 > > and as this URL has gone off the edge of my database universe, my > exception_notification features send me an email. > Is there a way to put logic somewhere so that if a spider (or person) > is messing around and requests a URL that isn''t there, that a routine > kicks in telling them "You''ve gone off the edge of my database > universe" on a view, and then takes them back to where they were? > > Thank you for any thoughts you may offer. > KathleenHi there, I don''t think that spiders work that way. I thought that they follow existing links - rather than looking at the url & guessing what another one could be. If there are no to a page, a spider should not get to it. Unless the link used to exist, but does not now - ie: you are dynamically genereating your urls using table id''s & have removed some records from the table. In this case, the links will still be in the search engines index & polled every so often. This is not a bad thing, as the link will eventually fall out of the search engines index. You can request removal through the webmaster admin tools if this bugs you. Yahoo''s spider deliberately generates a dummy url to generate a "404 - page not found" error to understand how your site handles this. Do don''t be too worried that Yahoo has weird links for you site in its index. Otherwise you can put something like this in your logic. user_agent = request.user_agent.downcase if ![ ''msnbot'', ''yahoo! slurp'',''googlebot'' ].detect { |b| user_agent.include? b } #This request is not from one of the msn, yahoo, or google spiders # and so process accordingly end rgds, - matt http://www.internetschminternet.com -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---