cyber y.
2012-Jan-04 11:26 UTC
How to get all image, pdf and other files links from a website?
I have to develop an application which fetches all the images, pdf, cgi, etc. file extension links from website. Can anybody guide me from where should I begin? -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Felipe Fontoura
2012-Jan-04 11:44 UTC
Re: How to get all image, pdf and other files links from a website?
You can find usefully information at http://railscasts.com/episodes?utf8=%E2%9C%93&search=nokogiri Specially Mechanize []''s --- Felipe Fontoura Eng. de Computação http://www.felipefontoura.com 2012/1/4 cyber y. <lists-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org>> I have to develop an application which fetches all the images, pdf, cgi, > etc. file extension links from website. > > Can anybody guide me from where should I begin? > > -- > Posted via http://www.ruby-forum.com/. > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Peter Hickman
2012-Jan-04 12:10 UTC
Re: How to get all image, pdf and other files links from a website?
Well wget has a mirror mode that will clone a website wget --mirror http://www.example.com or you could look at nutch (http://wiki.apache.org/nutch/) which is a web crawler for building searches. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
cyber y.
2012-Jan-05 09:42 UTC
How to get all image, pdf and other files links from a website?
I am working on an application where I have to 1) get all the links of website 2) and then get the list of all the files and file extensions in each of the web page/link. I am done with the first part of it :) now I have to get the all the files/file-extensions in each of the page. Can anybody guide me how to parse the links/webpage and get the file- extensions in the page? -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Peter Hickman
2012-Jan-05 09:48 UTC
Re: How to get all image, pdf and other files links from a website?
Is it me or has this particular homework question turned up a few times already? Hint: This has been asked and answered before quite recently (yesterday even) so try reading the mailing list. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.