Given a users search query my app goes off and scrapes a few sites and provides the results to the user. The user can also choose to filter these results even further by category, age etc and this will be updated via ajax without refreshing. All result items are not static. Except for its title the information for one item will change every 2 hours so theres no point in caching the data to a database. Given that i want to allow filtering of the results how should i go about storing the results after scraping? There will be at most about 1000 results each comprising about 300chars. Can i just store them in a @@results variable? How do i overcome the wiping of the data whilst in development mode? im new to rails but ive also read stuff on sessions, memcache etc but not really sure if they are whats needed for this situtation? Can anyone help? -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Sun, Apr 5, 2009 at 5:33 AM, Adam Akhtar <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > Given a users search query my app goes off and scrapes a few sites and > provides the results to the user. The user can also choose to filter > these results even further by category, age etc and this will be updated > via ajax without refreshing. All result items are not static. Except for > its title the information for one item will change every 2 hours so > theres no point in caching the data to a database. > > Given that i want to allow filtering of the results how should i go > about storing the results after scraping? There will be at most about > 1000 results each comprising about 300chars. > > Can i just store them in a @@results variable? How do i overcome the > wiping of the data whilst in development mode? > > im new to rails but ive also read stuff on sessions, memcache etc but > not really sure if they are whats needed for this situtation? > > Can anyone help? > -- > Posted via http://www.ruby-forum.com/. > > > >Why not write the results to a file. You could write the raw (pre-scraped) data to a file and re-scrape it or you could save the data structure in some format (YAML is an option here) Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake "I have never let my schooling interfere with my education" - Mark Twain --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
You can easily create a table, and stick it in as a row. in rails sqlite is easy enough, if you site is bigger you can use db2. If its like most sites, you make a "result" table that is associated to a user table. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi thanks for your replies. My main concern is performance. The data is not scraped beforehand in advance, its scraped on demand by my users. They submit a search query whch i then perform on several site, scrape their results and aggregate them for the user. My site is basically a meta search engine. Storing results in a db pros: i get to use msql find conditions when the user wants to filter the results even more. cons: ill only be temporarily storng these results. As soon as the user does a new search there gone forever. I dont know the peformance hit of storing a 1000 results in a db consisting of several fields. Is a db still a wise choice? Using YAML: pros: not sure, but hey, i like using it! cons: no msql conditions so id have to create my own methods does the above change anything? -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Sun, Apr 5, 2009 at 10:31 AM, Adam Akhtar <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > Hi thanks for your replies. > > My main concern is performance. The data is not scraped beforehand in > advance, its scraped on demand by my users. They submit a search query > whch i then perform on several site, scrape their results and aggregate > them for the user. My site is basically a meta search engine. > > Storing results in a db > > pros: i get to use msql find conditions when the user wants to filter > the results even more. > > cons: ill only be temporarily storng these results. As soon as the user > does a new search there gone forever. I dont know the peformance hit of > storing a 1000 results in a db consisting of several fields. Is a db > still a wise choice? > > Using YAML: > > pros: not sure, but hey, i like using it! > cons: no msql conditions so id have to create my own methods > > does the above change anything? > > > > -- > Posted via http://www.ruby-forum.com/. > > > >The benefit of YAML is that once you''ve scraped the data, you probably already have a structure in place which can easily be saved and restored. You could combine the two by storing the YAML in the database. From a performance perspective, consider caching the results of the scraping for at least some period of time so that you don''t have to scrape on every search (unless the source websites change VERY frequently) Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake "I have never let my schooling interfere with my education" - Mark Twain --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Thanks Andrew for ruling out any doubts i had regarding using yaml. I will cache the reuslts then for around 2 hours in a db. Im now wondering how this will affect the performance of filtering. My guess is that when a user selects some filters on the results screen, these get passed as params back to the controllers index. Logic there will determine its a request to filter existing results and will access the cache in the db and grab the yaml. Then use yaml to turn the info into the relevant objects and then use enumerators find_all method to filter the results... do you think that approach is ok or is there a better way of doing it? many thanks once again. you have been a great help. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Sun, Apr 5, 2009 at 12:17 PM, Adam Akhtar <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > Thanks Andrew for ruling out any doubts i had regarding using yaml. > > I will cache the reuslts then for around 2 hours in a db. > > Im now wondering how this will affect the performance of filtering. > > My guess is that when a user selects some filters on the results screen, > these get passed as params back to the controllers index. Logic there > will determine its a request to filter existing results and will access > the cache in the db and grab the yaml. Then use yaml to turn the info > into the relevant objects and then use enumerators find_all method to > filter the results... > > do you think that approach is ok or is there a better way of doing it? > > many thanks once again. you have been a great help. > > > > -- > Posted via http://www.ruby-forum.com/. > > > >Sounds good to me. I always focus on getting the job done in the simplest way possible first. Then work on optimisation if you see a bottleneck. Your biggest problem is likely to be fetching all the other sites for scraping which caching will hopefully help with. Andrew Timberlake http://ramblingsonrails.com http://www.linkedin.com/in/andrewtimberlake "I have never let my schooling interfere with my education" - Mark Twain --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Excellent thanks once again Andrew! Appreciate your advice. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Just thinking, your scrape should probably be in a worker, stick the results in a db, Depending on what your using, you configure it to be a temp table even. Then in your search window you can do ajax based updated from the scrape. With the ability to then clear up the cache. You get more concurrency, and with the right javascript you could cancel the scrape in process. Think this would scale and be more responsive On Apr 5, 10:02 pm, Adam Akhtar <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Excellent thanks once again Andrew! Appreciate your advice. > > -- > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Thanks glennswest, im relatively new to rails. Whilst i think i understood what you said can you (or anyone else) elaborate furhter on the points below? I really appreciated your help.> Just thinking, your scrape should probably be in a worker,when you say a worker i take it you mean some temporary database?>Depending on what your using, you configure it to be a temp > table even. > Then in your search window you can do ajax based updated from the > scrape.From the above do you mean whilst im scraping results from sites, when one sites results get added to the db and i go off scraping another sites results, i can simultaneously show the results that were just added to the screen?> With the ability to then clear up the cache.after i get all the results and display them to the screen i can then clear the table? You get more concurrency, Wasnt too sure what you meant by this but thats because im fresh to rails and cant gather from the context.> and with > the right javascript you could cancel the scrape in process.ahh so if whilst im scraping and simultaneously presenting already scraped data from the db, the user decides to cancel the request, via some javascript call i can terminate the outstanding scrape tasks and move on?> > Think this would scale and be more responsiveIn general how fast/slow is it to update a table with around 1000 results? is it fast enough to handle this situation? Id prefer to stick the objects in a temporary db because then id get to use existing activerecord methods and mysql statements. Im just worrying about the performance. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Here''s your problem in rails: Your web server is "single" threaded, so while you scrapping, its not doing anything else, so you will need more mongrels to take care of the users. Generally you scale by having more threads, and cpu working on the problem. The database is probably not going to be your bottleneck for a while, its more the style. Why dont I train you a bit. We can do a screen share/skype session. On Apr 6, 9:27 pm, Adam Akhtar <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Thanks glennswest, im relatively new to rails. Whilst i think i > understood what you said can you (or anyone else) elaborate furhter on > the points below? I really appreciated your help. > > > Just thinking, your scrape should probably be in a worker, > > when you say a worker i take it you mean some temporary database? > > >Depending on what your using, you configure it to be a temp > > table even. > > Then in your search window you can do ajax based updated from the > > scrape. > > From the above do you mean whilst im scraping results from sites, when > one sites results get added to the db and i go off scraping another > sites results, i can simultaneously show the results that were just > added to the screen? > > > With the ability to then clear up the cache. > > after i get all the results and display them to the screen i can then > clear the table? > > You get more concurrency, > > Wasnt too sure what you meant by this but thats because im fresh to > rails and cant gather from the context. > > > and with > > the right javascript you could cancel the scrape in process. > > ahh so if whilst im scraping and simultaneously presenting already > scraped data from the db, the user decides to cancel the request, via > some javascript call i can terminate the outstanding scrape tasks and > move on? > > > > > Think this would scale and be more responsive > > In general how fast/slow is it to update a table with around 1000 > results? is it fast enough to handle this situation? Id prefer to stick > the objects in a temporary db because then id get to use existing > activerecord methods and mysql statements. Im just worrying about the > performance. > > -- > Posted viahttp://www.ruby-forum.com/.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi Glennwest,sorry for the late reply. Id be up for chatting over skype if you are. Let me know either here or via a message. Thank you for your kind offer! adam. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---