I''m having a little trouble with understanding how to work out the schematic for some of my classes using ActiveRecord when a file is in my lib directory: Brief example: Here''s the outline of the files in use: ....app ........controllers ............application_controller.rb ............rushing_offenses_controller.rb ........models ............rushing_offense.rb ....lib ........scraper.rb ........tasks ............scraper.rake The rushing_offense.rb file contains: class RushingOffense < ActiveRecord::Base end The scraper.rb file contains: class Scraper < ActiveRecord::Base # METHOD that define which URL to parse # METHOD that parses the data into an instanced variable called @rows # METHOD that should be updating my database table called "rushing_offenses" # Update Rushing Offense def update_rushing_offense for i in 0..@numrows-1 update_all(:name => @rows[i][0], :games => @rows[i][1]) puts "Updating Team Name = #{@rows[i][0]}." end end end The scraper.rake file contains: desc "This task will parse data from ncaa.org and upload the data to our db" task :scraper => :environment do offensive_rushing Scraper.new(''http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org'', ''table'', ''statstable'', ''//tr'') offensive_rushing.scrape_data offensive_rushing.clean_celldata offensive_rushing.print_values offensive_rushing.update_rushing_offense # the call to the method above end Now if I run the rake file what is going to happen is I''m going to get an error stating: Table ''project_development.scrapers'' doesn''t exist: I believe I understand why that''s happening but I''m not sure how to fix it from a long term perspective. Here''s why... The class Scraper is pushed into the ActiveRecord::Base so it believes the class is the pluralized name of the table Scrapers. I then thought well maybe I need to put the code in the rushing_offenses_controller.rb file in that class but here''s the issue I''m having: The Scraper class should be a class that I can call with other classes to do repetitive tasks on many different URLs. I''ve setup the class to do that with the methods being able to retrieve different URLs. So, I want my Scraper class to just act like a utility class to be used by other classes to parse data, and upload it to the correct database table. If I place the scraper class inside the rushing_offenses_controller file then I''m not following DRY principles. I don''t want to have to repeat code over and over. Any ideas on how I can rectify this issue I''m having? -- Posted via http://www.ruby-forum.com/.
To expand upon the issue: There are approximately 37 different categories for College Football that house statistics. I will be parsing 37 different URLs to pull and retrieve data that will be pushed to my database. The Scraper class is the tool for doing that. Each call within my rake task is going to call specific URLs using the methods located in the Scraper class but will update to specific table names. Example: rushing_offense.rb ---> connects to the rushing_offenses table passing_offense.rb ---> connects to the passing_offenses table scoring_offense.rb ---> connects to the scoring_offenses table Call to scraper.rb to parse data from a rushing offense URL Call to scraper.rb to update data to rushing_offenses table Call to scraper.rb to parse data from a passing offense URL Call to scraper.rb to update data to passing_offenses table Call to scraper.rb to parse data from a scoring offense URl Call to scraper.rb to update data to scoring_offenses table etc. etc. -- for 37 different categories -- Posted via http://www.ruby-forum.com/.
To add another thought to the mix: The only reason why I''m defining a rake task is that eventually the rake task will be managed by a cron job for populating the data for my database on a weekly basis (say every sunday night). The main bulk of the remainder of my project will just be dealing with controllers and views for how the site is listed.. So, the population of data from an external source is the big issue right now. -- Posted via http://www.ruby-forum.com/.
Another thing I considered is inheritance. If I do class Scraper < RushingOffenses then the RushingOffenses class located in the rushing_offense.rb model would inherit it. Then I could possibly put the following in my rake task: offensive_rushing = RushingOffense::Scraper.new However, I would want Scraper to be a part of every statistical class I create. So, it would have to be a member of: RushingOffense PassingOffense ScoringOffense etc... How would I force inheritance across multiple classes? -- Posted via http://www.ruby-forum.com/.
I think I found my own answer to the last question - a single class cannot inherit across multiple classes. :( -- Posted via http://www.ruby-forum.com/.
On Jun 7, 8:30 pm, "J. D." <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> I think I found my own answer to the last question - a single class > cannot inherit across multiple classes. :( >Does Scraper need to be an activerecord class at all ? you could pass to it the class whose table needs to be updated ie def do_something(some_klass) some_klass.update_all(...) end or perhaps you might want to couple things a little more loosely def do_something(some_klass) some_klass.handle_scraper_data(...) end Fred
Frederick Cheung wrote:> > Does Scraper need to be an activerecord class at all ? you could pass > to it the class whose table needs to be updated ie > > def do_something(some_klass) > some_klass.update_all(...) > end > > or perhaps you might want to couple things a little more loosely > > def do_something(some_klass) > some_klass.handle_scraper_data(...) > end > > FredHi Fred: Here''s what I managed to do on my own (believe it or not - lol ): My Rake Task: Basically calling the RushingOffense class from models desc "Parse Rushing Offenses data from ncaa.org" task :parse_rushing_offenses => :environment do update_rushing = RushingOffense.new update_rushing.scrape end My Model for Rushing Offense: Which basically I created a method for "scrape" to scrape data utilizing the Scraper class. Since this model has inheritance with ActiveRecord it should be able to update... class RushingOffense < ActiveRecord::Base def scrape offensive_rushing = Scraper.new(''http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org'', ''table'', ''statstable'', ''//tr'') offensive_rushing.scrape_data offensive_rushing.clean_celldata for i in 0..offensive_rushing.numrows-1 puts "Updating Team Name = #{offensive_rushing.rows[i][1]}." RushingOffense.update_all(:name => offensive_rushing.rows[i][1], :games => offensive_rushing.rows[i][2]) end end end Then finally, I have my scraper.rb file #== Scraper Version 1.0 # #*Created By:* _Elricstorm_ # # _Special thanks to Soledad Penades for his initial parse idea which I worked with to create the Scraper program. # His article is located at http://www.iterasi.net/openviewer.aspx?sqrlitid=wd5wiad-hkgk93aw8zidbw_ # require ''hpricot'' require ''open-uri'' # This class is used to parse and collect data out of an html element class Scraper #< ActiveRecord::Base #class Scraper attr_accessor :url, :element_type, :clsname, :childsearch, :doc, :numrows, :rows # Define what the url is, what element type and class name we want to parse and open the url. def initialize(url, element_type, clsname, childsearch) @url = url @element_type = element_type @clsname = clsname @childsearch = childsearch @doc = Hpricot(open(url)) @numrows = numrows @rows = rows end # Scrape data based on the type of element, its class name, and define the child element that contains our data def scrape_data @rows = [] (doc/"#{@element_type}.#{@clsname}#{@childsearch}").each do |row| cells = [] (row/"td").each do |cell| if (cell/" span.s").length > 0 values = (cell/"span.s").inner_html.split(''<br />'').collect{ |str| pair = str.strip.split(''='').collect{|val| val.strip} Hash[pair[0], pair[1]] } if(values.length==1) cells << cell.inner_text.strip else cells << values.strip end elsif cells << cell.inner_text.strip end end @rows << cells end @rows.shift # Shifting removes the row containing the <th> table header elements. @rows.delete([]) # Remove any empty rows in our array of arrays. @numrows = @rows.length end def clean_celldata @rows[@numrows-1][0] = 120 end # Print a joined list by row to see our results def print_values puts "Number of rows = #{numrows}." for i in 0..@numrows-1 puts @rows[i].join('', '') end end end -------------------------------- Now the only problem I have now is when I run the rake task, I don''t get any errors and I see the puts for each team as it''s being updated (or supposed to be updated). So, it''s counting each row as I expected. I only tried to update 2 fields just for a test.. but no data is being listed in the database.. Any ideas of what I might be doing wrong? This still has been a great day because even though I''ve seen tons of errors, I''m learning.. -- Posted via http://www.ruby-forum.com/.
On Jun 7, 10:01 pm, "J. D." <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> > Any ideas of what I might be doing wrong?You''re not using update_all correctly - check the documentation Fred> > This still has been a great day because even though I''ve seen tons of > errors, I''m learning.. > > -- > Posted viahttp://www.ruby-forum.com/.
On Jun 8, 12:02 am, Frederick Cheung <frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On Jun 7, 10:01 pm, "J. D." <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote: > > > > > Any ideas of what I might be doing wrong? > > You''re not using update_all correctly - check the documentation >Well the documentation may not mention the usage you are using, but it does exist, sorry about that. You do seem to be using it slightly oddly though: you call update_all multiple times, but you don''t specify any conditions, so each call to update_all overwrites the changes made by the previous one. Fred> Fred > > > > > This still has been a great day because even though I''ve seen tons of > > errors, I''m learning.. > > > -- > > Posted viahttp://www.ruby-forum.com/.
Hi Fred, Yeah I''m stuck with this one. I''ve checked the documentation but I''m just not following it. What I basically need it to do is to update the table with the data that''s parsed into @rows. In this case @rows is listed by: offensive_rushing.rows[i][1] (:name) offensive_rushing.rows[i][2] (:games) I was trying to do a for loop to go through all of the rows and send the new data to the database. I''m just not sure how to do it properly. I catch on quick but I''ve been searching the web and reading the documentation and I just don''t see a very detailed model for what I''m trying to do. So, in a readability format what I see is: for i in 0..offensive_rushing.numrows-1 --> starting my loop and it''s going to repeat approx 120 times (120 teams) puts "Updating Team Name = #{offensive_rushing.rows[i][1]}." --> Print me out an update to show me that you are updating the teams RushingOffense.update_all(:name => offensive_rushing.rows[i][1], :games => offensive_rushing.rows[i][2]) --> Update the :name with the name of the team --> Update the :games with the number of games that team has played --> Update it if the team already exists (not sure how to do this part) --> Add new data if the team doesn''t exist (don''t know how to do this part) I hope that helps.. -- Posted via http://www.ruby-forum.com/.
On Jun 8, 12:22 am, "J. D." <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Hi Fred,> puts "Updating Team Name = #{offensive_rushing.rows[i][1]}." > --> Print me out an update to show me that you are updating the teams > RushingOffense.update_all(:name => offensive_rushing.rows[i][1], > :games => offensive_rushing.rows[i][2]) > --> Update the :name with the name of the team > --> Update the :games with the number of games that team has played > --> Update it if the team already exists (not sure how to do this part) > --> Add new data if the team doesn''t exist (don''t know how to do this > part)Sounds like you shouldn''t be using update_all at all here, rather you should be using find to find an appropriate row to update and if there is none, create a new one. Fred
Frederick Cheung wrote:> > Sounds like you shouldn''t be using update_all at all here, rather you > should be using find to find an appropriate row to update and if there > is none, create a new one. > > FredAgain, the problem is I don''t know how. I''m simply guessing based on what I see with the documentation. I don''t have any working examples and most of the tutorials I see are very basic.. How I plan to manage the data is important as well. For instance, I want to keep weekly data snapshots. So, as an example just using the rushing offense table: A user will be able to check by a particular week (the cron job will run the rake task once per week) Therefore, my database table needs to account for "new data" every single week. Scenario: Rake Task begins Check for weekly snapshot data (for current week) -- If no snapshot data then create it -- If data already exists for current week do nothing Next Week Rake Task begins Check for weekly snapshot data (for current week) -- If no snapshot data then create it -- If data already exists for current week do nothing So, let''s look at my current table structure: :rank :name :games :carries :net :avg :tds :ydspg :wins :losses :ties So, the first issue I see is that I do not have a column that accounts for some type of weekly snapshot event notification. Would you recommend this be tied to a timestamp? How would I check (based on the conditions above) to check against a particular timestamp range and produce the results..? Or should I create another column to check this out? And, lastly, is there somewhere online that code is available to view for "advanced table manipulation"? Much of the code that I have found is either very outdated, very basic, or not something I can use. The documentation is a decent start but it does not contain a lot of advanced examples.. I know I may be asking a lot of questions (and I apologize if I am). However, I do learn quickly and I''m the type of person that likes to dive in and get started. I''ve read one full ruby book and am midway through my first rails book. However, even these books do not provide me scenario based examples. This is why I''m here. I am better at understanding code when I see code. I don''t mind working through code that contains errors and trying to get it to work. That just helps me gain an understanding of what occurs. The API can only be used as a code bits reference. I always look there first but which code are you looking for? If you know exactly what method you are going to be working with, looking in the API and then scouring the web for information is a little easier. In the case of my example above, I''m not sure which methods I will be working with exactly to accomplish my task. Thanks. -- Posted via http://www.ruby-forum.com/.
By the way Fred, I really do appreciate you taking the time to help me and isolate some of my issues. I want to be proactive with my own code and later on with helping others. My goal is to gain an understanding of best practice methods and start utilizing those methods in my code from the start. I want to do whatever it is I need to do to get things going. If you say I need to go to X site (I''ll go to X site), etc. I''m very focused at the task at hand. -- Posted via http://www.ruby-forum.com/.
Hi Fred, I think I will use this for my find parameter: start_date = Time.now.beginning_of_week end_date = Time.now.end_of_week @rushing_offenses = RushingOffense.find(:all, :conditions => [''created_at > ? and created_at < ?'', start_date, end_date]) That will let me find anything created within the set week. Now I just have to figure out how to check whether or not it returns nil and create data.. -- Posted via http://www.ruby-forum.com/.
On Jun 8, 5:11 am, "J. D." <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Hi Fred, > > I think I will use this for my find parameter: > > start_date = Time.now.beginning_of_week > end_date = Time.now.end_of_week > @rushing_offenses = RushingOffense.find(:all, :conditions => > [''created_at > ? and created_at < ?'', start_date, end_date]) > > That will let me find anything created within the set week. Now I just > have to figure out how to check whether or not it returns nil and create > data..It will never return nil. It will return an array (possibly an empty one). You might want to set your own timestamp and use that rather than relying on created at (so that the date is one that is significant to your data and not just when you happened to run your scraper) Fred> > -- > Posted viahttp://www.ruby-forum.com/.
Frederick Cheung wrote:> On Jun 8, 5:11�am, "J. D." <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote: >> have to figure out how to check whether or not it returns nil and create >> data.. > > It will never return nil. It will return an array (possibly an empty > one). You might want to set your own timestamp and use that rather > than relying on created at (so that the date is one that is > significant to your data and not just when you happened to run your > scraper) > > FredHi Fred, Yep you were correct. If the query is empty it returns an empty array [] so I''ll make some checks against that. I''ll also take your advice and create a new column called compiled_on and associate it to timestamp. Thanks. -- Posted via http://www.ruby-forum.com/.
Okay, The end result was modifying the model for the table I was working with to do the following: class RushingOffense < ActiveRecord::Base def scrape offensive_rushing Scraper.new(''http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org'', ''table'', ''statstable'', ''//tr'') offensive_rushing.scrape_data offensive_rushing.clean_celldata start_date = Time.now.beginning_of_week end_date = Time.now.end_of_week current_date = Time.now @rushing_offenses = RushingOffense.find(:all, :conditions => [''compiled_on > ? and compiled_on < ?'', start_date, end_date]) if @rushing_offenses == [] #means we have an empty array for i in 0..offensive_rushing.numrows-1 puts "Updating Offensive Rushing Statistics for #{offensive_rushing.rows[i][1]}." RushingOffense.create(:rank => offensive_rushing.rows[i][0], :name => offensive_rushing.rows[i][1], :games => offensive_rushing.rows[i][2], :carries => offensive_rushing.rows[i][3], :net => offensive_rushing.rows[i][4], :avg => offensive_rushing.rows[i][5], :tds => offensive_rushing.rows[i][6], :ydspg => offensive_rushing.rows[i][7], :wins => offensive_rushing.rows[i][8], :losses => offensive_rushing.rows[i][9], :ties => offensive_rushing.rows[i][10], :compiled_on => current_date) end end if @rushing_offenses != [] #means the current week''s data is not empty puts "Current Week''s Data Is Already Populated!" end end end This code works 100% and doesn''t overlap. However, if you could take a look at this code and let me know if there''s something I should change to make it "better" or follow "best practices" to shorten or make it more efficient, I would be appreciative. I feel great now having completed my first difficult action with rails. -- Posted via http://www.ruby-forum.com/.
Just to throw another spanner in the works for you, I wonder if this wouldn''t be achieved more easily using scRUBYt!. The latest skimr branch (http://github.com/scrubber/scrubyt/tree/skimr) lets you quite easily store the results of a scrape directly into an ActiveRecord model. Drop me a line if you need me to provide a more concrete example. Glenn