thr3ads.net - Rails - Rake Tasks [Jun 2009]

If this information is useful, please help other people find it:
Share via:

J. D.

2009-Jun-06 20:54 UTC

Rake Tasks

Hi Everyone,

I just need some further help clarifying a custom rake task I''m
building
and the logistics of how it should be working.

I''ve created a custom rake task in libs/tasks called scraper.rake which
so far just contains the following:

desc "This task will parse data from ncaa.org and upload the data to our
db"
task :scraper => :environment do
  # code goes here for scraping
end

This rake task will be parsing data from ncaa.org and placing it into my
DB for further processing.  The .rb file I created has the following:

==============================
#== Scraper Version 1.0
#
#*Created By:* _Elricstorm_
#
# _Special thanks to Soledad Penades for his initial parse idea which I
worked with to create the Scraper program.
# His article is located at
iterasi.net/openviewer.aspx?sqrlitid=wd5wiad-hkgk93aw8zidbw_
#
require ''hpricot''
require ''open-uri''

# This class is used to parse and collect data out of an html element
class Scraper
  attr_accessor :url, :element_type, :clsname, :childsearch, :doc,
:numrows
   # Define what the url is, what element type and class name we want to
parse and open the url.
  def initialize(url, element_type, clsname, childsearch)
    @url = url
    @element_type = element_type
    @clsname = clsname
    @childsearch = childsearch
    @doc = Hpricot(open(url))
    @numrows = numrows
  end

  # Scrape data based on the type of element, its class name, and define
the child element that contains our data
  def scrape_data

    @rows = []

    (doc/"#{@element_type}.#{@clsname}#{@childsearch}").each do |row|
      cells = []
      (row/"td").each do |cell|

          if (cell/" span.s").length > 0
                values =
(cell/"span.s").inner_html.split(''<br
/>'').collect{ |str|
                pair = str.strip.split(''='').collect{|val|
val.strip}
                Hash[pair[0], pair[1]]
              }

              if(values.length==1)
                cells << cell.inner_text.strip
              else
                cells << values.strip
              end

          elsif
              cells << cell.inner_text.strip
          end
      end
      @rows << cells
    end
    @rows.shift # Shifting removes the row containing the <th> table
header elements.
    @rows.delete([]) # Remove any empty rows in our array of arrays.
    @numrows = @rows.length
  end

  def clean_celldata
     @rows[@numrows-1][0] = 120
  end

  # Print a joined list by row to see our results
  def print_values
    puts "Number of rows = #{numrows}."
    for i in 0..@numrows-1
      puts @rows[i].join('', '')
    end
  end
end
# In our search we are supplying the website url to parse, the type of
element (ex: table), the class name of that element
# and the child element that contains the data we wish to retrieve.
offensive_rushing
Scraper.new(''web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org'',
      ''table'', ''statstable'',
''//tr'')
offensive_rushing.scrape_data
offensive_rushing.clean_celldata
offensive_rushing.print_values

===============================
If you tested that out, you will see a print out of 120 rows of data..

What I want to do is to utilize the .rb file I created with my rake
task.  However, I''m not sure how to incorporate that into rails.  Once
I
get past this hurdle it should help with future issues.

So, here are my list of questions in order of what I am curious to
know..

1.  Where do custom .rb files go inside of my rails project?  (for
instance I understand the MVC but with a rake task - in my brain it''s
outside of the project and I''m not sure how it is supposed to
communicate with controllers or pull/associate variables from those
areas.

2.  With my custom .rb I''m also requiring ''hpricot''. 
Is there anything
special I need to do with a .rake file to make sure that it knows to
pull this gem?  And, if I export to my real site, how do I ensure that
hpricot is loaded there too?  In otherwords, what expectations should I
be relying on?

3.  When I run a rake task and need to communicate with my database (for
uploading purposes) is there an easy way to do this?  Can I utilize
.rake with my DB inside of my rails environment? Or, are rake tasks
completely seperate and distinct and need to be considered outside of
scope?

4.  Can anyone provide me a summarized step by step (nothing too fancy
or that takes up too much of your own time) with how "you" would
accomplish this kind of rake task given a similar .rb and .rake file?
What generalized steps would you take?  Create a class? (if so, where
would you place it)  How would you communicate with the DB within rails?
etc.

I know these are a lot of questions but I figure even if one or two of
them get answered, I''m happy.  You don''t have to feel that you
can''t
reply if you don''t have the answers to all of them.  Any answers that
can be touched upon would be greatly appreciated.

I am a newbie and learning rails (but many books do not talk about these
particulars).  So, I''m relying on others that have patience and
understanding to help enlighten me so that one day I too, can help
others that need similar help.

Thanks.
-- 
Posted via ruby-forum.com.

Maurício Linhares

2009-Jun-07 03:40 UTC

head link

Re: Rake Tasks

On Sat, Jun 6, 2009 at 5:54 PM, J.
D.<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
wrote:> 1.  Where do custom .rb files go inside of my rails project?  (for
> instance I understand the MVC but with a rake task - in my brain
it''s
> outside of the project and I''m not sure how it is supposed to
> communicate with controllers or pull/associate variables from those
> areas.
>
This custom file should be called scraper.rb and should be placed at
the /lib folder of your application. In a rake task you don''t really
access or call controllers, you just run the task, which is telling
the scrapper to load the data and then save it to the DB.
> 2.  With my custom .rb I''m also requiring
''hpricot''.  Is there anything
> special I need to do with a .rake file to make sure that it knows to
> pull this gem?  And, if I export to my real site, how do I ensure that
> hpricot is loaded there too?  In otherwords, what expectations should I
> be relying on?
>
You don''t need to do anything else, Rails will automatically enable
rubygems and by requiring hpricot you will tell it to load the gem.
> 3.  When I run a rake task and need to communicate with my database (for
> uploading purposes) is there an easy way to do this?  Can I utilize
> .rake with my DB inside of my rails environment? Or, are rake tasks
> completely seperate and distinct and need to be considered outside of
> scope?
>
Now you have to learn the Rails database access framework,
ActiveRecord, you should probably find plenty of material about it.


-
Maurício Linhares
alinhavado.wordpress.com (pt-br) |
codeshooter.wordpress.com (en)

J. D.

2009-Jun-07 12:40 UTC

head link

Re: Rake Tasks

Maurício Linhares wrote:> This custom file should be called scraper.rb and should be placed at
> the /lib folder of your application. In a rake task you don''t
really
> access or call controllers, you just run the task, which is telling
> the scrapper to load the data and then save it to the DB.
> 
So, just to make sure I understand correctly..

The scraper.rb file would go in the lib folder and my scraper.rake file 
would go in the lib\tasks folder?  The rake file - would I have to 
include anything to call that .rb file?  I''m sorry if I am 
misunderstanding this portion of the mechanics.
> 
> You don''t need to do anything else, Rails will automatically
enable
> rubygems and by requiring hpricot you will tell it to load the gem.
> 
Thanks - that part is easy enough.  When I go to port my app to my real 
site, I will have to install hpricot there as well?  Or, can I include 
hpricot in vendor\plugins?
> 
> Now you have to learn the Rails database access framework,
> ActiveRecord, you should probably find plenty of material about it.
> 
This is the part that I''m currently studying/reading on and
it''s
wonderful thus far.  I just didn''t know if the rake task or the ruby 
file had to communicate with activerecord in a certain way..

i.e. - When I have a controller created it can communicate through the 
model and access the database.  Since the rake and rb file are not part 
of MVC, I just didn''t know if it also used similar mechanics or not.

Thanks for the feedback.  I''ll wait for the reply on the first note to 
figure it out..
-- 
Posted via ruby-forum.com.

Frederick Cheung

2009-Jun-07 13:04 UTC

head link

Re: Rake Tasks

On Jun 7, 1:40 pm, "J. D."
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
wrote:> Maurício Linhares wrote:
> > This custom file should be called scraper.rb and should be placed at
> > the /lib folder of your application. In a rake task you don''t
really
> > access or call controllers, you just run the task, which is telling
> > the scrapper to load the data and then save it to the DB.
>
> So, just to make sure I understand correctly..
>
> The scraper.rb file would go in the lib folder and my scraper.rake file
> would go in the lib\tasks folder?  The rake file - would I have to
> include anything to call that .rb file?  I''m sorry if I am
> misunderstanding this portion of the mechanics.
>
>Because your task depends on :environment Rails is loaded, in
particular its dependency management is loaded so it will find your
Scraper class as long as it''s in scraper.rb somewhere on its search
path. Don''t take my word for it though, try it!

Fred

J. D.

2009-Jun-07 14:01 UTC

head link

Re: Rake Tasks

Frederick Cheung wrote:> On Jun 7, 1:40�pm, "J. D."
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:
>> include anything to call that .rb file? �I''m sorry if I am
>> misunderstanding this portion of the mechanics.
>>
>>
> Because your task depends on :environment Rails is loaded, in
> particular its dependency management is loaded so it will find your
> Scraper class as long as it''s in scraper.rb somewhere on its
search
> path. Don''t take my word for it though, try it!
> 
> Fred
Thanks,

I''ll test a couple of generic variables and puts...
-- 
Posted via ruby-forum.com.

J. D.

2009-Jun-07 14:08 UTC

head link

Re: Rake Tasks

Thanks - I understand that part now!

I put scraper.rb in my Libs folder
I put scraper.rake in my libs/tasks

I took the end portion of scraper.rb and removed it placing it in my 
rake file:

desc "This task will parse data from ncaa.org and upload the data to our 
db"
task :scraper => :environment do
  # In our search we are supplying the website url to parse, the type of 
element (ex: table), the class name of that element
  # and the child element that contains the data we wish to retrieve.
  offensive_rushing = 
Scraper.new(''web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org'',
    ''table'', ''statstable'',
''//tr'')
  offensive_rushing.scrape_data
  offensive_rushing.clean_celldata
  offensive_rushing.print_values
end

And it did a print out when I called the rake..  So, now I''ll have to 
test this with the database and see how it works...

Thanks a ton (I understand it now)..

The part that was => environment do was telling my rake task to make 
sure that the environment was fully loaded before running it.

So, if I wanted to run another rake task in the same rake file and I 
wanted to make sure the first was done, I''d do something like:

task: next_task => :scraper do
  # code
end

which would make it run only after the scraper task had finished..



-- 
Posted via ruby-forum.com.

Apparently Analagous Threads

Search for more possibly parallel threads

Rails - Jun 2009 - Rake Tasks

Rake Tasks

Re: Rake Tasks

Re: Rake Tasks

Re: Rake Tasks

Re: Rake Tasks

Re: Rake Tasks

Apparently Analagous Threads