thr3ads.net - Rails - updating the db 6000 times will take few minutes ? [May 2009]

If this information is useful, please help other people find it:
Share via:

Jian Lin

2009-May-09 15:24 UTC

updating the db 6000 times will take few minutes ?

i am writing a test program for ActiveRecord, and it reads a document
which is like 6000 words long.  And then i just tally up the words by

    recordWord = Word.find_by_s(word);
    if (recordWord.nil?)
      recordWord = Word.new
      recordWord.s = word
    end
    if recordWord.count.nil?
      recordWord.count = 1
    else
      recordWord.count += 1
    end
    recordWord.save

and so this part loops for 6000 times... and it takes a few minutes to
run at least using sqlite3.  Is it normal?  I was expecting it could run
within a couple seconds...  can MySQL speed it up a lot?
-- 
Posted via http://www.ruby-forum.com/.

Colin Law

2009-May-09 16:07 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Have you got an index on the s column?

Colin

2009/5/9 Jian Lin
<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
>
> i am writing a test program for ActiveRecord, and it reads a document
> which is like 6000 words long.  And then i just tally up the words by
>
>    recordWord = Word.find_by_s(word);
>    if (recordWord.nil?)
>      recordWord = Word.new
>      recordWord.s = word
>    end
>    if recordWord.count.nil?
>      recordWord.count = 1
>    else
>      recordWord.count += 1
>    end
>    recordWord.save
>
> and so this part loops for 6000 times... and it takes a few minutes to
> run at least using sqlite3.  Is it normal?  I was expecting it could run
> within a couple seconds...  can MySQL speed it up a lot?
> --
> Posted via http://www.ruby-forum.com/.
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Gary Doades

2009-May-09 16:26 UTC

head link

Re: updating the db 6000 times will take few minutes ?

I''m glad you said it was a test because it wouldn''t be nice to
do that
for real.

It looks like you''ve got at least 12000 transactions in there (select +
update). Any indexes would make it worse.

I''m guessing that this would take around 5 minutes on decent single
disk
  PC hardware.

No database engine (even oracle) is going to do that in 2 seconds on a 
standard PC with a single sata/ide hard disk. You''re asking for 6000 
read/write operations per second. You''ll need some mighty big hardware 
to do that!

If you did the whole thing in a single transaction you might get it to 
go quite a bit faster.

Cheers,
Gary.

Jian Lin wrote:> i am writing a test program for ActiveRecord, and it reads a document
> which is like 6000 words long.  And then i just tally up the words by
> 
>     recordWord = Word.find_by_s(word);
>     if (recordWord.nil?)
>       recordWord = Word.new
>       recordWord.s = word
>     end
>     if recordWord.count.nil?
>       recordWord.count = 1
>     else
>       recordWord.count += 1
>     end
>     recordWord.save
> 
> and so this part loops for 6000 times... and it takes a few minutes to
> run at least using sqlite3.  Is it normal?  I was expecting it could run
> within a couple seconds...  can MySQL speed it up a lot?

Colin Law

2009-May-09 16:35 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Would not the index make the lookup faster but the write slower?  Is it that
the cacheing would mean that the lookup would be pretty quick anyway, even
without an index?

Colin

2009/5/9 Gary Doades <gpd-UVRRe/+0hzLQXOPxS62xeg@public.gmane.org>
>
> I''m glad you said it was a test because it wouldn''t be
nice to do that
> for real.
>
> It looks like you''ve got at least 12000 transactions in there
(select +
> update). Any indexes would make it worse.
>
> I''m guessing that this would take around 5 minutes on decent
single disk
>  PC hardware.
>
> No database engine (even oracle) is going to do that in 2 seconds on a
> standard PC with a single sata/ide hard disk. You''re asking for
6000
> read/write operations per second. You''ll need some mighty big
hardware
> to do that!
>
> If you did the whole thing in a single transaction you might get it to
> go quite a bit faster.
>
> Cheers,
> Gary.
>
> Jian Lin wrote:
> > i am writing a test program for ActiveRecord, and it reads a document
> > which is like 6000 words long.  And then i just tally up the words by
> >
> >     recordWord = Word.find_by_s(word);
> >     if (recordWord.nil?)
> >       recordWord = Word.new
> >       recordWord.s = word
> >     end
> >     if recordWord.count.nil?
> >       recordWord.count = 1
> >     else
> >       recordWord.count += 1
> >     end
> >     recordWord.save
> >
> > and so this part loops for 6000 times... and it takes a few minutes to
> > run at least using sqlite3.  Is it normal?  I was expecting it could
run
> > within a couple seconds...  can MySQL speed it up a lot?
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Jian Lin

2009-May-09 16:36 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:> I''m glad you said it was a test because it wouldn''t be
nice to do that
> for real.
> 
> It looks like you''ve got at least 12000 transactions in there
(select +
> update). Any indexes would make it worse.
> 
> I''m guessing that this would take around 5 minutes on decent
single disk
>   PC hardware.
> 
> No database engine (even oracle) is going to do that in 2 seconds on a
> standard PC with a single sata/ide hard disk. You''re asking for
6000
> read/write operations per second. You''ll need some mighty big
hardware
> to do that!
> 
> If you did the whole thing in a single transaction you might get it to
> go quite a bit faster.
will   record.save  automatically make it a transaction?  so how do i 
make it into 1 transaction?   I can''t make it a hash or an array of 
records and at the end, do a loop of record.save too?  because each will 
be a transaction...  ?

what about using memcache?  the db seems like was about 45MB when i run 
6000 words test on a few pages...  so memcache of 64MB... maybe 
everything will happen in RAM and it can be really fast?

-- 
Posted via http://www.ruby-forum.com/.

Jian Lin

2009-May-09 16:39 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Colin Law wrote:> Would not the index make the lookup faster but the write slower?  Is it 
> that
> the cacheing would mean that the lookup would be pretty quick anyway, 
> even
> without an index?
> 
> Colin
> 
> 2009/5/9 Gary Doades <gpd-UVRRe/+0hzLQXOPxS62xeg@public.gmane.org>

yeah, seems like the search will be a lot faster with the index...  i 
was also thinking of using production mode to run it, since the 
development mode will write all the SQL log and slow down the 
operation...

-- 
Posted via http://www.ruby-forum.com/.

Colin Law

2009-May-09 16:44 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Is this a test or a real requirement?  If it is a real requirement then
count the words in memory first and then update the db so each record is
only written once.

Colin

2009/5/9 Jian Lin
<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
>
> Colin Law wrote:
> > Would not the index make the lookup faster but the write slower?  Is
it
> > that
> > the cacheing would mean that the lookup would be pretty quick anyway,
> > even
> > without an index?
> >
> > Colin
> >
> > 2009/5/9 Gary Doades
<gpd-UVRRe/+0hzLQXOPxS62xeg@public.gmane.org>
>
>
> yeah, seems like the search will be a lot faster with the index...  i
> was also thinking of using production mode to run it, since the
> development mode will write all the SQL log and slow down the
> operation...
>
> --
> Posted via http://www.ruby-forum.com/.
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Gary Doades

2009-May-09 16:48 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Yes, an index/caching will make the lookup faster, but even at the 
optimum you are looking at 6000 transactions. Never in two seconds 
unless you have some unsafe (buffered, no fsync) setting on your db engine.

Cheers,
Gary.

Colin Law wrote:> Would not the index make the lookup faster but the write slower?  Is it 
> that the cacheing would mean that the lookup would be pretty quick 
> anyway, even without an index?
> 
> Colin
> 
> 2009/5/9 Gary Doades <gpd-UVRRe/+0hzLQXOPxS62xeg@public.gmane.org
<mailto:gpd-UVRRe/+0hzLQXOPxS62xeg@public.gmane.org>>
> 
> 
>     I''m glad you said it was a test because it wouldn''t
be nice to do that
>     for real.
> 
>     It looks like you''ve got at least 12000 transactions in there
(select +
>     update). Any indexes would make it worse.
> 
>     I''m guessing that this would take around 5 minutes on decent
single disk
>      PC hardware.
> 
>     No database engine (even oracle) is going to do that in 2 seconds on a
>     standard PC with a single sata/ide hard disk. You''re asking
for 6000
>     read/write operations per second. You''ll need some mighty big
hardware
>     to do that!
> 
>     If you did the whole thing in a single transaction you might get it to
>     go quite a bit faster.
> 
>     Cheers,
>     Gary.
> 
>     Jian Lin wrote:
>      > i am writing a test program for ActiveRecord, and it reads a
document
>      > which is like 6000 words long.  And then i just tally up the
words by
>      >
>      >     recordWord = Word.find_by_s(word);
>      >     if (recordWord.nil?)
>      >       recordWord = Word.new
>      >       recordWord.s = word
>      >     end
>      >     if recordWord.count.nil?
>      >       recordWord.count = 1
>      >     else
>      >       recordWord.count += 1
>      >     end
>      >     recordWord.save
>      >
>      > and so this part loops for 6000 times... and it takes a few
>     minutes to
>      > run at least using sqlite3.  Is it normal?  I was expecting it
>     could run
>      > within a couple seconds...  can MySQL speed it up a lot?
> 
> 
> 
> 
> >

Jian Lin

2009-May-09 16:48 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Colin Law wrote:> Is this a test or a real requirement?  If it is a real requirement then
> count the words in memory first and then update the db so each record is
> only written once.
> 
> Colin
> 
> 2009/5/9 Jian Lin
<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>

it is a real requirement... so how do i update the db at the end in 1 
operation?  thanks.
-- 
Posted via http://www.ruby-forum.com/.

Frederick Cheung

2009-May-09 17:28 UTC

head link

Re: updating the db 6000 times will take few minutes ?

On May 9, 5:48 pm, Jian Lin
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
wrote:> Colin Law wrote:
> > Is this a test or a real requirement?  If it is a real requirement
then
> > count the words in memory first and then update the db so each record
is
> > only written once.
>
> > Colin
>
> > 2009/5/9 Jian Lin
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
>
> it is a real requirement... so how do i update the db at the end in 1
> operation?  thanks.
Well like others have said you can wrap everything in a single
transaction (look at the rails api docs for transactions) which at
least means that your database won''t have to flush to disk on each
write. There''s nothing built into rails for bulk inserts but
there''s
nothing stopping you from building up the appropriate insert statement
yourself (there''s also a plugin (http://github.com/zdennis/ar-
extensions/tree/master) that does that sort of thing).

Fred> --
> Posted viahttp://www.ruby-forum.com/.

Michael Schuerig

2009-May-09 17:47 UTC

head link

Re: updating the db 6000 times will take few minutes ?

On Saturday 09 May 2009, Jian Lin wrote:> i am writing a test program for ActiveRecord, and it reads a document
> which is like 6000 words long.  And then i just tally up the words by
>
>     recordWord = Word.find_by_s(word);
>     if (recordWord.nil?)
>       recordWord = Word.new
>       recordWord.s = word
>     end
>     if recordWord.count.nil?
>       recordWord.count = 1
>     else
>       recordWord.count += 1
>     end
>     recordWord.save
>
> and so this part loops for 6000 times...
Simply put, this is an inappropriate use of a relational database. As 
the other respondents said, you''re executing 12000 transactions each 
with a single database access. If you wrap the code above into a 
transaction block

Word.transaction do
  ...
end

you''re down to one transaction, but still have 12000 accesses. IIUC,
you
have a word list of 6000 words (or something) in memory already, why not 
do the frequency counting where it''s easy?

frequencies = Hash.new(0)
wordlist.each { |word| frequencies[word] += 1 }

Word.transaction do
  wordlist.each do |word, freq|
    unless w = Word.find_by_s(word)
      w = Word.new(:s => word, :count => 0)
    end
    w.freq += 1
    w.save!
  end
end

This way still incurs two database accesses per unique word in your 
list. To cut this down without custom SQL ([1], [2]) you have to 
know/decide whether there are mostly known or unknown words in every new 
document.

Assuming most words are already in the database, you could do the update 
like this

Word.transaction do
  wordlist.each do |word, freq|
    update_count = Word.update_all(
      ["freq = freq + ?", freq],
      ["s = ?", word])
    if update_count == 0
      Word.create!(:s => word, :count => freq)
    end
  end
end

Michael

[1] http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html
[2] http://www.postgresql.org/docs/current/static/plpgsql-control-
structures.html#PLPGSQL-ERROR-TRAPPING

-- 
Michael Schuerig
mailto:michael-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org
http://www.schuerig.de/michael/

Jian Lin

2009-May-09 18:09 UTC

head link

Re: updating the db 6000 times will take few minutes ?

great... will try it out now.

actually, i was thinking, that the computer sometimes has 1GB free RAM 
or 3GB free RAM (of the 4GB of RAM).  how come the OS doesn''t 
automatically create a cache for the 45MB db file?  If the OS creates 
the cache, everything happens in memory, and it should be quite fast.

Is it true that sqlite3 actually flush the data into the hard drive? 
but at least for the searching part, can it still happen just in RAM? 
Can''t MySQL or Sqlite3 actually have a mode so that the db can be
cached
in RAM as much as possible?

-- 
Posted via http://www.ruby-forum.com/.

Gary Doades

2009-May-09 18:21 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> great... will try it out now.
> 
> actually, i was thinking, that the computer sometimes has 1GB free RAM 
> or 3GB free RAM (of the 4GB of RAM).  how come the OS doesn''t 
> automatically create a cache for the 45MB db file?  If the OS creates 
> the cache, everything happens in memory, and it should be quite fast.
> 
> Is it true that sqlite3 actually flush the data into the hard drive? 
> but at least for the searching part, can it still happen just in RAM? 
> Can''t MySQL or Sqlite3 actually have a mode so that the db can be
cached
> in RAM as much as possible?
The database can be cached in ram by most database engines. However, 
most database engines will also make sure that every transaction 
(insert, update, delete) is committed to disk by forcing a write to the 
physical disk for *every* transaction. If the database engine doesn''t
do
this you risk losing part or all of your database if some kind of 
failure happens part way though your updates.

If you don''t care if you lose part or all of your database, most 
database engines also have a setting for this.....

Cheers,
Gary.

Jian Lin

2009-May-09 20:58 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:> Jian Lin wrote:
>> in RAM as much as possible?
> The database can be cached in ram by most database engines. However,
> most database engines will also make sure that every transaction
> (insert, update, delete) is committed to disk by forcing a write to the
> physical disk for *every* transaction. If the database engine
doesn''t do
> this you risk losing part or all of your database if some kind of
> failure happens part way though your updates.
> 
> If you don''t care if you lose part or all of your database, most
> database engines also have a setting for this.....

so i re-ran the test and it worked quite quickly... down to 1 minute or
so instead of 30 minutes... (for several pages)...  and the db size is
only 2.5MB  (i created another rails project to start anew).   So
if we are just write scripts, and running with   script/runner test.rb,
how can we turn the "force write" feature of the sqlite3 off?
-- 
Posted via http://www.ruby-forum.com/.

Phlip

2009-May-09 21:27 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:
> so i re-ran the test and it worked quite quickly... down to 1 minute or
> so instead of 30 minutes... (for several pages)...  and the db size is
> only 2.5MB  (i created another rails project to start anew).   So
> if we are just write scripts, and running with   script/runner test.rb,
> how can we turn the "force write" feature of the sqlite3 off?
Are you running inside a transaction? sqlite3 in transaction mode is
super-fast...

(Jumping into the thread late, why should a test push 6 000 records? Such a test
is not exactly a "unit" test, and alternate strategies should work
better.)

-- 
   Phlip
   http://flea.sourceforge.net/resume.html

Jian Lin

2009-May-10 02:24 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Phlip wrote:> Jian Lin wrote:
> 
>> so i re-ran the test and it worked quite quickly... down to 1 minute or
>> so instead of 30 minutes... (for several pages)...  and the db size is
>> only 2.5MB  (i created another rails project to start anew).   So
>> if we are just write scripts, and running with   script/runner test.rb,
>> how can we turn the "force write" feature of the sqlite3 off?
> 
> Are you running inside a transaction? sqlite3 in transaction mode is 
> super-fast...
> 
> (Jumping into the thread late, why should a test push 6 000 records? 
> Such a test
> is not exactly a "unit" test, and alternate strategies should
work
> better.)
i tried both  record.save  and record.save!  and both needed a few 
minutes at the end of the program to write the data to the db...

i am writing a program to tally up the common word or phrases (say, 
either in English or Chinese) that appears often.  So for example, at 
the end of the program (i used  ruby script/runner check.rb to run it 
using Ruby with ActiveRecored support), when the results are in, it may 
be a table of 6000 keys and 6000 frequency counts.  So I need to write 
the data into the db.   That''s it.  Wonder why it takes a few minutes 
and how to make it just be a few seconds.

If I just write the results to a flat text file using CSV format, it 
should take no more than a few seconds, i think.

-- 
Posted via http://www.ruby-forum.com/.

Colin Law

2009-May-10 07:45 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Have you made the suggested change to count the words in the document first
then update the db, so that you are not updating each record in the database
many times (presumably hundreds of times for common words)?  This will
change the number of db accesses from the total number of words in the
document to the number of unique words in the document.

2009/5/10 Jian Lin
<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
>
> Phlip wrote:
> > Jian Lin wrote:
> >
> >> so i re-ran the test and it worked quite quickly... down to 1
minute or
> >> so instead of 30 minutes... (for several pages)...  and the db
size is
> >> only 2.5MB  (i created another rails project to start anew).   So
> >> if we are just write scripts, and running with   script/runner
test.rb,
> >> how can we turn the "force write" feature of the sqlite3
off?
> >
> > Are you running inside a transaction? sqlite3 in transaction mode is
> > super-fast...
> >
> > (Jumping into the thread late, why should a test push 6 000 records?
> > Such a test
> > is not exactly a "unit" test, and alternate strategies
should work
> > better.)
>
> i tried both  record.save  and record.save!  and both needed a few
> minutes at the end of the program to write the data to the db...
>
> i am writing a program to tally up the common word or phrases (say,
> either in English or Chinese) that appears often.  So for example, at
> the end of the program (i used  ruby script/runner check.rb to run it
> using Ruby with ActiveRecored support), when the results are in, it may
> be a table of 6000 keys and 6000 frequency counts.  So I need to write
> the data into the db.   That''s it.  Wonder why it takes a few
minutes
> and how to make it just be a few seconds.
>
> If I just write the results to a flat text file using CSV format, it
> should take no more than a few seconds, i think.
>
> --
> Posted via http://www.ruby-forum.com/.
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Jian Lin

2009-May-10 07:55 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Colin Law wrote:> Have you made the suggested change to count the words in the document 
> first
> then update the db, so that you are not updating each record in the 
> database
> many times (presumably hundreds of times for common words)?  This will
> change the number of db accesses from the total number of words in the
> document to the number of unique words in the document.

yes, so right now i first tally up the count by a Hash, and then at the 
end, write all the data to the table.  it is strange that writing all 
this data take a minute or so...  because the final db is only 2.5MB, 
and that if i write all the data to a flat file, i think it is done in 2 
seconds.

so right now, i want to find out

1) can i tell sqlite3 not to write to the db every time i add a record 
(a row), but to do it all in memory and then finally, write to the table 
once at the end.

2) use memCache to do it.

3) some one at stackoverflow.com suggested using AR:Extensions

http://stackoverflow.com/questions/843524/updating-the-db-6000-times-will-take-few-minutes

but i just want to use the most basic way to do it... such as by turning 
off the force-write of sqlite3

-- 
Posted via http://www.ruby-forum.com/.

Colin Law

2009-May-10 08:04 UTC

head link

Re: updating the db 6000 times will take few minutes ?

2009/5/10 Jian Lin
<rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
>
> Colin Law wrote:
> > Have you made the suggested change to count the words in the document
> > first
> > then update the db, so that you are not updating each record in the
> > database
> > many times (presumably hundreds of times for common words)?  This will
> > change the number of db accesses from the total number of words in the
> > document to the number of unique words in the document.
>
>
> yes, so right now i first tally up the count by a Hash, and then at the
> end, write all the data to the table.  it is strange that writing all
> this data take a minute or so...  because the final db is only 2.5MB,
> and that if i write all the data to a flat file, i think it is done in 2
seconds.


Have you checked the sql in the log to make sure that it is only rewriting
each word once?  Have you added the index to the word field (I think now
that each record is only being written once then the index should be
beneficial as it will speed up the find, someone correct me if I am wrong)?

Another thing you could do is comment out the db access bit just to make
sure that the time is not going somewhere else.  Initially comment out the
save then comment out the find also.

Colin

>
>
> so right now, i want to find out
>
> 1) can i tell sqlite3 not to write to the db every time i add a record
> (a row), but to do it all in memory and then finally, write to the table
> once at the end.
>
> 2) use memCache to do it.
>
> 3) some one at stackoverflow.com suggested using AR:Extensions
>
>
http://stackoverflow.com/questions/843524/updating-the-db-6000-times-will-take-few-minutes
>
> but i just want to use the most basic way to do it... such as by turning
> off the force-write of sqlite3
>
> --
> Posted via http://www.ruby-forum.com/.
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Gary Doades

2009-May-10 08:29 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> Colin Law wrote:
>> Have you made the suggested change to count the words in the document 
>> first
>> then update the db, so that you are not updating each record in the 
>> database
>> 
> yes, so right now i first tally up the count by a Hash, and then at the 
> end, write all the data to the table.  it is strange that writing all 
> this data take a minute or so...  because the final db is only 2.5MB, 
> and that if i write all the data to a flat file, i think it is done in 2 
> seconds.
> 
> so right now, i want to find out
> 
> 1) can i tell sqlite3 not to write to the db every time i add a record 
> (a row), but to do it all in memory and then finally, write to the table 
> once at the end.
> 
> 2) use memCache to do it.
> 
> 3) some one at stackoverflow.com suggested using AR:Extensions
>   
http://stackoverflow.com/questions/843524/updating-the-db-6000-times-will-take-few-minutes
> 
> but i just want to use the most basic way to do it... such as by turning 
> off the force-write of sqlite3
> 
If you have made the change to count up words first and then *insert* 
all the (word,count) records into the database in a *single* transaction 
then it ought to take less than a second. I would expect that the total 
number of (word,count) records is in the order or hundreds or perhaps a 
thousand or so? Any decent DB ought to insert that in under a second as 
a *single* transaction.

If it is still taking minutes then you are probably not doing the above 
somehow. I think in that case you need to post your code again so we can 
see what it is doing now.

Cheers,
Gary.

Jian Lin

2009-May-10 09:01 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:
> If you have made the change to count up words first and then *insert*
> all the (word,count) records into the database in a *single* transaction
> then it ought to take less than a second. I would expect that the total
> number of (word,count) records is in the order or hundreds or perhaps a
> thousand or so? Any decent DB ought to insert that in under a second as
> a *single* transaction.
> 
> If it is still taking minutes then you are probably not doing the above
> somehow. I think in that case you need to post your code again so we can
> see what it is doing now.

i was doing it like this:


all_phrases = frequencies.keys
Phrase.transaction do
  all_phrases.each do |phrase|
  recordPhrase = Phrase.new(:s => phrase, :frequency => 
frequencies[phrase], :length => lengths[phrase])
  recordPhrase.save
  end
end


i am using "Phrase" instead of "Word"... but it is the same
thing...

all_phrases.length is about 34000 for all data...

it would run for at least a minute... that''s kind of weird...

-- 
Posted via http://www.ruby-forum.com/.

Jian Lin

2009-May-10 09:02 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:
> i was doing it like this:
> 
> 
> all_phrases = frequencies.keys
> Phrase.transaction do
>   all_phrases.each do |phrase|
>   recordPhrase = Phrase.new(:s => phrase, :frequency => 
> frequencies[phrase], :length => lengths[phrase])
>   recordPhrase.save
>   end
> end
by the way, the table has indexes on all fields: s, frequency, and 
length.

-- 
Posted via http://www.ruby-forum.com/.

Gary Doades

2009-May-10 09:13 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> Gary Doades wrote:
> 
>> If you have made the change to count up words first and then *insert*
>> all the (word,count) records into the database in a *single*
transaction
>> then it ought to take less than a second. I would expect that the total
>> number of (word,count) records is in the order or hundreds or perhaps a
>> thousand or so? Any decent DB ought to insert that in under a second as
>> a *single* transaction.
>>
>> If it is still taking minutes then you are probably not doing the above
>> somehow. I think in that case you need to post your code again so we
can
>> see what it is doing now.
> 
> 
> i was doing it like this:
> 
> 
> all_phrases = frequencies.keys
> Phrase.transaction do
>   all_phrases.each do |phrase|
>   recordPhrase = Phrase.new(:s => phrase, :frequency => 
> frequencies[phrase], :length => lengths[phrase])
>   recordPhrase.save
>   end
> end
> 
> 
> i am using "Phrase" instead of "Word"... but it is the
same thing...
> 
> all_phrases.length is about 34000 for all data...
I''m not sure how you get 34000 records from 6000 words. Surely you 
should get less not more.
> it would run for at least a minute... that''s kind of weird...
> 
Are you sure it is just the above code that is taking a minute and not 
the bit of code that counts the words?

What hardware is this on exactly?

Cheers,
Gary.

Gary Doades

2009-May-10 09:14 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> Jian Lin wrote:
> 
>> i was doing it like this:
>>
>>
>> all_phrases = frequencies.keys
>> Phrase.transaction do
>>   all_phrases.each do |phrase|
>>   recordPhrase = Phrase.new(:s => phrase, :frequency => 
>> frequencies[phrase], :length => lengths[phrase])
>>   recordPhrase.save
>>   end
>> end
> 
> by the way, the table has indexes on all fields: s, frequency, and 
> length.
> 
That''ll slow down inserts/updates, but not by a huge amount on such a 
small table.

Jian Lin

2009-May-10 09:41 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:
> 
> I''m not sure how you get 34000 records from 6000 words. Surely you
> should get less not more.
> 
>> it would run for at least a minute... that''s kind of weird...
>> 
> 
> Are you sure it is just the above code that is taking a minute and not
> the bit of code that counts the words?
> 
> What hardware is this on exactly?

it is 34000 records because it is actually count up phrases instead of 
words...  for example, "a quick brown fox jumps over the lazy dog"  i 
will actually count phrases such as

"a quick"
"a quick brown"
"a quick brown fox"

etc...  so the final result is 34000 entry hashes, mapping to 
frequencies and word counts.

the hardware should be pretty good... it is the HP TouchSmart IQ804... 
with a Core 2 Duo and hard drive ST3320820AS which is 7200rpm, 8MB 
buffer.

hm... makes me wonder if the db is just 4MB, how come the hard drive 
buffer 8MB didn''t totally handled it in its RAM and be super fast.  did
it actually force write to the physical disc?

-- 
Posted via http://www.ruby-forum.com/.

Michael Schuerig

2009-May-10 09:44 UTC

head link

Re: updating the db 6000 times will take few minutes ?

On Sunday 10 May 2009, Jian Lin wrote:> i was doing it like this:
>
>
> all_phrases = frequencies.keys
> Phrase.transaction do
>   all_phrases.each do |phrase|
>   recordPhrase = Phrase.new(:s => phrase, :frequency =>
> frequencies[phrase], :length => lengths[phrase])
>   recordPhrase.save
>   end
> end
Consider

Phrase.transaction do
  frequencies.each do |phrase, freq|
    Phrase.create!(:s => phrase, :frequency => freq)
  end
end

Hash#each passes keys and values to your block and avoids unnecessary 
lookups.

My snippet above doesn''t take into account your :length => 
lengths[phrase] and that is as it should be. Presumably, lengths[phrase] 
== phrase.length. Then

class Phrase < ActiveRecord::Base
  attr_protected :length
  ...
  def s=(value)
    self.length = value.length
  end
end

would be much cleaner code because it puts responsibility for setting 
the length attribute where it belongs.

Michael

-- 
Michael Schuerig
mailto:michael-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org
http://www.schuerig.de/michael/

Jian Lin

2009-May-10 10:00 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Michael Schuerig wrote:> On Sunday 10 May 2009, Jian Lin wrote:
>> end
> Consider
> 
> Phrase.transaction do
>   frequencies.each do |phrase, freq|
>     Phrase.create!(:s => phrase, :frequency => freq)
>   end
> end
> 
> Hash#each passes keys and values to your block and avoids unnecessary
> lookups.
> 
> My snippet above doesn''t take into account your :length =>
> lengths[phrase] and that is as it should be. Presumably, lengths[phrase]
> == phrase.length. Then
> 
> class Phrase < ActiveRecord::Base
>   attr_protected :length
>   ...
>   def s=(value)
>     self.length = value.length
>   end
> end
> 
> would be much cleaner code because it puts responsibility for setting
> the length attribute where it belongs.

i changed it to

  time_start = Time.now
  Phrase.transaction do
    all_phrases.each do |phrase|
    recordPhrase = Phrase.create!(:s => phrase, :frequency => 
frequencies[phrase], :length => lengths[phrase])
    end
  end
  p "took ", Time.now - time_start, " seconds to finish"

but it is still the same: it took 75 seconds...

i wanted the length, which is the count of word because in the future i 
might want to do query such as  "select * where length > 3" and so
if i
count the word by getting "s" first, then it will be really slow
won''t
it?  if length is stored and indexed, then "length > 3" can be
super
fast?
-- 
Posted via http://www.ruby-forum.com/.

Gary Doades

2009-May-10 10:15 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> i changed it to
> 
>   time_start = Time.now
>   Phrase.transaction do
>     all_phrases.each do |phrase|
>     recordPhrase = Phrase.create!(:s => phrase, :frequency => 
> frequencies[phrase], :length => lengths[phrase])
>     end
>   end
>   p "took ", Time.now - time_start, " seconds to
finish"
> 
> but it is still the same: it took 75 seconds...
That''s because it''s still really the same code!
> i wanted the length, which is the count of word because in the future i 
> might want to do query such as  "select * where length > 3"
and so if i
> count the word by getting "s" first, then it will be really slow
won''t
> it?  if length is stored and indexed, then "length > 3" can be
super
> fast?
Just because it has an index on it doesn''t automatically make it super 
fast. It depends largely on the selectivity of the length field. I 
suspect that most of your phrases are larger than 3 characters and so 
such a query will result in a full table scan anyway. In that case 
making the table smaller by leaving out the length may actually make it 
faster. However, selecting all the phrases where length < 5 would almost 
certainly use the index and make it faster. Only you know what you are 
likely to do in general here so you need to decide (and test) whether it 
is better to have the index and length column or not.

You can always do "select * from phrases where length(s) > 3" or 
something like.

Are you sure your overall run time is not limited by CPU rather than IO? 
How much CPU time is used to run your code?

Cheers,
Gary.

Jian Lin

2009-May-10 10:30 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:> Jian Lin wrote:
>> 
>> but it is still the same: it took 75 seconds...
> 
> That''s because it''s still really the same code!
> 
>> i wanted the length, which is the count of word because in the future i
>> might want to do query such as  "select * where length >
3" and so if i
>> count the word by getting "s" first, then it will be really
slow won''t
>> it?  if length is stored and indexed, then "length > 3"
can be super
>> fast?
> 
> Just because it has an index on it doesn''t automatically make it
super
> fast. It depends largely on the selectivity of the length field. I
> suspect that most of your phrases are larger than 3 characters and so
> such a query will result in a full table scan anyway. In that case
> making the table smaller by leaving out the length may actually make it
> faster. However, selecting all the phrases where length < 5 would almost
> certainly use the index and make it faster. Only you know what you are
> likely to do in general here so you need to decide (and test) whether it
> is better to have the index and length column or not.
> 
> You can always do "select * from phrases where length(s) > 3"
or
> something like.
> 
> Are you sure your overall run time is not limited by CPU rather than IO?
> How much CPU time is used to run your code?

oh sorry i should have clarified... the length is actually the count of 
words... so length > 3 means 3 words at least.  or i might do a query 
that is word > 5 or 7... so will having an column and an index make it 
fast if i do query such as length > 7?  i intend for example to get back 
only 5% or 3% of all records so the index might make it fast.

by the way, the 75 second is the time from the moment the

Phrase.transaction do

to the end of it...  so... i think during that time it is mostly IO 
time...

let me actually write up a test case with some dummy data to simulate it 
so that every one is on the same ground...
-- 
Posted via http://www.ruby-forum.com/.

Jian Lin

2009-May-10 10:55 UTC

head link

Re: updating the db 6000 times will take few minutes ?

ok, the following code with 6000 records insert will take 13.3 seconds 
to finish (just for the db portion).  if i change 6000 to 30000 then it 
will take 67 seconds.

it is being run using

ruby script/runner foo.rb

------------------------------------------------------------
code:


srand(123)

frequencies = {}
lengths = {}

6000.times do
  phrase = (65+rand(26)).chr + "#{rand(100_000_000)}";
  frequencies[phrase] = rand(20);
  lengths[phrase] = rand(10);
end

#p frequencies

all_phrases = frequencies.keys

p "Starting inserting records"
time_start = Time.now

Phrase.transaction do
  all_phrases.each do |phrase|
  recordPhrase = Phrase.create(:s => phrase, :frequency => 
frequencies[phrase], :length => lengths[phrase])
  end
end

p "took #{Time.now - time_start} seconds to finish"
p Phrase.count

------------------------------------------------------------

the scheme.rb for the table is


  create_table "phrases", :force => true do |t|
    t.string   "s"
    t.integer  "frequency"
    t.integer  "length"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  add_index "phrases", ["frequency"], :name => 
"index_phrases_on_frequency"
  add_index "phrases", ["length"], :name =>
"index_phrases_on_length"
  add_index "phrases", ["s"], :name =>
"index_phrases_on_s"
-- 
Posted via http://www.ruby-forum.com/.

Gary Doades

2009-May-10 11:27 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> 
> oh sorry i should have clarified... the length is actually the count of 
> words... so length > 3 means 3 words at least.  or i might do a query 
> that is word > 5 or 7... so will having an column and an index make it 
> fast if i do query such as length > 7?  i intend for example to get back
> only 5% or 3% of all records so the index might make it fast.
Ah, yes, that makes it different :)
> by the way, the 75 second is the time from the moment the
> 
> Phrase.transaction do
> 
> to the end of it...  so... i think during that time it is mostly IO 
> time...
OK, just need to look at the CPU time used then.

Is this Windows or Linux or something else?
> let me actually write up a test case with some dummy data to simulate it 
> so that every one is on the same ground...
That would be the best of all. I can run your code with your sample data 
against MySQL and Postgres to give you some ideas. I don''t really know 
much about sqlite, but I must admit I''m curious as to where the time
has
gone in such an apparently simple situation.

Don''t forget to include the database definition and data as well as the
code.

Cheers,
Gary.

Gary Doades

2009-May-10 11:55 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> ok, the following code with 6000 records insert will take 13.3 seconds 
> to finish (just for the db portion).  if i change 6000 to 30000 then it 
> will take 67 seconds.
> 
OK, I created the table and the index and ran your code for 30000 
records, but I wrapped the database part in a Benchmark.measure{}

Using MySQL:

For 30000 inserts with no indexes:

"Starting inserting records"
  31.996000   0.639000  32.635000 ( 35.356000)
30000

For 30000 inserts with all indexes:

"Starting inserting records"
  32.795000   0.982000  33.777000 ( 37.103000)
30000

That''s 33 seconds of CPU time with 37 seconds elapsed on a quad core
2.4
GHz with a Seagate Barracuda SATA drive with 32MB cache.

As you can see, it''s pretty much all in CPU time!!!!

The result was essentially the same in Postgres and MS SQL server!

So you can forget about the database itself. None of the database 
engines were unduly taxed by the test.

Just for fun I changed the program to output the data as SQL INSERT 
statements and then run that (with 30000 inserts wrapped in a 
transaction) against MySQL.

Imported in 1.2 seconds!!

I Don''t know if it is the hash lookup code or ActiveRecord that is 
gobbling up the time, but it certainly isn''t the database.

You''ll need to tinker with, or better profile your code to find out
what
is  sucking up the time.

Cheers,
Gary.

Jian Lin

2009-May-10 12:27 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:> Jian Lin wrote:
>> ok, the following code with 6000 records insert will take 13.3 seconds 
>> to finish (just for the db portion).  if i change 6000 to 30000 then it
>> will take 67 seconds.
>> 
> 
> OK, I created the table and the index and ran your code for 30000
> records, but I wrapped the database part in a Benchmark.measure{}
> 
> Using MySQL:
> 
> For 30000 inserts with no indexes:
> 
> "Starting inserting records"
>   31.996000   0.639000  32.635000 ( 35.356000)
> 30000
> 
> For 30000 inserts with all indexes:
> 
> "Starting inserting records"
>   32.795000   0.982000  33.777000 ( 37.103000)
> 30000
> 
> That''s 33 seconds of CPU time with 37 seconds elapsed on a quad
core 2.4
> GHz with a Seagate Barracuda SATA drive with 32MB cache.
> 
> As you can see, it''s pretty much all in CPU time!!!!
> 
> 
> The result was essentially the same in Postgres and MS SQL server!
> 
> So you can forget about the database itself. None of the database
> engines were unduly taxed by the test.
> 
> Just for fun I changed the program to output the data as SQL INSERT
> statements and then run that (with 30000 inserts wrapped in a
> transaction) against MySQL.
> 
> Imported in 1.2 seconds!!
> 
> I Don''t know if it is the hash lookup code or ActiveRecord that is
> gobbling up the time, but it certainly isn''t the database.
> 
> You''ll need to tinker with, or better profile your code to find
out what
> is  sucking up the time.

yeah that''s what i was going to say...  the line

Phrase.transaction do


didn''t cause any "transaction" statement to show up in the 
development.log

so is it suppose to have begin a transaction?

so the 1.2 second result you have, is by collecting all those INSERT 
statements and then wrap them in a begin transaction and commit and it 
is 1.2 seconds...

I wonder then, can''t this be achieved in ActiveRecord?

-- 
Posted via http://www.ruby-forum.com/.

Jian Lin

2009-May-10 12:36 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:> 
> "Starting inserting records"
>   31.996000   0.639000  32.635000 ( 35.356000)
> 30000
> 
> For 30000 inserts with all indexes:
> 
> "Starting inserting records"
>   32.795000   0.982000  33.777000 ( 37.103000)
> 30000
>
> I Don''t know if it is the hash lookup code or ActiveRecord that is
> gobbling up the time, but it certainly isn''t the database.

by the way... interesting to find out the CPU time is so much... i 
thought this code is I/O bound at first...

by the way I am using Rails 2.3.2, Ruby 1.8.6 patchlevel 287... on 
Windows 7.

hm... the Hash look up time should be really small... for example, if i 
just write all data to a text file, it should be really fast...  it 
might be ActiveRecord, although I though ActiveRecord merely translate 
the method into a SQL statement and so shouldn''t be so CPU intensive.

-- 
Posted via http://www.ruby-forum.com/.

Gary Doades

2009-May-10 12:56 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:> Gary Doades wrote:
>> "Starting inserting records"
>>   31.996000   0.639000  32.635000 ( 35.356000)
>> 30000
>>
>> For 30000 inserts with all indexes:
>>
>> "Starting inserting records"
>>   32.795000   0.982000  33.777000 ( 37.103000)
>> 30000
>>
>> I Don''t know if it is the hash lookup code or ActiveRecord
that is
>> gobbling up the time, but it certainly isn''t the database.
> 
> 
> by the way... interesting to find out the CPU time is so much... i 
> thought this code is I/O bound at first...
> 
> by the way I am using Rails 2.3.2, Ruby 1.8.6 patchlevel 287... on 
> Windows 7.
> 
> hm... the Hash look up time should be really small... for example, if i 
> just write all data to a text file, it should be really fast...  it 
> might be ActiveRecord, although I though ActiveRecord merely translate 
> the method into a SQL statement and so shouldn''t be so CPU
intensive.
> 
Aha.... It looks like ActiveRecord has an enormous overhead in 
creating/saving records.

If you change the inserts to this....

puts Benchmark.measure {
Phrase.transaction do
   all_phrases.each do |phrase|
     Phrase.connection.execute("insert into phrases(s,frequency,length) 
values(''#{phrase}'',#{frequencies[phrase]},#{lengths[phrase]})")
   end
end
}

you get this:

"Starting inserting records"
   1.123000   0.686000   1.809000 (  5.096000)
30000

Which is exactly what you want I think :)

This falls down of course if the phrases contain single quote characters 
or are from an untrusted source. You will need to at least escape quotes 
before putting them in the insert statement. This will slow it down a 
bit, but not as much as before I would think.

Note to self: Don''t attempt to do lots of records creations with 
standard AR code!

Cheers,
Gary.

Jian Lin

2009-May-10 13:31 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Gary Doades wrote:
> Aha.... It looks like ActiveRecord has an enormous overhead in
> creating/saving records.
> 
> If you change the inserts to this....
> 
> puts Benchmark.measure {
> Phrase.transaction do
>    all_phrases.each do |phrase|
>      Phrase.connection.execute("insert into
phrases(s,frequency,length)
>
values(''#{phrase}'',#{frequencies[phrase]},#{lengths[phrase]})")
>    end
> end
> }
> 
> you get this:
> 
> "Starting inserting records"
>    1.123000   0.686000   1.809000 (  5.096000)
> 30000
> 
> Which is exactly what you want I think :)

i was googling for "ActiveRecord transaction" and got this page

http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html

where it says:

  Save and destroy are automatically wrapped in a transaction

so i think create is also invoking a save... do you think so?  if it is 
then it is making it a transaction.  that''s why it is so slow... it 
cannot be lots of record creations in a single transaction, or maybe 
there is a method and we don''t know yet  (maybe a method is using 
ar:extensions, or why not have a standard activerecord mechanism to do 
hugh updates/inserts, i wonder)

-- 
Posted via http://www.ruby-forum.com/.

Frederick Cheung

2009-May-10 14:26 UTC

head link

Re: updating the db 6000 times will take few minutes ?

On May 10, 10:41 am, Jian Lin
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
wrote:> Gary Doades wrote:
> etc...  so the final result is 34000 entry hashes, mapping to
> frequencies and word counts.
>
> the hardware should be pretty good... it is the HP TouchSmart IQ804...
> with a Core 2 Duo and hard drive ST3320820AS which is 7200rpm, 8MB
> buffer.
>
> hm... makes me wonder if the db is just 4MB, how come the hard drive
> buffer 8MB didn''t totally handled it in its RAM and be super fast.
 did
> it actually force write to the physical disc?
It''s never that simple (and yes under the appropriate circumstances
the drive is supposed to flush to the actual disk). For example
there''s a constant amount of overhead with each query: network latency
etc. not a huge amount (probably less than a millisecond), but
multiply that by 34000 and it will add up. If you really need to
manipulate this much data you''d be well advised to do the inserts in
bulk.

Fred>
> --
> Posted viahttp://www.ruby-forum.com/.

Jian Lin

2009-May-10 15:31 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Jian Lin wrote:>  Save and destroy are automatically wrapped in a transaction
>
> so i think create is also invoking a save... 

i just read from Learning Rails the book (p.50 if remembered correctly) 
that

create is 3 operations in one: it has a new, an assignment of values, 
and a save, so create involves a save, and therefore is in a transaction 
by itself.

-- 
Posted via http://www.ruby-forum.com/.

Jian Lin

2009-May-10 15:36 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Frederick Cheung wrote:> On May 10, 10:41�am, Jian Lin
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
> wrote:
>> it actually force write to the physical disc?
> It''s never that simple (and yes under the appropriate
circumstances
> the drive is supposed to flush to the actual disk). For example
> there''s a constant amount of overhead with each query: network
latency
> etc. not a huge amount (probably less than a millisecond), but
> multiply that by 34000 and it will add up. If you really need to
> manipulate this much data you''d be well advised to do the inserts
in
> bulk.
so how to do inserts in bulk?  by using

Phrase.connection.execute("insert into phrases(s,frequency,length)
 values(''#{phrase}'',#{frequencies[phrase]},#{lengths[phrase]})")

?  or by ar-extensions?  can ActiveRecord have a mode for saving without 
being in a transaction, or can ActiveRecord has some standard method of 
doing bulk inserts?

-- 
Posted via http://www.ruby-forum.com/.

Frederick Cheung

2009-May-10 17:07 UTC

head link

Re: updating the db 6000 times will take few minutes ?

On May 10, 4:36 pm, Jian Lin
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
wrote:> Frederick Cheung wrote:
> > On May 10, 10:41 am, Jian Lin
<rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org>
> > wrote:
> >> it actually force write to the physical disc?
> > It''s never that simple (and yes under the appropriate
circumstances
> > the drive is supposed to flush to the actual disk). For example
> > there''s a constant amount of overhead with each query:
network latency
> > etc. not a huge amount (probably less than a millisecond), but
> > multiply that by 34000 and it will add up. If you really need to
> > manipulate this much data you''d be well advised to do the
inserts in
> > bulk.
>
> so how to do inserts in bulk?  by using
>
> Phrase.connection.execute("insert into phrases(s,frequency,length)
>
 values(''#{phrase}'',#{frequencies[phrase]},#{lengths[phrase]})")
>basically. you can insert more than one tuple like that (although
perhaps not with all databases)
> ?  or by ar-extensions?  can ActiveRecord have a mode for saving without
> being in a transaction, or can ActiveRecord has some standard method of
> doing bulk inserts?
ar-extensions provides bulk inserts if the db supports it.

Fred>
> --
> Posted viahttp://www.ruby-forum.com/.

Michael Schuerig

2009-May-10 17:21 UTC

head link

Re: updating the db 6000 times will take few minutes ?

On Sunday 10 May 2009, Jian Lin wrote:> so how to do inserts in bulk?  by using
>
> Phrase.connection.execute("insert into phrases(s,frequency,length)
> 
values(''#{phrase}'',#{frequencies[phrase]},#{lengths[phrase]})")
You might be using the wrong tool and you might be reinventing existing 
solutions. Did you have a look at existing text mining tools/frameworks 
in Ruby as well as other languages?

Michael

-- 
Michael Schuerig
mailto:michael-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org
http://www.schuerig.de/michael/

Jian Lin

2009-May-11 00:16 UTC

head link

Re: updating the db 6000 times will take few minutes ?

Michael Schuerig wrote:> On Sunday 10 May 2009, Jian Lin wrote:
>> so how to do inserts in bulk?  by using
>>
>> Phrase.connection.execute("insert into phrases(s,frequency,length)
>> 
values(''#{phrase}'',#{frequencies[phrase]},#{lengths[phrase]})")
> 
> You might be using the wrong tool and you might be reinventing existing
> solutions. Did you have a look at existing text mining tools/frameworks
> in Ruby as well as other languages?
i want to do this project and also at the same time let me have a chance 
to learn to use the ActiveRecord...

perhaps i can use some kind of "traverse" tool to walk the internet 
too...

something that can start at a page and then walk down all page by a 
level of 1 or 2 or any number.

it won''t be so hard to write except sometimes the "baseurl"
is used and
it is more processing than using absolute url.

-- 
Posted via http://www.ruby-forum.com/.

Seemingly Similar Threads

Search for more maybe matching threads

Rails - May 2009 - updating the db 6000 times will take few minutes ?

updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Re: updating the db 6000 times will take few minutes ?

Seemingly Similar Threads