Hey Folks,
Im looking for some suggestions here. Each day I need to process 200 GZ
compressed files each day that contain comma delimited information in
the following format:
HostName, FileName, DirName, Modified_On
Each file contains upwards of 200K rows and works out to be about 20MB when
uncompressed. I need to compare this
information to the information recieved the day before to look for
files that have changed.
My current plan was to:
##
do read file and uncompress line
do import line into Mysql using activerecord
end
end
run several comparison queries to find changes save changes to a table
automatically review changes based on rules, and those that are
leftover are unauthorized changes and require research by the network admins.
##
I''ve written everything, and its working. However, the MYSQL Import is
SLOOOOW.
Its taking 10 minutes per file.
Extrapolating this, it will take 2000 minutes, or 33 hours each day to
do just the import. Unfortunately, earth days only have 24 hours.
So, I need some way to compare todays file, to yesterdays and see
changes. I really want akk the data to end up in MySQL or SQL server, so
I''m
hesitant to use any solution that requires direct comparios on text files.
Ideally, I''d like a way to speed up the import. I saw another post
talking
about how ActiveRecord defaults to using transactions for insert in a block, so
the only thing ive tried at this point to speed things up is to begin the
transaction outside of the block and end it upon compleetion. This
didn''t seem
to help much...
Thanks :)
Brian