Richard Schneeman wrote:> Although i''m using validations in my models, i''ve managed
to get a few
> duplicate entries on my site. I''ve got roughly 300,000 entries,
and my
> administration only allows long running rake tasks of up to 5 hours. I
> need to remove all entries that have the same :word and :language.
>
> Is there an easy way to iterate through all entries, check to see if any
> other entries have the same :word and :language, and delete those from
> the DB?
Long term, I recommend adding a UNIQUE INDEX constraint to the database
table (preferably via a migration) in addition to a
validates_uniqueness_of in the model. The ActiveRecord validations are
susceptible to race conditions, but the database constraint should not
be.
Before you can add this index you''ll need to remove the duplicates.
There are a couple ways to do this, and the best choice depends on how
many dupes you have within the entire dataset. In most cases there are
relatively few dupes (say less than 1000 in your 300K), and here''s my
preference (performed in a rake task or a migration, or manually using
the SQL tool of your choice, whichever way fits your workflow best):
1. Fetch the set of distinct tuples that are duplicated (word + language
in your case). Something like this:
select word, language from entries group by word, language having
count(*) > 1
2. Once you know what your duplicates are, iterate over them and delete
all but one of the records:
dupes.each do |dupe|
# this algorithm retains only the most recently added dupe, modify
as needed
# make sure you have an index on (word, language), not yet unique
entries = Entry.all(
:conditions => {
:word => dupe.word,
:language => dupe.language},
:order => ''id desc'')
entries.each_with_index do |e, index|
e.destroy if index > 0
end
end
3. Re-run the query and confirm that you don''t have any lingering
duplicates.
4. Add the UNIQUE INDEX constraint to prevent more dupes from being
created. Make sure you have a validates_uniqueness_of in the model too.
Hope that helps!
Jeremy
http://jeronrails.blogspot.com
--
Posted via http://www.ruby-forum.com/.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---