I''ve been trying to use Rails on a database where the biggest datawarehouse-style table doesn''t have a primary key. It really doesn''t make sense and just adds extra unnecessary space. I do have, however, a column that I normally use to search on (IP) that returns many rows. The table also references several other tables through rails-friendly foreign keys. Simple AR works just fine but I''ve been running into trouble with more advanced stuff and I''m not sure how to get around it. My biggest problem is that the :include directive to find (to enable eager associations) fails because it assumes that there is an ID field present and tries to select it in the join. Similarly, the paginate helper attempts to SELECT COUNT(DISTINCT mytable.id) when it is trying to get the number of results of the query and bombs. I have read all the legacy docs that I can find, but none seem to address the issue of a non-existant primary key. Is there anything that I missed? Is there any way to fix the above problems? thanks -Worky
Ask if you can add a primary key to the table. You''ll feel better when your table has one. Go on. -- Posted via http://www.ruby-forum.com/.
Dr Nic wrote:> Ask if you can add a primary key to the table. You''ll feel better when > your table has one. Go on.I''m assuming that you are joking ... I''ve had a similar problem with trying to put a rails frontend onto a very large table. Perhaps we are storing the same data ... flow records? For large tables like these (mine is 200GB) its not logical nor practical to store a primary key, especially not a meaningless integer one. Accesses never return a single row and aggregates are often used. Which is sometimes frustrating with Rails since that''s not quite the environment it was designed for. I''ve hacked around the problem by using find_by_sql a lot more than I would like (i.e. almost everywhere), but that is not really the rails way. If someone has better solutions I''d love to hear them as well. -- Posted via http://www.ruby-forum.com/.
Mike wrote:> Dr Nic wrote: >> Ask if you can add a primary key to the table. You''ll feel better when >> your table has one. Go on. > > I''m assuming that you are joking ...I doubt he is.> I''ve hacked around the problem by using find_by_sql a lot more than I > would like (i.e. almost everywhere), but that is not really the rails > way. If someone has better solutions I''d love to hear them as well.The Rails way is to have some way to uniquely identify each row in the database. This is so that each object (or instance, actually) can be mapped to a unique row. Without any unique-ness, you''re not using ActiveRecord as the Object-Relational Mapper that it is. You''ll likely find this similar situation no matter what ORM tool you use. Mapping objects to rows in a database works best by far when you can guarantee uniqueness of each row. Given your copious use of find_by_sql, you may be better off just using the database connector for your database platform. I''m not saying what you need can''t be done in AR, just that you may save yourself some serious grief if you skip AR. -- Posted via http://www.ruby-forum.com/.
On Jul 27, 2006, at 12:58 , Kevin Williams wrote:> The Rails way is to have some way to uniquely identify each row in the > database. This is so that each object (or instance, actually) can be > mapped to a unique row.Just to be clear, having a unique integer column (such as that provided by MySQL''s AUTOINCREMENT or Postgres'' SERIAL columns) is only one way to guarantee row uniqueness. The caveat is that this is the way required by stock Rails to uniquely identify rows. Another way is to enforce uniqueness on a single column of some other type, or use a composite key to guarantee the uniqueness of some combination of columns. Michael Glaesemann grzm seespotcode net
> Just to be clear, having a unique integer column (such as that > provided by MySQL''s AUTOINCREMENT or Postgres'' SERIAL columns) is > only one way to guarantee row uniqueness. The caveat is that this is > the way required by stock Rails to uniquely identify rows. Another > way is to enforce uniqueness on a single column of some other type, > or use a composite key to guarantee the uniqueness of some > combination of columns.Yeah, I guess I should have been a bit clearer. I don''t have an integer, single-column primary key, however the rows are unique over multiple columns. Know of any way to "solve" this without a DB view?
On Jul 27, 2006, at 18:15 , Michael Artz wrote:> Yeah, I guess I should have been a bit clearer. I don''t have an > integer, single-column primary key, however the rows are unique over > multiple columns. Know of any way to "solve" this without a DB view?I''ve heard someone has recently developed a composite key plugin or patch. I don''t think it''s in Rails edge or even in the patch queue, but some googling should help you find it. Haven''t used it myself. Good luck! Michael Glaesemann grzm seespotcode net
http://wiki.rubyonrails.org/rails/pages/Uses+Guid+Plugin On 7/27/06, Michael Glaesemann <grzm@seespotcode.net> wrote:> > On Jul 27, 2006, at 18:15 , Michael Artz wrote: > > > Yeah, I guess I should have been a bit clearer. I don''t have an > > integer, single-column primary key, however the rows are unique over > > multiple columns. Know of any way to "solve" this without a DB view? > > I''ve heard someone has recently developed a composite key plugin or > patch. I don''t think it''s in Rails edge or even in the patch queue, > but some googling should help you find it. Haven''t used it myself. > Good luck! > > Michael Glaesemann > grzm seespotcode net > > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- rm -rf / 2>/dev/null - http://null.in "Things do not happen. Things are made to happen." - JFK
On Jul 27, 2006, at 20:03 , Pratik wrote:> http://wiki.rubyonrails.org/rails/pages/Uses+Guid+PluginWhile the above looks interesting, it isn''t support for composite keys. A quick google turned up: http://compositekeys.rubyforge.org/ Again, I haven''t used it, but you may find it helpful. Michael Glaesemann grzm seespotcode net
> http://compositekeys.rubyforge.org/Yep, and I''m adding composite foreign key support soon too. Then you''ll be able to do fun stuff like: class Foo set_primary_key :id # just for completeness end class Bar set_primary_keys :foo_id, :seq has_many :bars, :foreign_key => [:foo_id, :seq] end class Tar belongs_to :bar, :foreign_key => [:foo_id, :seq] end But I''ll try to make it unnessary to explicitly specify :foreign_key if the foreign key column names match the primary key column names. Cheers Nic -- Posted via http://www.ruby-forum.com/.
> class Bar > set_primary_keys :foo_id, :seq > has_many :bars, :foreign_key => [:foo_id, :seq] > endThat should be: has_many :tars, :foreign_key => [:foo_id, :seq] -- Posted via http://www.ruby-forum.com/.
Yes, I am storing network flow data. This means that I have a *large* table that looks like: src_ip IP src_port INT dest_ip IP dest_port INT start TIMESTAMP end TIMESTAMP bytes INT So there isn''t a defined uniqeness constraint, however there is an implicit one across all the columns (i.e. it would be *very* unlikely to have a duplicate row). I also have a similar problem with network event data, i.e. like IDS events or p0f data. Basically, I have a table that looks like: src_ip IP event TEXT ...other columns... first_seen TIMESTAMP last_seen TIMESTAMP number_seen INT so a composite primary key would span all the columns except for {first,last,number}_seen. I have not defined one, however, as it doesn''t really fit with the access pattern and I use PostgreSQL triggers to merge new data in to "ensure" uniqeness.> http://compositekeys.rubyforge.org/Looks very interesting ... I''ll see if it fixes the "include" problems that I mentioned initially. Like Mike mentioned previously, I never use the find(pkey) method, as I''m always interested in many rows (find_by_src_ip). -- Posted via http://www.ruby-forum.com/.
Worky wrote:> Yes, I am storing network flow data. This means that I have a *large* > table that looks like: > > src_ip IP > src_port INT > dest_ip IP > dest_port INT > start TIMESTAMP > end TIMESTAMP > bytes INT > > So there isn''t a defined uniqeness constraint, however there is an > implicit one across all the columns (i.e. it would be *very* unlikely to > have a duplicate row). > > I also have a similar problem with network event data, i.e. like IDS > events or p0f data. Basically, I have a table that looks like: > > src_ip IP > event TEXT > ...other columns... > first_seen TIMESTAMP > last_seen TIMESTAMP > number_seen INT > > so a composite primary key would span all the columns except for > {first,last,number}_seen. I have not defined one, however, as it > doesn''t really fit with the access pattern and I use PostgreSQL triggers > to merge new data in to "ensure" uniqeness. > >> http://compositekeys.rubyforge.org/ > > Looks very interesting ... I''ll see if it fixes the "include" problems > that I mentioned initially. Like Mike mentioned previously, I never use > the find(pkey) method, as I''m always interested in many rows > (find_by_src_ip).Although the size of the table presents an interesting challenge, this is exactly the case for an otherwise meaningless surrogate key. Indexing on, and selecting by, that one :id column is much more performant than a 6-10 column natural key. Assuming you still don''t want to do surrogate keys, how about the table partitioning feature of MySQL 5.1? Taking the table size out of the equation is the only thing coming to mind. -- Posted via http://www.ruby-forum.com/.
> Although the size of the table presents an interesting challenge, this > is exactly the case for an otherwise meaningless surrogate key. Indexing > on, and selecting by, that one :id column is much more performant than a > 6-10 column natural key.I think the idea is that no access to the table will result in a single row being returned so there is no reason, practical or logical, to have a key on this table. In my case, most queries use some form of IP and port and generate a series of records. And (in my case, not sure about OP) we use Postgres'' table partitioning, which works great. The issue isn''t necessarily table size for query performance, but just table size in general. I have similar (legacy) projects generating 200-300GB per day of traffic, so every bit that we can save per record results in a fairly large savings overall. Rails has been a quick way to put a face on those databases and it works great for simple models or those that conform to Rails conventions. That composite key plugin looks mighty useful, however.
Michael Artz wrote:>> Although the size of the table presents an interesting challenge, this >> is exactly the case for an otherwise meaningless surrogate key. Indexing >> on, and selecting by, that one :id column is much more performant than a >> 6-10 column natural key. > > I think the idea is that no access to the table will result in a > single row being returned so there is no reason, practical or logical, > to have a key on this table.I guess it depends on your usage of this data. If it''s reporting, fine. If you have to update row #1907135067, having a unique ID is what works best with an ORM solution.> ..... I have > similar (legacy) projects generating 200-300GB per day of traffic, so > every bit that we can save per record results in a fairly large > savings overall.What is your memory footprint when loading all these records? In all the ORM projects I''ve been on, memory usage became an issue when the tables were large. This is because a unique object is created for each row, and if you load 2 million rows you have 2 million objects in memory. This is another case where ORM is not the best way to go.> > Rails has been a quick way to put a face on those databases and it > works great for simple models or those that conform to Rails > conventions. That composite key plugin looks mighty useful, however.I''d love to find out your experience with that plugin, as legacy/enterprise integration is key to being able to use Rails at work for many Java/DotNet developers. -- Posted via http://www.ruby-forum.com/.
> > I think the idea is that no access to the table will result in a > > single row being returned so there is no reason, practical or logical, > > to have a key on this table. > > I guess it depends on your usage of this data. If it''s reporting, fine. > If you have to update row #1907135067, having a unique ID is what works > best with an ORM solution.Yep, reporting. My DB is a read-only data warehouse, which I guess would be considered an "integration" database from Rails'' perspective.> > ..... I have > > similar (legacy) projects generating 200-300GB per day of traffic, so > > every bit that we can save per record results in a fairly large > > savings overall. > > What is your memory footprint when loading all these records? In all the > ORM projects I''ve been on, memory usage became an issue when the tables > were large. This is because a unique object is created for each row, and > if you load 2 million rows you have 2 million objects in memory. This is > another case where ORM is not the best way to go.I don''t load all the records into the interface ... that would be crazy :) I just need to be able to put a quick query interface onto the database so that an analyst can issue queries and anything that would be too unwieldy for AR is probably too unwieldy for a normal analyst :)> > Rails has been a quick way to put a face on those databases and it > > works great for simple models or those that conform to Rails > > conventions. That composite key plugin looks mighty useful, however. > > I''d love to find out your experience with that plugin, as > legacy/enterprise integration is key to being able to use Rails at work > for many Java/DotNet developers.I''ll keep you posted, once I have a chance to get to it.