Hey folks, am hoping someone can set me straight with a solution to an issue that has come up involving the timing of rendered page elements in relation to the completion of an action''s execution. I have an ror search app which searches through two tables consisting of 7,000 page urls and a table w/ 87,000 related keyphrases within those pages. Search results can take anywhere from 2 seconds to 5-8 seconds to be displayed so I am trying to break the process up so a loading/progress mesg can be displayed using Ajax and the following logic-flow. Search button is pressed search action sets session data to ''loading'' render :update do |page| does a replace_html on a ''status'' div for loading mesg sets ''status'' div to show end search is performed end ( status div contains Ajax calls to keep checking the progress until the search is completed ). The above all works, however, I notice the initial render doesn''t affect the screen until the search has finished, by the time I see the loading message for the first time I also see the search has completed in my Lighttpd console. How might the above be modified so that the rendering takes place where it should in the logic flow but also be visible in the browser, instead of waiting until the full action has completed? I''d be grateful to hear any suggestions Thanks, Andy -- Posted via http://www.ruby-forum.com/.
This search time is too long for tables that are essentially empty. On 1,000,000 rows, with proer indexing, you should see no more than 40 milliseconds on sub-gigahertz desktop systems. The curve is logarithmic, so you should see about 20 milliseconds for these tables "as-is". Can you show the DDL that creates the database and the DML that is retrieving this information? On Fri, 2006-04-28 at 03:06 +0200, Andrew Cowan wrote:> Hey folks, am hoping someone can set me straight with a solution to an > issue that has come up involving the timing of rendered page elements in > relation to the completion of an action''s execution. > > I have an ror search app which searches through two tables consisting of > 7,000 page urls and a table w/ 87,000 related keyphrases within those > pages. Search results can take anywhere from 2 seconds to 5-8 seconds to > be displayed so I am trying to break the process up so a > loading/progress mesg can be displayed using Ajax and the following > logic-flow. > > Search button is pressed > search action sets session data to ''loading'' > render :update do |page| > does a replace_html on a ''status'' div for loading mesg > sets ''status'' div to show > end > search is performed > end > > ( status div contains Ajax calls to keep checking the progress until the > search is completed ). > > The above all works, however, I notice the initial render doesn''t affect > the screen until the search has finished, by the time I see the loading > message for the first time I also see the search has completed in my > Lighttpd console. > > How might the above be modified so that the rendering takes place where > it should in the logic flow but also be visible in the browser, instead > of waiting until the full action has completed? > > I''d be grateful to hear any suggestions > Thanks, > Andy > > >
Sure, here is the info you requested: CREATE TABLE urls ( id int(11) NOT NULL auto_increment, url varchar(250) default NULL, title varchar(250) default NULL, mdesc blob, udate datetime default NULL, phrases blob, rel_phrases blob, PRIMARY KEY (id) ) CREATE TABLE phrases ( id int(11) NOT NULL auto_increment, phrase varchar(250) default NULL, urls blob, rel_urls blob, synonyms blob, auth varchar(250) default NULL, xmlfeed varchar(250) default NULL, date_last_mention varchar(15) default NULL, date_last_external varchar(15) default NULL, date_last_photo varchar(15) default NULL, date_last_rss varchar(15) default NULL, seed_external1 blob, seed_external2 blob, seed_external3 blob, seed_photo1 blob, seed_photo2 blob, seed_photo3 blob, auth_external blob, auth_photo blob, auth_rss blob, PRIMARY KEY (id) ) There are only two sql queries: 1. Phrase.find( :all, :conditions => ["phrase LIKE ?", keyword ] ) 2. Url.find_all An alternative approach I tried was with an inner join table called relations which had 1 url_id, 1 phrase_id and 1 record for each associated url/phrase pair - resulting in a total of 235,000 relations. I used has_many :through in the Url and Phrase models and the performance was horrible, about 4 times as long so I did away with that and am storing the relation info in each of the two tables. I am open to any suggestions how to better deal with this relationship ... -Andy David Johnson wrote:> This search time is too long for tables that are essentially empty. On > 1,000,000 rows, with proer indexing, you should see no more than 40 > milliseconds on sub-gigahertz desktop systems. The curve is > logarithmic, so you should see about 20 milliseconds for these tables > "as-is". > > Can you show the DDL that creates the database and the DML that is > retrieving this information?-- Posted via http://www.ruby-forum.com/.
Like David mentioned, with the right indexes you should be getting good results. This is not a Rails problem. Ensure your phrases.phrase column is indexed since you are searching against it. You need to narrow the scope of your URL search because you are bringing back all 7000 URLs which will take some time especially if those three blobs are on the large size. One thing you may want to think about is moving those blobs out of the table and retrieving them in a lazy manner... IE when you need them. Hope that helps. Carl On 4/27/06, Andrew Cowan <icculus@gmdstudios.com> wrote:> > Sure, here is the info you requested: > > > CREATE TABLE urls ( > id int(11) NOT NULL auto_increment, > url varchar(250) default NULL, > title varchar(250) default NULL, > mdesc blob, > udate datetime default NULL, > phrases blob, > rel_phrases blob, > PRIMARY KEY (id) > ) > > > CREATE TABLE phrases ( > id int(11) NOT NULL auto_increment, > phrase varchar(250) default NULL, > urls blob, > rel_urls blob, > synonyms blob, > auth varchar(250) default NULL, > xmlfeed varchar(250) default NULL, > date_last_mention varchar(15) default NULL, > date_last_external varchar(15) default NULL, > date_last_photo varchar(15) default NULL, > date_last_rss varchar(15) default NULL, > seed_external1 blob, > seed_external2 blob, > seed_external3 blob, > seed_photo1 blob, > seed_photo2 blob, > seed_photo3 blob, > auth_external blob, > auth_photo blob, > auth_rss blob, > PRIMARY KEY (id) > ) > > > There are only two sql queries: > > 1. Phrase.find( :all, :conditions => ["phrase LIKE ?", keyword ] ) > > 2. Url.find_all > > > An alternative approach I tried was with an inner join table called > relations which had 1 url_id, 1 phrase_id and 1 record for each > associated url/phrase pair - resulting in a total of 235,000 relations. > > I used has_many :through in the Url and Phrase models and the > performance was horrible, about 4 times as long so I did away with that > and am storing the relation info in each of the two tables. I am open to > any suggestions how to better deal with this relationship ... > > -Andy > > > > David Johnson wrote: > > This search time is too long for tables that are essentially empty. On > > 1,000,000 rows, with proer indexing, you should see no more than 40 > > milliseconds on sub-gigahertz desktop systems. The curve is > > logarithmic, so you should see about 20 milliseconds for these tables > > "as-is". > > > > Can you show the DDL that creates the database and the DML that is > > retrieving this information? > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
David Johnson
2006-Apr-28 03:07 UTC
[Rails] Re: Q: Timing of render :update in an action ?
1. there are no indexes apart from the primary key, so every query execution must scan the entire table. 2. The use of the "like" predicate would prevent the use of indexes, if they existed 3. The heavy use of blobs are a huge performance drag - blobs are an extra I/O per record (at least) Essentially, you will want a table of phrases indexed on the phrase, with one phrase to a row, a table of URL''s, and an associative table that is indexed by both phrase_id and url_id (a single compound index). There is also a list of synonyms that you have not stated yet, but which belongs in the data model. Don''t be put off by a few million rows. The index is a B+ tree with a cardinality determined by the number of keys per page and the page size less some overheads. After about 1,000 rows performance does not degrade noticably. With an 8k page size, and a 4 byte key, with 4 bytes allowed for the row index, and some other overheads, you can expect about 800 index entries per page. The first 800 rows require 1 I/O to locate in the index and 1 I/O (2 I/O average) to look up in the table. However, you will average a total of 3 I/O''s per search at between 800 and 640000 rows, and 4 I/O''s between 640000 and 512000000 rows (yes, there are real-life examples that get that big and perform well). 1 I/O = 10 milliseconds worst case on current hardware. This does not allow for the performance boost that the RDBMS buffer pool caching will give automatically. This is not quite right, but it should be close: CREATE TABLE urls ( id int(11) NOT NULL auto_increment, url varchar(250) default NULL, title varchar(250) default NULL, mdesc blob, udate datetime default NULL, PRIMARY KEY (id) ); CREATE TABLE phrases ( id int(11) NOT NULL auto_increment, phrase varchar(250) default NULL, auth varchar(250) default NULL, xmlfeed varchar(250) default NULL, date_last_mention varchar(15) default NULL, date_last_external varchar(15) default NULL, date_last_photo varchar(15) default NULL, date_last_rss varchar(15) default NULL, seed_external1 blob, seed_external2 blob, seed_external3 blob, seed_photo1 blob, seed_photo2 blob, seed_photo3 blob, auth_external blob, auth_photo blob, auth_rss blob, PRIMARY KEY (id) ) create unique index phrase_ndx1 on phrases (phrase); create table synonyms ( id int(11) NOT NULL auto_increment PRIMARY KEY, phrase_id int(11) NOT NULL references phrases(ID), synonym_id int(11) NOT NULL references phrases(ID), ) create unique index phrase_synonym on synonyms (phrase_id, synonym_id); create table relations ( id int(11) NOT NULL auto_increment PRIMARY KEY, phrase_id int(11) NOT NULL references phrases(ID), url_id int(11) NOT NULL references urls(ID) ) // This index is useful in some RDBMS, and superfluous in others. // Experiment with yours to see whether the indexes created in the "references" // clauses above work better or worse than this index for your RDBMS. // In some RDBMS'' , this will allow an index only scan, eliminating one I/O from each fetch create unique index phrase_url on relations (phrase_id, url_id); The SQL to retrieve all URL''s for a given phrase would be something along the lines of this (someone else will have to express it in Rails- ese - I know all about making RDBMS'' perform, but my rails is still weak): select URL from urls a, relations b, phrases c where a.id = b.url_id and b.phrase_id = c.id and c.phrase = ? _Here''s a gotcha_: I normally use prepared statements. Rails does not have prepared statements. The people running the show so far seem to think that string substitution provides adequate performance, which is true in a limited context but not all contexts. If Rails had a prepared statement this would have much better performance. Rails'' string substitution model means that the heavy work of building the access path from the SQL must happen on every call to the RDBMS. A prepared statement would allow the access path to be build once (at prepare time) and reused for the duration of the RDBMS connection. So, while the actual data access may be fast with this schema, it is possible that the overheads of the prepare phase may overwhelm the benefits of the data model until/unless we can convince someone that prepared statements are more than an oddity. On Fri, 2006-04-28 at 04:07 +0200, Andrew Cowan wrote:> Sure, here is the info you requested: > > > CREATE TABLE urls ( > id int(11) NOT NULL auto_increment, > url varchar(250) default NULL, > title varchar(250) default NULL, > mdesc blob, > udate datetime default NULL, > phrases blob, > rel_phrases blob, > PRIMARY KEY (id) > ) > > > CREATE TABLE phrases ( > id int(11) NOT NULL auto_increment, > phrase varchar(250) default NULL, > urls blob, > rel_urls blob, > synonyms blob, > auth varchar(250) default NULL, > xmlfeed varchar(250) default NULL, > date_last_mention varchar(15) default NULL, > date_last_external varchar(15) default NULL, > date_last_photo varchar(15) default NULL, > date_last_rss varchar(15) default NULL, > seed_external1 blob, > seed_external2 blob, > seed_external3 blob, > seed_photo1 blob, > seed_photo2 blob, > seed_photo3 blob, > auth_external blob, > auth_photo blob, > auth_rss blob, > PRIMARY KEY (id) > ) > > > There are only two sql queries: > > 1. Phrase.find( :all, :conditions => ["phrase LIKE ?", keyword ] ) > > 2. Url.find_all > > > An alternative approach I tried was with an inner join table called > relations which had 1 url_id, 1 phrase_id and 1 record for each > associated url/phrase pair - resulting in a total of 235,000 relations. > > I used has_many :through in the Url and Phrase models and the > performance was horrible, about 4 times as long so I did away with that > and am storing the relation info in each of the two tables. I am open to > any suggestions how to better deal with this relationship ... > > -Andy > > > > David Johnson wrote: > > This search time is too long for tables that are essentially empty. On > > 1,000,000 rows, with proer indexing, you should see no more than 40 > > milliseconds on sub-gigahertz desktop systems. The curve is > > logarithmic, so you should see about 20 milliseconds for these tables > > "as-is". > > > > Can you show the DDL that creates the database and the DML that is > > retrieving this information? > >
Wow, looks like a major revision to the DB structure is needed, I did not realize the use of table indices is so crucial. One thing I am wondering is if Rails'' has_many :through relation implicitly performs the kind of query you describe with the url/phrase/relation lookup. I am guessing it does and that adding the index (which I did not have with my last version) will make a big difference. I''m also thinking of losing all of the blob data from the phrases table and perhaps moving it to a seperate table since its not needed for the bulk of the search logic. Alot to think about, thank you for taking the time to post this, I think the suggested revisions will likely help the app''s performance quite a bit! :) -Andy David Johnson wrote:> 1. there are no indexes apart from the primary key, so every query > execution must scan the entire table. > > 2. The use of the "like" predicate would prevent the use of indexes, if > they existed > > 3. The heavy use of blobs are a huge performance drag - blobs are an > extra I/O per record (at least) > > Essentially, you will want a table of phrases indexed on the phrase, > with one phrase to a row, a table of URL''s, and an associative table > that is indexed by both phrase_id and url_id (a single compound index). > There is also a list of synonyms that you have not stated yet, but which > belongs in the data model. > > Don''t be put off by a few million rows. The index is a B+ tree with a > cardinality determined by the number of keys per page and the page size > less some overheads. After about 1,000 rows performance does not > degrade noticably. > > With an 8k page size, and a 4 byte key, with 4 bytes allowed for the row > index, and some other overheads, you can expect about 800 index entries > per page. The first 800 rows require 1 I/O to locate in the index and 1 > I/O (2 I/O average) to look up in the table. However, you will average > a total of 3 I/O''s per search at between 800 and 640000 rows, and 4 > I/O''s between 640000 and 512000000 rows (yes, there are real-life > examples that get that big and perform well). > > 1 I/O = 10 milliseconds worst case on current hardware. This does not > allow for the performance boost that the RDBMS buffer pool caching will > give automatically. > > This is not quite right, but it should be close: > > CREATE TABLE urls ( > id int(11) NOT NULL auto_increment, > url varchar(250) default NULL, > title varchar(250) default NULL, > mdesc blob, > udate datetime default NULL, > PRIMARY KEY (id) > ); > > > CREATE TABLE phrases ( > id int(11) NOT NULL auto_increment, > phrase varchar(250) default NULL, > auth varchar(250) default NULL, > xmlfeed varchar(250) default NULL, > date_last_mention varchar(15) default NULL, > date_last_external varchar(15) default NULL, > date_last_photo varchar(15) default NULL, > date_last_rss varchar(15) default NULL, > seed_external1 blob, > seed_external2 blob, > seed_external3 blob, > seed_photo1 blob, > seed_photo2 blob, > seed_photo3 blob, > auth_external blob, > auth_photo blob, > auth_rss blob, > PRIMARY KEY (id) > ) > create unique index phrase_ndx1 on phrases (phrase); > > create table synonyms ( > id int(11) NOT NULL auto_increment PRIMARY KEY, > phrase_id int(11) NOT NULL references phrases(ID), > synonym_id int(11) NOT NULL references phrases(ID), > ) > create unique index phrase_synonym on synonyms (phrase_id, synonym_id); > > create table relations ( > id int(11) NOT NULL auto_increment PRIMARY KEY, > phrase_id int(11) NOT NULL references phrases(ID), > url_id int(11) NOT NULL references urls(ID) > ) > // This index is useful in some RDBMS, and superfluous in others. > // Experiment with yours to see whether the indexes created in the > "references" > // clauses above work better or worse than this index for your RDBMS. > // In some RDBMS'' , this will allow an index only scan, eliminating one > I/O from each fetch > create unique index phrase_url on relations (phrase_id, url_id); > > > > > The SQL to retrieve all URL''s for a given phrase would be something > along the lines of this (someone else will have to express it in Rails- > ese - I know all about making RDBMS'' perform, but my rails is still > weak): > > select URL > from > urls a, > relations b, > phrases c > where > a.id = b.url_id > and b.phrase_id = c.id > and c.phrase = ? > > > _Here''s a gotcha_: I normally use prepared statements. Rails does not > have prepared statements. The people running the show so far seem to > think that string substitution provides adequate performance, which is > true in a limited context but not all contexts. > > If Rails had a prepared statement this would have much better > performance. Rails'' string substitution model means that the heavy work > of building the access path from the SQL must happen on every call to > the RDBMS. A prepared statement would allow the access path to be build > once (at prepare time) and reused for the duration of the RDBMS > connection. > > So, while the actual data access may be fast with this schema, it is > possible that the overheads of the prepare phase may overwhelm the > benefits of the data model until/unless we can convince someone that > prepared statements are more than an oddity.-- Posted via http://www.ruby-forum.com/.
Andrew Cowan
2006-Apr-28 19:49 UTC
[Rails] Re: How to access a member from an inner join table?
David/Carl - thanks again for your replies, I started from scratch this morning with your suggestions and the response time is much much faster, I''d go as far as to say its even speedy. There is one issue that the database rework brings up and maybe there is an easy solution. With the removal of the phrases and rel_phrases fields from the urls table and similar removal of the urls and rel_urls fields from the phrases table, I am using the relations table to specify whether the relation between a given url_id and phrase_id is relevant or not -- done through a field Relation.isrel int(1). I have the urls and phrases linked together through a has_many :through relationship but I am unsure how to access the specific relation to get the isrel member. eg url.phrases is an array of all phrases where rel.url_id = url.id and rel.phrase_id = phrases*.id -- but given url.phrases how is the underling relation object accessed? I am hoping my description is clear and that there is a solution I am unaware of. Again, thank you guys so much for pointing out a better DB design. Andy -- Posted via http://www.ruby-forum.com/.
David Johnson
2006-Apr-29 01:47 UTC
[Rails] Re: How to access a member from an inner join table?
I''m still a newbie to Rails. If you simply want to filter out the rows based on the isrel column, just append "and isrel=1" to the sql If you have a more complex use for the relationship, my next step would be to trace into the rails source and figure out how Rails is handling the relation. I''m listening to see what a Rails guru comes up with. Good Luck. On Fri, 2006-04-28 at 21:49 +0200, Andrew Cowan wrote:> David/Carl - thanks again for your replies, I started from scratch this > morning with your suggestions and the response time is much much faster, > I''d go as far as to say its even speedy. > > There is one issue that the database rework brings up and maybe there is > an easy solution. > > With the removal of the phrases and rel_phrases fields from the urls > table and similar removal of the urls and rel_urls fields from the > phrases table, I am using the relations table to specify whether the > relation between a given url_id and phrase_id is relevant or not -- done > through a field Relation.isrel int(1). > > I have the urls and phrases linked together through a has_many :through > relationship but I am unsure how to access the specific relation to get > the isrel member. > > eg > > url.phrases is an array of all phrases where rel.url_id = url.id and > rel.phrase_id = phrases*.id -- but given url.phrases how is the > underling relation object accessed? > > I am hoping my description is clear and that there is a solution I am > unaware of. > > Again, thank you guys so much for pointing out a better DB design. > Andy >
Andrew Cowan
2006-Apr-29 02:23 UTC
[Rails] Re: Re: How to access a member from an inner join table?
David, as a temp solution I added a second join table called rrelations and only added in the url/phrase pairs that were tagged as relevant -- that works fine as far as I can tell, I can access phrase.urls with the relations inner join table via has_many :through and phrase.rel_urls via the rrelations join table. However, after playing alot more with the app I have found a problem, the inner join table implemented via has_many :through works great in only one direction, accessing the urls list from a phrase object (phrase.urls, or phrase.rel_urls for the second join table). If I attempt to access for a given url object url.phrases my server console starts to fill up with these: Phrase Load (0.004271) SELECT phrases.* FROM phrases INNER JOIN relations ON phrases.id = relations.phrase_id WHERE (relations.url_id = 4198) one of those for each url in the list, and a typical search may have 1000 - 7000 or so urls. This was the exact problem that caused me to scrap the inner join table with my first attempt at it, before I added the indexes for each of the tables. When I access for a phrase object, phrase.urls I see no such messages, so it seems the indexing is only working in one direction. Any ideas what might be the problem? -Andy David Johnson wrote:> I''m still a newbie to Rails. > > If you simply want to filter out the rows based on the isrel column, > just append "and isrel=1" to the sql > > If you have a more complex use for the relationship, my next step would > be to trace into the rails source and figure out how Rails is handling > the relation. > > I''m listening to see what a Rails guru comes up with. > > Good Luck.-- Posted via http://www.ruby-forum.com/.
David Johnson
2006-Apr-29 13:14 UTC
[Rails] Re: Re: How to access a member from an inner join table?
Rails is trying to query for these individually by generating SQL from each loaded object. This is a Rails specific problem, and I am still new to rails. I would suspect that one of three things is happening: 1. You are using the wrong rails facility to represent this relationship 2. Rails has a mechanism that will allow you to control this facility better, or 3. Rails has a bug in its implementation of this feature The only way to figure this out will be to dive into the Rails source code, unless a Rails guru is following this thread and can give more insight from experience. I am very interested in hearing what you find out because most of my work involves at least a couple of layers of indirection. On Sat, 2006-04-29 at 04:23 +0200, Andrew Cowan wrote:> David, as a temp solution I added a second join table called rrelations > and only added in the url/phrase pairs that were tagged as relevant -- > that works fine as far as I can tell, I can access phrase.urls with the > relations inner join table via has_many :through and phrase.rel_urls via > the rrelations join table. > > However, after playing alot more with the app I have found a problem, > the inner join table implemented via has_many :through works great in > only one direction, accessing the urls list from a phrase object > (phrase.urls, or phrase.rel_urls for the second join table). > > If I attempt to access for a given url object url.phrases my server > console starts to fill up with these: > > Phrase Load (0.004271) SELECT phrases.* FROM phrases INNER JOIN > relations ON phrases.id = relations.phrase_id WHERE (relations.url_id = > 4198) > > one of those for each url in the list, and a typical search may have > 1000 - 7000 or so urls. This was the exact problem that caused me to > scrap the inner join table with my first attempt at it, before I added > the indexes for each of the tables. > > When I access for a phrase object, phrase.urls I see no such messages, > so it seems the indexing is only working in one direction. > > Any ideas what might be the problem? > > -Andy > > > > > David Johnson wrote: > > I''m still a newbie to Rails. > > > > If you simply want to filter out the rows based on the isrel column, > > just append "and isrel=1" to the sql > > > > If you have a more complex use for the relationship, my next step would > > be to trace into the rails source and figure out how Rails is handling > > the relation. > > > > I''m listening to see what a Rails guru comes up with. > > > > Good Luck. > >