neongrau __
2007-Mar-14 15:45 UTC
[Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records
hi! after wondering why i can''t find alot of records i eventually found the problem in the sqlserver_adapters implementation of "add_limit_offset!". the problem is when using MSSQL with the sqlserver_adapter paging will only work when at least one column is defined in ":order". for example i was indexing a table with 2912 records, the generated sql for the batches were these: SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 1000 * FROM table_name) AS tmp1 ) AS tmp2 SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 2000 * FROM table_name) AS tmp1 ) AS tmp2 SELECT * FROM (SELECT TOP 912 * FROM (SELECT TOP 2912 * FROM table_name) AS tmp1 ) AS tmp2 as you can imagine it was indexing the same top 1000 records 3 times :( i think a default ordering by the primary key would help to eliminate that problem. -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Mar-14 16:14 UTC
[Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records
On Wed, Mar 14, 2007 at 04:45:04PM +0100, neongrau __ wrote:> hi! > > after wondering why i can''t find alot of records i eventually found the > problem in the sqlserver_adapters implementation of "add_limit_offset!". > > the problem is when using MSSQL with the sqlserver_adapter paging will > only work when at least one column is defined in ":order". > > for example i was indexing a table with 2912 records, the generated sql > for the batches were these: > > SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 1000 * FROM > table_name) AS tmp1 ) AS tmp2 > SELECT * FROM (SELECT TOP 1000 * FROM (SELECT TOP 2000 * FROM > table_name) AS tmp1 ) AS tmp2 > SELECT * FROM (SELECT TOP 912 * FROM (SELECT TOP 2912 * FROM table_name) > AS tmp1 ) AS tmp2 > > > as you can imagine it was indexing the same top 1000 records 3 times :( > > i think a default ordering by the primary key would help to eliminate > that problem.I just committed this, so could you please try if current trunk fixes this for you? Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
neongrau __
2007-Mar-14 16:43 UTC
[Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records
Jens Kraemer wrote:> I just committed this, so could you please try if current trunk fixes > this for you?hi! thanks for your fast response. yes it looks like everything gets indexed now. but now somehow find_id_by_contents doesnt work as before, has it changed? i was using this to get all id''s as an array: .find_id_by_contents(search, {:limit => :all}).collect {|x| x[:id]} -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Mar-14 16:57 UTC
[Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records
On Wed, Mar 14, 2007 at 05:43:35PM +0100, neongrau __ wrote:> Jens Kraemer wrote: > > I just committed this, so could you please try if current trunk fixes > > this for you? > > hi! > > thanks for your fast response. > yes it looks like everything gets indexed now. > > but now somehow find_id_by_contents doesnt work as before, has it > changed? > > i was using this to get all id''s as an array: > .find_id_by_contents(search, {:limit => :all}).collect {|x| x[:id]}if used without a block, find_id_by_contents returns a 2-element-array where the first element is the number of total hits and the last element is the results array, so .find_id_by_contents(search, {:limit => :all}).last.collect {|x| x[:id]} should work. I''m not sure when this behaviour has been introduced, but I guess it''s been a while ago... Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
neongrau __
2007-Mar-14 17:11 UTC
[Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records
Jens Kraemer wrote:> if used without a block, find_id_by_contents returns a 2-element-array > where the first element is the number of total hits and the last element > is the results array, sook, found that while checking on the console. but whats the difference to just ask for .length on the resulting array on the old behavior? -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Mar-15 08:28 UTC
[Ferret-talk] aaf batch_size limits indexing on mssql to 1000 records
On Wed, Mar 14, 2007 at 06:11:17PM +0100, neongrau __ wrote:> Jens Kraemer wrote: > > if used without a block, find_id_by_contents returns a 2-element-array > > where the first element is the number of total hits and the last element > > is the results array, so > > ok, found that while checking on the console. > > but whats the difference to just ask for .length on the resulting array > on the old behavior?if you use :limit to only fetch 20 or so results, total_hits will still tell you the total number of results, while results.length will be 20. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa