Matteo Vaccari
2009-Nov-04 17:13 UTC
references_eager_loaded_tables? matches string literals containing dots
It all began when a collegue said that "our free text search throws if the user enters something with a dot". A bit of investigation showed that ActiveRecord produced a different set of queries depending on the user input containing a dot or not. The :include is handled with a single query with a join, or with two separate queries. This is not a big problem in most cases; it only has a performance impact. In our case there was a failure because the :include joins a table which another scope joins for a different reason. We would never have noticed this otherwise. I dug a bit, and found that the reason is that method ActiveRecord::Base.tables_in_string treats anything before a dot like a table name. So a condition like ["x = ?", params[:input]] can produce a query "x = ''foo.bar''", and tables_in_string reports that we reference a table named "foo". It''s not a big problem for our application; the simple workaround for me was to filter out dots in user input. But I think it''s not good that AR can behave differently according to user input. So I filed a ticket and prepared a patch. https://rails.lighthouseapp.com/projects/8994/tickets/3446-tables_in_string-matches-literal-string-containing-dot#ticket-3446-2 Cheers, Matteo --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en -~----------~----~----~----~------~----~------~--~---
Jeremy Evans
2009-Nov-04 18:37 UTC
Re: references_eager_loaded_tables? matches string literals containing dots
On Wed, Nov 4, 2009 at 9:13 AM, Matteo Vaccari <matteo.vaccari@gmail.com> wrote:> > It all began when a collegue said that "our free text search throws if the > user enters something with a dot". A bit of investigation showed that > ActiveRecord produced a different set of queries depending on the user input > containing a dot or not. The :include is handled with a single query with a > join, or with two separate queries. > > This is not a big problem in most cases; it only has a performance impact. > In our case there was a failure because the :include joins a table which > another scope joins for a different reason. We would never have noticed > this otherwise. > > I dug a bit, and found that the reason is that method > ActiveRecord::Base.tables_in_string treats anything before a dot like a > table name. So a condition like > > ["x = ?", params[:input]] > > can produce a query "x = ''foo.bar''", and tables_in_string reports that we > reference a table named "foo". > > It''s not a big problem for our application; the simple workaround for me was > to filter out dots in user input. But I think it''s not good that AR can > behave differently according to user input. So I filed a ticket and > prepared a patch. > > https://rails.lighthouseapp.com/projects/8994/tickets/3446-tables_in_string-matches-literal-string-containing-dot#ticket-3446-2Attempting to parse SQL using regular expressions may work for simple cases, but not more complex ones. The following will still fail with your patch: irb(main):001:0> puts "name = ''foo\\\\'' AND ''foo.bar '' = number" name = ''foo\\'' AND ''foo.bar '' = number irb(main):002:0> puts "name = ''foo\\\\'' AND ''foo.bar '' number".gsub(/''(\\''|[^''])*''|"(\\"|[^"])*"/, '''') name = foo.bar '' = number irb(main):003:0> "name = ''foo\\\\'' AND ''foo.bar '' number".gsub(/''(\\''|[^''])*''|"(\\"|[^"])*"/, '''').scan(/([a-zA-Z_][\.\w]+).?\./).flatten => ["foo"] To be fair, ActiveRecord attempts to parse SQL using regular expressions all over the place, and your patch handles the most common current failure cases, so it''s still probably a good candidate for inclusion. Jeremy
Matteo Vaccari
2009-Nov-05 13:37 UTC
Re: references_eager_loaded_tables? matches string literals containing dots
On Wed, Nov 4, 2009 at 7:37 PM, Jeremy Evans <jeremyevans0@gmail.com> wrote:> > On Wed, Nov 4, 2009 at 9:13 AM, Matteo Vaccari <matteo.vaccari@gmail.com> wrote: >> >> It all began when a collegue said that "our free text search throws if the >> user enters something with a dot". A bit of investigation showed that >> ActiveRecord produced a different set of queries depending on the user input >> containing a dot or not. The :include is handled with a single query with a >> join, or with two separate queries. >> >> This is not a big problem in most cases; it only has a performance impact. >> In our case there was a failure because the :include joins a table which >> another scope joins for a different reason. We would never have noticed >> this otherwise. >> >> I dug a bit, and found that the reason is that method >> ActiveRecord::Base.tables_in_string treats anything before a dot like a >> table name. So a condition like >> >> ["x = ?", params[:input]] >> >> can produce a query "x = ''foo.bar''", and tables_in_string reports that we >> reference a table named "foo". >> >> It''s not a big problem for our application; the simple workaround for me was >> to filter out dots in user input. But I think it''s not good that AR can >> behave differently according to user input. So I filed a ticket and >> prepared a patch. >> >> https://rails.lighthouseapp.com/projects/8994/tickets/3446-tables_in_string-matches-literal-string-containing-dot#ticket-3446-2 > > Attempting to parse SQL using regular expressions may work for simple > cases, but not more complex ones. The following will still fail with > your patch: > > irb(main):001:0> puts "name = ''foo\\\\'' AND ''foo.bar '' = number" > name = ''foo\\'' AND ''foo.bar '' = number > irb(main):002:0> puts "name = ''foo\\\\'' AND ''foo.bar '' > number".gsub(/''(\\''|[^''])*''|"(\\"|[^"])*"/, '''') > name = foo.bar '' = number > irb(main):003:0> "name = ''foo\\\\'' AND ''foo.bar '' > number".gsub(/''(\\''|[^''])*''|"(\\"|[^"])*"/, > '''').scan(/([a-zA-Z_][\.\w]+).?\./).flatten > => ["foo"] > > To be fair, ActiveRecord attempts to parse SQL using regular > expressions all over the place, and your patch handles the most common > current failure cases, so it''s still probably a good candidate for > inclusion.You are right. I improved the regexp and updated the ticket with a new patch. Now it will treat any character after a backslash as part of the string literal: /''(\\.|[^''])*''|"(\\.|[^"])*"/ I agree that parsing SQL correctly with regexps is hopeless. But the string literal syntax should be well within the possibilities of a finite-state automa. I believe this should work... I''ll be happy to be refuted :-) I have a few question, being an absolute newbie to this list... was it bad manners on my part to assign the ticket to someone? I did choose Pratik because he handled a very similar ticket some time ago. Also, is it OK to add unit tests for a private method? Matteo
Jeremy Evans
2009-Nov-05 16:47 UTC
Re: references_eager_loaded_tables? matches string literals containing dots
On Thu, Nov 5, 2009 at 5:37 AM, Matteo Vaccari <matteo.vaccari@gmail.com> wrote:> You are right. I improved the regexp and updated the ticket with a new > patch. Now it will treat any character after a backslash as part of > the string literal: /''(\\.|[^''])*''|"(\\.|[^"])*"/It''s still going to be broken on PostgreSQL if standard_conforming_strings is on, and potentially other databases too if they conform to the SQL standard (where backslashes are not special): a = ''foo\'' AND ''foo.bar '' = a In standard SQL, ''foo\'' and ''foo.bar'' are two separate string literals. Your remove_literal_strings converts it to: a = foo.bar '' = a> I agree that parsing SQL correctly with regexps is hopeless. But the > string literal syntax should be well within the possibilities of a > finite-state automa. I believe this should work... I''ll be happy to > be refuted :-)It''ll work on some databases and not others, because of the differences in SQL parsers. On PostgreSQL, whether it works depends on the standard_conforming_strings setting. You could make it adapter dependent, I suppose. Also, in the SQL standard double quotes specify identifiers, not strings, so remove_literal_strings is a misnomer, since it also removes quoted identifiers. Whether backslashes are allowed and how they are handled inside quoted identifiers is another thing that is going to vary depending on the database used. Maybe since ActiveRecord is getting a major bump to 3.0, it''s time to break backwards compatibility and use a different .find hash key for preloaded includes (which use separate queries) vs. regular includes (which use joins). That would avoid the issue of scanning strings looking for whether or not tables are referenced in order to decide whether to preload them. Jeremy