Bashed this together after reading Ola Bini''s post about his annoyances about the MySQL centric approach of ActiveRecord at http://ola-bini.blogspot.com/2006/09/rails-databases-activerecord-and-path.html I decided to find out if it was really that bad, and I suppose it might be. I spent some time trying to get the type information about columns from ActiveRecord into the driver, but I couldn''t come up with a good way to do it. So I started over and wrote this query cache thingie into the PostgreSQL driver. It''s a giant hack, ugly and buggy, but it passes all ActiveRecord tests except one (where it results in two queries being made, one for the prepare and another for the execute, instead of a single query doing select) and I managed to try it out on a site of mine in development mode. With prepared statements I got: Completed in 1.59062 (0 reqs/sec) | Rendering: 1.31451 (82%) | DB: 0.25888 (16%) | 200 OK [http://192.168.0.46/characters] Completed in 1.61538 (0 reqs/sec) | Rendering: 1.36034 (84%) | DB: 0.23845 (14%) | 200 OK [http://192.168.0.46/characters] Without prepared statements: Completed in 1.74804 (0 reqs/sec) | Rendering: 1.23182 (70%) | DB: 0.49718 (28%) | 200 OK [http://192.168.0.46/characters] Completed in 1.67329 (0 reqs/sec) | Rendering: 1.21762 (72%) | DB: 0.43935 (26%) | 200 OK [http://192.168.0.46/characters] So that''s a nice speed up, and while that might be a poor example beause of the high number of repeated selects in it, I think that prepared statements should be supported in some way by ActiveRecord, as soon as possible. It''s a giant hack and I''m actually surprised it works at all, I''m not proud of this code but it told me what I wanted to know, whether or not prepared statements really are that cool or not (turns out they are). Index: activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb ==================================================================--- activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb (revision 5147) +++ activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb (working copy) @@ -1,4 +1,5 @@ require ''active_record/connection_adapters/abstract_adapter'' +require ''csv'' module ActiveRecord class Base @@ -54,6 +55,8 @@ def initialize(connection, logger, config = {}) super(connection, logger) + @statements = {} + @prepared_statements = [] @config = config @async = config[:allow_concurrency] configure_connection @@ -141,8 +144,184 @@ result.first if result end + def try_prepared_select(sql) + # split the query into the statement part (select a from x left join y...) and the conditions part (where b = c and x.d = y.d) + return nil unless sql =~ /WHERE/ + statement, conditions = sql.split("WHERE", 2) + return nil if conditions == nil + + # try to find a "primary table", for simple queries where column names aren''t fully qualified + # select name from developers where name = ''Joel'' + # ^-- this is the primary table + # vs + # select developers.name from developers where developers.name = ''Joel'' + # if we find columns without a table in the conditions, we prefix them with the primary table + # we don''t always get a primary table, but when we don''t, all columns are fully qualified + sql =~ /FROM ([\w\s,]+?)\s+(WHERE|LEFT|INNER|OUTER)/ + primary_table = $1.strip if $1 + + # table aliases cause all kinds of trouble, since we need to scan the real table to find column types + # select name from developers d where d.name = ''Joel'' + # ^ ^-- we need to translate this into "developers.name" when trying to find out the type + # \-- d is an alias for developers + alias_table = {} + tables = [] + if primary_table && (primary_table =~ /\w+\s+\w+/) # aliases + primary_table.split(/,/).each_with_index { |table_alias, i| + pri, ali = table_alias.strip.split(/\s+/) + pri.strip! + ali.strip! + primary_table = pri if i == 0 + tables << pri + alias_table[ali] = pri + } + end + tables << primary_table if primary_table + tables = tables.sort.uniq + + # next we need to normalize the conditions, since the following two statements are functionally the same and should use the same stored procedure + # select name from developers d where d.name = ''Joel'' + # select name from developers d where d.name = ''Andreas'' + # so, we should normalize the conditions into "name = $1" for the comparison and for the prepared statement + variables_values = [] + conditions.split(/\s+(AND|OR|LIMIT|ORDER|OFFSET|GROUP|HAVING|FOR)\s+/i).each { |part| + variable, operand, value part.strip.split(/\s*(=|>|>=|<>|<|<=|!=)\s*/, 3) + if variable and value + r1 = Regexp.new(/^[\(\)\s]*/) # remove leading parantheses and spaces ... + r2 = Regexp.new(/[\(\)\s]*$/) # ... and remove trailing parantheses and spaes + r3 = Regexp.new(/^\w+\(/) + variable.sub!(r1, '''') + variable.sub!(r2, '''') + variable.sub!(r3, '''') + value.sub!(r1, '''') + value.sub!(r2, '''') + + if value =~ /^(''|"|\d)/ + variables_values << [ variable, operand, value ] + end + end + } + return nil if variables_values.length == 0 + + # make the replacements and store the columns involved + variables = [ ] + aliased_columns = [ ] + normalized_conditions = conditions + variables_values.each_with_index { |v,i| + name, operand, value = v + variables << value + aliased_columns << name + normalized_conditions.sub!(/\s*#{operand}\s*#{value}/, "#{operand} $#{i+1}") + normalized_conditions.sub!(/\s*#{operand}\s*#{value}/, "#{operand} $#{i+1}") + } + + # keep track of how many times we''ve seen this statement + if @statements.has_key?(statement+normalized_conditions) + @statements[statement+normalized_conditions] += 1 + else + @statements[statement+normalized_conditions] = 0 + end + + # now that we have the variables and normalized conditions, we can look for a prepared statement + if @prepared_statements.include?(statement+normalized_conditions) + # if we have one, go for it + number @prepared_statements.index(statement+normalized_conditions) + return execute("EXECUTE prep#{number}(#{variables.join('', '')})") + # if we have some number of hits on this statement, we deem it worthy of being turned into a prepared statement + elsif @statements.fetch(statement+normalized_conditions, 0) > 3 + columns = [] + # here we want to match d.name and turn it into developers.name, and save it for the next step + aliased_columns.each { |c| + table, column = c.split(/\./) + table = alias_table[table] if alias_table[table] + if column + columns << "#{table}.#{column.strip.gsub(/\"/, "")}" + else + columns << "#{primary_table}.#{table.strip.gsub(/\"/, "")}" + end + } + + # now we need to find the type of all the columns in the conditions. we should find out that developers.name is a char varying(255) + # the first step is to store an array of columns for every table in condition_columns[table] = [ column1, column2, ... ] + columns + condition_columns = { } + columns.uniq.each { |c| + t, c = c.split("\.") + t.strip! if t + c.strip! if c + if condition_columns[t] + condition_columns[t] << c + else + condition_columns[t] = [ c ] + end + } + # then we go through every table and find the types for (hopefully) all the columns we care about + column_types = { } + condition_columns.each_key { |t| + column_definitions(t).collect do |name, type, default, notnull, typmod| + if condition_columns[t].include?(name) + column_types["#{t}.#{name}"] = type + end + end + } + + # and now that we finally have all types of the columns where interested in... + argument_types = [] + columns.each { |c| + return nil unless column_types.has_key?(c) # actually sometimes we don''t, and in that case we give up + argument_types << column_types[c] + } + # ... but when we do, we can prepare a statement + execute("PREPARE prep#{@prepared_statements.length}(#{argument_types.join('', '')}) AS #{statement} WHERE #{normalized_conditions}") + @prepared_statements << (statement+normalized_conditions) + number @prepared_statements.index(statement+normalized_conditions) + return execute("EXECUTE prep#{number}(#{variables.join('', '')})") + end + return nil + end + + def try_prepared_insert(sql) + statement = sql.split("VALUES")[0] + values = sql.split("VALUES")[1] + statement =~ /INSERT INTO (\w+) \((.+)\)/i + table = $1 + columns = $2 + if @statements.fetch(statement, 0) > 3 + value_array = CSV::parse_line(values).map { |e| e.to_s.strip } + column_array = CSV::parse_line(columns.gsub(/\"/, "")).map { |e| e.to_s.strip } + argument_types = {} + if tables.include?(table) + column_definitions(table).collect do |name, type, default, notnull, typmod| + if column_array.include?(name) + argument_types[name] = type + column_array.delete(name) + end + end + if column_array.length == 0 # we matched all columns + type_array = [] + column_array = CSV::parse_line(columns.gsub(/\"/, "")).map { |e| e.to_s.strip } + column_array.each { |c| type_array << argument_types[c] } + unless @prepared_statements.include?(statement) + dollar_arguments = ((1 .. column_array.length).map { |i| ''$'' + "#{i}" }).join(", ") + execute("PREPARE prep#{@prepared_statements.length} (#{type_array.join('', '')}) AS INSERT INTO #{table}(#{column_array.join('', '')}) VALUES(#{dollar_arguments})") + @prepared_statements << statement + end + number = @prepared_statements.index(statement) + return execute("EXECUTE prep#{number}#{values}") + end + end + else + if @statements.has_key?(statement) + @statements[statement] += 1 + else + @statements[statement] = 0 + end + end + return nil + end + def insert(sql, name = nil, pk = nil, id_value = nil, sequence_name = nil) #:nodoc: - execute(sql, name) + execute(sql, name) if try_prepared_insert(sql) == nil table = sql.split(" ", 4)[2] id_value || last_insert_id(table, sequence_name || default_sequence_name(table, pk)) end @@ -405,7 +584,8 @@ end def select(sql, name = nil) - res = execute(sql, name) + res = try_prepared_select(sql) + res = execute(sql, name) if res == nil results = res.result rows = [] if results.length > 0 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
Well, you sure have got an awesome proof of concept! Why don''t you take it up with the maintainer of the PostgreSQL adapter or the core team? It would be great to see more of this. Regards, Roderick -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
Roderick van Domburg wrote:> Well, you sure have got an awesome proof of concept! Why don''t you take > it up with the maintainer of the PostgreSQL adapter or the core team? It > would be great to see more of this.Thank you! However... If this was for the PostgreSQL adapter and only the PostgreSQL adapter, that might be worth doing, but it''s more than that. Any database supporting prepared statements could do this. Let''s turn it into a wrapper around all the calls to select and insert for all the databases that support it. That would be great, right? Actually, not really, for several reasons. 1) even if it was rewritten to do proper parsing of the SQL, it would still cause unnecessary overhead (possibly even more so if it did proper parsing) 2) there''s no control over when the prepared statements are being made. Here I use a minimum of 3 times, then the statement is prepared. In a production setting that would mean that almost every statement is prepared, which isn''t optimal. Higher limits would still get just about every query prepared, which isn''t always optimal. 3) if this was solid it would still only be treating the symptoms and hiding the real problem What real problem? ActiveRecord is, uhm... well, let''s just say it''s not well structured for this kind of thing (I''d love to be proven wrong on this) and needs to be refactored, and rewritten in large parts, if this was to be done cleanly. SQL statements are built in several parts, piece for piece, and no information is saved about what actually goes into it. That''s why we need to parse the SQL string to find things we should already know. ActiveRecord made that SQL string, so ActiveRecord should know what variables, tables and values are in there, and we should be able to use that to do useful things like prepare/execute before it hits the adapter. Adapters are stupid, and they should be. I think ActiveRecord is stupid because it''s throwing all this information away, and it shouldn''t be. Ideally ActiveRecord would use a slightly more abstract format for queries that then gets turned into SQL in one place... and while I don''t know a lot about ActiveResource, standing on the outside and looking in, it seems to me like it should be an ActiveRecord with different storage. A method like ''delete'' shouldn''t build a SQL string, it should send a delete message to the object''s storage object, whether it''s REST or SQL or something else (file based storage could have some uses I bet). Or implementing caching for all "ActiveObjects", which should also be done above the SQL/REST stuff, there''s no way to do that either from what I can tell. I bet I''m going to have to regret that last part. :) Anyways, the main focus is and should be that ActiveRecord isn''t very well structured for implementing features like prepare/execute, and until it is, I can''t see how it can be done well. And if we can''t do it well, we might as well not do it at all (unless it''s for testing or research, like this was). If I''m wrong and there is a good way to implement it, it should be made a top priority for 2.0 or 2.1, because this is more important that providing even more helpers. People like to call Ruby on Rails slow, and the defenders like to say that most of the time is spent in the database anyway, yet here we could get a huge database increase and yet it''s still not done. Someone posted a couple of weeks ago about how he had been called up by the DBA and was told to shut down his Rails application because it caused too much load on the database, and 80% of the time was spent preparing statements. I don''t think there''s any good excuse for this, and it''s time to do something about it, but this here isn''t the way to go. In my humble opinion. ;) Probably came across as pompous bastard there, which wasn''t my intention. I''d love to help working on a real solution to this problem but I don''t know where to start (and even if I did, I couldn''t do it alone, and if I did it wouldn''t be accepted anyways). I''m trying to show how useful prepared statements can be to hopefully bring some more attention to this. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
On 9/21/06, Joel Wilsson <joel.wilsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > That''s why we need to parse the SQL string to find things we should > already know. ActiveRecord made that SQL string, so ActiveRecord should > know what variables, tables and values are in there, and we should be > able to use that to do useful things like prepare/execute before it > hits the adapter. > Adapters are stupid, and they should be. I think ActiveRecord is stupid > because it''s throwing all this information away, and it shouldn''t be. > > Ideally ActiveRecord would use a slightly more abstract format for > queries that then gets turned into SQL in one place... and while I > don''t know a lot about ActiveResource, standing on the outside and > looking in, it seems to me like it should be an ActiveRecord with > different storage. > A method like ''delete'' shouldn''t build a SQL string, it should send a > delete message to the object''s storage object, whether it''s REST or SQL > or something else (file based storage could have some uses I bet). > Or implementing caching for all "ActiveObjects", which should also be > done above the SQL/REST stuff, there''s no way to do that either from > what I can tell. >Joel, good input all around. Not to punt on this, but I fear that refactoring will turn into a class-for-every-noun design spree/quagmire. AR works really well for its common case; we don''t need to solve the non-problems that some ORMs relish. However, it''s getting creaky in spots and could use some love. A good example is how anna-BeCJdpYfg9U@public.gmane.org reworked eager associations: http://dev.rubyonrails.org/ticket/3913 jeremy --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
johnson_d-j9pdmedNgrk@public.gmane.org
2006-Sep-22 02:25 UTC
Re: Prepared statements (PostgreSQL)
I''m with you ... but it''s about #10 on my list right now. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
You''ve got some good ideas here. I agree that we need to get prepared statements in AR. I just spent about 20 minutes working on this proof of concept. It''s really ugly, but it shows a different approach to using PS. Here''s how it works. First you extend AR::Base with the module below. Then your class could look like: class Person < ActiveRecord::Base prepare :find_by_name end When the Person class is loaded, it will build a prepared statement and execute it against the database. Then it defines find_by_name to execute the PS instead of building a standard SQL query. The main advantage that I can see is that the prepared statement is explicit. Instead of trying to figure out if it should prepare a statement after a certain number of queries, you can just say define a particular method to use a prepared statement. My implementation blows (it''s at the bottom of this post), but as I said it was just a quick proof of concept. I''m not super well-versed in AR, so I don''t know what exactly I can use. Ideally I''d like to somehow get a string that represents the SQL that AR would generate on its own, then use that to create the prepared statement. I think it could be implemented to handle inserts easily. There are some issues that need to be handled, like making sure not to prepare an exising PS, or making sure one exists before executing it. But those are of course things that can come later if this seems like a good approach. Also I think some of this probably belongs in the connector, though I''m not entirely sure what parts. What do you think of this approach? Is it worth pursuing? Pat module PreparedPostgres def prepare(method_id) if match = /find_(by)_([_a-zA-Z]\w*)/.match(method_id.to_s) finder = determine_finder(match) attribute_names = extract_attribute_names_from_match(match) super unless all_attributes_exists?(attribute_names) cols = columns.select { |c| attribute_names.include?(c.name) } col_types = cols.collect { |c| ps_type c.type } conds = [] cols.each { |c| conds << "#{c.name}=$#{conds.size + 1}" } prep = "PREPARE #{method_id.to_s} (#{col_types.join(", ")}) AS " + "SELECT * FROM #{table_name} WHERE #{conds.join(" AND ")};" connection.execute prep; (class << self; self; end).instance_eval do define_method(method_id) do |*args| qargs = args.collect { |a| quote a } statement = "EXECUTE #{method_id.to_s}(#{qargs.join(", ")});" find_by_sql(statement).first; end end end end private def ps_type(type) case type when :integer then :int when :string then :text end end end --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
I am glad to see that in the last week we have gone from "we should do something" to "here''s a proof of concept." Here''s a couple considerations and implications: 1. Only the database connector knows (or should be responsible for knowing) whether or not the underlying RDBMS supports prepared statements. 2. ActiveRecord needs to be the one responsible for determining whether or not a prepared statement exists. Otherwise it will generate a lot of unneeded sql strings that are subsequently ignored in favor of the prepared statements. The implication is that the availability of prepared statements must be exposed as a standardized (boolean) property of the RDBMS connector, and then the ActiveRecord can ignore (current behavior) or use it later as it matures. If a mechanism for generating a "footprint" for a prepared statements can be determined, that is more efficient than generating an SQL string, then we are in very good shape. The "find_by_id" idiom is a good trivial case, reflecting 90% of SQL in my own rails apps. However, it is incomplete. A trivial footprint consists of the SQL from the "from" clause to the end of the statement, prior to sanitize/populate parameters. The code for this pre-exists, and as a String it provides a good key into a hash map. Once the "from" and "where" clause are computed, the actual prepared statement could be obtained by something along these lines # This is entirely pseudocode. Any resemblance to Ruby is # randomly intentional but entirely unreliable footprint=... # existing "from ... where ..." code if connector.supports_prepared_statement then stmt=stmt_map[footprint] # populate parameters ... ... # execute query result = stmt.execute end else # existing code end Unless specifically noted (like in find_by_sql), prepared statement queries would return all columns from all addressed tables. find_by_sql should use the full sql as a footprint. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---