I''ve been looking into memory behavior in Rails workers. One thing I''ve noticed is that it''s easy to instantiate multiple tens of thousands of objects on the Ruby heap even with find_each operating in batches of 1000. Most of these objects appear to be highly redundant. Consider loading 1000 instances of an AR object MyClass which has 20 database fields. There will be at least 20 x 1000 strings allocated, as measured by GC.start; ObjectSpace.count_objects[:T_STRING]. Digging deeper, it looks like each instance has an internal attributes hash in an instance variable. The first key is typically the string "id". Each "id" string is an individual object, as determined by object_id, even though all of these strings are frozen for use as hash keys. Would it be possible to take advantage of the very large amount of duplication in the keys of this hash to save thousands of unnecessary objects from being allocated every time a bulk query is run? Maybe something like a StringPool, or getting the column name directly from a lower layer, or using symbols would work. There are also a bunch of empty hashes which could probably be shared in a copy on write style. Right now it looks like six Hashes per AR instance with 4 of them empty initially in a typical find query. Thanks for any thoughts! -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/KHOjbe_BlM0J. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Basically every Rails request allocates a zillion objects. I''m sure there''s tons of work that could be done here. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Tuesday, September 11, 2012 5:22:10 PM UTC-4, Masterleep wrote:> > Maybe something like a StringPoolThat''s a big one, and it would be something that needs to be addressed in Ruby, not in Rails. But the problem is that you would have unintuitive behavior for those used to doing things like: s = ''Error'' s.chomp!(''or'') In today''s Ruby and jruby-1.7.0.preview2: $ irb jruby-1.7.0.preview2 :001 > "Error".object_id => 2042 jruby-1.7.0.preview2 :002 > "Error".object_id => 2044 jruby-1.7.0.preview2 :003 > "Error".chomp!(''or'').object_id => 2046 jruby-1.7.0.preview2 :004 > s = "Error" => "Error" jruby-1.7.0.preview2 :005 > s.object_id => 2048 jruby-1.7.0.preview2 :006 > s.chomp!(''or'') => "Err" jruby-1.7.0.preview2 :007 > s.object_id => 2048 See, when you are just working with strings willy nilly, it creates new instances and you don''t have to worry about things like the "bang" methods altering the same object. In a StringPool''d ruby, the bang methods would need to return a string that was the same object_id so that past implementations that depend on object equivalence would still work, but it could not alter the "Error" string in the StringPool or things would go terribly wrong. Feel free to take this up on the ruby list, and post back the link. I''m sure that those guys could figure out a way to make it work if they''ve not already discussed it, but my guess is it would be a breaking major change, even if it is necessary to reduce # of objects and make things faster. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/GbH73B1EQmQJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Something that would work instead of a StringPool that is Ruby-ish is use of symbols. Symbols are Ruby''s answer to the StringPool. If things are stored as symbols, you can work with them similarly as to what you would expect and reduce # objects, e.g. jruby-1.7.0.preview2 :008 > :error.object_id => 2050 jruby-1.7.0.preview2 :009 > :error.object_id => 2050 jruby-1.7.0.preview2 :010 > :error.to_s.chomp!(''or'').to_sym => :err jruby-1.7.0.preview2 :011 > :error.to_s.chomp!(''or'').to_sym.object_id => 2052 jruby-1.7.0.preview2 :012 > :error.to_s.chomp!(''or'').to_sym.object_id => 2052 So basically if everywhere in Rails documentation that referred to strings instead specified constants, and if the method didn''t support constants that would be a good goal: http://guides.rubyonrails.org But still, whenever you output a string to a log, it becomes a string. So, you might be able to make some inroads by changes to Rails and related documentation, but if Ruby "fixed it" instead via something like StringPool (again- a major and breaking change), then you wouldn''t have to worry about wasting all that time on the Rails side. In addition, many text editors and IDEs have different colors for Strings, so that keys and values stand out better in examples like: class Employee < ActiveRecord::Base has_many :subordinates, :class_name => "Employee", :foreign_key => "manager_id" belongs_to :manager, :class_name => "Employee" end So, if you switch to all symbols, it is a little more monotone, colorwise. However, if you switch to Ruby 1.9 key/value then you could color the key in a: :b differently by the fact that it ends in a colon vs. starting with one. Unfortunately, the existing default color schemes don''t usually do that. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/W-QsFXyc4cwJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Richard Schneeman
2012-Sep-12 15:06 UTC
Re: Re: Excessive redundant object allocation in AR
Symbols are never garbage collected in Ruby. http://stackoverflow.com/questions/659755/ruby-symbols-are-not-garbage-collected-then-isnt-it-better-to-use-a-string -- Richard Schneeman http://heroku.com @schneems (http://twitter.com/schneems) On Wednesday, September 12, 2012 at 11:00 AM, Gary Weaver wrote:> Something that would work instead of a StringPool that is Ruby-ish is use of symbols. Symbols are Ruby''s answer to the StringPool. If things are stored as symbols, you can work with them similarly as to what you would expect and reduce # objects, e.g. > > jruby-1.7.0.preview2 :008 > :error.object_id > => 2050 > jruby-1.7.0.preview2 :009 > :error.object_id > => 2050 > jruby-1.7.0.preview2 :010 > :error.to_s.chomp!(''or'').to_sym > => :err > jruby-1.7.0.preview2 :011 > :error.to_s.chomp!(''or'').to_sym.object_id > => 2052 > jruby-1.7.0.preview2 :012 > :error.to_s.chomp!(''or'').to_sym.object_id > => 2052 > > So basically if everywhere in Rails documentation that referred to strings instead specified constants, and if the method didn''t support constants that would be a good goal: > http://guides.rubyonrails.org > > But still, whenever you output a string to a log, it becomes a string. So, you might be able to make some inroads by changes to Rails and related documentation, but if Ruby "fixed it" instead via something like StringPool (again- a major and breaking change), then you wouldn''t have to worry about wasting all that time on the Rails side. > > In addition, many text editors and IDEs have different colors for Strings, so that keys and values stand out better in examples like: > > class Employee < ActiveRecord::Base > has_many :subordinates, :class_name => "Employee", > :foreign_key => "manager_id" > belongs_to :manager, :class_name => "Employee" > end > > So, if you switch to all symbols, it is a little more monotone, colorwise. However, if you switch to Ruby 1.9 key/value then you could color the key in a: :b differently by the fact that it ends in a colon vs. starting with one. Unfortunately, the existing default color schemes don''t usually do that. > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. > To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/W-QsFXyc4cwJ. > To post to this group, send email to rubyonrails-core@googlegroups.com (mailto:rubyonrails-core@googlegroups.com). > To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com (mailto:rubyonrails-core+unsubscribe@googlegroups.com). > For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Wednesday, September 12, 2012 11:00:08 AM UTC-4, Gary Weaver wrote:> > Symbols are Ruby''s answer to the StringPool. >btw- I shouldn''t have said it like that. That makes it sound like symbols were invented as a reaction to Java''s StringPool. I just said this because the way you can continuously refer to a symbol and get the same object_id is similar to referring to a string that has been stored/retrieved from StringPool in Java. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/sAIZHywo8dQJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Wednesday, September 12, 2012 11:06:43 AM UTC-4, richard schneeman wrote:> > Symbols are never garbage collected in Ruby. >Good point. However, for Rails, I''d think you''d still use less memory if symbols were just used for class, controller, model, field names in views, etc. Even if Rails had to do handfuls (or 100s) of symbol -> string -> some change to string -> to_sym''ing during startup, the memory consumption would very likely be less than not doing it. You wouldn''t want to store every value retrieved from a database as a symbol obviously, nor store all values in incoming request params as symbols, and if things in Rails are doing regexp''s on something, it wouldn''t make sense to constantly be to_s''ing (in one way or another) to operate on them. There is a balance between needing to garbage collect and needing to keep too many objects from being instantiated, even if they are GC''d. But you are right- the Java StringPool would GC something that was no long referenced, I believe, and if symbols are used for large varying strings, that''s a memory leak, but that''s not what I''m talking about. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/hIkusaGHkUsJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Wednesday, September 12, 2012 11:24:26 AM UTC-4, Gary Weaver wrote:> > if symbols are used for large varying strings, that''s a memory leak, but > that''s not what I''m talking about. >Sorry, I meant if a large number of varying strings were symbols, that could be like a memory leak because of the lack of GC. (Just reword what I say to make sense. I need some caffeine.) -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/Yan_ad5uCngJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Wednesday, September 12, 2012 7:33:50 AM UTC-7, Gary Weaver wrote:> > On Tuesday, September 11, 2012 5:22:10 PM UTC-4, Masterleep wrote: >> >> Maybe something like a StringPool > > > That''s a big one, and it would be something that needs to be addressed in > Ruby, not in Rails. But the problem is that you would have unintuitive > behavior for those used to doing things like: > >It could be implemented in Rails by using a container class to hold the database field names that are used as the keys inside the AR @attributes hash, and reusing the same string object across instances. Those strings are frozen anyway so the concern about modification doesn''t apply. Based on the ObjectSpace data, that one change would have a large impact on the number of allocated subobjects for each AR model instance. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/6s6zemn2liEJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On 12/09/12 17:29, Masterleep wrote:> It could be implemented in Rails by using a container class to hold the > database field names that are used as the keys inside the AR @attributes > hash, and reusing the same string object across instances. Those strings > are frozen anyway so the concern about modification doesn''t apply. > Based on the ObjectSpace data, that one change would have a large > impact on the number of allocated subobjects for each AR model instance.To be honest I think we should just change @attributes to be keyed by symbols. I don''t see that there is a DoS vector in doing this since the keys aren''t going to come from user input (however, I do need to think about that a bit more before I say so confidently). I changed @attributes_cache to be keyed by symbols recently which lead to a nice speed up in attribute access (before then we were creating a new string every time you call an attribute method). It should be noted that these things could theoretically be optimised at the implementation level. I did some benchmarking a while back and there was no difference between using symbols and strings in @attributes on JRuby. However on a practical level, I think we should change it. I''m interested to hear what Mr T. Love thinks. -- http://jonathanleighton.com/ -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Wednesday, September 12, 2012 9:36:29 AM UTC-7, Jon Leighton wrote:> > On 12/09/12 17:29, Masterleep wrote: > > It could be implemented in Rails by using a container class to hold the > > database field names that are used as the keys inside the AR @attributes > > hash, and reusing the same string object across instances. > > To be honest I think we should just change @attributes to be keyed by > symbols. I don''t see that there is a DoS vector in doing this since the > keys aren''t going to come from user input (however, I do need to think > about that a bit more before I say so confidently). >That would be even better, if it''s not too hard to change to symbols. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/igr8EY4pO30J. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
I added https://github.com/rails/rails/issues/7629 on this subject. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/Cq-ur3TeWswJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
Adding issues does not help, and only creates noise on the tracker. Specifics about WHAT is causing over-allocation or HOW to fix it may be valid. But an open issue for ''tons ''o objects'' helps nobody and is not productive. It is far too general. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
The bug report specifies what is causing the over-allocation and how to fix it. It''s pretty specific. On Thursday, September 13, 2012 12:39:23 PM UTC-7, Steve Klabnik wrote:> > Adding issues does not help, and only creates noise on the tracker. > > Specifics about WHAT is causing over-allocation or HOW to fix it may be > valid. But an open issue for ''tons ''o objects'' helps nobody and is not > productive. It is far too general. >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/k0qxw-y377QJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Thu, Sep 13, 2012 at 12:43 PM, Masterleep <billspam@lipa.name> wrote:> The bug report specifies what is causing the over-allocation and how to fix > it. It''s pretty specific.It''s pretty specific in terms of what the problem is, but it''s not at all descriptive of what the actual problem is an how to fix it. In this case, the cause is that ActiveRecord is using unfrozen strings as keys. When you use an unfrozen string as a hash key, ruby dups it, freezes the dup, and uses the frozen dup as the hash key. The simple fix to reduce the number of allocated strings from columns * (rows + 1) to just columns is to freeze the columns before using them as hash keys. Pull request filed: https://github.com/rails/rails/pull/7631 Jeremy -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Thursday, September 13, 2012 2:16:51 PM UTC-7, Jeremy Evans wrote:> > > Pull request filed: https://github.com/rails/rails/pull/7631 > > >Excellent! I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names). -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/o1s-SI3MPaEJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Thursday, September 13, 2012 5:38:57 PM UTC-4, Masterleep wrote:> Excellent! I verified that your fix did eliminate the redundancy on these > field name strings in the case I was studying (from 15 extra strings per > instance down to 2, where the 2 were attribute values, not field names). >Attribute values would be a good case for the StringPool I guess, even though I still think that would be something that should be introduced in Ruby, not Rails, and because of string''s bang methods altering the object itself so a lot of existing user code would assume object_id equivalence of a string and the object produced by one of that string''s bang methods, so it would be a major change. I know you wanted to focus on AR, but if you did only focus on AR attribute values and just had a StringPool for them, then AR attribute values would be object equivalent and have the same string bang method wierdness, but other strings wouldn''t act that way, and this would be much more evil than doing it in Ruby. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/s-96wVnEJ0AJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.
On Friday, September 14, 2012 9:03:32 AM UTC-4, Gary Weaver wrote:> > On Thursday, September 13, 2012 5:38:57 PM UTC-4, Masterleep wrote: > >> Excellent! I verified that your fix did eliminate the redundancy on >> these field name strings in the case I was studying (from 15 extra strings >> per instance down to 2, where the 2 were attribute values, not field names). >> > > Attribute values would be a good case for the StringPool I guess, even > though I still think that would be something that should be introduced in > Ruby, not Rails, and because of string''s bang methods altering the object > itself so a lot of existing user code would assume object_id equivalence of > a string and the object produced by one of that string''s bang methods, so > it would be a major change. I know you wanted to focus on AR, but if you > did only focus on AR attribute values and just had a StringPool for them, > then AR attribute values would be object equivalent and have the same > string bang method wierdness, but other strings wouldn''t act that way, and > this would be much more evil than doing it in Ruby. >Take a look at these: Bartosz DziewoĆski wrote in post #1077524:>http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters>http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values That probably doesn''t help because the Ruby optimization happens for strings that are 23 chars are more, and I guess that most attribute names are shorter (and many attribute values may be also). -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/XeK4Q7R5UwgJ. To post to this group, send email to rubyonrails-core@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-core+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.