Jason Wright
2011-Jan-22 00:52 UTC
[Puppet Users] high 500 error rate on file metadata operations
Howdy, Lately our puppetmasters have been returning 500 errors at a reasonably high rate. On any given day, 200-300 (out of many thousands) of our clients are reporting that they''ve received a 500 during a run. So far, the only firm reports I''ve received have been regarding our Ubuntu clients, for which we happen to have central logging of puppet errors. I''ve done some analysis of the logs that I have and found that the majority of the 500s (~83%) received by our Ubuntu clients are on file metadata operations: //shell/File[/etc/csh.cshrc]: Failed to retrieve current state of resource: Error 500 on SERVER: <!DOCTYPE HTML PUBLIC ""-//IETF//DTD HTML 2.0//EN""> <html><head> <title>500 Internal Server Error</title> [etc...] The selection of files appears to be random and generally retrying the puppet run succeeds. Our puppetmasters are behind load balanced VIPs and we''re running 0.25.5 on Ubuntu Lucid with the stock Lucid Apache2/Ruby/Passenger stack. I''ve gone through the bug tracker and while I see some issues that are close, they don''t match our specific symptoms (most of the bugs I saw showed the clients getting 400s, not 500s, or that affect sites already on 2.6.x). I''d really like to say that this started when we upgraded the puppetmasters to Lucid, but we weren''t tracking the apache error rates at the time of the upgrades, so all I have to go on is the bug report, which was filed weeks after the Lucid upgrades were complete. To answer some questions I''ve seen while searching for previous puppet-users threads about metadata fetch problems: - We do use environments and every client is in a non-default environment. That is to say that we do not have an environment named ''production'' at all. - We do use an external node classifier but I don''t think that it''s issue #3910. For one thing, the clients are receiving error code 500, not 400 as Nigel reported in #3910. Actually, I can get a 400 just by giving ''--environment production'', as reported in #3910, and the resulting stack trace is different from the ones I list below. - We have a cronjob that runs on the puppetmasters every 10min to update the manifests from version control which does a checkout to a temp location and then rsyncs over to the live location. It is possible that files are being updated during some of the 500 errors, but our OS teams have well defined release schedules (releases every 2-4 weeks) and the 500s continue between releases. I enabled debug logging and tracing on a few of our puppetmasters so I could get the stack traces logged. I''m getting the stack traces in the apache error log but they''re arriving without timestamps and without the actual exception detail message. Also, unfortunately, we''re getting a lot of exceptions right now due to an unrelated problem with our Mac clients so the logs are very noisy and it''s difficult to tell what is what. Also, since there are no timestamps[1], it''s hard to correlate the stack traces with specific logs from puppetmasterd. That said, here are some candidate stack traces, which are hopefully complete: Related to locking: /usr/lib/ruby/1.8/puppet/external/lock.rb:36:in `flush'' /usr/lib/ruby/1.8/puppet/external/lock.rb:36:in `lock_exclusive'' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:38:in `writelock'' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:37:in `open'' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:37:in `writelock'' /usr/lib/ruby/1.8/sync.rb:230:in `synchronize'' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:36:in `writelock'' /usr/lib/ruby/1.8/puppet/indirector/yaml.rb:40:in `save'' /usr/lib/ruby/1.8/puppet/indirector/indirection.rb:262:in `save'' /usr/lib/ruby/1.8/puppet/node/facts.rb:15:in `save'' /usr/lib/ruby/1.8/puppet/indirector.rb:65:in `save'' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:26:in `extract_facts_from_request'' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:31:in `find'' Two different paths to a "fileserver module not mounted" exception: /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list'' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe'' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' And there are a few of these: /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' /usr/lib/ruby/1.8/puppet/network/handler/master.rb:65:in `getconfig'' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 Jason [1] I''m not sure why there are no timestamps on the logged stack traces. Other logs that come directly from apache (such as the server restart message) have timestamps. -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Daniel Pittman
2011-Jan-23 21:59 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Jan 22, 2011 8:46 AM, "Jason Wright" <jwright@google.com> wrote:> > Lately our puppetmasters have been returning 500 errors at a > reasonably high rate. On any given day, 200-300 (out of many > thousands) of our clients are reporting that they''ve received a 500 > during a run. So far, the only firm reports I''ve received have been > regarding our Ubuntu clients, for which we happen to have central > logging of puppet errors. I''ve done some analysis of the logs that I > have and found that the majority of the 500s (~83%) received by our > Ubuntu clients are on file metadata operations:I would love to know what the details of your configuration are, including web server version (Apache, Passenger, etc), and what timeouts you have configured at that level. I would also be interested to know what the concurrency levels you are seeing are like on the server, and ideally what sort of I/O you have - is this a heavily contended or slow disk, and how fast are writes generally completing? Regards, Daniel -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-24 21:20 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Sun, Jan 23, 2011 at 1:59 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:> I would love to know what the details of your configuration are, including > web server version (Apache, Passenger, etc), and what timeouts you have > configured at that level.Let''s see here... ii puppet 0.25.5-1gg11 centralised configuration management for networks ii puppet-common 0.25.5-1gg11 common files for puppet and puppetmaster ii puppetmaster 0.25.5-1gg11 centralised configuration management control daemon (Our puppet .deb is built from the Debian 0.25.5 git tag. We have one patch which removes the distribution config.ru from the package.) These are all stock Ubuntu Lucid: ii apache2-mpm-worker 2.2.14-5ubuntu8.4 Apache HTTP Server - high speed threaded model ii apache2-utils 2.2.14-5ubuntu8.4 utility programs for webservers ii apache2.2-bin 2.2.14-5ubuntu8.4 Apache HTTP Server common binary files ii apache2.2-common 2.2.14-5ubuntu8.4 Apache HTTP Server common files ii libapache2-mod-passenger 2.2.7debian-1 Rails and Rack support for Apache2 ii ruby1.8 1.8.7.249-2 Interpreter of object-oriented scripting language Ruby 1 ii librack-ruby1.8 1.1.0-3 A modular Ruby webserver interface (Ruby 1.8) Passenger config, which we''ve been using for awhile and I keep forgetting to try and tune: <IfModule mod_passenger.c> PassengerMaxPoolSize 8 PassengerPoolIdleTime 300 PassengerStatThrottleRate 600 PassengerUseGlobalQueue on PassengerHighPerformance on </IfModule>> I would also be interested to know what the concurrency levels you are > seeing are like on the server, and ideally what sort of I/O you have - is > this a heavily contended or slow disk, and how fast are writes generally > completing?Our config servers are all implemented as Xen virtual machines with 4 vCPUs and 4GB of RAM each. They are organized into VIPs with 2-4 backends per VIP. The VMs are dedicated to puppet but tend to share physical hardware with VMs running over services. The occasional spike aside, the typical config server VM under peak load exhibits: - 15min load < 2.0 - CPU utilization < 20% - No swapping - < 15KB/s average disk throughput - Write latency of 100-200ms or less - 20-45 RESTful API calls per second (sorry, I don''t have the detailed breakdown of which API calls) - < 8 catalog requests per minute across all environments The service doesn''t seem to be very loaded to me, especially compared to the Bad Old Days where our config server VMs ran with 15min load averages of 12-14 with the same hardware specs. For the record, I can''t actually upgrade to 2.6.x right now. My natural inclination would have been to have done so prior to starting this thread up in the first place but we have a defined type which uses "stage" as a parameter. This conflicts hilariously with run stages and I have to wait for an OS release with the fix to work its way through the pipeline before I can resume work on a 2.6.x upgrade. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Daniel Pittman
2011-Jan-24 22:30 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Mon, Jan 24, 2011 at 13:20, Jason Wright <jwright@google.com> wrote:> On Sun, Jan 23, 2011 at 1:59 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:>> I would love to know what the details of your configuration are, including >> web server version (Apache, Passenger, etc), and what timeouts you have >> configured at that level.[...] Thanks for sharing those details. As I expected, none of that looks like it should have any relationship with the failure, and helps me eliminate some "is it slow I/O" from my early mental space to explore to understand the problem.> For the record, I can''t actually upgrade to 2.6.x right now. My > natural inclination would have been to have done so prior to starting > this thread up in the first place but we have a defined type which > uses "stage" as a parameter. This conflicts hilariously with run > stages and I have to wait for an OS release with the fix to work its > way through the pipeline before I can resume work on a 2.6.x upgrade.By hilariously, how terrible is it? "Fails in nasty but obvious ways" would be OK; "sets fire to your cat" would be pretty undesirable, and would encourage me to think about how to better handle that in future. Regards, Daniel -- ⎋ Puppet Labs Developer – http://puppetlabs.com ✉ Daniel Pittman <daniel@puppetlabs.com> ✆ Contact me via gtalk, email, or phone: +1 (877) 575-9775 ♲ Made with 100 percent post-consumer electrons -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-24 22:39 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Mon, Jan 24, 2011 at 2:30 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:> Thanks for sharing those details. As I expected, none of that looks > like it should have any relationship with the failure, and helps me > eliminate some "is it slow I/O" from my early mental space to explore > to understand the problem.No worries.> By hilariously, how terrible is it? "Fails in nasty but obvious ways" > would be OK; "sets fire to your cat" would be pretty undesirable, and > would encourage me to think about how to better handle that in future.If I upgrade a puppetmaster to 2.6.3, catalog compilation fails for all environments using this particular defined type with an error to the effect of "you''re doing run stages wrong". I can''t remember the exact message, sorry. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Daniel Pittman
2011-Jan-24 22:42 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Mon, Jan 24, 2011 at 14:39, Jason Wright <jwright@google.com> wrote:> On Mon, Jan 24, 2011 at 2:30 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:[...]>> By hilariously, how terrible is it? "Fails in nasty but obvious ways" >> would be OK; "sets fire to your cat" would be pretty undesirable, and >> would encourage me to think about how to better handle that in future. > > If I upgrade a puppetmaster to 2.6.3, catalog compilation fails for > all environments using this particular defined type with an error to > the effect of "you''re doing run stages wrong". I can''t remember the > exact message, sorry.That qualifies into the first class of problems. I don''t like it, but it isn''t miscompiling the catalog and sending something entirely broken to the client without warning, so that isn''t so dire as it could be. Thanks, Daniel -- ⎋ Puppet Labs Developer – http://puppetlabs.com ✉ Daniel Pittman <daniel@puppetlabs.com> ✆ Contact me via gtalk, email, or phone: +1 (877) 575-9775 ♲ Made with 100 percent post-consumer electrons -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-24 23:09 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Mon, Jan 24, 2011 at 2:42 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:> That qualifies into the first class of problems. I don''t like it, but > it isn''t miscompiling the catalog and sending something entirely > broken to the client without warning, so that isn''t so dire as it > could be.Puppet is doing the right thing. I was just giving some background on why we weren''t on 2.6.x yet. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Daniel Pittman
2011-Jan-25 05:24 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Mon, Jan 24, 2011 at 14:30, Daniel Pittman <daniel@puppetlabs.com> wrote:> On Mon, Jan 24, 2011 at 13:20, Jason Wright <jwright@google.com> wrote: >> On Sun, Jan 23, 2011 at 1:59 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: > >>> I would love to know what the details of your configuration are, including >>> web server version (Apache, Passenger, etc), and what timeouts you have >>> configured at that level. > > [...] > > Thanks for sharing those details. As I expected, none of that looks > like it should have any relationship with the failure, and helps me > eliminate some "is it slow I/O" from my early mental space to explore > to understand the problem.So, I was looking further at this to try and narrow down the places where the failure originates, and I am vaguely suspicious of the disk subsystem still, at least for the trace involving ''flush'' / locking. Do any of the exceptions you find elsewhere look like they are related to I/O failures? For the other two exceptions, do you have ''ArgumentError'' "Could not find hostname" raised anywhere, or FileServerError, "Fileserver module %s is not mounted"? They also, ultimately, lead down to a place where I/O subsystem errors could cause a false failure, and it would be interesting to know if either of those two were thrown. Right now my best guess would be that you are seeing some transient disk errors on the system, which are causing File#flush (which calls into the kernel to write data, but isn''t fsync), and FileTest.exist?, to return errors that are not otherwise handled. Working out if those two errors are thrown or not would be a big help in narrowing that down again. Regards, Daniel -- ⎋ Puppet Labs Developer – http://puppetlabs.com ✉ Daniel Pittman <daniel@puppetlabs.com> ✆ Contact me via gtalk, email, or phone: +1 (877) 575-9775 ♲ Made with 100 percent post-consumer electrons -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-25 17:47 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Mon, Jan 24, 2011 at 9:24 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:> For the other two exceptions, do you have ''ArgumentError'' "Could not > find hostname" raised anywhere, or FileServerError, "Fileserver module > %s is not mounted"? They also, ultimately, lead down to a place where > I/O subsystem errors could cause a false failure, and it would be > interesting to know if either of those two were thrown.These two lead to a fileserver module not mounted exception: /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list'' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe'' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-25 18:25 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On 25/01/11 18:47, Jason Wright wrote:> On Mon, Jan 24, 2011 at 9:24 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: >> For the other two exceptions, do you have ''ArgumentError'' "Could not >> find hostname" raised anywhere, or FileServerError, "Fileserver module >> %s is not mounted"? They also, ultimately, lead down to a place where >> I/O subsystem errors could cause a false failure, and it would be >> interesting to know if either of those two were thrown. > > These two lead to a fileserver module not mounted exception: > > /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' > /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' > /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list'' > /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 > /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' > /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' > > /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' > /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' > /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe'' > /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 > /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' > /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' > /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' > /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process''xmlrpc? Do you still have 0.24.x clients? You omitted one important piece of information which is the kind of exception along with the error message. Can you post it so that we can understand what happens? -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2011-Jan-25 18:27 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Tue, Jan 25, 2011 at 10:25 AM, Brice Figureau <brice-puppet@daysofwonder.com> wrote:> On 25/01/11 18:47, Jason Wright wrote: >> On Mon, Jan 24, 2011 at 9:24 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: >>> For the other two exceptions, do you have ''ArgumentError'' "Could not >>> find hostname" raised anywhere, or FileServerError, "Fileserver module >>> %s is not mounted"? They also, ultimately, lead down to a place where >>> I/O subsystem errors could cause a false failure, and it would be >>> interesting to know if either of those two were thrown. >> >> These two lead to a fileserver module not mounted exception: >> >> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' >> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' >> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list'' >> /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 >> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' >> /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' >> >> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' >> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' >> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe'' >> /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 >> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' >> /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' >> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' >> /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' > > xmlrpc? > Do you still have 0.24.x clients? > > You omitted one important piece of information which is the kind of > exception along with the error message. Can you post it so that we can > understand what happens?Brice, I''m pretty sure we still had some XMLRPC left in 0.25.x, I don''t believe we completely got rid of it until 2.6.x> -- > Brice Figureau > My Blog: http://www.masterzen.fr/ > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-25 18:48 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On 25/01/11 19:27, Nigel Kersten wrote:> On Tue, Jan 25, 2011 at 10:25 AM, Brice Figureau > <brice-puppet@daysofwonder.com> wrote: >> On 25/01/11 18:47, Jason Wright wrote: >>> On Mon, Jan 24, 2011 at 9:24 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: >>>> For the other two exceptions, do you have ''ArgumentError'' "Could not >>>> find hostname" raised anywhere, or FileServerError, "Fileserver module >>>> %s is not mounted"? They also, ultimately, lead down to a place where >>>> I/O subsystem errors could cause a false failure, and it would be >>>> interesting to know if either of those two were thrown. >>> >>> These two lead to a fileserver module not mounted exception: >>> >>> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' >>> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' >>> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list'' >>> /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' >>> /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' >>> >>> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'' >>> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'' >>> /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe'' >>> /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'' >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'' >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'' >>> /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'' >>> /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'' >>> /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'' >> >> xmlrpc? >> Do you still have 0.24.x clients? >> >> You omitted one important piece of information which is the kind of >> exception along with the error message. Can you post it so that we can >> understand what happens? > > Brice, I''m pretty sure we still had some XMLRPC left in 0.25.x, I > don''t believe we completely got rid of it until 2.6.xOh, I''m well aware of this. I was asking about 0.24.x clients, because I thought the not-ported-to-REST handlers was only the filebucket. I was pretty sure file_metadata and file_content were handled through fullblown REST. BTW, I really think this is a thread race, as the first trace reminds me something I reported (and we fixed) for 2.6. Looking to the 0.25.5 code of the xmlrpc fileserver handler, when mounting it tries to find the current node, which might trigger a call to the ENC, if I''m not mistaken. If this operation lasts a long time, it is well possible that another threads trigger the same codepath. This same codepath also uses the environment cached_attr "module": I also discovered a thread race in this code that was fixed in 2.6. Would it be possible to run passenger in a mode where it won''t spawn more than one thread and see if the problem disappears? -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-25 19:10 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Tue, Jan 25, 2011 at 10:48 AM, Brice Figureau <brice-puppet@daysofwonder.com> wrote:>>> xmlrpc? >>> Do you still have 0.24.x clients?No. We''re 0.25.5 across the board.>>> You omitted one important piece of information which is the kind of >>> exception along with the error message. Can you post it so that we can >>> understand what happens?No, I can''t. As I originally stated, the actual exception isn''t making it into the apache error log and since the stack traces aren''t timestamped, I can''t correlate to the puppetmasterd logs. I''d love to understand why this is so I can provide better information to you. I''ve received some passenger pointers from a coworker and am going to play with the logging options to see if I can affect this.> Would it be possible to run passenger in a mode where it won''t spawn > more than one thread and see if the problem disappears?You mean setting PassengerMaxPoolSize to 1? I can try that on one of the production VMs but I''ll have to revert it immediately if it causes any other problems. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-25 19:26 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On 25/01/11 20:10, Jason Wright wrote:> On Tue, Jan 25, 2011 at 10:48 AM, Brice Figureau > <brice-puppet@daysofwonder.com> wrote: >>>> xmlrpc? >>>> Do you still have 0.24.x clients? > > No. We''re 0.25.5 across the board. > >>>> You omitted one important piece of information which is the kind of >>>> exception along with the error message. Can you post it so that we can >>>> understand what happens? > > No, I can''t. As I originally stated, the actual exception isn''t > making it into the apache error log and since the stack traces aren''t > timestamped, I can''t correlate to the puppetmasterd logs. I''d love to > understand why this is so I can provide better information to you. > I''ve received some passenger pointers from a coworker and am going to > play with the logging options to see if I can affect this.OK, I missed this fact in the discussion, sorry.>> Would it be possible to run passenger in a mode where it won''t spawn >> more than one thread and see if the problem disappears? > > You mean setting PassengerMaxPoolSize to 1? I can try that on one of > the production VMs but I''ll have to revert it immediately if it causes > any other problems.I really don''t know anything about Passenger, but reading the documentation I don''t think that''s the correct settings. I couldn''t find in the documentation how passenger or rack use ruby threads, so I''m unsure about what to do (one solution would be to add some sync on the puppet side). -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-26 09:38 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Tue, 2011-01-25 at 20:26 +0100, Brice Figureau wrote:> On 25/01/11 20:10, Jason Wright wrote: > > On Tue, Jan 25, 2011 at 10:48 AM, Brice Figureau > > <brice-puppet@daysofwonder.com> wrote: > >>>> xmlrpc? > >>>> Do you still have 0.24.x clients? > > > > No. We''re 0.25.5 across the board. > > > >>>> You omitted one important piece of information which is the kind of > >>>> exception along with the error message. Can you post it so that we can > >>>> understand what happens? > > > > No, I can''t. As I originally stated, the actual exception isn''t > > making it into the apache error log and since the stack traces aren''t > > timestamped, I can''t correlate to the puppetmasterd logs. I''d love to > > understand why this is so I can provide better information to you. > > I''ve received some passenger pointers from a coworker and am going to > > play with the logging options to see if I can affect this. > > OK, I missed this fact in the discussion, sorry. > > >> Would it be possible to run passenger in a mode where it won''t spawn > >> more than one thread and see if the problem disappears? > > > > You mean setting PassengerMaxPoolSize to 1? I can try that on one of > > the production VMs but I''ll have to revert it immediately if it causes > > any other problems. > > I really don''t know anything about Passenger, but reading the > documentation I don''t think that''s the correct settings. > I couldn''t find in the documentation how passenger or rack use ruby > threads, so I''m unsure about what to do (one solution would be to add > some sync on the puppet side).My analysis might be completely wrong, because reading the passenger documentation, it looks like there should be only one thread entering a given master process at any given time. Still there can be some tricky multi-process issues, like the one we fixed in 2.6: http://projects.puppetlabs.com/issues/4923 -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-26 19:42 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Wed, Jan 26, 2011 at 1:38 AM, Brice Figureau <brice-puppet@daysofwonder.com> wrote:> Still there can be some tricky multi-process issues, like the one we > fixed in 2.6: > http://projects.puppetlabs.com/issues/4923My bugfix for the manifest issue that''s blocking me from upgrading to 2.6.x has progressed far enough through our internal Ubuntu release process that I can resume testing for the upgrade, so that''s good news. I don''t think I''ll be able to put production traffic onto a 2.6.x server until at least next week, and that''s assuming that there are no showstoppers in the Ubuntu release. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Daniel Pittman
2011-Jan-26 21:17 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Wed, Jan 26, 2011 at 11:42, Jason Wright <jwright@google.com> wrote:> On Wed, Jan 26, 2011 at 1:38 AM, Brice Figureau > <brice-puppet@daysofwonder.com> wrote: >> Still there can be some tricky multi-process issues, like the one we >> fixed in 2.6: >> http://projects.puppetlabs.com/issues/4923 > > My bugfix for the manifest issue that''s blocking me from upgrading to > 2.6.x has progressed far enough through our internal Ubuntu release > process that I can resume testing for the upgrade, so that''s good > news. I don''t think I''ll be able to put production traffic onto a > 2.6.x server until at least next week, and that''s assuming that there > are no showstoppers in the Ubuntu release.For what it is worth I have been looking at this quietly in the background, and come to the conclusion that to progress further I am going to have to either reproduce this myself (failed, so far), or get a bit of state instrumentation into that code to track down exactly what conditions are being hit to trigger the failure. Sadly, this is also being bitten by my need to get our current release out the door, and other company stuff, too. Sounds like a good next step might be for y''all to let me know when you might look at being able to do that instrumentation, and I can try and send you a satisfactory patch to trial? Regards, Daniel -- ⎋ Puppet Labs Developer – http://puppetlabs.com ✉ Daniel Pittman <daniel@puppetlabs.com> ✆ Contact me via gtalk, email, or phone: +1 (877) 575-9775 ♲ Made with 100 percent post-consumer electrons -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-26 21:56 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Wed, Jan 26, 2011 at 1:17 PM, Daniel Pittman <daniel@puppetlabs.com> wrote:> For what it is worth I have been looking at this quietly in the > background, and come to the conclusion that to progress further I am > going to have to either reproduce this myself (failed, so far), or get > a bit of state instrumentation into that code to track down exactly > what conditions are being hit to trigger the failure.I haven''t been able to reproduce it either. So far, I''ve tried annexing a bunch of machines and running puppetd in a tight loop against an otherwise idle puppetmaster VM and I can get the rate of API calls and catalog compiles up to the correct level for one of our busy VMs, but no 500s (or even 400s) so far. If this fails, I have some code which fetches pluginsync metadata and then proceeeds to make fileserver calls for every .rb listed. I''ll start using that generate traffic, since these are the sorts of operations which get the most errors.> Sounds like a good next step might be for y''all to let me know when > you might look at being able to do that instrumentation, and I can try > and send you a satisfactory patch to trial?What instrumentation would you be looking for? Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Daniel Pittman
2011-Jan-26 22:36 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Wed, Jan 26, 2011 at 13:56, Jason Wright <jwright@google.com> wrote:> On Wed, Jan 26, 2011 at 1:17 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: > >> For what it is worth I have been looking at this quietly in the >> background, and come to the conclusion that to progress further I am >> going to have to either reproduce this myself (failed, so far), or get >> a bit of state instrumentation into that code to track down exactly >> what conditions are being hit to trigger the failure. > > I haven''t been able to reproduce it either. So far, I''ve tried > annexing a bunch of machines and running puppetd in a tight loop > against an otherwise idle puppetmaster VM and I can get the rate of > API calls and catalog compiles up to the correct level for one of our > busy VMs, but no 500s (or even 400s) so far. If this fails, I have > some code which fetches pluginsync metadata and then proceeeds to make > fileserver calls for every .rb listed. I''ll start using that generate > traffic, since these are the sorts of operations which get the most > errors. > >> Sounds like a good next step might be for y''all to let me know when >> you might look at being able to do that instrumentation, and I can try >> and send you a satisfactory patch to trial? > > What instrumentation would you be looking for?Specifically, around the "not mounted" fault, in the ''splitpath'' method, identify what the value of ''mount'' in the outer ''unless'' is, and what @mounts and mount_name contain. My hope would be to use that to narrow down the possible causes, and either confirm or eliminate a thread race or something. I doubt that will be the complete data set, but it should help move forward. Annoyingly, I don''t have a super-solid picture of what the problem is at this stage, because it looks like it shouldn''t be possible to hit the situation but, clearly, it is getting there... Daniel -- ⎋ Puppet Labs Developer – http://puppetlabs.com ✉ Daniel Pittman <daniel@puppetlabs.com> ✆ Contact me via gtalk, email, or phone: +1 (877) 575-9775 ♲ Made with 100 percent post-consumer electrons -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-27 09:31 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Wed, 2011-01-26 at 13:56 -0800, Jason Wright wrote:> On Wed, Jan 26, 2011 at 1:17 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: > > For what it is worth I have been looking at this quietly in the > > background, and come to the conclusion that to progress further I am > > going to have to either reproduce this myself (failed, so far), or get > > a bit of state instrumentation into that code to track down exactly > > what conditions are being hit to trigger the failure. > > I haven''t been able to reproduce it either. So far, I''ve tried > annexing a bunch of machines and running puppetd in a tight loop > against an otherwise idle puppetmaster VM and I can get the rate of > API calls and catalog compiles up to the correct level for one of our > busy VMs, but no 500s (or even 400s) so far. If this fails, I have > some code which fetches pluginsync metadata and then proceeeds to make > fileserver calls for every .rb listed. I''ll start using that generate > traffic, since these are the sorts of operations which get the most > errors.There''s no guarantee you exercise the exact same XLMRPC stuff than you are seeing in production (especially if it''s the filebucket handler). There are great chances your test environment doesn''t change, so the filebucket is almost never used. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-27 09:42 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Thu, 2011-01-27 at 10:31 +0100, Brice Figureau wrote:> On Wed, 2011-01-26 at 13:56 -0800, Jason Wright wrote: > > On Wed, Jan 26, 2011 at 1:17 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: > > > For what it is worth I have been looking at this quietly in the > > > background, and come to the conclusion that to progress further I am > > > going to have to either reproduce this myself (failed, so far), or get > > > a bit of state instrumentation into that code to track down exactly > > > what conditions are being hit to trigger the failure. > > > > I haven''t been able to reproduce it either. So far, I''ve tried > > annexing a bunch of machines and running puppetd in a tight loop > > against an otherwise idle puppetmaster VM and I can get the rate of > > API calls and catalog compiles up to the correct level for one of our > > busy VMs, but no 500s (or even 400s) so far. If this fails, I have > > some code which fetches pluginsync metadata and then proceeeds to make > > fileserver calls for every .rb listed. I''ll start using that generate > > traffic, since these are the sorts of operations which get the most > > errors. > > There''s no guarantee you exercise the exact same XLMRPC stuff than you > are seeing in production (especially if it''s the filebucket handler). > There are great chances your test environment doesn''t change, so the > filebucket is almost never used.From the stacktrace, it is highly improbable it''s the filebucket. It really looks like the XMLRPC fileserving api. But, I did a quick check and to my knowledge pluginsync, factsync and file serving are using the REST api which doesn''t call any of the XMLRPC handlers. Which make me think that either you still have 0.24.x clients or I missed some 0.25 client feature that uses XMLRPC file serving. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-27 09:49 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Wed, 2011-01-26 at 14:36 -0800, Daniel Pittman wrote:> On Wed, Jan 26, 2011 at 13:56, Jason Wright <jwright@google.com> wrote: > > On Wed, Jan 26, 2011 at 1:17 PM, Daniel Pittman <daniel@puppetlabs.com> wrote: > > > >> For what it is worth I have been looking at this quietly in the > >> background, and come to the conclusion that to progress further I am > >> going to have to either reproduce this myself (failed, so far), or get > >> a bit of state instrumentation into that code to track down exactly > >> what conditions are being hit to trigger the failure. > > > > I haven''t been able to reproduce it either. So far, I''ve tried > > annexing a bunch of machines and running puppetd in a tight loop > > against an otherwise idle puppetmaster VM and I can get the rate of > > API calls and catalog compiles up to the correct level for one of our > > busy VMs, but no 500s (or even 400s) so far. If this fails, I have > > some code which fetches pluginsync metadata and then proceeeds to make > > fileserver calls for every .rb listed. I''ll start using that generate > > traffic, since these are the sorts of operations which get the most > > errors. > > > >> Sounds like a good next step might be for y''all to let me know when > >> you might look at being able to do that instrumentation, and I can try > >> and send you a satisfactory patch to trial? > > > > What instrumentation would you be looking for? > > Specifically, around the "not mounted" fault, in the ''splitpath'' > method, identify what the value of ''mount'' in the outer ''unless'' is, > and what @mounts and mount_name contain. My hope would be to use that > to narrow down the possible causes, and either confirm or eliminate a > thread race or something.There are some thread races in this codepath: * we currently know that all cached_attrs (and splitpath uses one through the module accessor of the environment) are subject to a thread race in 0.25. * there is another one when reading fileserver.conf (in readconfig). But since normally passenger should make sure there is only one thread in a given running puppet process we should be immune.> I doubt that will be the complete data set, but it should help move > forward. Annoyingly, I don''t have a super-solid picture of what the > problem is at this stage, because it looks like it shouldn''t be > possible to hit the situation but, clearly, it is getting there...Yes, so we''re certainly missing something, and instrumenting this codepath will help understand the root cause. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-27 19:40 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Thu, Jan 27, 2011 at 1:42 AM, Brice Figureau <brice-puppet@daysofwonder.com> wrote:> Which make me think that either you still have 0.24.x clients or I > missed some 0.25 client feature that uses XMLRPC file serving.All of our OS teams upgraded to 0.25.x clients months ago. I don''t even think our manifests compile for 0.24.x clients any more and we do not support clients earlier than 0.25.x any longer. If we do have 0.24.x clients (and there probably are a few running around), they''re broken machines. In any case, the nodes which are reporting the 500s which caused our Ubuntu team to open the bug in the first place are definitely on 0.25.5. I don''t know that I''ve reported the correct stack traces, only some of the more common ones. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-27 21:46 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On 27/01/11 20:40, Jason Wright wrote:> On Thu, Jan 27, 2011 at 1:42 AM, Brice Figureau > <brice-puppet@daysofwonder.com> wrote: >> Which make me think that either you still have 0.24.x clients or I >> missed some 0.25 client feature that uses XMLRPC file serving. > > All of our OS teams upgraded to 0.25.x clients months ago. I don''t > even think our manifests compile for 0.24.x clients any more and we do > not support clients earlier than 0.25.x any longer. If we do have > 0.24.x clients (and there probably are a few running around), they''re > broken machines. In any case, the nodes which are reporting the 500s > which caused our Ubuntu team to open the bug in the first place are > definitely on 0.25.5. I don''t know that I''ve reported the correct > stack traces, only some of the more common ones.I believe you :) Can you correlate the stacktrace to a given client through the access log, and with that maybe we can find what on the client triggers this. I did a quick grep in the 0.25.5 source-code to find what could still use the xmlrpc fileserver, but couldn''t find anything. Adding a couple of Puppet.info() to puppet/network/handler/fileserver.rb around line 232 to print both the client and the url would maybe help you find who is triggering this. Regarding the first stacktrace you posted in your first e-mail, I''m sure this is the writelock multiprocess issue we fixed in 2.6 and that I referred to in a previous e-mail. This is a single (ok maybe 2) line fix that is safe for you to backport to your 0.25.5 master. HTH, -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-27 23:59 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Thu, Jan 27, 2011 at 1:46 PM, Brice Figureau <brice-puppet@daysofwonder.com> wrote:> Regarding the first stacktrace you posted in your first e-mail, I''m sure > this is the writelock multiprocess issue we fixed in 2.6 and that I > referred to in a previous e-mail. This is a single (ok maybe 2) line fix > that is safe for you to backport to your 0.25.5 master.You''re referring to this patch, correct? http://projects.puppetlabs.com/projects/puppet/repository/revisions/9ba0c8a22c6f9ca856851ef6c2d38242754a7a00/diff/lib/puppet/util/file_locking.rb I''ve got that patched into our package and will try to get it out to our canary puppetmasters tomorrow. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Jan-28 08:17 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Thu, 2011-01-27 at 15:59 -0800, Jason Wright wrote:> On Thu, Jan 27, 2011 at 1:46 PM, Brice Figureau > <brice-puppet@daysofwonder.com> wrote: > > > Regarding the first stacktrace you posted in your first e-mail, I''m sure > > this is the writelock multiprocess issue we fixed in 2.6 and that I > > referred to in a previous e-mail. This is a single (ok maybe 2) line fix > > that is safe for you to backport to your 0.25.5 master. > > You''re referring to this patch, correct? > > http://projects.puppetlabs.com/projects/puppet/repository/revisions/9ba0c8a22c6f9ca856851ef6c2d38242754a7a00/diff/lib/puppet/util/file_locking.rb > > I''ve got that patched into our package and will try to get it out to > our canary puppetmasters tomorrow.Yes, this is the patch I was talking about. I have great hopes it will fix the filelock related stacktrace. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Jan-31 20:48 UTC
Re: [Puppet Users] high 500 error rate on file metadata operations
On Fri, Jan 28, 2011 at 12:17 AM, Brice Figureau <brice-puppet@daysofwonder.com> wrote:> Yes, this is the patch I was talking about. I have great hopes it will > fix the filelock related stacktrace.It''s been rolled out to our canary servers. The rest of the fleet will get it tomorrow, along with the debug patch that Daniel provided me. I have traces enabled on one of the canaries, but it hasn''t logged anything yet from Daniel''s patch. I''m also taking Daniel''s hint and adding puts() based logging on this one puppetmaster for every unique exception we''re getting, which gives me the missing details from the raise that aren''t being logged. Jason -- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Feb-09 20:18 UTC
[Puppet Users] Re: high 500 error rate on file metadata operations
I rolled out 2.6.3 to four puppetmasters yesterday as part of a canary and while it''s been less than 24 hours, there are already encouraging signs. The servers participating in the canary logged just short of 600 500 errors yesterday and have logged 0 so far today. For further comparison, the remainder of our puppetmaster fleet has logged over 1000 500 errors in just twelve hours today. Jason -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Jason Wright
2011-Feb-11 17:06 UTC
[Puppet Users] Re: high 500 error rate on file metadata operations
I''ve completed rolling out 2.6.3 and it''s completely resolved our issues. Thanks for the help, Jason On Wed, Feb 9, 2011 at 12:18 PM, Jason Wright <jwright@google.com> wrote:> I rolled out 2.6.3 to four puppetmasters yesterday as part of a canary > and while it''s been less than 24 hours, there are already encouraging > signs. The servers participating in the canary logged just short of > 600 500 errors yesterday and have logged 0 so far today. For further > comparison, the remainder of our puppetmaster fleet has logged over > 1000 500 errors in just twelve hours today. > > Jason >-- "His face disappeared. If someone has no face left, you know it''s serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nigel Kersten
2011-Feb-11 18:18 UTC
Re: [Puppet Users] Re: high 500 error rate on file metadata operations
On Fri, Feb 11, 2011 at 9:06 AM, Jason Wright <jwright@google.com> wrote:> I''ve completed rolling out 2.6.3 and it''s completely resolved our issues. > > Thanks for the help,Wahoo! That''s awesome to hear Jason. It might be worth starting canarying 2.6.5rc2, as you''re doing report submissions, and we''ve got a fix in there for larger report submission problems.> Jason > > On Wed, Feb 9, 2011 at 12:18 PM, Jason Wright <jwright@google.com> wrote: >> I rolled out 2.6.3 to four puppetmasters yesterday as part of a canary >> and while it''s been less than 24 hours, there are already encouraging >> signs. The servers participating in the canary logged just short of >> 600 500 errors yesterday and have logged 0 so far today. For further >> comparison, the remainder of our puppetmaster fleet has logged over >> 1000 500 errors in just twelve hours today. >> >> Jason >> > > > > -- > "His face disappeared. If someone has no face left, you know it''s serious." > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Brice Figureau
2011-Feb-11 18:37 UTC
Re: [Puppet Users] Re: high 500 error rate on file metadata operations
On 11/02/11 18:06, Jason Wright wrote:> I''ve completed rolling out 2.6.3 and it''s completely resolved our issues.That''s terriffic! Thanks for sharing this good news. -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.