Hello all, We have a problem with puppet and certain kind of machines from our farm (+300), those with Supermicro X8SIE motherboard. Sometime when running puppet the machine crash, we loss access to it and logging through IPMI doesn''t show anything in the console, the only thing we can do is a cold reboot. Then if we run puppet again, nothing happens. If we run puppet several days after it could be another crash or not, it is random. I debugged the problem and get the conclusion the cause was running "facter", running it in a mpssh session caused 7 or 8 crashes in different machines. Soft Version: S.O facter 1.5.4-1ubuntu1 puppet 0.25.1-2 After upgrading to facter -1.6.11-1 crashes continue. (last .deb in puppetlabs to hardy) -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/HxwyenNunv4J. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Hello all,> > We have a problem with puppet and certain kind of machines from our farm > (+300), those with Supermicro X8SIE motherboard. Sometime when running > puppet the machine crashes, we lose access to it and logging through IPMI > doesn''t show anything in the console, the only thing we can do is a cold > reboot. Then if we run puppet again, nothing happens. If we run puppet > several days after it could be another crash or not, it is random. > I debugged the problem and got the conclusion that the cause was when > running "facter", running it in a mpssh session caused 7 or 8 crashes in > different machines. > > Soft Version: > S.O: ubuntu 8.04 > facter 1.5.4-1ubuntu1 > puppet 0.25.1-2 > > After upgrading to facter -1.6.11-1 crashes continued. (last .deb in > puppetlabs to hardy) > >Sorry, I sent before ending....... I managed to get some traces executing with "strace" that I could paste if you consider so. Someone has experienced something like that? -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/bkmbvI5nGvkJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
On Thursday, November 22, 2012 6:23:06 AM UTC-6, Mon wrote:> > > > > Hello all, >> >> We have a problem with puppet and certain kind of machines from our farm >> (+300), those with Supermicro X8SIE motherboard. Sometime when running >> puppet the machine crashes, we lose access to it and logging through IPMI >> doesn''t show anything in the console, the only thing we can do is a cold >> reboot. Then if we run puppet again, nothing happens. If we run puppet >> several days after it could be another crash or not, it is random. >> I debugged the problem and got the conclusion that the cause was when >> running "facter", running it in a mpssh session caused 7 or 8 crashes in >> different machines. >> >> Soft Version: >> S.O: ubuntu 8.04 >> facter 1.5.4-1ubuntu1 >> puppet 0.25.1-2 >> >> After upgrading to facter -1.6.11-1 crashes continued. (last .deb in >> puppetlabs to hardy) >> >> > Sorry, I sent before ending....... > > I managed to get some traces executing with "strace" that I could paste if > you consider so. > > Someone has experienced something like that? > > >For what it''s worth, Facter itself is unlikely to be crashing your system, but it runs a variety of commands that probe system details, and it''s possible that one or a combination of those sometimes crashes them. It should be possible to crash the systems by running the same commands from the shell. If you have straces of facter sessions that resulted in crashes then they might be illuminating. The key thing I would be looking for is what commands Facter is trying to run when the crashes occurred. Unfortunately, the nature of the problem precludes being certain that the last thing in the captured trace is actually the thing Facter was trying to do when the crash happened. If there is a software bug then it is probably in a separate tool or in the OS kernel. It might also be that you have a firmware (i.e. BIOS) bug on the affected systems, or even that the particular motherboard model that is affected has a design or fabrication flaw. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/uRikgvYaJN8J. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
mseisdedos
2012-Nov-28 10:49 UTC
Re: [Puppet Users] Re: Executing puppet crash the machine
Hello John, Thanks for your answer. I have open an issue with my hardward manufacturer and so I will do it with my SO one. Anyway I paste the strace listings so maybe someone can shed light on it: server1: BIOS: American Megatrends Inc. 1.2 SYS: Supermicro X8SIE CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores] MEM: SLOT0 2048 MB SLOT1 2048 MB open("/usr/lib/ruby/1.8/facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 close(3) = 0 open("/usr/lib/ruby/1.8/facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0xb7297000 read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800 ......CRASH server2: BIOS: American Megatrends Inc. 1.2 SYS: Supermicro X8SIE CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores] MEM: SLOT0 2048 MB SLOT1 2048 MB stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, ...}) 0 pipe([3, 4]) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb74e5ba8) = 8709 close(4) = 0 fcntl64(3, F_GETFL) = 0 (flags O_RDONLY) fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0xb725e000 _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek) fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024 read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024 read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024 read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024 read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024 read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024 read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024 read(3, "ternal Reference Designator: LPT"..., 1024) = 1024 read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024 read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024 read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024 read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024 read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024 read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024 read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024 read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024 read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024 read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024 --- SIGCHLD (Child exited) @ 0 (0) --- read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024 read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024 read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024 read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669 read(3, "", 1024) = 0 close(3) = 0 munmap(0xb725e000, 4096) = 0 rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0 rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0 waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709 rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0 rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0 rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0 ............ sigprocmask(SIG_SETMASK, [], NULL) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_SETMASK, [], NULL) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 ............. sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_SETMASK, [], NULL) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 ......... sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 .......CRASH 2012/11/26 jcbollinger <John.Bollinger@stjude.org>> > > On Thursday, November 22, 2012 6:23:06 AM UTC-6, Mon wrote: >> >> >> >> >> Hello all, >>> >>> We have a problem with puppet and certain kind of machines from our farm >>> (+300), those with Supermicro X8SIE motherboard. Sometime when running >>> puppet the machine crashes, we lose access to it and logging through IPMI >>> doesn''t show anything in the console, the only thing we can do is a cold >>> reboot. Then if we run puppet again, nothing happens. If we run puppet >>> several days after it could be another crash or not, it is random. >>> I debugged the problem and got the conclusion that the cause was when >>> running "facter", running it in a mpssh session caused 7 or 8 crashes in >>> different machines. >>> >>> Soft Version: >>> S.O: ubuntu 8.04 >>> facter ** 1.5.4-1ubuntu1 >>> puppet 0.25.1-2 >>> >>> After upgrading to facter -1.6.11-1 crashes continued. (last .deb in >>> puppetlabs to hardy) >>> >>> >> Sorry, I sent before ending....... >> >> I managed to get some traces executing with "strace" that I could paste >> if you consider so. >> >> Someone has experienced something like that? >> >> >> > > > > For what it''s worth, Facter itself is unlikely to be crashing your system, > but it runs a variety of commands that probe system details, and it''s > possible that one or a combination of those sometimes crashes them. It > should be possible to crash the systems by running the same commands from > the shell. > > If you have straces of facter sessions that resulted in crashes then they > might be illuminating. The key thing I would be looking for is what > commands Facter is trying to run when the crashes occurred. Unfortunately, > the nature of the problem precludes being certain that the last thing in > the captured trace is actually the thing Facter was trying to do when the > crash happened. > > If there is a software bug then it is probably in a separate tool or in > the OS kernel. It might also be that you have a firmware (i.e. BIOS) bug > on the affected systems, or even that the particular motherboard model that > is affected has a design or fabrication flaw. > > > John > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/uRikgvYaJN8J. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2012-Nov-28 15:32 UTC
Re: [Puppet Users] Re: Executing puppet crash the machine
On Wednesday, November 28, 2012 4:49:13 AM UTC-6, Mon wrote:> > Hello John, > Thanks for your answer. I have open an issue with my hardward manufacturer > and so I will do it with my SO one. > Anyway I paste the strace listings so maybe someone can shed light on it: > > server1: > > BIOS: American Megatrends Inc. 1.2 > SYS: Supermicro X8SIE > CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores] > MEM: > SLOT0 2048 MB > SLOT1 2048 MB > > > open("/usr/lib/ruby/1.8/facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 > close(3) = 0 > open("/usr/lib/ruby/1.8/facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 > fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > = 0xb7297000 > read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800 > ......CRASH > > > server2: > > BIOS: American Megatrends Inc. 1.2 > SYS: Supermicro X8SIE > CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores] > MEM: > SLOT0 2048 MB > SLOT1 2048 MB > > > > stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, ...}) > = 0 > pipe([3, 4]) = 0 > clone(child_stack=0, > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0xb74e5ba8) = 8709 > close(4) = 0 > fcntl64(3, F_GETFL) = 0 (flags O_RDONLY) > fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > = 0xb725e000 > _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek) > fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 > read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024 > read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024 > read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024 > read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024 > read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024 > read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024 > read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024 > read(3, "ternal Reference Designator: LPT"..., 1024) = 1024 > read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024 > read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024 > read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024 > read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024 > read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024 > read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024 > read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024 > read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024 > read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024 > read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024 > --- SIGCHLD (Child exited) @ 0 (0) --- > read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024 > read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024 > read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024 > read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669 > read(3, "", 1024) = 0 > close(3) = 0 > munmap(0xb725e000, 4096) = 0 > rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0 > rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0 > rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0 > waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709 > rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0 > rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0 > rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0 > ............ > sigprocmask(SIG_SETMASK, [], NULL) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_SETMASK, [], NULL) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > ............. > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_SETMASK, [], NULL) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > ......... > sigprocmask(SIG_BLOCK, NULL, []) = 0 > sigprocmask(SIG_BLOCK, NULL, []) = 0 > .......CRASH > >I''m supposing that ".......CRASH" means "more of the same syscall, with similar results, until the trace ends on account of a system crash. The second trace says nothing useful, as far as I can tell. The last thing it shows before all the signal mask handling is the successful completion of a fact evaluation. The first trace is not much more helpful. The last thing it shows is Facter reading the Ruby code for the ''osfamily'' fact. That might indicate that it is during evaluation of that fact that the system crashed, but it''s too far removed from fact evaluation for me to have any confidence in that. My bet would be that the crash cuts off communication before its cause is reported in the trace, as I warned might be the case. Here''s another thing you could try: since facter doesn''t always crash the system (if I understand correctly), you should be able to get a list of all the facts it is evaluating (and their values) by running "facter -p" from the command line. Take that list, and use it to stress test facter on each fact individually (i.e. run facter -p <factname> many times in a loop), in a way that lets you be sure you always know which fact is currently under test. In this way you may be able to identify one or more facts whose evaluation sometimes crashes the machine. Note: don''t neglect the "or more" above. It is conceivable that your problem is deeper than just one fact. Once you know the facts with which the problem is associated, we can investigate the commands facter is running, and thereby narrow down the cause of the crash. John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/B7AKDJ-7U40J. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
mseisdedos
2012-Nov-28 16:41 UTC
Re: [Puppet Users] Re: Executing puppet crash the machine
Hello John, Your assumption is ok. I can not do the facter loop because we are in a production environment. Every time I run puppet on this machines I make sure I can reach its IPMI interface so I can reboot the machine in few minutes. Thanks for you help Regards. 2012/11/28 jcbollinger <John.Bollinger@stjude.org>> > > On Wednesday, November 28, 2012 4:49:13 AM UTC-6, Mon wrote: >> >> Hello John, >> Thanks for your answer. I have open an issue with my hardward >> manufacturer and so I will do it with my SO one. >> Anyway I paste the strace listings so maybe someone can shed light on it: >> >> server1: >> >> BIOS: American Megatrends Inc. 1.2 >> SYS: Supermicro X8SIE >> CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores] >> MEM: >> SLOT0 2048 MB >> SLOT1 2048 MB >> >> >> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 >> close(3) = 0 >> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 >> fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0 >> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> = 0xb7297000 >> read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800 >> ......CRASH >> >> >> server2: >> >> BIOS: American Megatrends Inc. 1.2 >> SYS: Supermicro X8SIE >> CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores] >> MEM: >> SLOT0 2048 MB >> SLOT1 2048 MB >> >> >> >> stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, ...}) >> = 0 >> pipe([3, 4]) = 0 >> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|**CLONE_CHILD_SETTID|SIGCHLD, >> child_tidptr=0xb74e5ba8) = 8709 >> close(4) = 0 >> fcntl64(3, F_GETFL) = 0 (flags O_RDONLY) >> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 >> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> = 0xb725e000 >> _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek) >> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 >> read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024 >> read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024 >> read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024 >> read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024 >> read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024 >> read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024 >> read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024 >> read(3, "ternal Reference Designator: LPT"..., 1024) = 1024 >> read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024 >> read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024 >> read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024 >> read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024 >> read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024 >> read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024 >> read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024 >> read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024 >> read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024 >> read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024 >> --- SIGCHLD (Child exited) @ 0 (0) --- >> read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024 >> read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024 >> read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024 >> read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669 >> read(3, "", 1024) = 0 >> close(3) = 0 >> munmap(0xb725e000, 4096) = 0 >> rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0 >> rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0 >> rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0 >> waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709 >> rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0 >> rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0 >> rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0 >> ............ >> sigprocmask(SIG_SETMASK, [], NULL) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_SETMASK, [], NULL) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> ............. >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_SETMASK, [], NULL) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> ......... >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> sigprocmask(SIG_BLOCK, NULL, []) = 0 >> .......CRASH >> >> > I''m supposing that ".......CRASH" means "more of the same syscall, with > similar results, until the trace ends on account of a system crash. > > The second trace says nothing useful, as far as I can tell. The last > thing it shows before all the signal mask handling is the successful > completion of a fact evaluation. > > The first trace is not much more helpful. The last thing it shows is > Facter reading the Ruby code for the ''osfamily'' fact. That might indicate > that it is during evaluation of that fact that the system crashed, but it''s > too far removed from fact evaluation for me to have any confidence in that. > > My bet would be that the crash cuts off communication before its cause is > reported in the trace, as I warned might be the case. > > Here''s another thing you could try: since facter doesn''t always crash the > system (if I understand correctly), you should be able to get a list of all > the facts it is evaluating (and their values) by running "facter -p" from > the command line. Take that list, and use it to stress test facter on each > fact individually (i.e. run facter -p <factname> many times in a loop), in > a way that lets you be sure you always know which fact is currently under > test. In this way you may be able to identify one or more facts whose > evaluation sometimes crashes the machine. > > Note: don''t neglect the "or more" above. It is conceivable that your > problem is deeper than just one fact. > > Once you know the facts with which the problem is associated, we can > investigate the commands facter is running, and thereby narrow down the > cause of the crash. > > > John > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/puppet-users/-/B7AKDJ-7U40J. > > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to > puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Montse Seisdedos
2013-Jun-20 15:03 UTC
Re: [Puppet Users] Re: Executing puppet crash the machine
Hello group: We eventually performed the test John suggested and we caught the "thief" -> virtual.rb We didn''t even try to analyze why it is hanging the machine. Due to the fact that this facter is not being used in ours recipes we just dropped it out. Thanks for you help. On Wednesday, November 28, 2012 5:41:00 PM UTC+1, Montse Seisdedos wrote:> > Hello John, > Your assumption is ok. > I can not do the facter loop because we are in a production environment. > Every time I run puppet on this machines I make sure I can reach its IPMI > interface so I can reboot the machine in few minutes. > Thanks for you help > Regards. > > > 2012/11/28 jcbollinger <John.Bo...@stjude.org <javascript:>> > >> >> >> On Wednesday, November 28, 2012 4:49:13 AM UTC-6, Mon wrote: >>> >>> Hello John, >>> Thanks for your answer. I have open an issue with my hardward >>> manufacturer and so I will do it with my SO one. >>> Anyway I paste the strace listings so maybe someone can shed light on it: >>> >>> server1: >>> >>> BIOS: American Megatrends Inc. 1.2 >>> SYS: Supermicro X8SIE >>> CPU: Intel(R) Core(TM) i3 CPU 550 @ 3.20GHz [4 cores] >>> MEM: >>> SLOT0 2048 MB >>> SLOT1 2048 MB >>> >>> >>> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 >>> close(3) = 0 >>> open("/usr/lib/ruby/1.8/**facter/osfamily.rb", O_RDONLY|O_LARGEFILE) = 3 >>> fstat64(3, {st_mode=S_IFREG|0644, st_size=800, ...}) = 0 >>> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, >>> 0) = 0xb7297000 >>> read(3, "# Fact: osfamily\n#\n# Purpose: Re"..., 4096) = 800 >>> ......CRASH >>> >>> >>> server2: >>> >>> BIOS: American Megatrends Inc. 1.2 >>> SYS: Supermicro X8SIE >>> CPU: Intel(R) Core(TM) i3 CPU 560 @ 3.33GHz [4 cores] >>> MEM: >>> SLOT0 2048 MB >>> SLOT1 2048 MB >>> >>> >>> >>> stat64("/usr/sbin/dmidecode", {st_mode=S_IFREG|0755, st_size=48408, >>> ...}) = 0 >>> pipe([3, 4]) = 0 >>> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|**CLONE_CHILD_SETTID|SIGCHLD, >>> child_tidptr=0xb74e5ba8) = 8709 >>> close(4) = 0 >>> fcntl64(3, F_GETFL) = 0 (flags O_RDONLY) >>> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 >>> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, >>> 0) = 0xb725e000 >>> _llseek(3, 0, 0xbf900930, SEEK_CUR) = -1 ESPIPE(Illegal seek) >>> fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 >>> read(3, "# dmidecode 2.9\nSMBIOS 2.6 prese"..., 1024) = 1024 >>> read(3, "oot is supported\n\t\tBIOS boot spe"..., 1024) = 1024 >>> read(3, "tate: Safe\n\tThermal State: Safe\n"..., 1024) = 1024 >>> read(3, "Maximum Size: 128 KB\n\tSupported "..., 1024) = 1024 >>> read(3, "e 5, 28 bytes\nMemory Controller "..., 1024) = 1024 >>> read(3, " Installed\n\tError Status: OK\n\nHa"..., 1024) = 1024 >>> read(3, " type 8, 9 bytes\nPort Connector "..., 1024) = 1024 >>> read(3, "ternal Reference Designator: LPT"..., 1024) = 1024 >>> read(3, "nal Reference Designator: Not Sp"..., 1024) = 1024 >>> read(3, "nator: Not Specified\n\tExternal C"..., 1024) = 1024 >>> read(3, "or Type: None\n\tPort Type: Other\n"..., 1024) = 1024 >>> read(3, "ector Information\n\tInternal Refe"..., 1024) = 1024 >>> read(3, "\tLength: Short\n\tID: 1\n\tCharacter"..., 1024) = 1024 >>> read(3, "escriptor 5: POST error\n\tData Fo"..., 1024) = 1024 >>> read(3, "ype 19, 15 bytes\nMemory Array Ma"..., 1024) = 1024 >>> read(3, " Width: Unknown\n\tSize: No Module"..., 1024) = 1024 >>> read(3, "ry Device Mapped Address\n\tStarti"..., 1024) = 1024 >>> read(3, "on Handle: Not Provided\n\tTotal W"..., 1024) = 1024 >>> --- SIGCHLD (Child exited) @ 0 (0) --- >>> read(3, "\n\nHandle 0x0039, DMI type 20, 19"..., 1024) = 1024 >>> read(3, "on-recoverable Threshold: 6\n\nHan"..., 1024) = 1024 >>> read(3, "UT OF SPEC>\n\tCooling Unit Group:"..., 1024) = 1024 >>> read(3, "ed: Yes\n\tHot Replaceable: No\n\tCo"..., 1024) = 669 >>> read(3, "", 1024) = 0 >>> close(3) = 0 >>> munmap(0xb725e000, 4096) = 0 >>> rt_sigaction(SIGHUP, {SIG_IGN}, {0xb77388f0, [HUP], SA_RESTART}, 8) = 0 >>> rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb77388f0, [QUIT], SA_RESTART}, 8) = 0 >>> rt_sigaction(SIGINT, {SIG_IGN}, {0xb77388f0, [INT], SA_RESTART}, 8) = 0 >>> waitpid(8709, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 8709 >>> rt_sigaction(SIGHUP, {0xb77388f0, [HUP], SA_RESTART}, {SIG_IGN}, 8) = 0 >>> rt_sigaction(SIGQUIT, {0xb77388f0, [QUIT], SA_RESTART}, {SIG_IGN}, 8) = 0 >>> rt_sigaction(SIGINT, {0xb77388f0, [INT], SA_RESTART}, {SIG_IGN}, 8) = 0 >>> ............ >>> sigprocmask(SIG_SETMASK, [], NULL) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_SETMASK, [], NULL) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> ............. >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_SETMASK, [], NULL) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> ......... >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> sigprocmask(SIG_BLOCK, NULL, []) = 0 >>> .......CRASH >>> >>> >> I''m supposing that ".......CRASH" means "more of the same syscall, with >> similar results, until the trace ends on account of a system crash. >> >> The second trace says nothing useful, as far as I can tell. The last >> thing it shows before all the signal mask handling is the successful >> completion of a fact evaluation. >> >> The first trace is not much more helpful. The last thing it shows is >> Facter reading the Ruby code for the ''osfamily'' fact. That might indicate >> that it is during evaluation of that fact that the system crashed, but it''s >> too far removed from fact evaluation for me to have any confidence in that. >> >> My bet would be that the crash cuts off communication before its cause is >> reported in the trace, as I warned might be the case. >> >> Here''s another thing you could try: since facter doesn''t always crash the >> system (if I understand correctly), you should be able to get a list of all >> the facts it is evaluating (and their values) by running "facter -p" from >> the command line. Take that list, and use it to stress test facter on each >> fact individually (i.e. run facter -p <factname> many times in a loop), in >> a way that lets you be sure you always know which fact is currently under >> test. In this way you may be able to identify one or more facts whose >> evaluation sometimes crashes the machine. >> >> Note: don''t neglect the "or more" above. It is conceivable that your >> problem is deeper than just one fact. >> >> Once you know the facts with which the problem is associated, we can >> investigate the commands facter is running, and thereby narrow down the >> cause of the crash. >> >> >> John >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To view this discussion on the web visit >> https://groups.google.com/d/msg/puppet-users/-/B7AKDJ-7U40J. >> >> To post to this group, send email to puppet...@googlegroups.com<javascript:> >> . >> To unsubscribe from this group, send email to >> puppet-users...@googlegroups.com <javascript:>. >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. >> > >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com. To post to this group, send email to puppet-users@googlegroups.com. Visit this group at http://groups.google.com/group/puppet-users. For more options, visit https://groups.google.com/groups/opt_out.