I have a small problem. I have a development fileserver box running Solaris 11 Express. The Rpool is mirrored between an SSD and a hard drive. Today, the SSD deveoped a fault for some reason. While trying to diagnose the problem, the system panicked and rebooted. The SSD was the first boot drive, and every time it tried to boot it panicked and rebooted, ending up in a loop. I tried to change to the second rpool drive, but either I forgot to install grub on it or it has become corrupted (probably the first, I can be that stupid at times). Can anyone give me any advice on how to get this system back? Can I trick grub, installed on the SSD, to boot from the HDD''s rpool mirror? Is something more sinister going on? By the way, whatever the error message is when booting, it disapears so quickly I can''t read it, so I am only guessing that this is the reason. PLEASE HELP! Thanks Karl -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130216/f7d53d49/attachment.html>
In message <CANxrvgW3oHstp7gtMxjbqU6u4KdvESvXZ_x1facpp=ozzGvG6w at mail.gmail.com> , Karl Wagner writes:>The SSD was the first boot drive, and every time it tried to boot it >panicked and rebooted, ending up in a loop. I tried to change to the second >rpool drive, but either I forgot to install grub on it or it has become >corrupted (probably the first, I can be that stupid at times). > >Can anyone give me any advice on how to get this system back? Can I trick >grub, installed on the SSD, to boot from the HDD''s rpool mirror? Is >something more sinister going on?Remove the broken drive, boot installation media, import the mirror drive. If it imports, you will be able to installgrub(1M).>By the way, whatever the error message is when booting, it disapears so >quickly I can''t read it, so I am only guessing that this is the reason.Boot with kernel debugger so you can see the panic. John groenveld at acm.org
On 02/16/2013 09:49 PM, John D Groenveld wrote:> Boot with kernel debugger so you can see the panic.Sadly, though, without access to the source code, all he do can at that point is log a support ticket with Oracle (assuming he has paid his support fees) and hope it will get picked up by somebody there. People on this list have few, if any ways of helping out. Cheers, -- Saso
On 17/02/13 06:54 AM, Sa?o Kiselkov wrote:> On 02/16/2013 09:49 PM, John D Groenveld wrote: >> Boot with kernel debugger so you can see the panic. > > Sadly, though, without access to the source code, all he do can at that > point is log a support ticket with Oracle (assuming he has paid his > support fees) and hope it will get picked up by somebody there. People > on this list have few, if any ways of helping out.You''re missing the point. Booting with kmdb enabled is The Way(tm) to get anything remotely resembling a paused screen so you can see what the message is. Whether that message winds up being something you need to talk with a Oracle about is entirely different. The OP mentioned that he was running S11 Express, for which, iirc, you can dig through source on a non-Oracle site and investigate. Really, though, just adding -k to the kernel$ line in the grub menu prior to booting should be enough for him to make significant progress. James C. McPherson -- Oracle Systems / Solaris / Core http://www.jmcpdotcom.com/blog
On 02/16/2013 10:47 PM, James C. McPherson wrote:> On 17/02/13 06:54 AM, Sa?o Kiselkov wrote: >> On 02/16/2013 09:49 PM, John D Groenveld wrote: >>> Boot with kernel debugger so you can see the panic. >> >> Sadly, though, without access to the source code, all he do can at that >> point is log a support ticket with Oracle (assuming he has paid his >> support fees) and hope it will get picked up by somebody there. People >> on this list have few, if any ways of helping out. > > You''re missing the point. Booting with kmdb enabled > is The Way(tm) to get anything remotely resembling > a paused screen so you can see what the message is. > > Whether that message winds up being something you need > to talk with a Oracle about is entirely different.He got a kernel panic on a completely legitimate operation (booting with one half of the root mirror faulted). There''s a good chance that the only thing he''ll see is something like BAD TRAP and a stack trace. Without source, that''s where the investigation ends.> The OP mentioned that he was running S11 Express, for > which, iirc, you can dig through source on a non-Oracle > site and investigate.And once he''s found the problem, what then? Can he build a new ZFS kernel module? Can he submit a patch?> Really, though, just adding > > -k > > to the kernel$ line in the grub menu prior to booting > should be enough for him to make significant progress.If by "significant progress" you mean sending a stack trace to Oracle, then yes. Look I''m not accusing you or anybody else for not trying to help - there are some wonderful people around here who both care deeply for their users and are proud of their work. I fully applaud that stance. All I''m doing is just pointing out the facts of the matter - take from that what you will. Cheers, -- Saso
On 17/02/13 08:48 AM, Sa?o Kiselkov wrote:> On 02/16/2013 10:47 PM, James C. McPherson wrote:...>> Whether that message winds up being something you need >> to talk with a Oracle about is entirely different. > > He got a kernel panic on a completely legitimate operation (booting with > one half of the root mirror faulted). There''s a good chance that the > only thing he''ll see is something like BAD TRAP and a stack trace. > Without source, that''s where the investigation ends.There is significant information provided in a panic message which does NOT require that you go and ask Oracle for help. As I pointed out, too, there is a non-Oracle source repo which does contain the code which went into the release and build which the OP is running. He''s running Solaris 11 Express, which we published/delivered as build snv_151b. One would hope that there are sufficient hints in the previous 2 sentences to enable debugging if that is required.>> The OP mentioned that he was running S11 Express, for >> which, iirc, you can dig through source on a non-Oracle >> site and investigate. > > And once he''s found the problem, what then? Can he build a new ZFS > kernel module? Can he submit a patch?You''re assuming that he''s found a bug which is unfixed, and not related to failed hardware. Big assumption.>> Really, though, just adding >> >> -k >> >> to the kernel$ line in the grub menu prior to booting >> should be enough for him to make significant progress. > > If by "significant progress" you mean sending a stack trace to Oracle, > then yes.I think you are insulting the OP by assuming that he has insufficient understanding of how to use a search engine.> Look I''m not accusing you or anybody else for not trying to help - there > are some wonderful people around here who both care deeply for their > users and are proud of their work. I fully applaud that stance. > All I''m doing is just pointing out the facts of the matter - take from > that what you will.Your opinion is no doubt coloured by the recent announcement re opensolaris.org. I have corresponded privately with the OP on this matter. I will not respond further to this thread. James C. McPherson -- Oracle Systems / Solaris / Core http://www.jmcpdotcom.com/blog
On 2013-02-16 21:49, John D Groenveld wrote:>> By the way, whatever the error message is when booting, it disapears so >> quickly I can''t read it, so I am only guessing that this is the reason. > > Boot with kernel debugger so you can see the panic.And that would be so: 1) In the boot loader (GRUB) edit the boot options (press "e", select "kernel" line, press "e" again), and add "-kd" to the kernel bootup. Maybe also "-v" to add verbosity. 2) Press enter to save the change and "b" to boot 3) The kmdb prompt should pop up; enter ":c" to continue execution The bootup should start, throw the kernel panic and pause. It is likely that there would be so much info that it doesn''t fit on screen - I can only suggest a serial console in this case. However, the end of dump info should point you in the right direction. For example, an error in "mount_vfs_root" is popular, and usually means either corrupt media or simply unexpected device name for the root pool (i.e. disk plugged on a different port, or BIOS changes between SATA-IDE modes, etc.) The device name changes should go away if you can boot from anything that can import your rpool (livecd, installer cd, failsafe boot image) and just "zpool import -f rpool; zpool export rpool" - this should clear the dependency on exact device names, and next bootup should work. And yes, I think it is a bug for such a fixable problem to behave so inconveniently - the official docs go as far as to suggest an OS reinstallation in this case. //Jim
Sa?o Kiselkov wrote:> On 02/16/2013 09:49 PM, John D Groenveld wrote: >> Boot with kernel debugger so you can see the panic. > Sadly, though, without access to the source code, all he do can at that > point is log a support ticket with Oracle (assuming he has paid his > support fees) and hope it will get picked up by somebody there. People > on this list have few, if any ways of helping out.If he can boot from a recent install media and import the pool, that''s a pretty good indicator that the problem has been fixed. He can then upgrade the what ever he booted with (which could be OI or Solaris11.1) and recover his data. -- Ian.