thr3ads.net - zfs discuss - [zfs-discuss] HELP! RPool problem [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Karl Wagner

2013-Feb-16 20:33 UTC

[zfs-discuss] HELP! RPool problem

I have a small problem.

I have a development fileserver box running Solaris 11 Express. The Rpool
is mirrored between an SSD and a hard drive. Today, the SSD deveoped a
fault for some reason. While trying to diagnose the problem, the system
panicked and rebooted.

The SSD was the first boot drive, and every time it tried to boot it
panicked and rebooted, ending up in a loop. I tried to change to the second
rpool drive, but either I forgot to install grub on it or it has become
corrupted (probably the first, I can be that stupid at times).

Can anyone give me any advice on how to get this system back? Can I trick
grub, installed on the SSD, to boot from the HDD''s rpool mirror? Is
something more sinister going on?

By the way, whatever the error message is when booting, it disapears so
quickly I can''t read it, so I am only guessing that this is the reason.

PLEASE HELP!

Thanks
Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130216/f7d53d49/attachment.html>

John D Groenveld

2013-Feb-16 20:49 UTC

head link

[zfs-discuss] HELP! RPool problem

In message <CANxrvgW3oHstp7gtMxjbqU6u4KdvESvXZ_x1facpp=ozzGvG6w at
mail.gmail.com>
, Karl Wagner writes:>The SSD was the first boot drive, and every time it tried to boot it
>panicked and rebooted, ending up in a loop. I tried to change to the second
>rpool drive, but either I forgot to install grub on it or it has become
>corrupted (probably the first, I can be that stupid at times).
>
>Can anyone give me any advice on how to get this system back? Can I trick
>grub, installed on the SSD, to boot from the HDD''s rpool mirror? Is
>something more sinister going on?
Remove the broken drive, boot installation media, import the
mirror drive.
If it imports, you will be able to installgrub(1M).
>By the way, whatever the error message is when booting, it disapears so
>quickly I can''t read it, so I am only guessing that this is the
reason.
Boot with kernel debugger so you can see the panic.

John
groenveld at acm.org

Sašo Kiselkov

2013-Feb-16 20:54 UTC

head link

[zfs-discuss] HELP! RPool problem

On 02/16/2013 09:49 PM, John D Groenveld wrote:> Boot with kernel debugger so you can see the panic.
Sadly, though, without access to the source code, all he do can at that
point is log a support ticket with Oracle (assuming he has paid his
support fees) and hope it will get picked up by somebody there. People
on this list have few, if any ways of helping out.

Cheers,
--
Saso

James C. McPherson

2013-Feb-16 21:47 UTC

head link

[zfs-discuss] HELP! RPool problem

On 17/02/13 06:54 AM, Sa?o Kiselkov wrote:> On 02/16/2013 09:49 PM, John D Groenveld wrote:
>> Boot with kernel debugger so you can see the panic.
>
> Sadly, though, without access to the source code, all he do can at that
> point is log a support ticket with Oracle (assuming he has paid his
> support fees) and hope it will get picked up by somebody there. People
> on this list have few, if any ways of helping out.
You''re missing the point. Booting with kmdb enabled
is The Way(tm) to get anything remotely resembling
a paused screen so you can see what the message is.

Whether that message winds up being something you need
to talk with a Oracle about is entirely different.

The OP mentioned that he was running S11 Express, for
which, iirc, you can dig through source on a non-Oracle
site and investigate.

Really, though, just adding

-k

to the kernel$ line in the grub menu prior to booting
should be enough for him to make significant progress.

James C. McPherson
--
Oracle
Systems / Solaris / Core
http://www.jmcpdotcom.com/blog

Sašo Kiselkov

2013-Feb-16 22:48 UTC

head link

[zfs-discuss] HELP! RPool problem

On 02/16/2013 10:47 PM, James C. McPherson wrote:> On 17/02/13 06:54 AM, Sa?o Kiselkov wrote:
>> On 02/16/2013 09:49 PM, John D Groenveld wrote:
>>> Boot with kernel debugger so you can see the panic.
>>
>> Sadly, though, without access to the source code, all he do can at that
>> point is log a support ticket with Oracle (assuming he has paid his
>> support fees) and hope it will get picked up by somebody there. People
>> on this list have few, if any ways of helping out.
> 
> You''re missing the point. Booting with kmdb enabled
> is The Way(tm) to get anything remotely resembling
> a paused screen so you can see what the message is.
> 
> Whether that message winds up being something you need
> to talk with a Oracle about is entirely different.
He got a kernel panic on a completely legitimate operation (booting with
one half of the root mirror faulted). There''s a good chance that the
only thing he''ll see is something like BAD TRAP and a stack trace.
Without source, that''s where the investigation ends.
> The OP mentioned that he was running S11 Express, for
> which, iirc, you can dig through source on a non-Oracle
> site and investigate.
And once he''s found the problem, what then? Can he build a new ZFS
kernel module? Can he submit a patch?
> Really, though, just adding
> 
> -k
> 
> to the kernel$ line in the grub menu prior to booting
> should be enough for him to make significant progress.
If by "significant progress" you mean sending a stack trace to Oracle,
then yes.

Look I''m not accusing you or anybody else for not trying to help -
there
are some wonderful people around here who both care deeply for their
users and are proud of their work. I fully applaud that stance.
All I''m doing is just pointing out the facts of the matter - take from
that what you will.

Cheers,
--
Saso

James C. McPherson

2013-Feb-16 22:54 UTC

head link

[zfs-discuss] HELP! RPool problem

On 17/02/13 08:48 AM, Sa?o Kiselkov wrote:> On 02/16/2013 10:47 PM, James C. McPherson wrote:
...>> Whether that message winds up being something you need
>> to talk with a Oracle about is entirely different.
>
> He got a kernel panic on a completely legitimate operation (booting with
> one half of the root mirror faulted). There''s a good chance that
the
> only thing he''ll see is something like BAD TRAP and a stack trace.
> Without source, that''s where the investigation ends.
There is significant information provided in a panic message
which does NOT require that you go and ask Oracle for help.

As I pointed out, too, there is a non-Oracle source repo which
does contain the code which went into the release and build
which the OP is running. He''s running Solaris 11 Express, which
we published/delivered as build snv_151b. One would hope that
there are sufficient hints in the previous 2 sentences to enable
debugging if that is required.

>> The OP mentioned that he was running S11 Express, for
>> which, iirc, you can dig through source on a non-Oracle
>> site and investigate.
>
> And once he''s found the problem, what then? Can he build a new ZFS
> kernel module? Can he submit a patch?
You''re assuming that he''s found a bug which is unfixed,
and not related to failed hardware. Big assumption.

>> Really, though, just adding
>>
>> -k
>>
>> to the kernel$ line in the grub menu prior to booting
>> should be enough for him to make significant progress.
>
> If by "significant progress" you mean sending a stack trace to
Oracle,
> then yes.
I think you are insulting the OP by assuming that he has
insufficient understanding of how to use a search engine.
> Look I''m not accusing you or anybody else for not trying to help -
there
> are some wonderful people around here who both care deeply for their
> users and are proud of their work. I fully applaud that stance.
> All I''m doing is just pointing out the facts of the matter - take
from
> that what you will.
Your opinion is no doubt coloured by the recent announcement
re opensolaris.org.

I have corresponded privately with the OP on this matter. I
will not respond further to this thread.


James C. McPherson
--
Oracle
Systems / Solaris / Core
http://www.jmcpdotcom.com/blog

Jim Klimov

2013-Feb-16 23:26 UTC

head link

[zfs-discuss] HELP! RPool problem

On 2013-02-16 21:49, John D Groenveld wrote:>> By the way, whatever the error message is when booting, it disapears so
>> quickly I can''t read it, so I am only guessing that this is
the reason.
>
> Boot with kernel debugger so you can see the panic.And that would be so:
1) In the boot loader (GRUB) edit the boot options (press "e",
    select "kernel" line, press "e" again), and add
"-kd" to the
    kernel bootup. Maybe also "-v" to add verbosity.

2) Press enter to save the change and "b" to boot

3) The kmdb prompt should pop up; enter ":c" to continue execution
    The bootup should start, throw the kernel panic and pause.
    It is likely that there would be so much info that it doesn''t
    fit on screen - I can only suggest a serial console in this case.

    However, the end of dump info should point you in the right
    direction. For example, an error in "mount_vfs_root" is popular,
    and usually means either corrupt media or simply unexpected device
    name for the root pool (i.e. disk plugged on a different port, or
    BIOS changes between SATA-IDE modes, etc.)

The device name changes should go away if you can boot from anything
that can import your rpool (livecd, installer cd, failsafe boot image)
and just "zpool import -f rpool; zpool export rpool" - this should
clear the dependency on exact device names, and next bootup should
work.

And yes, I think it is a bug for such a fixable problem to behave so
inconveniently - the official docs go as far as to suggest an OS
reinstallation in this case.

//Jim

Ian Collins

2013-Feb-17 04:16 UTC

head link

[zfs-discuss] HELP! RPool problem

Sa?o Kiselkov wrote:> On 02/16/2013 09:49 PM, John D Groenveld wrote:
>> Boot with kernel debugger so you can see the panic.
> Sadly, though, without access to the source code, all he do can at that
> point is log a support ticket with Oracle (assuming he has paid his
> support fees) and hope it will get picked up by somebody there. People
> on this list have few, if any ways of helping out.
If he can boot from a recent install media and import the pool, that''s
a
pretty good indicator that the problem has been fixed. He can then 
upgrade the what ever he booted with (which could be OI or Solaris11.1) 
and recover his data.

-- 
Ian.

zfs discuss - Feb 2013 - HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem

[zfs-discuss] HELP! RPool problem