thr3ads.net - freebsd stable - geom using 100% cpu with failed da5. How to calm it down without cam passdev? [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Harald Schmalzbauer

2012-Dec-04 06:50 UTC

geom using 100% cpu with failed da5. How to calm it down without cam passdev?

Hello,

I've a failed disk at a remote server, which shouldn't be a problem
actually.
Just for info, here's the last shout:
    kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
0 0 0 length 0 SMID 256 command timeout cm 0xffffff8001c64800 ccb
0xfffffe0007329000
    kernel: mps0: mpssas_alloc_tm freezing simq
    kernel: mps0: timedout cm 0xffffff8001c64800 allocated tm
0xffffff8001c50148
    kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
0 0 0 length 0 SMID 256 completed timedout cm 0xffffff8001c64800 ccb
0xfffffe0007329000 during recovery ioc 8048 scsi 0 state c
xf(noperiph:mps0:0:5:0): SMID 1 abort TaskMID 256 status 0x4a code 0x0
count 1
    kernel: (noperiph:mps0:0:5:0): SMID 1 finished recovery after
aborting TaskMID 256
    kernel: mps0: mpssas_free_tm releasing simq
    kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
0 0 0
    kernel: (da5:mps0:0:5:0): CAM status: Command timeout
    kernel: (da5:mps0:0:5:0): Retrying command
    kernel: (da5:mps0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 length 0
SMID 981 terminated ioc 804b scsi 0 state 0 xfer 0
    kernel: mps0: mpssas_alloc_tm freezing simq
    kernel: mps0: mpssas_remove_complete on handle 0x000e, IOCStatus= 0x0
    kernel: mps0: mpssas_free_tm releasing simq
    kernel: (da5:mps0:0:(pass7:5:mps0:0:0): lost device - 4 outstanding,
2 refs
    kernel: 5:0): passdevgonecb: devfs entry is gone
    kernel: (da5:mps0:0:5:0): oustanding 3
    kernel: (da5:mps0:0:5:0): oustanding 2
    kernel: (da5:mps0:0:5:0): oustanding 1
    kernel: (da5:mps0:0:5:0): oustanding 0

After reboot, 'camcontrol devlist' doesn't show any da5,
but 'geom disk list' _does_ show da5!!!

My problem is that geom is now consuming 100% of one core!
top -S:
13 root        3  -8    -     0K    48K -       1 480:19 100.00% geom

Since there's no /dev/da5 I can't use camcontrol to stop anything, and
at the moment nobody can physically remove the failed drive.
How can I calm geom down?
How can I find out what "geom" is doing/trying to do?
I guess it's related to the failed da5, but how can I know?

Thanks,

-Harry


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20121204/5df44dc3/attachment.sig>

Fabian Keil

2012-Dec-04 10:44 UTC

head link

geom using 100% cpu with failed da5. How to calm it down without cam passdev?

Harald Schmalzbauer <h.schmalzbauer at omnilan.de> wrote:
> I've a failed disk at a remote server, which shouldn't be a problem
> actually.
Welcome to geom ...
> Just for info, here's the last shout:
>     kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
> 0 0 0 length 0 SMID 256 command timeout cm 0xffffff8001c64800 ccb
> 0xfffffe0007329000
>     kernel: mps0: mpssas_alloc_tm freezing simq
>     kernel: mps0: timedout cm 0xffffff8001c64800 allocated tm
> 0xffffff8001c50148
>     kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
> 0 0 0 length 0 SMID 256 completed timedout cm 0xffffff8001c64800 ccb
> 0xfffffe0007329000 during recovery ioc 8048 scsi 0 state c
> xf(noperiph:mps0:0:5:0): SMID 1 abort TaskMID 256 status 0x4a code 0x0
> count 1
>     kernel: (noperiph:mps0:0:5:0): SMID 1 finished recovery after
> aborting TaskMID 256
>     kernel: mps0: mpssas_free_tm releasing simq
>     kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
> 0 0 0
>     kernel: (da5:mps0:0:5:0): CAM status: Command timeout
>     kernel: (da5:mps0:0:5:0): Retrying command
>     kernel: (da5:mps0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 length 0
> SMID 981 terminated ioc 804b scsi 0 state 0 xfer 0
>     kernel: mps0: mpssas_alloc_tm freezing simq
>     kernel: mps0: mpssas_remove_complete on handle 0x000e, IOCStatus= 0x0
>     kernel: mps0: mpssas_free_tm releasing simq
>     kernel: (da5:mps0:0:(pass7:5:mps0:0:0): lost device - 4 outstanding,
> 2 refs
>     kernel: 5:0): passdevgonecb: devfs entry is gone
>     kernel: (da5:mps0:0:5:0): oustanding 3
>     kernel: (da5:mps0:0:5:0): oustanding 2
>     kernel: (da5:mps0:0:5:0): oustanding 1
>     kernel: (da5:mps0:0:5:0): oustanding 0
> 
> After reboot, 'camcontrol devlist' doesn't show any da5,
> but 'geom disk list' _does_ show da5!!!
> 
> My problem is that geom is now consuming 100% of one core!
> top -S:
> 13 root        3  -8    -     0K    48K -       1 480:19 100.00% geom
> 
> Since there's no /dev/da5 I can't use camcontrol to stop anything,
and
> at the moment nobody can physically remove the failed drive.
> How can I calm geom down?
I reported a similar problem in:
http://www.freebsd.org/cgi/query-pr.cgi?pr=171865

The PR contains a patch that I'm using as a workaround.
> How can I find out what "geom" is doing/trying to do?
> I guess it's related to the failed da5, but how can I know?
DTrace might help.

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20121204/2dec7a95/attachment.sig>

freebsd stable - Dec 2012 - geom using 100% cpu with failed da5. How to calm it down without cam passdev?

geom using 100% cpu with failed da5. How to calm it down without cam passdev?

geom using 100% cpu with failed da5. How to calm it down without cam passdev?