Harald Schmalzbauer
2012-Dec-04 06:50 UTC
geom using 100% cpu with failed da5. How to calm it down without cam passdev?
Hello, I've a failed disk at a remote server, which shouldn't be a problem actually. Just for info, here's the last shout: kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 length 0 SMID 256 command timeout cm 0xffffff8001c64800 ccb 0xfffffe0007329000 kernel: mps0: mpssas_alloc_tm freezing simq kernel: mps0: timedout cm 0xffffff8001c64800 allocated tm 0xffffff8001c50148 kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 length 0 SMID 256 completed timedout cm 0xffffff8001c64800 ccb 0xfffffe0007329000 during recovery ioc 8048 scsi 0 state c xf(noperiph:mps0:0:5:0): SMID 1 abort TaskMID 256 status 0x4a code 0x0 count 1 kernel: (noperiph:mps0:0:5:0): SMID 1 finished recovery after aborting TaskMID 256 kernel: mps0: mpssas_free_tm releasing simq kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 kernel: (da5:mps0:0:5:0): CAM status: Command timeout kernel: (da5:mps0:0:5:0): Retrying command kernel: (da5:mps0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 length 0 SMID 981 terminated ioc 804b scsi 0 state 0 xfer 0 kernel: mps0: mpssas_alloc_tm freezing simq kernel: mps0: mpssas_remove_complete on handle 0x000e, IOCStatus= 0x0 kernel: mps0: mpssas_free_tm releasing simq kernel: (da5:mps0:0:(pass7:5:mps0:0:0): lost device - 4 outstanding, 2 refs kernel: 5:0): passdevgonecb: devfs entry is gone kernel: (da5:mps0:0:5:0): oustanding 3 kernel: (da5:mps0:0:5:0): oustanding 2 kernel: (da5:mps0:0:5:0): oustanding 1 kernel: (da5:mps0:0:5:0): oustanding 0 After reboot, 'camcontrol devlist' doesn't show any da5, but 'geom disk list' _does_ show da5!!! My problem is that geom is now consuming 100% of one core! top -S: 13 root 3 -8 - 0K 48K - 1 480:19 100.00% geom Since there's no /dev/da5 I can't use camcontrol to stop anything, and at the moment nobody can physically remove the failed drive. How can I calm geom down? How can I find out what "geom" is doing/trying to do? I guess it's related to the failed da5, but how can I know? Thanks, -Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20121204/5df44dc3/attachment.sig>
Fabian Keil
2012-Dec-04 10:44 UTC
geom using 100% cpu with failed da5. How to calm it down without cam passdev?
Harald Schmalzbauer <h.schmalzbauer at omnilan.de> wrote:> I've a failed disk at a remote server, which shouldn't be a problem > actually.Welcome to geom ...> Just for info, here's the last shout: > kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 > 0 0 0 length 0 SMID 256 command timeout cm 0xffffff8001c64800 ccb > 0xfffffe0007329000 > kernel: mps0: mpssas_alloc_tm freezing simq > kernel: mps0: timedout cm 0xffffff8001c64800 allocated tm > 0xffffff8001c50148 > kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 > 0 0 0 length 0 SMID 256 completed timedout cm 0xffffff8001c64800 ccb > 0xfffffe0007329000 during recovery ioc 8048 scsi 0 state c > xf(noperiph:mps0:0:5:0): SMID 1 abort TaskMID 256 status 0x4a code 0x0 > count 1 > kernel: (noperiph:mps0:0:5:0): SMID 1 finished recovery after > aborting TaskMID 256 > kernel: mps0: mpssas_free_tm releasing simq > kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 > 0 0 0 > kernel: (da5:mps0:0:5:0): CAM status: Command timeout > kernel: (da5:mps0:0:5:0): Retrying command > kernel: (da5:mps0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 length 0 > SMID 981 terminated ioc 804b scsi 0 state 0 xfer 0 > kernel: mps0: mpssas_alloc_tm freezing simq > kernel: mps0: mpssas_remove_complete on handle 0x000e, IOCStatus= 0x0 > kernel: mps0: mpssas_free_tm releasing simq > kernel: (da5:mps0:0:(pass7:5:mps0:0:0): lost device - 4 outstanding, > 2 refs > kernel: 5:0): passdevgonecb: devfs entry is gone > kernel: (da5:mps0:0:5:0): oustanding 3 > kernel: (da5:mps0:0:5:0): oustanding 2 > kernel: (da5:mps0:0:5:0): oustanding 1 > kernel: (da5:mps0:0:5:0): oustanding 0 > > After reboot, 'camcontrol devlist' doesn't show any da5, > but 'geom disk list' _does_ show da5!!! > > My problem is that geom is now consuming 100% of one core! > top -S: > 13 root 3 -8 - 0K 48K - 1 480:19 100.00% geom > > Since there's no /dev/da5 I can't use camcontrol to stop anything, and > at the moment nobody can physically remove the failed drive. > How can I calm geom down?I reported a similar problem in: http://www.freebsd.org/cgi/query-pr.cgi?pr=171865 The PR contains a patch that I'm using as a workaround.> How can I find out what "geom" is doing/trying to do? > I guess it's related to the failed da5, but how can I know?DTrace might help. Fabian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20121204/2dec7a95/attachment.sig>