Running Solaris 10 Update 3 on an X4500 I have found that it is possible to reproducibly block all writes to a ZFS pool by running "chgrp -R" on any large filesystem in that pool. As can be seen below in the zpool iostat output below, after about 10-sec of running the chgrp command all writes to the pool stop, and the pool starts exclusively running a slow background task of 1kB reads. At this point the chgrp -R command is not killable via root kill -9, and in fact even the command "halt -d" does not do anything. In at lest one instance I have seen the chgrp command eventually respond to the kill command after ~30 minutes, and the pool was writable again. However, while waiting for this to happen the kernel was generating "No more processes." when simple commands where attempted to be run in pre-existing shells, e.g., uname or uptime. # zpool iostat test 2 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- ... test 1.12T 19.2T 1 1.72K 11.2K 220M test 1.12T 19.2T 0 3.10K 0 380M test 1.12T 19.2T 0 335 0 41.9M test 1.12T 19.2T 0 4.49K 0 559M test 1.12T 19.2T 0 0 0 0 test 1.12T 19.2T 0 1.51K 0 193M test 1.12T 19.2T 0 3.31K 0 408M test 1.12T 19.2T 0 0 0 0 test 1.12T 19.2T 0 3.54K 0 453M test 1.13T 19.2T 428 1.17K 1.82M 129M *** Started chgrp -R *** test 1.13T 19.2T 1.74K 2.21K 7.19M 282M test 1.13T 19.2T 531 2.49K 2.34M 300M test 1.13T 19.2T 549 1.67K 2.96M 213M test 1.13T 19.2T 395 3.00K 2.38M 368M test 1.13T 19.2T 343 0 1.66M 0 test 1.13T 19.2T 113 0 113K 0 test 1.13T 19.2T 132 0 132K 0 test 1.13T 19.2T 136 0 137K 0 test 1.13T 19.2T 132 0 132K 0 test 1.13T 19.2T 148 0 149K 0 test 1.13T 19.2T 137 0 138K 0 test 1.13T 19.2T 163 0 163K 0 test 1.13T 19.2T 152 0 153K 0 ... *** All writes to this pool are hung for some long period of time. *** Here is the pool configuration: # zpool status pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 spares c8t1d0 AVAIL errors: No known data errors There is nothing in the output of dmesg, svcs -xv, or fmdump associated with this event. Is this a known issue or should I open a new case with Sun? Thanks. -- Stuart Anderson anderson at ligo.caltech.edu http://www.ligo.caltech.edu/~anderson
Stuart Anderson wrote:> Running Solaris 10 Update 3 on an X4500 I have found that it is possible > to reproducibly block all writes to a ZFS pool by running "chgrp -R" > on any large filesystem in that pool. As can be seen below in the zpool > iostat output below, after about 10-sec of running the chgrp command all > writes to the pool stop, and the pool starts exclusively running a slow > background task of 1kB reads. > > At this point the chgrp -R command is not killable via root kill -9, > and in fact even the command "halt -d" does not do anything. > > In at lest one instance I have seen the chgrp command eventually > respond to the kill command after ~30 minutes, and the pool was > writable again. However, while waiting for this to happen the > kernel was generating "No more processes." when simple commands > where attempted to be run in pre-existing shells, e.g., uname or uptime....> There is nothing in the output of dmesg, svcs -xv, or fmdump associated > with this event. > > Is this a known issue or should I open a new case with Sun?Log a new case with Sun, and make sure you supply a crash dump so people who know ZFS can analyze the issue. You can use <stop-A> sync, <break> sync, or reboot -dq cheers, James C. McPherson -- Solaris kernel software engineer Sun Microsystems
On Tue, Jul 17, 2007 at 02:49:08PM +1000, James C. McPherson wrote:> Stuart Anderson wrote: > >Running Solaris 10 Update 3 on an X4500 I have found that it is possible > >to reproducibly block all writes to a ZFS pool by running "chgrp -R" > >on any large filesystem in that pool. As can be seen below in the zpool > >iostat output below, after about 10-sec of running the chgrp command all > >writes to the pool stop, and the pool starts exclusively running a slow > >background task of 1kB reads. > >...> > > >Is this a known issue or should I open a new case with Sun? > > Log a new case with Sun, and make sure you supply > a crash dump so people who know ZFS can analyze > the issue. > > You can use <stop-A> sync, <break> sync, or > > reboot -dq >In previous attempts, neither "halt -d" nor reboot (with no arguments) where able to shutdown the machine. Is "reboot -dq" really a bigger hammer than "halt -d"? Sorry to be pedantic, but what is the exact key sequence on a Sun USB keyboard one should use to force a kernel dump on Solx86? Since there is no OBP on an X4500 where do I type the sync command? Thanks. -- Stuart Anderson anderson at ligo.caltech.edu http://www.ligo.caltech.edu/~anderson
Stuart Anderson wrote:> On Tue, Jul 17, 2007 at 02:49:08PM +1000, James C. McPherson wrote: >> Stuart Anderson wrote: >>> Running Solaris 10 Update 3 on an X4500 I have found that it is possible >>> to reproducibly block all writes to a ZFS pool by running "chgrp -R" >>> on any large filesystem in that pool. As can be seen below in the zpool >>> iostat output below, after about 10-sec of running the chgrp command all >>> writes to the pool stop, and the pool starts exclusively running a slow >>> background task of 1kB reads. >>> Is this a known issue or should I open a new case with Sun? >> Log a new case with Sun, and make sure you supply >> a crash dump so people who know ZFS can analyze >> the issue. >> >> You can use <stop-A> sync, <break> sync, or >> >> reboot -dq >> > > In previous attempts, neither "halt -d" nor reboot (with no arguments) > where able to shutdown the machine. Is "reboot -dq" really a bigger hammer > than "halt -d"?Kindasorta - the q option tells reboot to do its stuff with all guns blazing, as it were.> Sorry to be pedantic, but what is the exact key sequence on a Sun > USB keyboard one should use to force a kernel dump on Solx86? > Since there is no OBP on an X4500 where do I type the sync command?first, either boot with "-k" or shortly after you get to multiuser, run "mdb -K" on the console (and hit :c <enter>). Then you can use <F1>A to drop to kmdb, and then run ::systemdump or 0>rip :c :c or for 32bit mode 0>eip :c :c cheers, James C. McPherson -- Solaris kernel software engineer Sun Microsystems
I found a very nice doc. that describes the steps to create a kernel dump: "The Solaris Operating System on x86 Platforms - Crashdump Analysis Operating System Internals" http://opensolaris.org/os/community/documentation/files/book.pdf -> 7.2.2.Forcing system crashdumps Rayson On 7/17/07, James C. McPherson <James.McPherson at sun.com> wrote:> first, either boot with "-k" or shortly after you get to > multiuser, run "mdb -K" on the console (and hit :c <enter>). > > Then you can use <F1>A to drop to kmdb, and then run > > ::systemdump > > or > > 0>rip > :c > :c > > or for 32bit mode > > 0>eip > :c > :c > > > cheers, > James C. McPherson > -- > Solaris kernel software engineer > Sun Microsystems > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Joshua.Goodall at editure.com
2007-Jul-17 07:49 UTC
[zfs-discuss] chgrp -R hangs all writes to pool
zfs-discuss-bounces at opensolaris.org wrote on 17/07/2007 02:36:06 PM:> Running Solaris 10 Update 3 on an X4500 I have found that it is possible > to reproducibly block all writes to a ZFS pool by running "chgrp -R" > on any large filesystem in that pool. As can be seen below in the zpool > iostat output below, after about 10-sec of running the chgrp command all > writes to the pool stop, and the pool starts exclusively running a slow > background task of 1kB reads.Related or not, I can hang all reads on a nv_65 zpool simply by dd''ing a 5GB server image to a zvol. - JG This email, including any attachments, is intended only for the use of the individual or entity named above and may contain information that is confidential and privileged. Any information contained in this email is not to be used or disclosed for any purpose other than the purpose for which you received it. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email by mistake, please delete this email permanently from your system. WARNING: Although Editure has taken reasonable precautions to ensure no viruses are present in this email, Editure can not accept responsibility for any losses or damages whatsoever, arising from the use of this email and/or its attachments. www.editure.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070717/b8f7422a/attachment.html>
On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote:> >>Log a new case with Sun, and make sure you supply > >>a crash dump so people who know ZFS can analyze > >>the issue. > >> > >>You can use <stop-A> sync, <break> sync, or > >> > >>reboot -dq > >>That does appear to have caused a panic/kernel dump. However, I cannot find the dump image after rebooting to Solaris even thought savecore appears to be configured, # reboot -dq Jul 17 12:27:35 x4500gc reboot: rebooted by root panic[cpu2]/thread=ffffffff9823c460: forced crash dump initiated at user request fffffe8000e18d60 genunix:kadmin+4b4 () fffffe8000e18ec0 genunix:uadmin+93 () fffffe8000e18f10 unix:sys_syscall32+101 () syncing file systems... 1 1 done dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel 100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded rebooting... # dumpadm Dump content: kernel pages Dump device: /dev/md/dsk/d2 (swap) Savecore directory: /var/crash/x4500gc Savecore enabled: yes # ls -laR /var/crash/x4500gc/ /var/crash/x4500gc/: total 2 drwx------ 2 root root 512 Jul 12 16:26 . drwxr-xr-x 3 root root 512 Jul 12 16:26 .. Thanks. -- Stuart Anderson anderson at ligo.caltech.edu http://www.ligo.caltech.edu/~anderson
It looks like there is a problem dumping a kernel panic on an X4500. During the self induced panic, there where additional syslog messages that indicate a problem writing to the two disks that make up /dev/md/dsk/d2 in my case. It is as if the SATA controllers are being reset during the crash dump. At any rate I will send this all to Sun support. Thanks. Jul 17 12:27:35 x4500gc unix: [ID 836849 kern.notice] Jul 17 12:27:35 x4500gc ^Mpanic[cpu2]/thread=ffffffff9823c460: Jul 17 12:27:35 x4500gc genunix: [ID 156897 kern.notice] forced crash dump initiated at user request Jul 17 12:27:35 x4500gc unix: [ID 100000 kern.notice] Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fffffe8000e18d60 genunix:kadmin+4b4 () Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fffffe8000e18ec0 genunix:uadmin+93 () Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fffffe8000e18f10 unix:sys_syscall32+101 () Jul 17 12:27:35 x4500gc unix: [ID 100000 kern.notice] Jul 17 12:27:35 x4500gc genunix: [ID 672855 kern.notice] syncing file systems... Jul 17 12:27:35 x4500gc genunix: [ID 733762 kern.notice] 1 Jul 17 12:27:37 x4500gc last message repeated 1 time Jul 17 12:27:38 x4500gc genunix: [ID 904073 kern.notice] done Jul 17 12:27:39 x4500gc genunix: [ID 111219 kern.notice] dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx3: error on port 0: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device disconnected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device connected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] SError interrupt Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info] SErrors: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Recovered communication error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] PHY ready change Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Disparity error Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx3: error on port 4: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device disconnected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device connected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] SError interrupt Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info] SErrors: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Recovered communication error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] PHY ready change Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Disparity error Jul 17 12:28:39 x4500gc genunix: [ID 409368 kern.notice] ^M100% done: 3268790 pages dumped, compression ratio 12.39, Jul 17 12:28:39 x4500gc genunix: [ID 851671 kern.notice] dump succeeded Jul 17 12:30:38 x4500gc genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_125101-10 64-bit Jul 17 12:30:38 x4500gc genunix: [ID 943907 kern.notice] Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. On Tue, Jul 17, 2007 at 12:40:16PM -0700, Stuart Anderson wrote:> On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote: > > >>Log a new case with Sun, and make sure you supply > > >>a crash dump so people who know ZFS can analyze > > >>the issue. > > >> > > >>You can use <stop-A> sync, <break> sync, or > > >> > > >>reboot -dq > > >> > > That does appear to have caused a panic/kernel dump. However, I cannot > find the dump image after rebooting to Solaris even thought savecore > appears to be configured, > > # reboot -dq > Jul 17 12:27:35 x4500gc reboot: rebooted by root > > panic[cpu2]/thread=ffffffff9823c460: forced crash dump initiated at user request > > fffffe8000e18d60 genunix:kadmin+4b4 () > fffffe8000e18ec0 genunix:uadmin+93 () > fffffe8000e18f10 unix:sys_syscall32+101 () > > syncing file systems... 1 1 done > dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel > 100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded > rebooting... > > > # dumpadm > Dump content: kernel pages > Dump device: /dev/md/dsk/d2 (swap) > Savecore directory: /var/crash/x4500gc > Savecore enabled: yes > > # ls -laR /var/crash/x4500gc/ > /var/crash/x4500gc/: > total 2 > drwx------ 2 root root 512 Jul 12 16:26 . > drwxr-xr-x 3 root root 512 Jul 12 16:26 .. > > > Thanks. > > > -- > Stuart Anderson anderson at ligo.caltech.edu > http://www.ligo.caltech.edu/~anderson-- Stuart Anderson anderson at ligo.caltech.edu http://www.ligo.caltech.edu/~anderson
Hello Stuart, Looks like crash dumped went ok. Check logs after system booted up again if there''s a warning that there''s no enough space in /var/crash/x4500gc to save crashdump. When using zfs on a file servers crashdumps usually will be almost of server''s memory size... Eventually just run ''savecore path_to_dir'' where path_to_dir is a path to a directory with enough free space. Of course assuming you haven''t touch swap device up-to this time. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com Tuesday, July 17, 2007, 9:04:55 PM, you wrote: SA> It looks like there is a problem dumping a kernel panic on an X4500. SA> During the self induced panic, there where additional syslog messages SA> that indicate a problem writing to the two disks that make up SA> /dev/md/dsk/d2 in my case. It is as if the SATA controllers are being SA> reset during the crash dump. SA> At any rate I will send this all to Sun support. SA> Thanks. SA> Jul 17 12:27:35 x4500gc unix: [ID 836849 kern.notice] SA> Jul 17 12:27:35 x4500gc ^Mpanic[cpu2]/thread=ffffffff9823c460: SA> Jul 17 12:27:35 x4500gc genunix: [ID 156897 kern.notice] forced SA> crash dump initiated at user request SA> Jul 17 12:27:35 x4500gc unix: [ID 100000 kern.notice] SA> Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] SA> fffffe8000e18d60 genunix:kadmin+4b4 () SA> Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] SA> fffffe8000e18ec0 genunix:uadmin+93 () SA> Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] SA> fffffe8000e18f10 unix:sys_syscall32+101 () SA> Jul 17 12:27:35 x4500gc unix: [ID 100000 kern.notice] SA> Jul 17 12:27:35 x4500gc genunix: [ID 672855 kern.notice] syncing file systems... SA> Jul 17 12:27:35 x4500gc genunix: [ID 733762 kern.notice] 1 SA> Jul 17 12:27:37 x4500gc last message repeated 1 time SA> Jul 17 12:27:38 x4500gc genunix: [ID 904073 kern.notice] done SA> Jul 17 12:27:39 x4500gc genunix: [ID 111219 kern.notice] dumping SA> to /dev/md/dsk/d2, offset 3436511232, content: kernel SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] SA> WARNING: marvell88sx3: error on port 0: SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device disconnected SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device connected SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] SError interrupt SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info] SErrors: SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Recovered communication error SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] PHY ready change SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Disparity error SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] SA> WARNING: marvell88sx3: error on port 4: SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device disconnected SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device connected SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] SError interrupt SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info] SErrors: SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Recovered communication error SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] PHY ready change SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error SA> Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Disparity error SA> Jul 17 12:28:39 x4500gc genunix: [ID 409368 kern.notice] ^M100% SA> done: 3268790 pages dumped, compression ratio 12.39, SA> Jul 17 12:28:39 x4500gc genunix: [ID 851671 kern.notice] dump succeeded SA> Jul 17 12:30:38 x4500gc genunix: [ID 540533 kern.notice] ^MSunOS SA> Release 5.10 Version Generic_125101-10 64-bit SA> Jul 17 12:30:38 x4500gc genunix: [ID 943907 kern.notice] SA> Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. SA> On Tue, Jul 17, 2007 at 12:40:16PM -0700, Stuart Anderson wrote:>> On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote: >> > >>Log a new case with Sun, and make sure you supply >> > >>a crash dump so people who know ZFS can analyze >> > >>the issue. >> > >> >> > >>You can use <stop-A> sync, <break> sync, or >> > >> >> > >>reboot -dq >> > >> >> >> That does appear to have caused a panic/kernel dump. However, I cannot >> find the dump image after rebooting to Solaris even thought savecore >> appears to be configured, >> >> # reboot -dq >> Jul 17 12:27:35 x4500gc reboot: rebooted by root >> >> panic[cpu2]/thread=ffffffff9823c460: forced crash dump initiated at user request >> >> fffffe8000e18d60 genunix:kadmin+4b4 () >> fffffe8000e18ec0 genunix:uadmin+93 () >> fffffe8000e18f10 unix:sys_syscall32+101 () >> >> syncing file systems... 1 1 done >> dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel >> 100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded >> rebooting... >> >> >> # dumpadm >> Dump content: kernel pages >> Dump device: /dev/md/dsk/d2 (swap) >> Savecore directory: /var/crash/x4500gc >> Savecore enabled: yes >> >> # ls -laR /var/crash/x4500gc/ >> /var/crash/x4500gc/: >> total 2 >> drwx------ 2 root root 512 Jul 12 16:26 . >> drwxr-xr-x 3 root root 512 Jul 12 16:26 .. >> >> >> Thanks. >> >> >> -- >> Stuart Anderson anderson at ligo.caltech.edu >> http://www.ligo.caltech.edu/~anderson
On Mon, Jul 16, 2007 at 09:36:06PM -0700, Stuart Anderson wrote:> Running Solaris 10 Update 3 on an X4500 I have found that it is possible > to reproducibly block all writes to a ZFS pool by running "chgrp -R" > on any large filesystem in that pool. As can be seen below in the zpool > iostat output below, after about 10-sec of running the chgrp command all > writes to the pool stop, and the pool starts exclusively running a slow > background task of 1kB reads. > > At this point the chgrp -R command is not killable via root kill -9, > and in fact even the command "halt -d" does not do anything. >For posterity this appears to have been fixed in S10U4, at least I am unable to reproduce the problem that was easy to trigger with S10U3. Thanks. -- Stuart Anderson anderson at ligo.caltech.edu http://www.ligo.caltech.edu/~anderson