> My current zfs setup lookst like this: > > homepool 3.63G 34.1G 8K /homepool > > homepool/db 61.6M 34.1G 8.50K /var/db > > homepool/db/pgsql 61.5M 34.1G 61.5M > > /var/db/pgsql > > homepool/home 3.57G 34.1G 10.0K /users > > homepool/home/carrie 8K 34.1G 8K > > /users/carrie > > homepool/home/posssumhaw 8K 34.1G 8K > > /users/posssumhaw > > homepool/home/weekleyj 3.57G 34.1G 3.57G > > /users/weekleyj > > > > NAME SIZE USED AVAIL CAP > > HEALTH ALTROOT > > homepool 38.0G 3.63G 34.4G 9% > > ONLINE - > > > > > > I was copying over the b27 boot cd via "find . > > -print | cpio > > -pdmv /home/weekleyj/CD" and terminated it with a > > ^C. > > At that point the machine panicked with all sorts of > > ZFS info, and > > dropped core in /var/crash/fugly. So far, I haven''t > > been able to > > reproduce. > >Relevant info from /var/adm/messaages Nov 19 09:09:34 fugly genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on /dev/dsk/c0d1 off 8504e8200: zio d5 3a5600 [L0 unallocated] vdev=0 offset=8500e8200 size=200L/200P/200A fletcher4 uncompressed LE contiguous birth=41837 fill=0 cksum=bcd3026a:5d6fdb3032:174cfede75e7:3e7752513e700): error 6 Nov 19 09:09:34 fugly unix: [ID 100000 kern.notice] Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5c7c zfs:zio_done+199 (d53a5600) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5c9c zfs:zio_next_stage+73 (d53a5600) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5cbc zfs:zio_wait_for_children+58 (d53a5600, 13, d53a5) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5cdc zfs:zio_wait_children_done+18 (d53a5600) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5cf8 zfs:zio_next_stage+73 (d53a5600) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5d2c zfs:zio_vdev_io_assess+c6 (d53a5600) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5d40 zfs:zio_next_stage+73 (d53a5600) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5d54 zfs:vdev_disk_io_done+2b (d53a5600, 0, d51e5d) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5d64 zfs:vdev_io_done+18 (d53a5600, fe964d5d,) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5d78 zfs:zio_vdev_io_done+e (d53a5600, 0, 0, 0, ) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5dc8 genunix:taskq_thread+16c (d45d51f0, 0) Nov 19 09:09:34 fugly genunix: [ID 353471 kern.notice] d51e5dd8 unix:thread_start+8 () Nov 19 09:09:34 fugly unix: [ID 100000 kern.notice] Nov 19 09:09:34 fugly genunix: [ID 672855 kern.notice] syncing file systems... Nov 19 09:09:34 fugly genunix: [ID 904073 kern.notice] done Nov 19 09:09:35 fugly genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0d0s1, offset 123863040, content: kernel Nov 19 09:09:56 fugly genunix: [ID 409368 kern.notice] ^M100% done: 69052 pages dumped, compression ratio 1.58, Nov 19 09:09:56 fugly genunix: [ID 851671 kern.notice] dump succeeded I''ve got the core if anyone wants it. Thanks, John This message posted from opensolaris.org
> I''ve got the core if anyone wants it.Yes, I want it. Just let me know where I can get it. Also, any information about the hardware would be helpful. This wouldn''t be a Tyan 2885 by any chance, would it? Thanks, Jeff This message posted from opensolaris.org
Hi Jeff Thanks for the reply. I''ll tar them up and put them on my server for you, and email you with account info separately. The motherboard is an old Asus K7M , Athlon 750Mhz 384 MB RAM onboard ATA controller. Primary master Seagate 40G ATA disk Secondary Master Maxtor 40G ATA Secondary Slave 12x Lite-on DVD RW Ancient hardware, but Solaris is quite usable on this stuff! This message posted from opensolaris.org
Did you receive the login info to retreive the core, Jeff? This message posted from opensolaris.org
John Weekley wrote:> Did you receive the login info to retreive the core, Jeff? > This message posted from opensolaris.org > _______________________________________________I''ve had the same thing happen to me; same stack trace. By any chance were you plugging or unplugging any usb/firewire disks when this happened? - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Nope, these are all internal ATA disks. I just killed a cpio from CD to a ZFS filesystem with a ^C. This message posted from opensolaris.org
Ugh. I''ve managed to hit this issue twice today on oceana.central. I hit it on 11/15 nevada bits so I upgraded to last night''s build and I hit it again. Core is NFS exported... (yes this is the flaky machine that is producing sporadic uncorrectable errors on my raidz. But it''s still rude of the box to just panic instead of retrying.) This message posted from opensolaris.org
> Ugh. I''ve managed to hit this issue twice today on oceana.central. > [...] it''s still rude of the box to just panic instead of retrying.Actually, ZFS does retry. The problem is that x86 disk I/O, across the board, is very broken in 27a. For more details see here: 6354389 cmlb_partinfo can return bogus info for apparently any device 6205971 ON bits shouldn''t be using obsoleted DDI DMA interfaces All it takes is either low memory or a hot-plug event to cause the disk geometry information to just *disappear* and render the device inaccessible. There''s really nothing we can do about it in ZFS. Both of these are P1 driver bugs with fixes in progress. Jeff
I''ve offered the dump, but so far there haven''t been any takers... This message posted from opensolaris.org
On Thu, Dec 01, 2005 at 10:15:43PM -0800, Jeff Bonwick wrote: | Actually, ZFS does retry. The problem is that x86 disk I/O, across | the board, is very broken in 27a. For more details see here: | | 6354389 cmlb_partinfo can return bogus info for apparently any device | 6205971 ON bits shouldn''t be using obsoleted DDI DMA interfaces | | All it takes is either low memory or a hot-plug event to cause the | disk geometry information to just *disappear* and render the device | inaccessible. There''s really nothing we can do about it in ZFS. | Both of these are P1 driver bugs with fixes in progress. Jeff, thanks for the info, this looks like exactly my problem. Certainly no hotplug here, I was burning CDs with the images on ZFS using an ATA burner, this may just be a failure mode nobody else has seen yet. -- Eric Lowe Solaris Kernel Development Austin, Texas Sun Microsystems. We make the net work. x64155/+1(512)401-1155
why are you asking for a tyan 2885 ? ... just asking because i have a 2881 (Thunder K8SR) and i experienced some similar problems (this board complains about the 131 errata of the opterons - but the latest bios does not have a fix for that). System setup: TYAN Thunder K8SR 2 * Opteron 270 4 GB Ram 1 SCSI (73GB) Boot & System Disk (Adaptec SCSI 29160 Adapter) 4 WD250 SATA (as raidz pool) The first installation went quite good until i mounted a zfs filesystem to /export and that mount point was already occupied by a scsi - partition -> complained, but system ran up to the next reboot. Then it panicked constantly complaining that it cannot mount the zfs filesystem on a directory which is not empty () ... i reinstalled because i was just playing around rem: a zone was configured on a zfs filesystem Next install went fine, all installed and started copying my old fileserver stuff to the new raidz pool - after 4 hours i came back home and machine was rebooting constantly and complaining about bad raidz blocks.... rem: i was using a zfs filesystem for 2 zones i configured any ideas ? This message posted from opensolaris.org
its me again..... i installed Solaris 10 (01/06) and experienced the same problems (total freezes) - but it could be the XServer as i found out in various posts on yahoo groups. I stopped working on the Desktop and disabled dtlogin ..... no freezes sofar. jens This message posted from opensolaris.org