zfs discuss - Jan 2007 - hard-hang on snapshot rename

[Initial version of this message originally sent to zfs-interest by
mistake.  Sorry if this appears anywhere as a duplicate.]

I was noodling around with creating a backup script for my home
system, and I ran into a problem that I''m having a little trouble
diagnosing.  Has anyone seen anything like this or have any debug
advice?

I did a "zfs create -r" to set a snapshot on all of the members of a
given pool.  Later, for reasons that are probably obscure, I wanted to
rename that snapshot.  There''s no "zfs rename -r" function,
so I tried
to write a crude one on my own:

zfs list -rHo name -t filesystem pool |
while read name; do
	zfs rename $name at foo $name at bar
done

The results were disappointing.  The system was extremely busy for a
moment and then went completely catatonic.  Most network traffic
appeared to stop, though I _think_ network driver interrupts were
still working.  The keyboard and mouse (traditional PS/2 types; not
USB) went dead -- not even keyboard lights were working (nothing from
Caps Lock).  The disk light stopped flashing and went dark.  The CPU
temperature started to climb (as measured by an external sensor).  No
messages were written to /var/adm/messages or dmesg on reboot.

The system turned into an increasingly warm brick.  As all of my
inputs to the system were gone, I really had no good way immediately
available to debug the problem.  Thinking this was just a fluke or
perhaps something induced by hardware, I shut everything down, cooled
off, and tried again.  Three times.  The same thing happened each
time.

System details:

  - snv_55

  - Tyan 2885 motherboard with 4GB RAM (four 1GB modules) and one
    Opteron 246 (model 5 step 8).

  - AMI BIOS version 080010, dated 06/14/2005.  No tweaks applied,
    system is always on; no power management.

  - Silicon Image 3114 SATA controller configured for legacy (not
    RAID) mode.

  - Three SATA disks in the system, no IDE as they''ve gone to the
    great bit-bucket in the sky.  The SATA drives are one WDC
    WD740GD-32F (not part of this ZFS pool), and a pair of
    ST3250623NS.

  - The two Seagate drives are partitioned like this:

  0       root    wm       3 -   655        5.00GB    (653/0/0)    10490445
  1       swap    wm     656 -   916        2.00GB    (261/0/0)     4192965
  2     backup    wu       0 - 30397      232.86GB    (30398/0/0) 488343870
  3   reserved    wm     917 -   917        7.84MB    (1/0/0)         16065
  4 unassigned    wu       0                0         (0/0/0)             0
  5 unassigned    wu       0                0         (0/0/0)             0
  6 unassigned    wu       0                0         (0/0/0)             0
  7       home    wm     918 - 30397      225.83GB    (29480/0/0) 473596200
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 alternates    wm       1 -     2       15.69MB    (2/0/0)         32130

  - For both disks: slice 0 is for an SVM mirrored root, slice 1 has
    swap, slice 3 has the SVM metadata, and slice 7 is in the ZFS pool
    named "pool" as a mirror.  No, I''m not using whole-disk
or EFI.

  - Zpool status:

  pool: pool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4d0s7  ONLINE       0     0     0
            c4d1s7  ONLINE       0     0     0

  - ''zfs list -rt filesystem pool | wc -l'' says 37.

  - Iostat -E doesn''t show any errors of any kind on the drives.

  - I read through CR 6421427, but that seems to be SPARC-only.

Next step will probably be to set the ''snooping'' flag and
maybe hack
the bge driver to do an abort_sequence_enter() call on a magic packet
so that I can wrest control back.  Before I do something that drastic,
does anyone else have ideas?

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

zfs discuss - Jan 2007 - hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

[zfs-discuss] hard-hang on snapshot rename

Possibly Parallel Threads