thr3ads.net - zfs discuss - [zfs-discuss] possible ZFS-related panic? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Marion Hakanson

2010-Sep-03 02:31 UTC

[zfs-discuss] possible ZFS-related panic?

Folks,

Has anyone seen a panic traceback like the following?  This is Solaris-10u7
on a Thumper, acting as an NFS server.  The machine was up for nearly a
year, I added a dataset to an existing pool, set compression=on for the
first time on this system, loaded some data in there (via "rsync"),
then mounted it to the NFS client.

The first data was written by the client itself in a 10pm cron-job, and
the system crashed at 10:02pm as below:

panic[cpu2]/thread=fffffe8000f5cc60: page_sub: bad arg(s): pp 
ffffffff872b5610, *ppp 0

fffffe8000f5c470 unix:mutex_exit_critical_size+20219 ()
fffffe8000f5c4b0 unix:page_list_sub_pages+161 ()
fffffe8000f5c510 unix:page_claim_contig_pages+190 ()
fffffe8000f5c600 unix:page_geti_contig_pages+44b ()
fffffe8000f5c660 unix:page_get_contig_pages+c2 ()
fffffe8000f5c6f0 unix:page_get_freelist+1a4 ()
fffffe8000f5c760 unix:page_create_get_something+95 ()
fffffe8000f5c7f0 unix:page_create_va+2a1 ()
fffffe8000f5c850 unix:segkmem_page_create+72 ()
fffffe8000f5c8b0 unix:segkmem_xalloc+60 ()
fffffe8000f5c8e0 unix:segkmem_alloc_vn+8a ()
fffffe8000f5c8f0 unix:segkmem_alloc+10 ()
fffffe8000f5c9c0 genunix:vmem_xalloc+315 ()
fffffe8000f5ca20 genunix:vmem_alloc+155 ()
fffffe8000f5ca90 genunix:kmem_slab_create+77 ()
fffffe8000f5cac0 genunix:kmem_slab_alloc+107 ()
fffffe8000f5caf0 genunix:kmem_cache_alloc+e9 ()
fffffe8000f5cb00 zfs:zio_buf_alloc+1d ()
fffffe8000f5cb50 zfs:zio_compress_data+ba ()
fffffe8000f5cba0 zfs:zio_write_compress+78 ()
fffffe8000f5cbc0 zfs:zio_execute+60 ()
fffffe8000f5cc40 genunix:taskq_thread+bc ()
fffffe8000f5cc50 unix:thread_start+8 ()

syncing file systems... done
. . .

Unencumbered by more than a gut feeling, I disabled compression on
the dataset, and we''ve gotten through two nightly runs of the same
NFS client job without crashing, but of course we would tecnically
have to wait for nearly a year before we''ve exactly replicated the
original situation (:-).

Unfortunately the dump-slice was slightly too small, we were just short
of enough space to capture the whole 10GB crash dump.  I did get savecore
to write something out, and I uploaded it to the Oracle support site,but it 
gives "scat" too much indigestion to be useful to the engineer
I''m working
with.  They have not found any matching bugs so far, so I thought I''d
ask a
slightly wider audience here.

Thanks and regards,

Marion

Cindy Swearingen

2010-Sep-03 15:59 UTC

head link

[zfs-discuss] possible ZFS-related panic?

Hi Marion,

I''m not the right person to analyze your panic stack, but a quick
search says the page_sub: bad arg(s): pp panic string might be
associated with a bad CPU or a page locking problem.

I would recommend running CPU/memory diagnostics on this system.


Thanks,

Cindy

On 09/02/10 20:31, Marion Hakanson wrote:> Folks,
> 
> Has anyone seen a panic traceback like the following?  This is Solaris-10u7
> on a Thumper, acting as an NFS server.  The machine was up for nearly a
> year, I added a dataset to an existing pool, set compression=on for the
> first time on this system, loaded some data in there (via
"rsync"),
> then mounted it to the NFS client.
> 
> The first data was written by the client itself in a 10pm cron-job, and
> the system crashed at 10:02pm as below:
> 
> panic[cpu2]/thread=fffffe8000f5cc60: page_sub: bad arg(s): pp 
> ffffffff872b5610, *ppp 0
> 
> fffffe8000f5c470 unix:mutex_exit_critical_size+20219 ()
> fffffe8000f5c4b0 unix:page_list_sub_pages+161 ()
> fffffe8000f5c510 unix:page_claim_contig_pages+190 ()
> fffffe8000f5c600 unix:page_geti_contig_pages+44b ()
> fffffe8000f5c660 unix:page_get_contig_pages+c2 ()
> fffffe8000f5c6f0 unix:page_get_freelist+1a4 ()
> fffffe8000f5c760 unix:page_create_get_something+95 ()
> fffffe8000f5c7f0 unix:page_create_va+2a1 ()
> fffffe8000f5c850 unix:segkmem_page_create+72 ()
> fffffe8000f5c8b0 unix:segkmem_xalloc+60 ()
> fffffe8000f5c8e0 unix:segkmem_alloc_vn+8a ()
> fffffe8000f5c8f0 unix:segkmem_alloc+10 ()
> fffffe8000f5c9c0 genunix:vmem_xalloc+315 ()
> fffffe8000f5ca20 genunix:vmem_alloc+155 ()
> fffffe8000f5ca90 genunix:kmem_slab_create+77 ()
> fffffe8000f5cac0 genunix:kmem_slab_alloc+107 ()
> fffffe8000f5caf0 genunix:kmem_cache_alloc+e9 ()
> fffffe8000f5cb00 zfs:zio_buf_alloc+1d ()
> fffffe8000f5cb50 zfs:zio_compress_data+ba ()
> fffffe8000f5cba0 zfs:zio_write_compress+78 ()
> fffffe8000f5cbc0 zfs:zio_execute+60 ()
> fffffe8000f5cc40 genunix:taskq_thread+bc ()
> fffffe8000f5cc50 unix:thread_start+8 ()
> 
> syncing file systems... done
> . . .
> 
> Unencumbered by more than a gut feeling, I disabled compression on
> the dataset, and we''ve gotten through two nightly runs of the same
> NFS client job without crashing, but of course we would tecnically
> have to wait for nearly a year before we''ve exactly replicated the
> original situation (:-).
> 
> Unfortunately the dump-slice was slightly too small, we were just short
> of enough space to capture the whole 10GB crash dump.  I did get savecore
> to write something out, and I uploaded it to the Oracle support site,but it
> gives "scat" too much indigestion to be useful to the engineer
I''m working
> with.  They have not found any matching bugs so far, so I thought
I''d ask a
> slightly wider audience here.
> 
> Thanks and regards,
> 
> Marion
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Sep 2010 - possible ZFS-related panic?

[zfs-discuss] possible ZFS-related panic?

[zfs-discuss] possible ZFS-related panic?