Philip Manuel wrote:> We are running kernel 2.6.18-164.6.1.el5 with exporting 3 aoe provided
> ext4 directories. For a couple of weeks we had a small number of users
> using the system with no issues, today we added 7 users and the system
> crashed and did not perform correctly since.
>
> Nov 23 10:20:03 sulphur rpc.idmapd[5199]: nfsdcb: id '-2' too big!
> Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: Setting version failed:
> errno 16 (Device or resource busy)
> Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: unable to bind UPD socket:
> errno 98 (Address already in use)
> Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy():
> cache `nfsd4_files': Can't free all objects
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88645efd>]
> :nfsd:nfsd4_free_slab+0x11/0x4d
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88645f55>]
> :nfsd:nfsd4_free_slabs+0x1c/0x33
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>]
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>]
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>]
:nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
:nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
:nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: BUG: warning at
> fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G )
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88645f55>]
> :nfsd:nfsd4_free_slabs+0x1c/0x33
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>]
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>]
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>]
:nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
:nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
:nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy():
> cache `nfsd4_delegations': Can't free all objects
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88645efd>]
> :nfsd:nfsd4_free_slab+0x11/0x4d
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>]
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>]
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>]
:nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
:nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
> :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: BUG: warning at
> fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G )
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>]
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>]
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>]
> :nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
> :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>]
> :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: nfsd: last server has
> exited
> Nov 23 10:42:26 sulphur kernel: nfsd: unexporting all
> filesystems
> Nov 23 10:42:44 sulphur kernel: kmem_cache_create: duplicate cache
> nfsd4_files
> Nov 23 10:42:44 sulphur kernel: [<ffffffff88646f29>]
> :nfsd:nfs4_state_start+0x52/0x18f
> Nov 23 10:42:44 sulphur kernel: [<ffffffff886303ae>]
> :nfsd:nfsd_svc+0x6c/0x1e9
> Nov 23 10:42:44 sulphur kernel: [<ffffffff88630f8e>]
> :nfsd:write_threads+0x0/0xa9
> Nov 23 10:42:44 sulphur kernel: [<ffffffff88630ffd>]
> :nfsd:write_threads+0x6f/0xa9
> Nov 23 10:42:44 sulphur kernel: [<ffffffff88630f8e>]
> :nfsd:write_threads+0x0/0xa9
> Nov 23 10:42:44 sulphur kernel: [<ffffffff88630d59>]
> :nfsd:nfsctl_transaction_write+0x42/0x77Nov 23 10:42:44 sulphur
> nfsd[27369]: nfssvc: Cannot allocate memory
> Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: Setting version failed:
> errno 16 (Device or resource
> busy)
>
> Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: unable to bind UPD socket:
> errno 98 (Address already in use)
>
> So above shows the original problem and then me restarting it and
> eventually I had to reboot the server. Since then it has been behaving
> bizarrely with it running for 5 mins and then stopping, upon a restart
> it will run for a while and then stop.
> Nov 23 11:04:46 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as
> the NFSv4 state recovery directory
> Nov 23 11:17:02 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:29:01 sulphur kernel: nfsd: last server has exited
> Nov 23 11:29:01 sulphur kernel: nfsd: unexporting all filesystems
> Nov 23 11:29:08 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as
> the NFSv4 state recovery directory
> Nov 23 11:29:08 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:32:03 sulphur kernel: nfsd: last server has exited
> Nov 23 11:32:03 sulphur kernel: nfsd: unexporting all filesystems
> Nov 23 11:32:34 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as
> the NFSv4 state recovery directory
> Nov 23 11:32:34 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:41:58 sulphur kernel: nfsd: last server has exited
> Nov 23 11:41:58 sulphur kernel: nfsd: unexporting all filesystems
> Nov 23 11:42:03 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as
> the NFSv4 state recovery directory
> Nov 23 11:42:03 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:47:20 sulphur kernel: nfsd: last server has exited
> Nov 23 11:47:20 sulphur kernel: nfsd: unexporting all filesystems
>
> I haven't found a report of an issues for the "nfsdcb: id
'-2' too
> big!" message but equally I don't know what it means either.
>
> On the console we are seeing loads of these messages:-
>
> kernel: NFSD: preprocess_seqid_op: magic stateid!
>
> Again I don't know what this means or the implications of this message.
>
> Any suggestions would be welcome.
>
> At the moment we are up with two users migrated back to the old servers.
>
> Thanks
>
> Phil.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
Just a quick update, 4 hours later the message "
kernel: NFSD: preprocess_seqid_op: magic stateid!" has stopped, now to why
?
Thanks