Marcis Lielturks
2011-Feb-10 14:57 UTC
[zfs-discuss] smbd becomes unresponsive on snv_151a
Hi! We have some trouble with CIFS. Two servers (n20 and n30), both configured the same way and running Solaris 11 Express snv_151a. From time to time CIFS service on n20 locks up, becomes unresponsive and impossible to restart, even "kill -9 <pid_of_smbd>" can''t kill it. Only system reboot helps to solve the problem. Servers had deduplication turned on ZFS datasets for some time and on n20 it was only recently turned off and still contains deduped data. Servers periodically create and rotate ZFS snapshots on shared filesystems. We cant get any definite evidence on what is causing the trouble. Can somebody advise on how to debug this? We can provide more information if necessary or even organize remote access if there is interest in it. Thanks! -- This message posted from opensolaris.org
Are you running CIFS with any AD integration, or is it functioning in work-group mode? Do you have lockups only when you transfer a lot of data, or will it lock up without any machine working on the CIFS share? How long is "time to time"? -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Marcis Lielturks Sent: Thursday, February 10, 2011 10:57 AM To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] smbd becomes unresponsive on snv_151a Hi! We have some trouble with CIFS. Two servers (n20 and n30), both configured the same way and running Solaris 11 Express snv_151a. From time to time CIFS service on n20 locks up, becomes unresponsive and impossible to restart, even "kill -9 <pid_of_smbd>" can''t kill it. Only system reboot helps to solve the problem. Servers had deduplication turned on ZFS datasets for some time and on n20 it was only recently turned off and still contains deduped data. Servers periodically create and rotate ZFS snapshots on shared filesystems. We cant get any definite evidence on what is causing the trouble. Can somebody advise on how to debug this? We can provide more information if necessary or even organize remote access if there is interest in it. Thanks! -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marcis Lielturks
2011-Feb-11 09:36 UTC
[zfs-discuss] smbd becomes unresponsive on snv_151a
It is integrated with AD who has a lot of domain trusts, lockups happen during working hours, when CIFS is quite busy with a lot of users (~216) and files open(~1723). I don''t know about IO load at time when lockup happens, but usually it is not doing heavy IO. "time to time" is varying - first lockup happened ~13 days after OS was installed, then it worked for another 16 and hanged again. Third lockup (last so far) happened 5 days after 2nd lockup. cifs-discuss suggested that this might be "6996574 smbd intermittently hangs" which seems very likely. Thanks! -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2011-Feb-12 12:37 UTC
[zfs-discuss] smbd becomes unresponsive on snv_151a
> Servers had deduplication turned on ZFS datasets for some time and on > n20 it was only recently turned off and still contains deduped data. > Servers periodically create and rotate ZFS snapshots on shared > filesystems. We cant get any definite evidence on what is causing the > trouble. Can somebody advise on how to debug this? We can provide more > information if necessary or even organize remote access if there is > interest in it.How large is the dedeuped dataset? And how much memory/L2ARC do you have in the system? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Marcis Lielturks
2011-Feb-15 10:25 UTC
[zfs-discuss] smbd becomes unresponsive on snv_151a
Deduped dataset is 2.1TB, no L2ARC and server has 64GB RAM. We have currently ruled out possibility that this is related to dedup and ZFS and working to get fix for "6996574 smbd intermittently hangs". Thanks! -- This message posted from opensolaris.org