I am having a problem that I am hoping someone might have some insight in to. I am running a x4500 with solaris 5.10 and a zfs filesystem named nasPool. I am also running NetBackup on the box as well...server and client all in one. I have had this up and running for sometime now and recently ran in to a problem that Netbackup, running as root, cannot seem to write to a directory backup and its subdirectories on the zfs filesystem. The directory backup has ownership of backup:backup and at this point also has perms of 777 (did that while trying to figure out this issue). Netbackup cannot write to those directories any longer. Any insight in to this would be greatly appreciated. Thank you in advance. -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com
Actually, the issue seems to be more than what I described below. I cannot seemingly issue any zfs or zpool commands short of just zpool status -x , giving a ''healthy'' status. If I do zpool status , I get the following: root at ec1-nas1# zpool status pool: nasPool state: ONLINE scrub: none requested But then it freezes there. This used to return fairly quickly. Where can I go to see what might be causing this? I see nothing in the message logs. -thx On 2/2/09 9:57 AM, "Matthew Arguin" <marguin at jpr-inc.com> wrote: I am having a problem that I am hoping someone might have some insight in to. I am running a x4500 with solaris 5.10 and a zfs filesystem named nasPool. I am also running NetBackup on the box as well...server and client all in one. I have had this up and running for sometime now and recently ran in to a problem that Netbackup, running as root, cannot seem to write to a directory backup and its subdirectories on the zfs filesystem. The directory backup has ownership of backup:backup and at this point also has perms of 777 (did that while trying to figure out this issue). Netbackup cannot write to those directories any longer. Any insight in to this would be greatly appreciated. Thank you in advance. -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090202/35a69aaf/attachment.html>
Hi Matthew, I''ve seen similar behavior with X4500s on both update 4 and update 5. In both cases it was a failed disk. In both our cases any command that accessed a disk in the zpool triggered a hang I started with dmesg, then /var/adm/messages. In some cases I was able to run `hd` and see where the disk enumeration would stop. The next logical disk in the enumeration is likely your bad disk. Assuming you have your zpool in a RAID or Mirror configuration remove the drive that didn''t show up in the `hd` output then try to run a zpool command. You might also simply try opening the disk cover on the chassis and see if you have a blue eject LED lit, or yellow fault LED. ..Matt Snow ________________________________ From: Matthew Arguin <marguin at jpr-inc.com> Date: Mon, 2 Feb 2009 07:15:05 -0800 To: <zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] Zfs and permissions Actually, the issue seems to be more than what I described below. I cannot seemingly issue any zfs or zpool commands short of just zpool status -x , giving a ''healthy'' status. If I do zpool status , I get the following: root at ec1-nas1# zpool status pool: nasPool state: ONLINE scrub: none requested But then it freezes there. This used to return fairly quickly. Where can I go to see what might be causing this? I see nothing in the message logs. -thx On 2/2/09 9:57 AM, "Matthew Arguin" <marguin at jpr-inc.com> wrote: I am having a problem that I am hoping someone might have some insight in to. I am running a x4500 with solaris 5.10 and a zfs filesystem named nasPool. I am also running NetBackup on the box as well...server and client all in one. I have had this up and running for sometime now and recently ran in to a problem that Netbackup, running as root, cannot seem to write to a directory backup and its subdirectories on the zfs filesystem. The directory backup has ownership of backup:backup and at this point also has perms of 777 (did that while trying to figure out this issue). Netbackup cannot write to those directories any longer. Any insight in to this would be greatly appreciated. Thank you in advance. -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090203/9f67dfee/attachment.html>
Thanks for the reply Matt. After some more digging around, here is what I found. I have about 1/2 dozen various rsync jobs that back up various data from various locations. ps-ef | grep rsync showed each of these jobs listed many times, so for some reason they were not finishing. I think this is a cause potentially, rather than a symptom, but that is a bit of a guess right now. Anyway, a power cycle of the box resolves the access issue, but it creeps back up quickly. I disabled all my cronjobs and am testing them 1 by 1. I can now issue the zpool status command and get all healthy What is the ''hd'' command you speak of? Also, I do not show any drives in error via the ilom interface, all green. Thanks, Matthew On 2/3/09 1:13 PM, "Matt Snow" <msnow at greenplum.com> wrote: Hi Matthew, I''ve seen similar behavior with X4500s on both update 4 and update 5. In both cases it was a failed disk. In both our cases any command that accessed a disk in the zpool triggered a hang I started with dmesg, then /var/adm/messages. In some cases I was able to run `hd` and see where the disk enumeration would stop. The next logical disk in the enumeration is likely your bad disk. Assuming you have your zpool in a RAID or Mirror configuration remove the drive that didn''t show up in the `hd` output then try to run a zpool command. You might also simply try opening the disk cover on the chassis and see if you have a blue eject LED lit, or yellow fault LED. ..Matt Snow ________________________________ From: Matthew Arguin <marguin at jpr-inc.com> Date: Mon, 2 Feb 2009 07:15:05 -0800 To: <zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] Zfs and permissions Actually, the issue seems to be more than what I described below. I cannot seemingly issue any zfs or zpool commands short of just zpool status -x , giving a ''healthy'' status. If I do zpool status , I get the following: root at ec1-nas1# zpool status pool: nasPool state: ONLINE scrub: none requested But then it freezes there. This used to return fairly quickly. Where can I go to see what might be causing this? I see nothing in the message logs. -thx On 2/2/09 9:57 AM, "Matthew Arguin" <marguin at jpr-inc.com> wrote: I am having a problem that I am hoping someone might have some insight in to. I am running a x4500 with solaris 5.10 and a zfs filesystem named nasPool. I am also running NetBackup on the box as well...server and client all in one. I have had this up and running for sometime now and recently ran in to a problem that Netbackup, running as root, cannot seem to write to a directory backup and its subdirectories on the zfs filesystem. The directory backup has ownership of backup:backup and at this point also has perms of 777 (did that while trying to figure out this issue). Netbackup cannot write to those directories any longer. Any insight in to this would be greatly appreciated. Thank you in advance. -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090203/a53803de/attachment.html>
Hd is a utility for the X4500/x4540 systems that shows you a bit more than cfgadm. Actually a lot more. :) http://docs.sun.com/source/819-4363-11/hd_util10.html I use `hd -R` to view the S.M.A.R.T. Values to determine of there is a bad disk. Run `hd -r` to view the long format and see the SMART IDs. ID 5, 196, 197, 198, and 199 should be zero. If they are anything but zero I replace the disk. Down this file from Sun: http://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/VerifyItem-Start/X4500_Tools_And_Drivers_solaris_42606a.tar.bz2?BundledLineItemUUID=ZldIBe.pxW0AAAEfRnkk4z3e&OrderID=9ENIBe.pyJEAAAEf.Xgk4z3e&ProductID=ldpIBe.o9ykAAAEctThSCJEY&FileName=/X4500_Tools_And_Drivers_solaris_42606a.tar.bz2 Extract and install solaris/tools/hdtool/SUNWhd-1.07.pkg. ..Matt ________________________________ From: Matthew Arguin <marguin at jpr-inc.com> Date: Tue, 3 Feb 2009 10:21:35 -0800 To: Matt Snow <msnow at greenplum.com>, <zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] Zfs and permissions Thanks for the reply Matt. After some more digging around, here is what I found. I have about 1/2 dozen various rsync jobs that back up various data from various locations. ps-ef | grep rsync showed each of these jobs listed many times, so for some reason they were not finishing. I think this is a cause potentially, rather than a symptom, but that is a bit of a guess right now. Anyway, a power cycle of the box resolves the access issue, but it creeps back up quickly. I disabled all my cronjobs and am testing them 1 by 1. I can now issue the zpool status command and get all healthy What is the ''hd'' command you speak of? Also, I do not show any drives in error via the ilom interface, all green. Thanks, Matthew On 2/3/09 1:13 PM, "Matt Snow" <msnow at greenplum.com> wrote: Hi Matthew, I''ve seen similar behavior with X4500s on both update 4 and update 5. In both cases it was a failed disk. In both our cases any command that accessed a disk in the zpool triggered a hang I started with dmesg, then /var/adm/messages. In some cases I was able to run `hd` and see where the disk enumeration would stop. The next logical disk in the enumeration is likely your bad disk. Assuming you have your zpool in a RAID or Mirror configuration remove the drive that didn''t show up in the `hd` output then try to run a zpool command. You might also simply try opening the disk cover on the chassis and see if you have a blue eject LED lit, or yellow fault LED. ..Matt Snow ________________________________ From: Matthew Arguin <marguin at jpr-inc.com> Date: Mon, 2 Feb 2009 07:15:05 -0800 To: <zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] Zfs and permissions Actually, the issue seems to be more than what I described below. I cannot seemingly issue any zfs or zpool commands short of just zpool status -x , giving a ''healthy'' status. If I do zpool status , I get the following: root at ec1-nas1# zpool status pool: nasPool state: ONLINE scrub: none requested But then it freezes there. This used to return fairly quickly. Where can I go to see what might be causing this? I see nothing in the message logs. -thx On 2/2/09 9:57 AM, "Matthew Arguin" <marguin at jpr-inc.com> wrote: I am having a problem that I am hoping someone might have some insight in to. I am running a x4500 with solaris 5.10 and a zfs filesystem named nasPool. I am also running NetBackup on the box as well...server and client all in one. I have had this up and running for sometime now and recently ran in to a problem that Netbackup, running as root, cannot seem to write to a directory backup and its subdirectories on the zfs filesystem. The directory backup has ownership of backup:backup and at this point also has perms of 777 (did that while trying to figure out this issue). Netbackup cannot write to those directories any longer. Any insight in to this would be greatly appreciated. Thank you in advance. -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Matthew Arguin Production Support Jackpotrewards, Inc. 275 Grove St Newton, MA 02466 617-795-2850 x 2325 www.jackpotrewards.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090203/054a8608/attachment.html>