CHU, STEPHEN H (ATTSI)
2011-Jan-18 14:39 UTC
[Lustre-discuss] RHEL54 / Lustre 2.0.0.1 mds_getxattr -95 errors
Hi all, I recently loaded the following on my testbed to try out Lustre 2.0: * One MDS - RHEL 54, Lustre 2.0 o mkfs.lustre --fsname=lufs --reformat --mgs --mdt --param lov.stripesize=25M --param lov.stripecount=1 /dev/sda2 o /etc/fstab = /dev/sda2 /lustre1-mgs-mds lustre rw,noauto,_netdev 0 0 * One OSS - RHEL 54, Lustre 2.0 o mkfs.lustre --fsname=lufs --reformat --ost --mgsnode=10.103.34.42 at o2ib0 /dev/sdb o mkfs.lustre --fsname=lufs --reformat --ost --mgsnode=10.103.34.42 at o2ib0 /dev/sdc o /etc/fstab = /dev/sdb /lustre1/ost1 lustre rw,noauto,_netdev 0 0 o /dev/sdc /lustre1/ost2 lustre rw,noauto,_netdev 0 0 * One Client - RHEL 54, Lustre 2.0 o /etc/fstab = 10.103.34.42 at o2ib0:/lufs /lustre1_fifo lustre rw,noauto,_netdev 0 0 /lustre-mgs-mds, /lustre1/ost1,2 mounted OK on MDS and OSS. /lustre1_fifo mounted OK with Client. So far so good. On Client, "cd /" and performed a "ls -l". The following messages immediately showed up: * On Client o Jan 18 14:04:19 bg8mo33sn kernel: LustreError: 11-0: an error occurred while communicating with 10.103.34.42 at o2ib. The mds_getxattr operation failed with -95 * On MDS o Jan 18 14:04:19 bg8mo33lm kernel: LustreError: 20056:0:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error (-95) req at ffff81031427e050 x1357944885151537/t0(0) o49->af4d23bf-2d42-6e4d-6afc-353425e513af at NET_0x500000a672229_UUID:0/0 lens 448/328 e 0 to 0 dl 1295359465 ref 1 fl Interpret:/ffffffff/ffffffff rc -95/-1 All IBs between all nodes are alive and seeing each other with no problem. The client also NFS exports /lustre1_fifo via: * /etc/exports = /lustre1_fifo testhost1(rw,sync,no_root_squash) After testhost1 mounted /lustre1_fifo and attempt to "mkdir test" under it, the same error messages from above showed up on the MDS and the client. "-95" as "Operation not supported on transport endpoint" but what does it mean here in this context. The same MDS/OSS/Client setup/arrangements ran fine on RHEL 5.3 and Lustre 1.8.1.1 with no errors. Appreciate any help/insight. Thanks. Steve Stephen Chu AT&T Labs CSO C5-3C03 200 Laurel Ave Middletown, NJ (732) 420-0588 stephenchu at att.com "This e-mail and any files transmitted with it are AT&T property, are confidential, and are intended solely for the use of the individual or entity to whom this email is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, retention, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110118/3024182b/attachment-0001.html
Fan Yong
2011-Jan-19 02:51 UTC
[Lustre-discuss] RHEL54 / Lustre 2.0.0.1 mds_getxattr -95 errors
I think you met the issues I mentioned in: http://jira.whamcloud.com/browse/ORNL-3 I have made patch for it and in testing. -- Nasf On 1/18/11 10:39 PM, CHU, STEPHEN H (ATTSI) wrote:> > Hi all, > > I recently loaded the following on my testbed to try out Lustre 2.0: > > ? One MDS -- RHEL 54, Lustre 2.0 > > o mkfs.lustre --fsname=lufs --reformat --mgs --mdt --param > lov.stripesize=25M --param lov.stripecount=1 /dev/sda2 > > o /etc/fstab = /dev/sda2 /lustre1-mgs-mds lustre > rw,noauto,_netdev 0 0 > > ? One OSS -- RHEL 54, Lustre 2.0 > > o mkfs.lustre --fsname=lufs --reformat --ost > --mgsnode=10.103.34.42 at o2ib0 /dev/sdb > > o mkfs.lustre --fsname=lufs --reformat --ost > --mgsnode=10.103.34.42 at o2ib0 /dev/sdc > > o /etc/fstab = /dev/sdb /lustre1/ost1 lustre > rw,noauto,_netdev 0 0 > > o /dev/sdc /lustre1/ost2 lustre > rw,noauto,_netdev 0 0 > > ? One Client -- RHEL 54, Lustre 2.0 > > o /etc/fstab = 10.103.34.42 at o2ib0:/lufs /lustre1_fifo > lustre rw,noauto,_netdev 0 0 > > /lustre-mgs-mds, /lustre1/ost1,2 mounted OK on MDS and OSS. > /lustre1_fifo mounted OK with Client. So far so good. > > On Client, "cd /" and performed a "ls --l". The following messages > immediately showed up: > > ? On Client > > o Jan 18 14:04:19 bg8mo33sn kernel: LustreError: 11-0: an error > occurred while communicating with 10.103.34.42 at o2ib. The mds_getxattr > operation failed with -95 > > ? On MDS > > o Jan 18 14:04:19 bg8mo33lm kernel: LustreError: > 20056:0:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error > (-95) req at ffff81031427e050 x1357944885151537/t0(0) > o49->af4d23bf-2d42-6e4d-6afc-353425e513af at NET_0x500000a672229_UUID:0/0 > lens 448/328 e 0 to 0 dl 1295359465 ref 1 fl > Interpret:/ffffffff/ffffffff rc -95/-1 > > All IBs between all nodes are alive and seeing each other with no problem. > > The client also NFS exports /lustre1_fifo via: > > ? /etc/exports = /lustre1_fifo testhost1(rw,sync,no_root_squash) > > After testhost1 mounted /lustre1_fifo and attempt to "mkdir test" > under it, the same error messages from above showed up on the MDS and > the client. "-95" as "Operation not supported on transport endpoint" > but what does it mean here in this context. > > The same MDS/OSS/Client setup/arrangements ran fine on RHEL 5.3 and > Lustre 1.8.1.1 with no errors. > > Appreciate any help/insight. Thanks. > > */Steve/* > > */Stephen Chu/* > > *AT&T Labs CSO* > > C5-3C03 > > 200 Laurel Ave > > Middletown, NJ > > (732) 420-0588 > > stephenchu at att.com > > /"This e-mail and any files transmitted with it are AT&T property, are > confidential, and are intended solely for the use of the individual or > entity to whom this email is addressed. If you are not one of the > named recipient(s) or otherwise have reason to believe that you have > received this message in error, please notify the sender and delete > this message immediately from your computer. Any other use, retention, > dissemination, forwarding, printing, or copying of this e-mail is > strictly prohibited."/ > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110119/be6d6b06/attachment.html
Fan Yong
2011-Jan-19 03:21 UTC
[Lustre-discuss] RHEL54 / Lustre 2.0.0.1 mds_getxattr -95 errors
I think you met the issues I mentioned in: http://jira.whamcloud.com/browse/ORNL-3 I have made patch for it and in testing. Before the patch landed, you can mount MDT with "-o acl" or "-o noacl" explicitly to avoid these confused messages. -- Nasf On 1/18/11 10:39 PM, CHU, STEPHEN H (ATTSI) wrote:> > Hi all, > > I recently loaded the following on my testbed to try out Lustre 2.0: > > ? One MDS -- RHEL 54, Lustre 2.0 > > o mkfs.lustre --fsname=lufs --reformat --mgs --mdt --param > lov.stripesize=25M --param lov.stripecount=1 /dev/sda2 > > o /etc/fstab = /dev/sda2 /lustre1-mgs-mds lustre > rw,noauto,_netdev 0 0 > > ? One OSS -- RHEL 54, Lustre 2.0 > > o mkfs.lustre --fsname=lufs --reformat --ost > --mgsnode=10.103.34.42 at o2ib0 /dev/sdb > > o mkfs.lustre --fsname=lufs --reformat --ost > --mgsnode=10.103.34.42 at o2ib0 /dev/sdc > > o /etc/fstab = /dev/sdb /lustre1/ost1 lustre > rw,noauto,_netdev 0 0 > > o /dev/sdc /lustre1/ost2 lustre > rw,noauto,_netdev 0 0 > > ? One Client -- RHEL 54, Lustre 2.0 > > o /etc/fstab = 10.103.34.42 at o2ib0:/lufs /lustre1_fifo > lustre rw,noauto,_netdev 0 0 > > /lustre-mgs-mds, /lustre1/ost1,2 mounted OK on MDS and OSS. > /lustre1_fifo mounted OK with Client. So far so good. > > On Client, "cd /" and performed a "ls --l". The following messages > immediately showed up: > > ? On Client > > o Jan 18 14:04:19 bg8mo33sn kernel: LustreError: 11-0: an error > occurred while communicating with 10.103.34.42 at o2ib. The mds_getxattr > operation failed with -95 > > ? On MDS > > o Jan 18 14:04:19 bg8mo33lm kernel: LustreError: > 20056:0:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error > (-95) req at ffff81031427e050 x1357944885151537/t0(0) > o49->af4d23bf-2d42-6e4d-6afc-353425e513af at NET_0x500000a672229_UUID:0/0 > lens 448/328 e 0 to 0 dl 1295359465 ref 1 fl > Interpret:/ffffffff/ffffffff rc -95/-1 > > All IBs between all nodes are alive and seeing each other with no problem. > > The client also NFS exports /lustre1_fifo via: > > ? /etc/exports = /lustre1_fifo testhost1(rw,sync,no_root_squash) > > After testhost1 mounted /lustre1_fifo and attempt to "mkdir test" > under it, the same error messages from above showed up on the MDS and > the client. "-95" as "Operation not supported on transport endpoint" > but what does it mean here in this context. > > The same MDS/OSS/Client setup/arrangements ran fine on RHEL 5.3 and > Lustre 1.8.1.1 with no errors. > > Appreciate any help/insight. Thanks. > > */Steve/* > > */Stephen Chu/* > > *AT&T Labs CSO* > > C5-3C03 > > 200 Laurel Ave > > Middletown, NJ > > (732) 420-0588 > > stephenchu at att.com > > /"This e-mail and any files transmitted with it are AT&T property, are > confidential, and are intended solely for the use of the individual or > entity to whom this email is addressed. If you are not one of the > named recipient(s) or otherwise have reason to believe that you have > received this message in error, please notify the sender and delete > this message immediately from your computer. Any other use, retention, > dissemination, forwarding, printing, or copying of this e-mail is > strictly prohibited."/ > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110119/26e1f8a8/attachment-0001.html
CHU, STEPHEN H (ATTSI)
2011-Jan-19 13:38 UTC
[Lustre-discuss] RHEL54 / Lustre 2.0.0.1 mds_getxattr -95 errors
Yong, Thanks. I''ll try the explicit mount option. I assume the patch will land in 2.1. Steve From: Fan Yong [mailto:yong.fan at whamcloud.com] Sent: Tuesday, January 18, 2011 10:21 PM To: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] RHEL54 / Lustre 2.0.0.1 mds_getxattr -95 errors I think you met the issues I mentioned in: http://jira.whamcloud.com/browse/ORNL-3 I have made patch for it and in testing. Before the patch landed, you can mount MDT with "-o acl" or "-o noacl" explicitly to avoid these confused messages. -- Nasf On 1/18/11 10:39 PM, CHU, STEPHEN H (ATTSI) wrote: Hi all, I recently loaded the following on my testbed to try out Lustre 2.0: One MDS - RHEL 54, Lustre 2.0 mkfs.lustre --fsname=lufs --reformat --mgs --mdt --param lov.stripesize=25M --param lov.stripecount=1 /dev/sda2 /etc/fstab = /dev/sda2 /lustre1-mgs-mds lustre rw,noauto,_netdev 0 0 One OSS - RHEL 54, Lustre 2.0 mkfs.lustre --fsname=lufs --reformat --ost --mgsnode=10.103.34.42 at o2ib0 /dev/sdb mkfs.lustre --fsname=lufs --reformat --ost --mgsnode=10.103.34.42 at o2ib0 /dev/sdc /etc/fstab = /dev/sdb /lustre1/ost1 lustre rw,noauto,_netdev 0 0 /dev/sdc /lustre1/ost2 lustre rw,noauto,_netdev 0 0 One Client - RHEL 54, Lustre 2.0 /etc/fstab = 10.103.34.42 at o2ib0:/lufs /lustre1_fifo lustre rw,noauto,_netdev 0 0 /lustre-mgs-mds, /lustre1/ost1,2 mounted OK on MDS and OSS. /lustre1_fifo mounted OK with Client. So far so good. On Client, "cd /" and performed a "ls -l". The following messages immediately showed up: On Client Jan 18 14:04:19 bg8mo33sn kernel: LustreError: 11-0: an error occurred while communicating with 10.103.34.42 at o2ib. The mds_getxattr operation failed with -95 On MDS Jan 18 14:04:19 bg8mo33lm kernel: LustreError: 20056:0:(ldlm_lib.c:2123:target_send_reply_msg()) @@@ processing error (-95) req at ffff81031427e050 x1357944885151537/t0(0) o49->af4d23bf-2d42-6e4d-6afc-353425e513af at NET_0x500000a672229_UUID:0/0 lens 448/328 e 0 to 0 dl 1295359465 ref 1 fl Interpret:/ffffffff/ffffffff rc -95/-1 All IBs between all nodes are alive and seeing each other with no problem. The client also NFS exports /lustre1_fifo via: /etc/exports = /lustre1_fifo testhost1(rw,sync,no_root_squash) After testhost1 mounted /lustre1_fifo and attempt to "mkdir test" under it, the same error messages from above showed up on the MDS and the client. "-95" as "Operation not supported on transport endpoint" but what does it mean here in this context. The same MDS/OSS/Client setup/arrangements ran fine on RHEL 5.3 and Lustre 1.8.1.1 with no errors. Appreciate any help/insight. Thanks. Steve Stephen Chu AT&T Labs CSO C5-3C03 200 Laurel Ave Middletown, NJ (732) 420-0588 stephenchu at att.com "This e-mail and any files transmitted with it are AT&T property, are confidential, and are intended solely for the use of the individual or entity to whom this email is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, retention, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited." _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110119/25826cbb/attachment.html