We''ve been running Lustre happily for a few months now, but we have one client that can be troublesome at times and it happens to be the most important client. Its our "file server" client as it runs NFS and Samba. I''m not sure where to start. I''ve seen this client disconnect from lustre nodes, but then recover and reconnect. There are hundreds of messages in dmesg about a few inodes. The big problem happened a few weeks ago when this client was booted and never could reconnect. The client and the lustre nodes simply kept saying HELLO to each other. Anyways as of right now this is what I see in dmesg: nfsd: non-standard errno: -108 LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114 previous similar messages LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure -108 inode 561619132 LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped 777 previous similar messages LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode 18382976 mdc close failed: rc = -108 nfsd: non-standard errno: -108 LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped 17238 previous similar messages nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 nfsd: non-standard errno: -108 LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff81032da81800 x1360479978792199/t0 o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 19011 previous similar messages nfsd: non-standard errno: -108 LustreError: 11-0: an error occurred while communicating with 192.168.5.104 at tcp. The mds_close operation failed with -116 LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode 12094041 mdc close failed: rc = -116 LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre()) failure -2 inode 560111661 Any ideas? -- Personally, I liked the university. They gave us money and facilities, we didn''t have to produce anything! You''ve never been out of college! You don''t know what it''s like out there! I''ve worked in the private sector. They expect results. -Ray Ghostbusters
David, What kernel are you running on the file server? I''ve heard on the list that the stock RedHat kernels are compiled with too small of a stack size option and that running NFS and lustre on the same node will not behave well together. A minimum of a 8k stack size is needed for this configuration. -mb On Mar 11, 2011, at 12:37 PM, David Noriega wrote:> We''ve been running Lustre happily for a few months now, but we have > one client that can be troublesome at times and it happens to be the > most important client. Its our "file server" client as it runs NFS and > Samba. I''m not sure where to start. I''ve seen this client disconnect > from lustre nodes, but then recover and reconnect. There are hundreds > of messages in dmesg about a few inodes. The big problem happened a > few weeks ago when this client was booted and never could reconnect. > The client and the lustre nodes simply kept saying HELLO to each > other. > > Anyways as of right now this is what I see in dmesg: > > nfsd: non-standard errno: -108 > LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 > LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114 > previous similar messages > LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure > -108 inode 561619132 > LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped > 777 previous similar messages > LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode > 18382976 mdc close failed: rc = -108 > nfsd: non-standard errno: -108 > LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped > 17238 previous similar messages > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -108 > LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@ > IMP_INVALID req at ffff81032da81800 x1360479978792199/t0 > o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to > 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 > LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped > 19011 previous similar messages > nfsd: non-standard errno: -108 > > LustreError: 11-0: an error occurred while communicating with > 192.168.5.104 at tcp. The mds_close operation failed with -116 > LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode > 12094041 mdc close failed: rc = -116 > LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre()) > failure -2 inode 560111661 > > > Any ideas? > > -- > Personally, I liked the university. They gave us money and facilities, > we didn''t have to produce anything! You''ve never been out of college! > You don''t know what it''s like out there! I''ve worked in the private > sector. They expect results. -Ray Ghostbusters > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- +----------------------------------------------- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +-----------------------------------------------
kernel ver 2.6.18-194.3.1.el5_lustre.1.8.4, downloaded from lustre recompiled. How can I check the stack size and how would I increase it? On Fri, Mar 11, 2011 at 1:17 PM, Michael Barnes <Michael.Barnes at jlab.org> wrote:> David, > > What kernel are you running on the file server? ?I''ve heard on the list > that the stock RedHat kernels are compiled with too small of a stack > size option and that running NFS and lustre on the same node will not > behave well together. ?A minimum of a 8k stack size is needed for this > configuration. > > -mb > > On Mar 11, 2011, at 12:37 PM, David Noriega wrote: > >> We''ve been running Lustre happily for a few months now, but we have >> one client that can be troublesome at times and it happens to be the >> most important client. Its our "file server" client as it runs NFS and >> Samba. I''m not sure where to start. I''ve seen this client disconnect >> from lustre nodes, but then recover and reconnect. There are hundreds >> of messages in dmesg about a few inodes. The big problem happened a >> few weeks ago when this client was booted and never could reconnect. >> The client and the lustre nodes simply kept saying HELLO to each >> other. >> >> Anyways as of right now this is what I see in dmesg: >> >> nfsd: non-standard errno: -108 >> LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 >> LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114 >> previous similar messages >> LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure >> -108 inode 561619132 >> LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped >> 777 previous similar messages >> LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode >> 18382976 mdc close failed: rc = -108 >> nfsd: non-standard errno: -108 >> LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped >> 17238 previous similar messages >> nfsd: non-standard errno: -108 >> nfsd: non-standard errno: -108 >> nfsd: non-standard errno: -108 >> nfsd: non-standard errno: -108 >> nfsd: non-standard errno: -108 >> LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@ >> IMP_INVALID ?req at ffff81032da81800 x1360479978792199/t0 >> o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to >> 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >> LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped >> 19011 previous similar messages >> nfsd: non-standard errno: -108 >> >> LustreError: 11-0: an error occurred while communicating with >> 192.168.5.104 at tcp. The mds_close operation failed with -116 >> LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode >> 12094041 mdc close failed: rc = -116 >> LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre()) >> failure -2 inode 560111661 >> >> >> Any ideas? >> >> -- >> Personally, I liked the university. They gave us money and facilities, >> we didn''t have to produce anything! You''ve never been out of college! >> You don''t know what it''s like out there! I''ve worked in the private >> sector. They expect results. -Ray Ghostbusters >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > -- > +----------------------------------------------- > | Michael Barnes > | > | Thomas Jefferson National Accelerator Facility > | Scientific Computing Group > | 12000 Jefferson Ave. > | Newport News, VA 23606 > | (757) 269-7634 > +----------------------------------------------- > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Personally, I liked the university. They gave us money and facilities, we didn''t have to produce anything! You''ve never been out of college! You don''t know what it''s like out there! I''ve worked in the private sector. They expect results. -Ray Ghostbusters
If you used the kernel .config file from the Lustre kernel you should be fine. Red Hat has the option # CONFIG_4KSTACKS Which is unset in all Lustre kernel configs. cliffw On Mar 11, 2011, at 2:10 PM, David Noriega wrote:> kernel ver 2.6.18-194.3.1.el5_lustre.1.8.4, downloaded from lustre > recompiled. How can I check the stack size and how would I increase > it? > > On Fri, Mar 11, 2011 at 1:17 PM, Michael Barnes <Michael.Barnes at jlab.org> wrote: >> David, >> >> What kernel are you running on the file server? I''ve heard on the list >> that the stock RedHat kernels are compiled with too small of a stack >> size option and that running NFS and lustre on the same node will not >> behave well together. A minimum of a 8k stack size is needed for this >> configuration. >> >> -mb >> >> On Mar 11, 2011, at 12:37 PM, David Noriega wrote: >> >>> We''ve been running Lustre happily for a few months now, but we have >>> one client that can be troublesome at times and it happens to be the >>> most important client. Its our "file server" client as it runs NFS and >>> Samba. I''m not sure where to start. I''ve seen this client disconnect >>> from lustre nodes, but then recover and reconnect. There are hundreds >>> of messages in dmesg about a few inodes. The big problem happened a >>> few weeks ago when this client was booted and never could reconnect. >>> The client and the lustre nodes simply kept saying HELLO to each >>> other. >>> >>> Anyways as of right now this is what I see in dmesg: >>> >>> nfsd: non-standard errno: -108 >>> LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 >>> LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114 >>> previous similar messages >>> LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure >>> -108 inode 561619132 >>> LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped >>> 777 previous similar messages >>> LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode >>> 18382976 mdc close failed: rc = -108 >>> nfsd: non-standard errno: -108 >>> LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped >>> 17238 previous similar messages >>> nfsd: non-standard errno: -108 >>> nfsd: non-standard errno: -108 >>> nfsd: non-standard errno: -108 >>> nfsd: non-standard errno: -108 >>> nfsd: non-standard errno: -108 >>> LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@ >>> IMP_INVALID req at ffff81032da81800 x1360479978792199/t0 >>> o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to >>> 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >>> LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped >>> 19011 previous similar messages >>> nfsd: non-standard errno: -108 >>> >>> LustreError: 11-0: an error occurred while communicating with >>> 192.168.5.104 at tcp. The mds_close operation failed with -116 >>> LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode >>> 12094041 mdc close failed: rc = -116 >>> LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre()) >>> failure -2 inode 560111661 >>> >>> >>> Any ideas? >>> >>> -- >>> Personally, I liked the university. They gave us money and facilities, >>> we didn''t have to produce anything! You''ve never been out of college! >>> You don''t know what it''s like out there! I''ve worked in the private >>> sector. They expect results. -Ray Ghostbusters >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> -- >> +----------------------------------------------- >> | Michael Barnes >> | >> | Thomas Jefferson National Accelerator Facility >> | Scientific Computing Group >> | 12000 Jefferson Ave. >> | Newport News, VA 23606 >> | (757) 269-7634 >> +----------------------------------------------- >> >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > > -- > Personally, I liked the university. They gave us money and facilities, > we didn''t have to produce anything! You''ve never been out of college! > You don''t know what it''s like out there! I''ve worked in the private > sector. They expect results. -Ray Ghostbusters > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss