Hi All, I am using the following guide to understand how liblustre works. http://wiki.lustre.org/index.php/LibLustre_How-To_Guide But, I am not able to run the "sanity" test. The reason may be that I am not passing the correct "profile_name" file while running the test. I have created two directories under my client /mnt/lustre /mnt/liblustre_client My MDS ip address is 10.193.123.1, and I am using following command from my client: sanity --target 10.193.123.1:/mnt/liblustre_client Is this correct? If not, what should be the correct command, please help. J
On 02/05/2012 10:37 PM, Jack David wrote:> Hi All, > > I am using the following guide to understand how liblustre works. > > http://wiki.lustre.org/index.php/LibLustre_How-To_Guide > > But, I am not able to run the "sanity" test. The reason may be that I > am not passing the correct "profile_name" file while running the test. > > I have created two directories under my client > /mnt/lustre > /mnt/liblustre_client > > My MDS ip address is 10.193.123.1, and I am using following command > from my client: > > sanity --target 10.193.123.1:/mnt/liblustre_clientsanity --target mgsnid:/your_fsname Thanks WangDi> Is this correct? If not, what should be the correct command, please help. > > J > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 02/06/2012 03:50 AM, Jack David wrote:> On Mon, Feb 6, 2012 at 1:20 PM, wangdi<di.wang at whamcloud.com> wrote: >> On 02/05/2012 11:26 PM, Jack David wrote: >>> On Mon, Feb 6, 2012 at 12:42 PM, wangdi<di.wang at whamcloud.com> wrote: >>>> On 02/05/2012 10:37 PM, Jack David wrote: >>>>> Hi All, >>>>> >>>>> I am using the following guide to understand how liblustre works. >>>>> >>>>> http://wiki.lustre.org/index.php/LibLustre_How-To_Guide >>>>> >>>>> But, I am not able to run the "sanity" test. The reason may be that I >>>>> am not passing the correct "profile_name" file while running the test. >>>>> >>>>> I have created two directories under my client >>>>> /mnt/lustre >>>>> /mnt/liblustre_client >>>>> >>>>> My MDS ip address is 10.193.123.1, and I am using following command >>>>> from my client: >>>>> >>>>> sanity --target 10.193.123.1:/mnt/liblustre_client >>>> sanity --target mgsnid:/your_fsname >>>> >>> Thanks, I used the "fsname" in the sanity command, but I am now >>> getting following error. (which says The mds_connect operation failed >>> with -16) >>> ====================================>>> <root at niteshs /usr/src/lustre-release>$ lustre/liblustre/tests/sanity >>> --target nanogon:/temp | head >>> >>> 1328512853.118823:23449:niteshs:(class_obd.c:492:init_obdclass()): >>> Lustre: 23449-niteshs:(class_obd.c:492:init_obdclass()): Lustre: Build >>> Version: 2.1.52-g48452fb-CHANGED-2.6.32-lustre-patched >>> 1328512853.151641:23449:niteshs:(lov_obd.c:2892:lov_init()): Lustre: >>> 23449-niteshs:(lov_obd.c:2892:lov_init()): Lustre LOV module >>> (0x85d180). >>> 1328512853.151687:23449:niteshs:(osc_request.c:4636:osc_init()): >>> Lustre: 23449-niteshs:(osc_request.c:4636:osc_init()): Lustre OSC >>> module (0x85da40). >>> 1328512853.158760:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>> mgc_dev->10.193.186.112 at tcp netid 20000: select flavor null >>> 1328512853.175099:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>> temp-OST0000-osc-0x22c0670->10.193.184.135 at tcp netid 20000: select >>> flavor null >>> 1328512853.175159:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>> temp-MDT0000-mdc-0x22c0670->10.193.186.112 at tcp netid 20000: select >>> flavor null >>> 1328512853.179231:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>> The mds_connect operation failed with -16 >>> 1328512853.179763:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>> The mds_connect operation failed with -16 >>> 1328512853.180146:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>> The mds_connect operation failed with -16 >>> ====================================>> It seems mds is somehow stuck in a long recovery, so it can not access the >> new connection. You might wait a bit. or just umount mds and remount it with >> -o abort_recov. >> > I tried after some time, and the mds_connect failure error disappeared > (as mentioned in my earlier email). But now I am stuck with the new > problem, i.e. sanity test does not work. I ran it with "gdb" and found > that it failed in the first test itself (test t1, which does > touch+unlink). The failure is in "open" call. I am not sure why it > fails. Does the "sanity" test have any prerequisite, like lustre > should be mounted on a specific path? Because I could see that test_t1 > file was created when I mounted the filesystem using "mount" command. > > Following screen-log says that now it failed in "unlink" command.Hmm, you can set environment variable LIBLUSTRE_MOUNT_POINT will indicate where the lustre mounted. What is your lustre version? Btw: why do not you try lustre/tests/liblustre.sh, which might make things easier for you. Thanks WangDi> ===== START t1: touch+unlink 1328528905=======================================> 1328528905.411406:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): > LustreError: 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): > obd_intent_lock: NULL export > 1328528905.411425:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): > LustreError: 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): > obd_intent_lock: NULL export > unlink(/mnt/lustre/test_t1) error: No such device > > What am I missing?
Hi. http://wiki.lustre.org/index.php/LibLustre_How-To_Guide This document can help to setup liblustre test. lustre/liblustre/tests/sanity is more appropriate test for liblustre. Shortly. On server. LOAD=YES ./lustre/tests/llmount.sh On client. ./lustre/liblustre/tests/sanity --target=SERVER_IP:/lustre Best regards, Artem Blagodarenko. On 06.02.2012, at 23:00, lustre-discuss-request at lists.lustre.org wrote:> But, I am not able to run the "sanity" test. The reason may be that I > am not passing the correct "profile_name" file while running the test. > > I have created two directories under my client > /mnt/lustre > /mnt/liblustre_client > > My MDS ip address is 10.193.123.1, and I am using following command > from my client: > > sanity --target 10.193.123.1:/mnt/liblustre_client > > Is this correct? If not, what should be the correct command, please help.-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120207/c0d8b7f1/attachment.html
On Mon, Feb 6, 2012 at 11:52 PM, wangdi <di.wang at whamcloud.com> wrote:> On 02/06/2012 03:50 AM, Jack David wrote: >> >> On Mon, Feb 6, 2012 at 1:20 PM, wangdi<di.wang at whamcloud.com> ?wrote: >>> >>> On 02/05/2012 11:26 PM, Jack David wrote: >>>> >>>> On Mon, Feb 6, 2012 at 12:42 PM, wangdi<di.wang at whamcloud.com> ? ?wrote: >>>>> >>>>> On 02/05/2012 10:37 PM, Jack David wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I am using the following guide to understand how liblustre works. >>>>>> >>>>>> http://wiki.lustre.org/index.php/LibLustre_How-To_Guide >>>>>> >>>>>> But, I am not able to run the "sanity" test. The reason may be that I >>>>>> am not passing the correct "profile_name" file while running the test. >>>>>> >>>>>> I have created two directories under my client >>>>>> /mnt/lustre >>>>>> /mnt/liblustre_client >>>>>> >>>>>> My MDS ip address is 10.193.123.1, and I am using following command >>>>>> from my client: >>>>>> >>>>>> sanity --target 10.193.123.1:/mnt/liblustre_client >>>>> >>>>> sanity --target mgsnid:/your_fsname >>>>> >>>> Thanks, I used the "fsname" in the sanity command, but I am now >>>> getting following error. (which says The mds_connect operation failed >>>> with -16) >>>> ====================================>>>> <root at niteshs /usr/src/lustre-release>$ lustre/liblustre/tests/sanity >>>> --target nanogon:/temp | head >>>> >>>> 1328512853.118823:23449:niteshs:(class_obd.c:492:init_obdclass()): >>>> Lustre: 23449-niteshs:(class_obd.c:492:init_obdclass()): Lustre: Build >>>> Version: 2.1.52-g48452fb-CHANGED-2.6.32-lustre-patched >>>> 1328512853.151641:23449:niteshs:(lov_obd.c:2892:lov_init()): Lustre: >>>> 23449-niteshs:(lov_obd.c:2892:lov_init()): Lustre LOV module >>>> (0x85d180). >>>> 1328512853.151687:23449:niteshs:(osc_request.c:4636:osc_init()): >>>> Lustre: 23449-niteshs:(osc_request.c:4636:osc_init()): Lustre OSC >>>> module (0x85da40). >>>> 1328512853.158760:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>>> mgc_dev->10.193.186.112 at tcp netid 20000: select flavor null >>>> 1328512853.175099:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>>> temp-OST0000-osc-0x22c0670->10.193.184.135 at tcp netid 20000: select >>>> flavor null >>>> 1328512853.175159:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>>> temp-MDT0000-mdc-0x22c0670->10.193.186.112 at tcp netid 20000: select >>>> flavor null >>>> 1328512853.179231:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>>> The mds_connect operation failed with -16 >>>> 1328512853.179763:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>>> The mds_connect operation failed with -16 >>>> 1328512853.180146:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>>> The mds_connect operation failed with -16 >>>> ====================================>>> >>> It seems mds is somehow stuck in a long recovery, so it can not access >>> the >>> new connection. You might wait a bit. or just umount mds and remount it >>> with >>> -o abort_recov. >>> >> I tried after some time, and the mds_connect failure error disappeared >> (as mentioned in my earlier email). But now I am stuck with the new >> problem, i.e. sanity test does not work. I ran it with "gdb" and found >> that it failed in the first test itself (test t1, which does >> touch+unlink). The failure is in "open" call. I am not sure why it >> fails. Does the "sanity" test have any prerequisite, like lustre >> should be mounted on a specific path? Because I could see that test_t1 >> file was created when I mounted the filesystem using "mount" command. >> >> Following screen-log says that now it failed in "unlink" command. > > Hmm, you can set environment variable LIBLUSTRE_MOUNT_POINT will indicate > where the lustre mounted. > > What is your lustre version? > > Btw: why do not you try lustre/tests/liblustre.sh, which might make things > easier for you. >I set the LIBLUSTRE_MOUNT_POINT as well, but it did not help. My FSNAME is "temp", so I set it to "/mnt/temp" but it didn''t work. In the /usr/sbin/lrun file, the default path is /mnt/lustre, and I gave it a shot, but no luck either. I cloned the git (from whamcloud) a couple of weeks back, so not sure about the version. I do see lustre-2.1.52.tar.gz file so assuming that to be the version. I tried running the lustre/tests/liblustre.sh (after modifying the FSNAME, MOUNT2, mds_HOST and ost_HOST variables in local.sh of client) but the same error. Something is wrong.> Thanks > WangDi > >> ===== START t1: touch+unlink >> 1328528905=======================================>> >> 1328528905.411406:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >> LustreError: >> 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >> obd_intent_lock: NULL export >> >> 1328528905.411425:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >> LustreError: >> 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >> obd_intent_lock: NULL export >> unlink(/mnt/lustre/test_t1) error: No such device >> >> What am I missing? > >-- J
On 02/06/2012 10:08 PM, Jack David wrote:> On Mon, Feb 6, 2012 at 11:52 PM, wangdi<di.wang at whamcloud.com> wrote: >> On 02/06/2012 03:50 AM, Jack David wrote: >>> On Mon, Feb 6, 2012 at 1:20 PM, wangdi<di.wang at whamcloud.com> wrote: >>>> On 02/05/2012 11:26 PM, Jack David wrote: >>>>> On Mon, Feb 6, 2012 at 12:42 PM, wangdi<di.wang at whamcloud.com> wrote: >>>>>> On 02/05/2012 10:37 PM, Jack David wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> I am using the following guide to understand how liblustre works. >>>>>>> >>>>>>> http://wiki.lustre.org/index.php/LibLustre_How-To_Guide >>>>>>> >>>>>>> But, I am not able to run the "sanity" test. The reason may be that I >>>>>>> am not passing the correct "profile_name" file while running the test. >>>>>>> >>>>>>> I have created two directories under my client >>>>>>> /mnt/lustre >>>>>>> /mnt/liblustre_client >>>>>>> >>>>>>> My MDS ip address is 10.193.123.1, and I am using following command >>>>>>> from my client: >>>>>>> >>>>>>> sanity --target 10.193.123.1:/mnt/liblustre_client >>>>>> sanity --target mgsnid:/your_fsname >>>>>> >>>>> Thanks, I used the "fsname" in the sanity command, but I am now >>>>> getting following error. (which says The mds_connect operation failed >>>>> with -16) >>>>> ====================================>>>>> <root at niteshs /usr/src/lustre-release>$ lustre/liblustre/tests/sanity >>>>> --target nanogon:/temp | head >>>>> >>>>> 1328512853.118823:23449:niteshs:(class_obd.c:492:init_obdclass()): >>>>> Lustre: 23449-niteshs:(class_obd.c:492:init_obdclass()): Lustre: Build >>>>> Version: 2.1.52-g48452fb-CHANGED-2.6.32-lustre-patched >>>>> 1328512853.151641:23449:niteshs:(lov_obd.c:2892:lov_init()): Lustre: >>>>> 23449-niteshs:(lov_obd.c:2892:lov_init()): Lustre LOV module >>>>> (0x85d180). >>>>> 1328512853.151687:23449:niteshs:(osc_request.c:4636:osc_init()): >>>>> Lustre: 23449-niteshs:(osc_request.c:4636:osc_init()): Lustre OSC >>>>> module (0x85da40). >>>>> 1328512853.158760:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>>>> mgc_dev->10.193.186.112 at tcp netid 20000: select flavor null >>>>> 1328512853.175099:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>>>> temp-OST0000-osc-0x22c0670->10.193.184.135 at tcp netid 20000: select >>>>> flavor null >>>>> 1328512853.175159:23449:niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): >>>>> Lustre: 23449-niteshs:(sec.c:1475:sptlrpc_import_sec_adapt()): import >>>>> temp-MDT0000-mdc-0x22c0670->10.193.186.112 at tcp netid 20000: select >>>>> flavor null >>>>> 1328512853.179231:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>>>> The mds_connect operation failed with -16 >>>>> 1328512853.179763:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>>>> The mds_connect operation failed with -16 >>>>> 1328512853.180146:23449:niteshs:(client.c:1141:ptlrpc_check_status()): >>>>> LustreError: 23449-niteshs:(client.c:1141:ptlrpc_check_status()): >>>>> 11-0: an error occurred while communicating with 10.193.186.112 at tcp. >>>>> The mds_connect operation failed with -16 >>>>> ====================================>>>> It seems mds is somehow stuck in a long recovery, so it can not access >>>> the >>>> new connection. You might wait a bit. or just umount mds and remount it >>>> with >>>> -o abort_recov. >>>> >>> I tried after some time, and the mds_connect failure error disappeared >>> (as mentioned in my earlier email). But now I am stuck with the new >>> problem, i.e. sanity test does not work. I ran it with "gdb" and found >>> that it failed in the first test itself (test t1, which does >>> touch+unlink). The failure is in "open" call. I am not sure why it >>> fails. Does the "sanity" test have any prerequisite, like lustre >>> should be mounted on a specific path? Because I could see that test_t1 >>> file was created when I mounted the filesystem using "mount" command. >>> >>> Following screen-log says that now it failed in "unlink" command. >> Hmm, you can set environment variable LIBLUSTRE_MOUNT_POINT will indicate >> where the lustre mounted. >> >> What is your lustre version? >> >> Btw: why do not you try lustre/tests/liblustre.sh, which might make things >> easier for you. >> > I set the LIBLUSTRE_MOUNT_POINT as well, but it did not help. > > My FSNAME is "temp", so I set it to "/mnt/temp" but it didn''t work. In > the /usr/sbin/lrun file, the default path is /mnt/lustre, and I gave > it a shot, but no luck either. > > I cloned the git (from whamcloud) a couple of weeks back, so not sure > about the version. I do see lustre-2.1.52.tar.gz file so assuming that > to be the version. > > I tried running the lustre/tests/liblustre.sh (after modifying the > FSNAME, MOUNT2, mds_HOST and ost_HOST variables in local.sh of client) > but the same error. > > Something is wrong.Hmm, there is a liblustre bug (lu-703) recently, you might want to retry this after that is fixed. Thanks WangDi>> Thanks >> WangDi >> >>> ===== START t1: touch+unlink >>> 1328528905=======================================>>> >>> 1328528905.411406:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >>> LustreError: >>> 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >>> obd_intent_lock: NULL export >>> >>> 1328528905.411425:28664:niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >>> LustreError: >>> 28664-niteshs:(/usr/src/lustre-release/lustre/include/obd_class.h:1980:md_intent_lock()): >>> obd_intent_lock: NULL export >>> unlink(/mnt/lustre/test_t1) error: No such device >>> >>> What am I missing? >> > >