Hello There, I am trying to wet my feet by diving into the waters of Lustre File System and I am having some problems that I need help with. I have 3 physical servers and this is what I have installed on all 3 servers. Let''s name them S1, S2, S3 for now. S1 I have dual port IB card and here is network config for each port ib0 - 192.168.100.100 ib1 - 172.16.100.100 Kernel-2.6.18-194.3.1.el5_lustre Lustre-modules Lustre-ldiskfs Lustre-1.8.4- E2fsprogrs Here is the /etc/modprobe.conf file options lnet forwarding="enabled" options lnet accept=all options lnet networks="o2ib0(ib0),o2ib1(ib1)" I have partitioned /dev/sda3 and /dev/sda4 on this server as mgs/mdt and ost filesystem respectively. S2 I have one port IB card and here is the network config for that port. I have connected this port directly to ib0 of S1 server. ib0 - 192.168.100.101 Here is the /etc/modprobe.conf file options lnet networks="o2ib0(ib0)" options lnet routes="o2ib1 192.168.100.100 at o2ib0<mailto:192.168.100.100 at o2ib0>" When I run cat /proc/sys/lnet/routers I get following output ref rtr_ref alive_cnt state last_ping ping_sent deadline down_ni router 3 1 4 up 4303108 1 NA -2 192.168.100.100 at o2ib<mailto:192.168.100.100 at o2ib> When I run lctl ping 192.168.100.100 at o2ib0<mailto:192.168.100.100 at o2ib0> , I get following output 12345-0 at lo 12345-192.168.100.100 at o2ib<mailto:12345-192.168.100.100 at o2ib> 12345-172.16.100.100 at o2ib1<mailto:12345-172.16.100.100 at o2ib1> S3 I have one port IB card and here is the network config for that port. I have connected this port directly to ib1 of S1 server ib0 - 172.16.100.101 Here is the /etc/modprobe.conf file options lnet networks="o2ib1(ib0)" options lnet routes="o2ib0 172.16.100.100 at o2ib1" When I run cat /proc/sys/lnet/routers I get following output ref rtr_ref alive_cnt state last_ping ping_sent deadline down_ni router 3 1 2 up 4297593 1 NA -2 172.16.100.100 at o2ib<mailto:192.168.100.100 at o2ib>1 When I run lctl ping 172.16.100.100 at o2ib1 , I get following output 12345-0 at lo 12345-192.168.100.100 at o2ib<mailto:12345-192.168.100.100 at o2ib> 12345-172.16.100.100 at o2ib1<mailto:12345-172.16.100.100 at o2ib1> Now my problem is to run some network tests from S2 --> S3 and S3 --> S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong? Thanks Nihir -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101116/331f0527/attachment-0001.html
Hi, ? 2010-11-17???9:17? Nihir Parikh ???> > Now my problem is to run some network tests from S2 ? S3 and S3 ? S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong?Your configuration seems OK to me. Can S2 and S3 ping each other using ''lctl ping''? What kind of network test did you do? Note that only lustre LNET can do the routing. There''s a script in lustre testsuite that''s specifically for testing the network connectivity - lnet-selftest.sh.> > Thanks > Nihir > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101118/aad487ce/attachment.html
Hello Wang Yibin, Thanks for getting back to me. Yes, S2 and S3 can ping each other using lctl ping. I was using nuttcp test and I also tried ib tests that comes with the IB utilities. I will lnet-selftest. My goal was to measure the bandwidth when it has to reach across different network. Are there any such tests specific to lustre? Thanks Nihir ________________________________ From: Wang Yibin [mailto:wang.yibin at oracle.com] Sent: Thursday, November 18, 2010 6:32 AM To: Nihir Parikh Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Help Hi, ? 2010-11-17???9:17? Nihir Parikh ??? Now my problem is to run some network tests from S2 --> S3 and S3 --> S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong? Your configuration seems OK to me. Can S2 and S3 ping each other using ''lctl ping''? What kind of network test did you do? Note that only lustre LNET can do the routing. There''s a script in lustre testsuite that''s specifically for testing the network connectivity - lnet-selftest.sh. Thanks Nihir _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101119/cc0db850/attachment.html
? 2010-11-20???8:39? Nihir Parikh ???> Hello Wang Yibin, > Thanks for getting back to me. Yes, S2 and S3 can ping each other using lctl ping.This indicates that your routing is work as expected.> I was using nuttcp test and I also tried ib tests that comes with the IB utilities. I will lnet-selftest.These utilities do not understand lnet protocol so they won''t work.> > My goal was to measure the bandwidth when it has to reach across different network. Are there any such tests specific to lustre?Lnet has its own testsuite which is called lnet self-test. To measure the bandwidth, you can load lnet_selftest module on your nodes and execute lst in brw mode.> > Thanks > Nihir > > From: Wang Yibin [mailto:wang.yibin at oracle.com] > Sent: Thursday, November 18, 2010 6:32 AM > To: Nihir Parikh > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Help > > Hi, > > ? 2010-11-17???9:17? Nihir Parikh ??? > > > > Now my problem is to run some network tests from S2 ? S3 and S3 ? S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong? > > Your configuration seems OK to me. Can S2 and S3 ping each other using ''lctl ping''? > What kind of network test did you do? Note that only lustre LNET can do the routing. > There''s a script in lustre testsuite that''s specifically for testing the network connectivity - lnet-selftest.sh. > > > > Thanks > Nihir > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101122/77285f5c/attachment-0001.html
Hi, we have lustre client intalled is 1.6.6 on node. It got crashed generating vmcore file.OS is RHEL 5.2. Some part of vmcore logs. Error is : ib_cm: req timeout_ms 34816 > 32768, decreasing LustreError: 13790:0:(ldlm_request.c:996:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 13790:0:(ldlm_request.c:996:ldlm_cli_cancel_req()) Skipped 18 previous similar messages LustreError: 13790:0:(ldlm_request.c:1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 LustreError: 13790:0:(ldlm_request.c:1605:ldlm_cli_cancel_list()) Skipped 18 previous similar messages LustreError: 13790:0:(ldlm_request.c:996:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 13790:0:(ldlm_request.c:1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: client ffff81200f886400 umount complete Lustre: Request x21503806 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). LustreError: 166-1: MGC172.31.65.49 at o2ib: Connection to service MGS via nid 172.31.65.49 at o2ib was lost; in progress operations using this service will fail. Lustre: Request x21504004 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). Lustre: Request x21504206 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). Lustre: Request x21504405 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). Lustre: Request x21504583 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). Lustre: Request x21504796 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). Lustre: Request x21505000 sent from MGC172.31.65.49 at o2ib to NID 172.31.65.49 at o2ib 100s ago has timed out (limit 100s). Lustre: 19021:0:(import.c:736:ptlrpc_connect_interpret()) MGS at MGC172.31.65.49@o2ib_0 changed server handle from 0x818c15f164eefdf6 to 0x818c15f1f24cadd8 but is still in recovery Lustre: MGC172.31.65.49 at o2ib: Reactivating import Lustre: MGC172.31.65.49 at o2ib: Connection restored to service MGS using nid 172.31.65.49 at o2ib. general protection fault: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:01.0/0000:03:00.0/0000:04:01.0/0000:07:00.0/0000:08:00.0/irq CPU 3 Modules linked in: iptable_raw iptable_filter iptable_mangle iptable_nat ip_nat ip_conntrack nfnetlink ip_tables nfsd exportfs auth_rpcgss xt_tcpudp nfs lock d fscache nfs_acl x_tables ipmi_devintf ipmi_si ipmi_msghandler mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvf s(U) libcfs(U) hpilo(U) sunrpc rdma_ucm(U) rds(U) ib_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mthca(U) ib_mad(U) ib_core(U) dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev mlx4_core(U) bnx2(U) ide_cd shpchp e1000e serio_raw cdrom pcspkr ata_piix libata cciss(U) sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 14147, comm: ldlm_cb_02 Tainted: G 2.6.18-92.el5 #1 RIP: 0010:[<ffffffff88625121>] [<ffffffff88625121>] :ptlrpc:lock_res_and_lock+0x41/0xe0 RSP: 0018:ffff811ff7593ce0 EFLAGS: 00010206 RAX: ffff811d4538f800 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000001 RDX: ffffc2001053f790 RSI: 0000000000000000 RDI: ffff811d4538f800 RBP: ffff811db60e80c0 R08: 0000000000000000 R09: ffff812003815400 R10: 0000000000000000 R11: 0000000000000001 R12: ffff811d4538f800 R13: ffff810170bd3a00 R14: 000000004ce47162 R15: ffff811ff8a64c50 FS: 00002b2883022220(0000) GS:ffff81202ff1c640(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000c655f28 CR3: 000000201a1fb000 CR4: 00000000000006e0 Process ldlm_cb_02 (pid: 14147, threadinfo ffff811ff7592000, task ffff81202fa85100) Stack: ffff811d4538f800 ffff811ff8a64c50 ffff811ff84ecc50 ffffffff88646ba8 ffff810009040a80 ffff811ff8ea8820 ffff811ff7593d30 ffff811ff84ecbb8 0000000000000010 ffffffff8866e0da ffff811ff7e04dc0 ffffffff8866530e Call Trace: [<ffffffff88646ba8>] :ptlrpc:ldlm_callback_handler+0x10a8/0x1ae0 [<ffffffff8866e0da>] :ptlrpc:ptlrpc_check_req+0x1a/0x110 [<ffffffff8866530e>] :ptlrpc:lustre_msg_get_handle+0x2e/0xe0 [<ffffffff886702c2>] :ptlrpc:ptlrpc_server_handle_request+0x992/0x1040 [<ffffffff80062efb>] thread_return+0x0/0xdf [<ffffffff8006d7bf>] do_gettimeofday+0x50/0x92 [<ffffffff88520466>] :libcfs:lcw_update_time+0x16/0x100 [<ffffffff80089241>] __wake_up_common+0x3e/0x68 [<ffffffff886732dc>] :ptlrpc:ptlrpc_main+0xe0c/0xf90 [<ffffffff8008ac03>] default_wake_function+0x0/0xe [<ffffffff800b4326>] audit_syscall_exit+0x31b/0x336 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff886724d0>] :ptlrpc:ptlrpc_main+0x0/0xf90 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: f7 43 08 fc ff ff ff 74 26 b9 7f 01 00 00 48 c7 c2 40 ad 68 RIP [<ffffffff88625121>] :ptlrpc:lock_res_and_lock+0x41/0xe0 RSP <ffff811ff7593ce0> crash> sys KERNEL: ../vmlinux DUMPFILE: ./vmcore CPUS: 16 DATE: Thu Nov 18 05:50:50 2010 UPTIME: 21 days, 12:07:43 LOAD AVERAGE: 0.04, 0.03, 0.00 TASKS: 379 Warm Regards, Pankaj -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On 2010-11-29, at 13:10, Pankaj Dorlikar wrote:> we have lustre client intalled is 1.6.6 on node. It got crashed generating vmcore file. OS is RHEL 5.2. Some part of vmcore logs.There were a couple of bugs fixed in the client related to core dumps. However, they are so long ago I don''t have any idea what the bug numbers were. You could look in the lustre/ChangeLog file and/or search this up in bugzilla to find which Lustre version they were fixed in, but your best bet is simply to upgrade to a newer version of Lustre. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Hello Wang Yibin, Can I please get step by step information on how to run lnet_selftest? Thanks Nihir ________________________________ From: Wang Yibin [mailto:wang.yibin at oracle.com] Sent: Sunday, November 21, 2010 6:12 PM To: Nihir Parikh Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Help ? 2010-11-20???8:39? Nihir Parikh ??? Hello Wang Yibin, Thanks for getting back to me. Yes, S2 and S3 can ping each other using lctl ping. This indicates that your routing is work as expected. I was using nuttcp test and I also tried ib tests that comes with the IB utilities. I will lnet-selftest. These utilities do not understand lnet protocol so they won''t work. My goal was to measure the bandwidth when it has to reach across different network. Are there any such tests specific to lustre? Lnet has its own testsuite which is called lnet self-test. To measure the bandwidth, you can load lnet_selftest module on your nodes and execute lst in brw mode. Thanks Nihir ________________________________ From: Wang Yibin [mailto:wang.yibin at oracle.com] Sent: Thursday, November 18, 2010 6:32 AM To: Nihir Parikh Cc: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org> Subject: Re: [Lustre-discuss] Help Hi, ? 2010-11-17???9:17? Nihir Parikh ??? Now my problem is to run some network tests from S2 --> S3 and S3 --> S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong? Your configuration seems OK to me. Can S2 and S3 ping each other using ''lctl ping''? What kind of network test did you do? Note that only lustre LNET can do the routing. There''s a script in lustre testsuite that''s specifically for testing the network connectivity - lnet-selftest.sh. Thanks Nihir _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101213/f974f7c2/attachment.html
On 2010-12-13, at 18:35, Nihir Parikh wrote:> Hello Wang Yibin, > Can I please get step by step information on how to run lnet_selftest?This is described in the Lustre manual.> From: Wang Yibin [mailto:wang.yibin at oracle.com] > Sent: Sunday, November 21, 2010 6:12 PM > To: Nihir Parikh > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Help > > > ? 2010-11-20???8:39? Nihir Parikh ??? > > > Hello Wang Yibin, > Thanks for getting back to me. Yes, S2 and S3 can ping each other using lctl ping. > > This indicates that your routing is work as expected. > > > I was using nuttcp test and I also tried ib tests that comes with the IB utilities. I will lnet-selftest. > > These utilities do not understand lnet protocol so they won''t work. > > > > My goal was to measure the bandwidth when it has to reach across different network. Are there any such tests specific to lustre? > > Lnet has its own testsuite which is called lnet self-test. > To measure the bandwidth, you can load lnet_selftest module on your nodes and execute lst in brw mode. > > > > Thanks > Nihir > > From: Wang Yibin [mailto:wang.yibin at oracle.com] > Sent: Thursday, November 18, 2010 6:32 AM > To: Nihir Parikh > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Help > > Hi, > > ? 2010-11-17???9:17? Nihir Parikh ??? > > > > > Now my problem is to run some network tests from S2 ? S3 and S3 ? S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong? > > Your configuration seems OK to me. Can S2 and S3 ping each other using ''lctl ping''? > What kind of network test did you do? Note that only lustre LNET can do the routing. > There''s a script in lustre testsuite that''s specifically for testing the network connectivity - lnet-selftest.sh. > > > > > Thanks > Nihir > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
LNET self-test is discussed in the Lustre manual: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreIOKit.html#50651262_pgfId-1290255 On 12/13/2010 6:35 PM, Nihir Parikh wrote:> > Hello Wang Yibin, > > Can I please get step by step information on how to run lnet_selftest? > > Thanks > > Nihir > > ------------------------------------------------------------------------ > > *From:* Wang Yibin [mailto:wang.yibin at oracle.com] > *Sent:* Sunday, November 21, 2010 6:12 PM > *To:* Nihir Parikh > *Cc:* lustre-discuss at lists.lustre.org > *Subject:* Re: [Lustre-discuss] Help > > ? 2010-11-20???8:39? Nihir Parikh ??? > > > > Hello Wang Yibin, > > Thanks for getting back to me. Yes, S2 and S3 can ping each other > using lctl ping. > > This indicates that your routing is work as expected. > > > > I was using nuttcp test and I also tried ib tests that comes with the > IB utilities. I will lnet-selftest. > > These utilities do not understand lnet protocol so they won''t work. > > > > My goal was to measure the bandwidth when it has to reach across > different network. Are there any such tests specific to lustre? > > Lnet has its own testsuite which is called lnet self-test. > > To measure the bandwidth, you can load lnet_selftest module on your > nodes and execute lst in brw mode. > > > > Thanks > > Nihir > > ------------------------------------------------------------------------ > > *From:*Wang Yibin [mailto:wang.yibin at oracle.com] > *Sent:*Thursday, November 18, 2010 6:32 AM > *To:*Nihir Parikh > *Cc:*lustre-discuss at lists.lustre.org > <mailto:lustre-discuss at lists.lustre.org> > *Subject:*Re: [Lustre-discuss] Help > > Hi, > > ?2010-11-17???9:17?Nihir Parikh??? > > > > > Now my problem is to run some network tests from S2a`S3 and S3a`S2 to > measure the bandwidth but somehow both S2 and S3 complain that network > is unreachable. What am I doing wrong? > > Your configuration seems OK to me.Can S2 and S3 ping each other using > ''lctl ping''? > > What kind of network test did you do? Note that only lustre LNET can > do the routing. > > There''s a script in lustre testsuite that''s specifically for testing > the network connectivity - lnet-selftest.sh. > > > > > Thanks > > Nihir > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101214/5e543f5b/attachment-0001.html
Hello Sheila and Andreas Thanks for replying and pointing me to the right direction. I have followed the manual and was able to run read/write test but I am not sure how to analyze the performance based on the numbers. Can someone help me with that? Here is my configuration and I am attaching the read and write logs. 1) I have 3 physical servers out of which 1 is a router (routernode) and 2 are clients (Node 1 and Node 2). The network is IB network and Node 1 is directly connected to Routernode?s 1st IB Port. Node 2 is directly connected to routernode?s 2nd IB Port. 2) I wrote a script as follows. #!/bin/bash export LST_SESSION=$$ lst new_session write -timeout 1000 lst add_group servers 192.168.100.101 at o2ib0 lst add_group writers/readers 172.16.100.101 at o2ib1 lst add_batch bulk_write lst add_test -batch bulk_write --from writers/readers --to servers brw write/read check=full size=1024K lst run bulk_write lst stat servers & sleep 30 lst end_session Thanks a lot for all your help Nihir ________________________________ From: Sheila Barthel [mailto:sheila.barthel at oracle.com] Sent: Tuesday, December 14, 2010 5:13 AM To: Nihir Parikh Cc: Wang Yibin; lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Help LNET self-test is discussed in the Lustre manual: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreIOKit.html#50651262_pgfId-1290255 On 12/13/2010 6:35 PM, Nihir Parikh wrote: Hello Wang Yibin, Can I please get step by step information on how to run lnet_selftest? Thanks Nihir ________________________________ From: Wang Yibin [mailto:wang.yibin at oracle.com] Sent: Sunday, November 21, 2010 6:12 PM To: Nihir Parikh Cc: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org> Subject: Re: [Lustre-discuss] Help ? 2010-11-20???8:39? Nihir Parikh ??? Hello Wang Yibin, Thanks for getting back to me. Yes, S2 and S3 can ping each other using lctl ping. This indicates that your routing is work as expected. I was using nuttcp test and I also tried ib tests that comes with the IB utilities. I will lnet-selftest. These utilities do not understand lnet protocol so they won''t work. My goal was to measure the bandwidth when it has to reach across different network. Are there any such tests specific to lustre? Lnet has its own testsuite which is called lnet self-test. To measure the bandwidth, you can load lnet_selftest module on your nodes and execute lst in brw mode. Thanks Nihir ________________________________ From: Wang Yibin [mailto:wang.yibin at oracle.com] Sent: Thursday, November 18, 2010 6:32 AM To: Nihir Parikh Cc: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org> Subject: Re: [Lustre-discuss] Help Hi, ? 2010-11-17???9:17? Nihir Parikh ? ?? Now my problem is to run some network tests from S2 --> S3 and S3 --> S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong? Your configuration seems OK to me. Can S2 and S3 ping each other using ''lctl ping''? What kind of network test did you do? Note that only lustre LNET can do the routing. There''s a script in lustre testsuite that''s specifically for testing the network connectivity - lnet-selftest.sh. Thanks Nihir _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss ________________________________ _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101214/73f1914d/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: write.log Type: application/octet-stream Size: 1640 bytes Desc: write.log Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101214/73f1914d/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: read.log Type: application/octet-stream Size: 1674 bytes Desc: read.log Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101214/73f1914d/attachment-0003.obj