thr3ads.net - Lustre discuss - Setting up a lustre zfs dual mgs/mdt over tcp

If this information is useful, please help other people find it:
Share via:

Sten Wolf

2013-Dec-17 14:10 UTC

Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Hi all,

    Here is the situation:

    I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
    failover MGS, active/active MDT with zfs.

    I have a jbod shelf with 12 disks, seen by both nodes as das (the
    shelf has 2 sas ports, connected to a sas hba on each node), and I
    am using lustre 2.4 on centos 6.4 x64

    I have created 3 zfs pools:

    1. mgs:

    # zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror
    /dev/disk/by-id/wwn-0x50000c0f012306fc
    /dev/disk/by-id/wwn-0x50000c0f01233aec

    # mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0
    --param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs

       Permanent disk data:

    Target:     MGS

    Index:      unassigned

    Lustre FS:  

    Mount type: zfs

    Flags:      0x1064

                  (MGS first_time update no_primnode )

    Persistent mount opts: 

    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000

    2 mdt0:

    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d07a34
    /dev/disk/by-id/wwn-0x50000c0f01d110c8

    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0
    --servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt0/mdt0

    warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target
    index must be specified with --index

       Permanent disk data:

    Target:     fs0:MDT0000

    Index:      0

    Lustre FS:  fs0

    Mount type: zfs

    Flags:      0x1061

                  (MDT first_time update no_primnode )

    Persistent mount opts: 

    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp

    checking for existing Lustre data: not found

    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0

    Writing lustre-mdt0/mdt0 properties

      lustre:version=1

      lustre:flags=4193

      lustre:index=0

      lustre:fsname=fs0

      lustre:svname=fs0:MDT0000

      lustre:failover.node=10.0.0.22@tcp

      lustre:failover.node=10.0.0.23@tcp

      lustre:sys.timeout=5000

      lustre:mgsnode=10.0.0.22@tcp

      lustre:mgsnode=10.0.0.23@tcp

    3. mdt1:

    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d113e0
    /dev/disk/by-id/wwn-0x50000c0f01d116fc

    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0
    --servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt1/mdt1

       Permanent disk data:

    Target:     fs0:MDT0001

    Index:      1

    Lustre FS:  fs0

    Mount type: zfs

    Flags:      0x1061

                  (MDT first_time update no_primnode )

    Persistent mount opts: 

    Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp

    checking for existing Lustre data: not found

    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1

    Writing lustre-mdt1/mdt1 properties

      lustre:version=1

      lustre:flags=4193

      lustre:index=1

      lustre:fsname=fs0

      lustre:svname=fs0:MDT0001

      lustre:failover.node=10.0.0.23@tcp

      lustre:failover.node=10.0.0.22@tcp

      lustre:sys.timeout=5000

      lustre:mgsnode=10.0.0.22@tcp

      lustre:mgsnode=10.0.0.23@tcp

    a few basic sanity checks:

    # zfs list

    NAME               USED  AVAIL  REFER  MOUNTPOINT

    lustre-mdt0        824K  3.57T   136K  /lustre-mdt0

    lustre-mdt0/mdt0   136K  3.57T   136K  /lustre-mdt0/mdt0

    lustre-mdt1        716K  3.57T   136K  /lustre-mdt1

    lustre-mdt1/mdt1   136K  3.57T   136K  /lustre-mdt1/mdt1

    lustre-mgs        4.78M  3.57T   136K  /lustre-mgs

    lustre-mgs/mgs    4.18M  3.57T  4.18M  /lustre-mgs/mgs

    # zpool list

    NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT

    lustre-mdt0  3.62T  1.00M  3.62T     0%  1.00x  ONLINE  -

    lustre-mdt1  3.62T   800K  3.62T     0%  1.00x  ONLINE  -

    lustre-mgs   3.62T  4.86M  3.62T     0%  1.00x  ONLINE  -

    # zpool status

      pool: lustre-mdt0

     state: ONLINE

      scan: none requested

    config:

        NAME                        STATE     READ WRITE CKSUM

        lustre-mdt0                 ONLINE       0     0     0

          mirror-0                  ONLINE       0     0     0

            wwn-0x50000c0f01d07a34  ONLINE       0     0     0

            wwn-0x50000c0f01d110c8  ONLINE       0     0     0

    errors: No known data errors

      pool: lustre-mdt1

     state: ONLINE

      scan: none requested

    config:

        NAME                        STATE     READ WRITE CKSUM

        lustre-mdt1                 ONLINE       0     0     0

          mirror-0                  ONLINE       0     0     0

            wwn-0x50000c0f01d113e0  ONLINE       0     0     0

            wwn-0x50000c0f01d116fc  ONLINE       0     0     0

    errors: No known data errors

      pool: lustre-mgs

     state: ONLINE

      scan: none requested

    config:

        NAME                        STATE     READ WRITE CKSUM

        lustre-mgs                  ONLINE       0     0     0

          mirror-0                  ONLINE       0     0     0

            wwn-0x50000c0f012306fc  ONLINE       0     0     0

            wwn-0x50000c0f01233aec  ONLINE       0     0     0

    errors: No known data errors

    # zfs get lustre:svname lustre-mgs/mgs

    NAME            PROPERTY       VALUE          SOURCE

    lustre-mgs/mgs  lustre:svname  MGS            local

    # zfs get lustre:svname lustre-mdt0/mdt0

    NAME              PROPERTY       VALUE          SOURCE

    lustre-mdt0/mdt0  lustre:svname  fs0:MDT0000    local

    # zfs get lustre:svname lustre-mdt1/mdt1

    NAME              PROPERTY       VALUE          SOURCE

    lustre-mdt1/mdt1  lustre:svname  fs0:MDT0001    local

    So far, so good.

    My /etc/ldev.conf:

    mds1 mds2 MGS zfs:lustre-mgs/mgs

    mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0

    mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1

    my /etc/modprobe.d/lustre.conf 

    # options lnet networks=tcp0(em1)

    options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;"

-----------------------------------------------------------------------------

    Now, when starting the services, I get strange errors:

    # service lustre start local

    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS

    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000

    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: Input/output error

    Is the MGS running?

    # service lustre status local

    running

    attached lctl-dk.local01

    If I run the same command again, I get a different error:

    # service lustre start local

    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS

    mount.lustre: according to /etc/mtab lustre-mgs/mgs is already
    mounted on /mnt/lustre/local/MGS

    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000

    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: File exists

    attached lctl-dk.local02

    What am I doing wrong?

    I have tested lnet self-test as well, using the following script:

    # cat lnet-selftest.sh

    #!/bin/bash

    export LST_SESSION=$$

    lst new_session read/write

    lst add_group servers 10.0.0.[22,23]@tcp

    lst add_group readers 10.0.0.[22,23]@tcp

    lst add_group writers 10.0.0.[22,23]@tcp

    lst add_batch bulk_rw

    lst add_test --batch bulk_rw --from readers --to servers \

    brw read check=simple size=1M

    lst add_test --batch bulk_rw --from writers --to servers \

    brw write check=full size=4K

    # start running

    lst run bulk_rw

    # display server stats for 30 seconds

    lst stat servers &amp; sleep 30; kill $!

    # tear down

    lst end_session

    and it seemed ok

    # modprobe lnet-selftest &amp;&amp; ssh mds2 modprobe lnet-selftest

    # ./lnet-selftest.sh 

    SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No

    10.0.0.[22,23]@tcp are added to session

    10.0.0.[22,23]@tcp are added to session

    10.0.0.[22,23]@tcp are added to session

    Test was added successfully

    Test was added successfully

    bulk_rw is running now

    [LNet Rates of servers]

    [R] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19739    RPC/s

    [W] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19738    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s

    [W] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s

    [LNet Rates of servers]

    [R] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s

    [W] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s

    [W] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s

    [LNet Rates of servers]

    [R] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s

    [W] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s

    [W] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s

    [LNet Rates of servers]

    [R] Avg: 19587    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s

    [W] Avg: 19586    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s

    [W] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s

    [LNet Rates of servers]

    [R] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19823    RPC/s

    [W] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19824    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s

    [W] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s

    session is ended

    ./lnet-selftest.sh: line 17:  8835 Terminated              lst stat
    servers




_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Sten Wolf

2013-Dec-17 15:29 UTC

head link

Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Hi all,

    Here is the situation:

    I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
    failover MGS, active/active MDT with zfs.

    I have a jbod shelf with 12 disks, seen by both nodes as das (the
    shelf has 2 sas ports, connected to a sas hba on each node), and I
    am using lustre 2.4 on centos 6.4 x64

    I have created 3 zfs pools:

    1. mgs:

    # zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror
    /dev/disk/by-id/wwn-0x50000c0f012306fc
    /dev/disk/by-id/wwn-0x50000c0f01233aec

    # mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0
    --param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs

       Permanent disk data:

    Target:     MGS

    Index:      unassigned

    Lustre FS:  

    Mount type: zfs

    Flags:      0x1064

                  (MGS first_time update no_primnode )

    Persistent mount opts: 

    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000

    2 mdt0:

    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d07a34
    /dev/disk/by-id/wwn-0x50000c0f01d110c8

    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0
    --servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt0/mdt0

    warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target
    index must be specified with --index

       Permanent disk data:

    Target:     fs0:MDT0000

    Index:      0

    Lustre FS:  fs0

    Mount type: zfs

    Flags:      0x1061

                  (MDT first_time update no_primnode )

    Persistent mount opts: 

    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp

    checking for existing Lustre data: not found

    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0

    Writing lustre-mdt0/mdt0 properties

      lustre:version=1

      lustre:flags=4193

      lustre:index=0

      lustre:fsname=fs0

      lustre:svname=fs0:MDT0000

      lustre:failover.node=10.0.0.22@tcp

      lustre:failover.node=10.0.0.23@tcp

      lustre:sys.timeout=5000

      lustre:mgsnode=10.0.0.22@tcp

      lustre:mgsnode=10.0.0.23@tcp

    3. mdt1:

    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d113e0
    /dev/disk/by-id/wwn-0x50000c0f01d116fc

    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0
    --servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt1/mdt1

       Permanent disk data:

    Target:     fs0:MDT0001

    Index:      1

    Lustre FS:  fs0

    Mount type: zfs

    Flags:      0x1061

                  (MDT first_time update no_primnode )

    Persistent mount opts: 

    Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp

    checking for existing Lustre data: not found

    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1

    Writing lustre-mdt1/mdt1 properties

      lustre:version=1

      lustre:flags=4193

      lustre:index=1

      lustre:fsname=fs0

      lustre:svname=fs0:MDT0001

      lustre:failover.node=10.0.0.23@tcp

      lustre:failover.node=10.0.0.22@tcp

      lustre:sys.timeout=5000

      lustre:mgsnode=10.0.0.22@tcp

      lustre:mgsnode=10.0.0.23@tcp

    a few basic sanity checks:

    # zfs list

    NAME               USED  AVAIL  REFER  MOUNTPOINT

    lustre-mdt0        824K  3.57T   136K  /lustre-mdt0

    lustre-mdt0/mdt0   136K  3.57T   136K  /lustre-mdt0/mdt0

    lustre-mdt1        716K  3.57T   136K  /lustre-mdt1

    lustre-mdt1/mdt1   136K  3.57T   136K  /lustre-mdt1/mdt1

    lustre-mgs        4.78M  3.57T   136K  /lustre-mgs

    lustre-mgs/mgs    4.18M  3.57T  4.18M  /lustre-mgs/mgs

    # zpool list

    NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT

    lustre-mdt0  3.62T  1.00M  3.62T     0%  1.00x  ONLINE  -

    lustre-mdt1  3.62T   800K  3.62T     0%  1.00x  ONLINE  -

    lustre-mgs   3.62T  4.86M  3.62T     0%  1.00x  ONLINE  -

    # zpool status

      pool: lustre-mdt0

     state: ONLINE

      scan: none requested

    config:

        NAME                        STATE     READ WRITE CKSUM

        lustre-mdt0                 ONLINE       0     0     0

          mirror-0                  ONLINE       0     0     0

            wwn-0x50000c0f01d07a34  ONLINE       0     0     0

            wwn-0x50000c0f01d110c8  ONLINE       0     0     0

    errors: No known data errors

      pool: lustre-mdt1

     state: ONLINE

      scan: none requested

    config:

        NAME                        STATE     READ WRITE CKSUM

        lustre-mdt1                 ONLINE       0     0     0

          mirror-0                  ONLINE       0     0     0

            wwn-0x50000c0f01d113e0  ONLINE       0     0     0

            wwn-0x50000c0f01d116fc  ONLINE       0     0     0

    errors: No known data errors

      pool: lustre-mgs

     state: ONLINE

      scan: none requested

    config:

        NAME                        STATE     READ WRITE CKSUM

        lustre-mgs                  ONLINE       0     0     0

          mirror-0                  ONLINE       0     0     0

            wwn-0x50000c0f012306fc  ONLINE       0     0     0

            wwn-0x50000c0f01233aec  ONLINE       0     0     0

    errors: No known data errors

    # zfs get lustre:svname lustre-mgs/mgs

    NAME            PROPERTY       VALUE          SOURCE

    lustre-mgs/mgs  lustre:svname  MGS            local

    # zfs get lustre:svname lustre-mdt0/mdt0

    NAME              PROPERTY       VALUE          SOURCE

    lustre-mdt0/mdt0  lustre:svname  fs0:MDT0000    local

    # zfs get lustre:svname lustre-mdt1/mdt1

    NAME              PROPERTY       VALUE          SOURCE

    lustre-mdt1/mdt1  lustre:svname  fs0:MDT0001    local

    So far, so good.

    My /etc/ldev.conf:

    mds1 mds2 MGS zfs:lustre-mgs/mgs

    mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0

    mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1

    my /etc/modprobe.d/lustre.conf 

    # options lnet networks=tcp0(em1)

    options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;"

-----------------------------------------------------------------------------

    Now, when starting the services, I get strange errors:

    # service lustre start local

    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS

    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000

    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: Input/output error

    Is the MGS running?

    # service lustre status local

    running

    attached lctl-dk.local01

    If I run the same command again, I get a different error:

    # service lustre start local

    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS

    mount.lustre: according to /etc/mtab lustre-mgs/mgs is already
    mounted on /mnt/lustre/local/MGS

    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000

    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: File exists

    attached lctl-dk.local02

    What am I doing wrong?

    I have tested lnet self-test as well, using the following script:

    # cat lnet-selftest.sh

    #!/bin/bash

    export LST_SESSION=$$

    lst new_session read/write

    lst add_group servers 10.0.0.[22,23]@tcp

    lst add_group readers 10.0.0.[22,23]@tcp

    lst add_group writers 10.0.0.[22,23]@tcp

    lst add_batch bulk_rw

    lst add_test --batch bulk_rw --from readers --to servers \

    brw read check=simple size=1M

    lst add_test --batch bulk_rw --from writers --to servers \

    brw write check=full size=4K

    # start running

    lst run bulk_rw

    # display server stats for 30 seconds

    lst stat servers &amp; sleep 30; kill $!

    # tear down

    lst end_session

    and it seemed ok

    # modprobe lnet-selftest &amp;&amp; ssh mds2 modprobe lnet-selftest

    # ./lnet-selftest.sh 

    SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No

    10.0.0.[22,23]@tcp are added to session

    10.0.0.[22,23]@tcp are added to session

    10.0.0.[22,23]@tcp are added to session

    Test was added successfully

    Test was added successfully

    bulk_rw is running now

    [LNet Rates of servers]

    [R] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19739    RPC/s

    [W] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19738    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s

    [W] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s

    [LNet Rates of servers]

    [R] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s

    [W] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s

    [W] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s

    [LNet Rates of servers]

    [R] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s

    [W] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s

    [W] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s

    [LNet Rates of servers]

    [R] Avg: 19587    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s

    [W] Avg: 19586    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s

    [W] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s

    [LNet Rates of servers]

    [R] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19823    RPC/s

    [W] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19824    RPC/s

    [LNet Bandwidth of servers]

    [R] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s

    [W] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s

    session is ended

    ./lnet-selftest.sh: line 17:  8835 Terminated              lst stat
    servers

    Addendum - I can start the MGS service on the 2nd node, and then
    start mdt0 service on local node:

    # ssh mds2 service lustre start MGS

    Mounting lustre-mgs/mgs on /mnt/lustre/foreign/MGS

    # service lustre start fs0-MDT0000

    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000

    # service lustre status

    unhealthy

    # service lustre status local

    running




_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Mohr Jr, Richard Frank (Rick Mohr)

2013-Dec-17 17:52 UTC

head link

Re: Setting up a lustre zfs dual mgs/mdt over tcp - help requested

On Dec 17, 2013, at 10:29 AM, Sten Wolf
<sten-dX0jVuv5p8QybS5Ee8rs3A@public.gmane.org>
 wrote:

I''m afraid I don''t have any suggested solutions to your
problem, but I did notice something about your lnet selftest script.
> lst add_group servers 10.0.0.[22,23]@tcp
> lst add_group readers 10.0.0.[22,23]@tcp
> lst add_group writers 10.0.0.[22,23]@tcp
> lst add_batch bulk_rw
> lst add_test --batch bulk_rw --from readers --to servers \
> brw read check=simple size=1M
> lst add_test --batch bulk_rw --from writers --to servers \
> brw write check=full size=4K
You may want to try swapping the order of the nids in the "servers"
group.  If I recall correctly, the default distribution method for lnet selftest
is 1:1.  This means that your clients and servers will be paired like this:

10.0.0.22@tcp  <-->  10.0.0.22@tcp
10.0.0.23@tcp  <--> 10.0.0.23@tcp

So you are not testing any lnet traffic between nodes.  (That being said, the
lnet connectivity between your nodes is still probably fine otherwise the lnet
selftest would likely not have run at all.)

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

Maybe Matching Threads

Search for more reasonably related threads

Lustre discuss - Dec 2013 - Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Re: Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Maybe Matching Threads