Sten Wolf
2013-Dec-17  14:10 UTC
Setting up a lustre zfs dual mgs/mdt over tcp - help requested
Hi all,
    Here is the situation:
    I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
    failover MGS, active/active MDT with zfs.
    I have a jbod shelf with 12 disks, seen by both nodes as das (the
    shelf has 2 sas ports, connected to a sas hba on each node), and I
    am using lustre 2.4 on centos 6.4 x64
    I have created 3 zfs pools:
    1. mgs:
    # zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror
    /dev/disk/by-id/wwn-0x50000c0f012306fc
    /dev/disk/by-id/wwn-0x50000c0f01233aec
    # mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0
    --param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs
       Permanent disk data:
    Target:     MGS
    Index:      unassigned
    Lustre FS:  
    Mount type: zfs
    Flags:      0x1064
                  (MGS first_time update no_primnode )
    Persistent mount opts: 
    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000
    2 mdt0:
    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d07a34
    /dev/disk/by-id/wwn-0x50000c0f01d110c8
    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0
    --servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt0/mdt0
    warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target
    index must be specified with --index
       Permanent disk data:
    Target:     fs0:MDT0000
    Index:      0
    Lustre FS:  fs0
    Mount type: zfs
    Flags:      0x1061
                  (MDT first_time update no_primnode )
    Persistent mount opts: 
    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
    checking for existing Lustre data: not found
    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0
    Writing lustre-mdt0/mdt0 properties
      lustre:version=1
      lustre:flags=4193
      lustre:index=0
      lustre:fsname=fs0
      lustre:svname=fs0:MDT0000
      lustre:failover.node=10.0.0.22@tcp
      lustre:failover.node=10.0.0.23@tcp
      lustre:sys.timeout=5000
      lustre:mgsnode=10.0.0.22@tcp
      lustre:mgsnode=10.0.0.23@tcp
    3. mdt1:
    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d113e0
    /dev/disk/by-id/wwn-0x50000c0f01d116fc
    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0
    --servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt1/mdt1
       Permanent disk data:
    Target:     fs0:MDT0001
    Index:      1
    Lustre FS:  fs0
    Mount type: zfs
    Flags:      0x1061
                  (MDT first_time update no_primnode )
    Persistent mount opts: 
    Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
    checking for existing Lustre data: not found
    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1
    Writing lustre-mdt1/mdt1 properties
      lustre:version=1
      lustre:flags=4193
      lustre:index=1
      lustre:fsname=fs0
      lustre:svname=fs0:MDT0001
      lustre:failover.node=10.0.0.23@tcp
      lustre:failover.node=10.0.0.22@tcp
      lustre:sys.timeout=5000
      lustre:mgsnode=10.0.0.22@tcp
      lustre:mgsnode=10.0.0.23@tcp
    a few basic sanity checks:
    # zfs list
    NAME               USED  AVAIL  REFER  MOUNTPOINT
    lustre-mdt0        824K  3.57T   136K  /lustre-mdt0
    lustre-mdt0/mdt0   136K  3.57T   136K  /lustre-mdt0/mdt0
    lustre-mdt1        716K  3.57T   136K  /lustre-mdt1
    lustre-mdt1/mdt1   136K  3.57T   136K  /lustre-mdt1/mdt1
    lustre-mgs        4.78M  3.57T   136K  /lustre-mgs
    lustre-mgs/mgs    4.18M  3.57T  4.18M  /lustre-mgs/mgs
    # zpool list
    NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
    lustre-mdt0  3.62T  1.00M  3.62T     0%  1.00x  ONLINE  -
    lustre-mdt1  3.62T   800K  3.62T     0%  1.00x  ONLINE  -
    lustre-mgs   3.62T  4.86M  3.62T     0%  1.00x  ONLINE  -
    # zpool status
      pool: lustre-mdt0
     state: ONLINE
      scan: none requested
    config:
        NAME                        STATE     READ WRITE CKSUM
        lustre-mdt0                 ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50000c0f01d07a34  ONLINE       0     0     0
            wwn-0x50000c0f01d110c8  ONLINE       0     0     0
    errors: No known data errors
      pool: lustre-mdt1
     state: ONLINE
      scan: none requested
    config:
        NAME                        STATE     READ WRITE CKSUM
        lustre-mdt1                 ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50000c0f01d113e0  ONLINE       0     0     0
            wwn-0x50000c0f01d116fc  ONLINE       0     0     0
    errors: No known data errors
      pool: lustre-mgs
     state: ONLINE
      scan: none requested
    config:
        NAME                        STATE     READ WRITE CKSUM
        lustre-mgs                  ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50000c0f012306fc  ONLINE       0     0     0
            wwn-0x50000c0f01233aec  ONLINE       0     0     0
    errors: No known data errors
    # zfs get lustre:svname lustre-mgs/mgs
    NAME            PROPERTY       VALUE          SOURCE
    lustre-mgs/mgs  lustre:svname  MGS            local
    # zfs get lustre:svname lustre-mdt0/mdt0
    NAME              PROPERTY       VALUE          SOURCE
    lustre-mdt0/mdt0  lustre:svname  fs0:MDT0000    local
    # zfs get lustre:svname lustre-mdt1/mdt1
    NAME              PROPERTY       VALUE          SOURCE
    lustre-mdt1/mdt1  lustre:svname  fs0:MDT0001    local
    So far, so good.
    My /etc/ldev.conf:
    mds1 mds2 MGS zfs:lustre-mgs/mgs
    mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0
    mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1
    my /etc/modprobe.d/lustre.conf 
    # options lnet networks=tcp0(em1)
    options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;"
-----------------------------------------------------------------------------
    Now, when starting the services, I get strange errors:
    # service lustre start local
    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: Input/output error
    Is the MGS running?
    # service lustre status local
    running
    attached lctl-dk.local01
    If I run the same command again, I get a different error:
    # service lustre start local
    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
    mount.lustre: according to /etc/mtab lustre-mgs/mgs is already
    mounted on /mnt/lustre/local/MGS
    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: File exists
    attached lctl-dk.local02
    What am I doing wrong?
    I have tested lnet self-test as well, using the following script:
    # cat lnet-selftest.sh
    #!/bin/bash
    export LST_SESSION=$$
    lst new_session read/write
    lst add_group servers 10.0.0.[22,23]@tcp
    lst add_group readers 10.0.0.[22,23]@tcp
    lst add_group writers 10.0.0.[22,23]@tcp
    lst add_batch bulk_rw
    lst add_test --batch bulk_rw --from readers --to servers \
    brw read check=simple size=1M
    lst add_test --batch bulk_rw --from writers --to servers \
    brw write check=full size=4K
    # start running
    lst run bulk_rw
    # display server stats for 30 seconds
    lst stat servers & sleep 30; kill $!
    # tear down
    lst end_session
    and it seemed ok
    # modprobe lnet-selftest && ssh mds2 modprobe lnet-selftest
    # ./lnet-selftest.sh 
    SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No
    10.0.0.[22,23]@tcp are added to session
    10.0.0.[22,23]@tcp are added to session
    10.0.0.[22,23]@tcp are added to session
    Test was added successfully
    Test was added successfully
    bulk_rw is running now
    [LNet Rates of servers]
    [R] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19739    RPC/s
    [W] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19738    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
    [W] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
    [LNet Rates of servers]
    [R] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
    [W] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
    [W] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
    [LNet Rates of servers]
    [R] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
    [W] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
    [W] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
    [LNet Rates of servers]
    [R] Avg: 19587    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
    [W] Avg: 19586    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
    [W] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
    [LNet Rates of servers]
    [R] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19823    RPC/s
    [W] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19824    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
    [W] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
    session is ended
    ./lnet-selftest.sh: line 17:  8835 Terminated              lst stat
    servers
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Sten Wolf
2013-Dec-17  15:29 UTC
Setting up a lustre zfs dual mgs/mdt over tcp - help requested
Hi all,
    Here is the situation:
    I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
    failover MGS, active/active MDT with zfs.
    I have a jbod shelf with 12 disks, seen by both nodes as das (the
    shelf has 2 sas ports, connected to a sas hba on each node), and I
    am using lustre 2.4 on centos 6.4 x64
    I have created 3 zfs pools:
    1. mgs:
    # zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror
    /dev/disk/by-id/wwn-0x50000c0f012306fc
    /dev/disk/by-id/wwn-0x50000c0f01233aec
    # mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0
    --param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs
       Permanent disk data:
    Target:     MGS
    Index:      unassigned
    Lustre FS:  
    Mount type: zfs
    Flags:      0x1064
                  (MGS first_time update no_primnode )
    Persistent mount opts: 
    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000
    2 mdt0:
    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d07a34
    /dev/disk/by-id/wwn-0x50000c0f01d110c8
    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0
    --servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt0/mdt0
    warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target
    index must be specified with --index
       Permanent disk data:
    Target:     fs0:MDT0000
    Index:      0
    Lustre FS:  fs0
    Mount type: zfs
    Flags:      0x1061
                  (MDT first_time update no_primnode )
    Persistent mount opts: 
    Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
    checking for existing Lustre data: not found
    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0
    Writing lustre-mdt0/mdt0 properties
      lustre:version=1
      lustre:flags=4193
      lustre:index=0
      lustre:fsname=fs0
      lustre:svname=fs0:MDT0000
      lustre:failover.node=10.0.0.22@tcp
      lustre:failover.node=10.0.0.23@tcp
      lustre:sys.timeout=5000
      lustre:mgsnode=10.0.0.22@tcp
      lustre:mgsnode=10.0.0.23@tcp
    3. mdt1:
    # zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror
    /dev/disk/by-id/wwn-0x50000c0f01d113e0
    /dev/disk/by-id/wwn-0x50000c0f01d116fc
    # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0
    --servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs
    --index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt1/mdt1
       Permanent disk data:
    Target:     fs0:MDT0001
    Index:      1
    Lustre FS:  fs0
    Mount type: zfs
    Flags:      0x1061
                  (MDT first_time update no_primnode )
    Persistent mount opts: 
    Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp
    sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
    checking for existing Lustre data: not found
    mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1
    Writing lustre-mdt1/mdt1 properties
      lustre:version=1
      lustre:flags=4193
      lustre:index=1
      lustre:fsname=fs0
      lustre:svname=fs0:MDT0001
      lustre:failover.node=10.0.0.23@tcp
      lustre:failover.node=10.0.0.22@tcp
      lustre:sys.timeout=5000
      lustre:mgsnode=10.0.0.22@tcp
      lustre:mgsnode=10.0.0.23@tcp
    a few basic sanity checks:
    # zfs list
    NAME               USED  AVAIL  REFER  MOUNTPOINT
    lustre-mdt0        824K  3.57T   136K  /lustre-mdt0
    lustre-mdt0/mdt0   136K  3.57T   136K  /lustre-mdt0/mdt0
    lustre-mdt1        716K  3.57T   136K  /lustre-mdt1
    lustre-mdt1/mdt1   136K  3.57T   136K  /lustre-mdt1/mdt1
    lustre-mgs        4.78M  3.57T   136K  /lustre-mgs
    lustre-mgs/mgs    4.18M  3.57T  4.18M  /lustre-mgs/mgs
    # zpool list
    NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
    lustre-mdt0  3.62T  1.00M  3.62T     0%  1.00x  ONLINE  -
    lustre-mdt1  3.62T   800K  3.62T     0%  1.00x  ONLINE  -
    lustre-mgs   3.62T  4.86M  3.62T     0%  1.00x  ONLINE  -
    # zpool status
      pool: lustre-mdt0
     state: ONLINE
      scan: none requested
    config:
        NAME                        STATE     READ WRITE CKSUM
        lustre-mdt0                 ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50000c0f01d07a34  ONLINE       0     0     0
            wwn-0x50000c0f01d110c8  ONLINE       0     0     0
    errors: No known data errors
      pool: lustre-mdt1
     state: ONLINE
      scan: none requested
    config:
        NAME                        STATE     READ WRITE CKSUM
        lustre-mdt1                 ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50000c0f01d113e0  ONLINE       0     0     0
            wwn-0x50000c0f01d116fc  ONLINE       0     0     0
    errors: No known data errors
      pool: lustre-mgs
     state: ONLINE
      scan: none requested
    config:
        NAME                        STATE     READ WRITE CKSUM
        lustre-mgs                  ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x50000c0f012306fc  ONLINE       0     0     0
            wwn-0x50000c0f01233aec  ONLINE       0     0     0
    errors: No known data errors
    # zfs get lustre:svname lustre-mgs/mgs
    NAME            PROPERTY       VALUE          SOURCE
    lustre-mgs/mgs  lustre:svname  MGS            local
    # zfs get lustre:svname lustre-mdt0/mdt0
    NAME              PROPERTY       VALUE          SOURCE
    lustre-mdt0/mdt0  lustre:svname  fs0:MDT0000    local
    # zfs get lustre:svname lustre-mdt1/mdt1
    NAME              PROPERTY       VALUE          SOURCE
    lustre-mdt1/mdt1  lustre:svname  fs0:MDT0001    local
    So far, so good.
    My /etc/ldev.conf:
    mds1 mds2 MGS zfs:lustre-mgs/mgs
    mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0
    mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1
    my /etc/modprobe.d/lustre.conf 
    # options lnet networks=tcp0(em1)
    options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;"
-----------------------------------------------------------------------------
    Now, when starting the services, I get strange errors:
    # service lustre start local
    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: Input/output error
    Is the MGS running?
    # service lustre status local
    running
    attached lctl-dk.local01
    If I run the same command again, I get a different error:
    # service lustre start local
    Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
    mount.lustre: according to /etc/mtab lustre-mgs/mgs is already
    mounted on /mnt/lustre/local/MGS
    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
    mount.lustre: mount lustre-mdt0/mdt0 at
    /mnt/lustre/local/fs0-MDT0000 failed: File exists
    attached lctl-dk.local02
    What am I doing wrong?
    I have tested lnet self-test as well, using the following script:
    # cat lnet-selftest.sh
    #!/bin/bash
    export LST_SESSION=$$
    lst new_session read/write
    lst add_group servers 10.0.0.[22,23]@tcp
    lst add_group readers 10.0.0.[22,23]@tcp
    lst add_group writers 10.0.0.[22,23]@tcp
    lst add_batch bulk_rw
    lst add_test --batch bulk_rw --from readers --to servers \
    brw read check=simple size=1M
    lst add_test --batch bulk_rw --from writers --to servers \
    brw write check=full size=4K
    # start running
    lst run bulk_rw
    # display server stats for 30 seconds
    lst stat servers & sleep 30; kill $!
    # tear down
    lst end_session
    and it seemed ok
    # modprobe lnet-selftest && ssh mds2 modprobe lnet-selftest
    # ./lnet-selftest.sh 
    SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No
    10.0.0.[22,23]@tcp are added to session
    10.0.0.[22,23]@tcp are added to session
    10.0.0.[22,23]@tcp are added to session
    Test was added successfully
    Test was added successfully
    bulk_rw is running now
    [LNet Rates of servers]
    [R] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19739    RPC/s
    [W] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19738    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
    [W] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
    [LNet Rates of servers]
    [R] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
    [W] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
    [W] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
    [LNet Rates of servers]
    [R] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
    [W] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
    [W] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
    [LNet Rates of servers]
    [R] Avg: 19587    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
    [W] Avg: 19586    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
    [W] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
    [LNet Rates of servers]
    [R] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19823    RPC/s
    [W] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19824    RPC/s
    [LNet Bandwidth of servers]
    [R] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
    [W] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
    session is ended
    ./lnet-selftest.sh: line 17:  8835 Terminated              lst stat
    servers
    Addendum - I can start the MGS service on the 2nd node, and then
    start mdt0 service on local node:
    # ssh mds2 service lustre start MGS
    Mounting lustre-mgs/mgs on /mnt/lustre/foreign/MGS
    # service lustre start fs0-MDT0000
    Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
    # service lustre status
    unhealthy
    # service lustre status local
    running
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Mohr Jr, Richard Frank (Rick Mohr)
2013-Dec-17  17:52 UTC
Re: Setting up a lustre zfs dual mgs/mdt over tcp - help requested
On Dec 17, 2013, at 10:29 AM, Sten Wolf <sten-dX0jVuv5p8QybS5Ee8rs3A@public.gmane.org> wrote: I''m afraid I don''t have any suggested solutions to your problem, but I did notice something about your lnet selftest script.> lst add_group servers 10.0.0.[22,23]@tcp > lst add_group readers 10.0.0.[22,23]@tcp > lst add_group writers 10.0.0.[22,23]@tcp > lst add_batch bulk_rw > lst add_test --batch bulk_rw --from readers --to servers \ > brw read check=simple size=1M > lst add_test --batch bulk_rw --from writers --to servers \ > brw write check=full size=4KYou may want to try swapping the order of the nids in the "servers" group. If I recall correctly, the default distribution method for lnet selftest is 1:1. This means that your clients and servers will be paired like this: 10.0.0.22@tcp <--> 10.0.0.22@tcp 10.0.0.23@tcp <--> 10.0.0.23@tcp So you are not testing any lnet traffic between nodes. (That being said, the lnet connectivity between your nodes is still probably fine otherwise the lnet selftest would likely not have run at all.) -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu