thr3ads.net - Gluster users - [Gluster-users] Configuring legacy Gulster NFS [May 2020]

If this information is useful, please help other people find it:
Share via:

Olivier

2020-May-25 02:49 UTC

[Gluster-users] Configuring legacy Gulster NFS

Strahil Nikolov <hunter86_bg at yahoo.com> writes:
> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier <Olivier.Nicole at
cs.ait.ac.th> wrote:
>>Hi,
>>
>>I have been struggling with NFS Ganesha: one gluster node with ganesha
>>serving only one client could not handle the load when dealing with
>>thousand of small files. Legacy gluster NFS works flawlesly with 5 or 6
>>clients.
>>
>>But the documentation for gNFS is scarce, I could not find where to
>>configure the various autorizations, so any pointer is greatly welcome.
>>
>>Best regards,
>>
>>Olivier
>
> Hi Oliver,
>
> Can you hint me why you are using gluster with a single node in the TSP
serving only 1 client ?
> Usually, this is not a typical gluster workload.
Hi Strahil,

Of course I have more than one node, other nodes are supporting the
bricks and the data. I am using a node with no data to solve this issue
with NFS. But in my comparison between gNFS and Ganesha, I was using the
same configuration, with one node with no birck accessing the other
nodes for the data. So the only change between what is working and what
was not is the NFS server. Beside, I have been using NFS for over 15
years and know that given my data and type of activity, one single NFS
server should be able to serve 5 to 10 clients without a problem, that
is why I suspected Ganesha from the begining.

If I cannot configure gNFS, I think I could glusterfs_mount the volume
and use the native NFS server of Linux, but that would add overhead and
leave some features behind, that is why my focus is primarily on
configuring gNFS.
>
> Also can you specify:
> - Brick block device type and details (raid type, lvm, vdo, etc )
All nodes are VMware virtual machines, the RAID being at VMware level
> - xfs_info of the brick
> - mount options  for the brick
Bricks are not mounted
> - SELINUX/APPARMOR status
> - sysctl tunables (including tuned profile)
All systems are vanilla Ubuntu with no tuning.
> - gluster volume information and status
sudo gluster volume info gv0

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster3000:/gluster1/br
Brick2: gluster5000:/gluster/br
Brick3: gluster3000:/gluster2/br
Brick4: gluster2000:/gluster/br
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
nfs.disable: off
features.cache-invalidation: on
on at gluster3:~$ sudo gluster volume status gv0
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster3000:/gluster1/br              49152     0          Y       1473
Brick gluster5000:/gluster/br               49152     0          Y       724
Brick gluster3000:/gluster2/br              49153     0          Y       1549
Brick gluster2000:/gluster/br               49152     0          Y       723
Self-heal Daemon on localhost               N/A       N/A        Y       1571
NFS Server on localhost                     N/A       N/A        N       N/A
Quota Daemon on localhost                   N/A       N/A        Y       1560
Self-heal Daemon on gluster2000.cs.ait.ac.t
h                                           N/A       N/A        Y       835
NFS Server on gluster2000.cs.ait.ac.th      N/A       N/A        N       N/A
Quota Daemon on gluster2000.cs.ait.ac.th    N/A       N/A        Y       735
Self-heal Daemon on gluster5000.cs.ait.ac.t
h                                           N/A       N/A        Y       829
NFS Server on gluster5000.cs.ait.ac.th      N/A       N/A        N       N/A
Quota Daemon on gluster5000.cs.ait.ac.th    N/A       N/A        Y       736
Self-heal Daemon on fbsd3500                N/A       N/A        Y       2584
NFS Server on fbsd3500                      2049      0          Y       2671
Quota Daemon on fbsd3500                    N/A       N/A        Y       2571

Task Status of Volume gv0
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 53e7c649-27f0-4da0-90dc-af59f937d01f
Status               : completed
> - ganesha settings
MDCACHE
{
Attr_Expiration_Time = 600;
Entries_HWMark = 50000;
LRU_Run_Interval = 90;
FD_HWMark_Percent = 60;
FD_LWMark_Percent = 20;
FD_Limit_Percent = 90;
}
EXPORT
{
        Export_Id = 2;
        etc.
}
> - Network settings + MTU
MTU 1500 (I think it is my switch that never worked with jumbo
frames). I have a dedicated VLAN for NFS and gluster and a VLAN for
users connection.

I hope that helps.

Best regards,

Olivier
>
> Best Regards,
> Strahil Nikolov
>
--

Strahil Nikolov

2020-May-25 14:13 UTC

head link

[Gluster-users] Configuring legacy Gulster NFS

?? 25 ??? 2020 ?. 5:49:00 GMT+03:00, Olivier <Olivier.Nicole at
cs.ait.ac.th> ??????:>Strahil Nikolov <hunter86_bg at yahoo.com> writes:
>
>> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier
><Olivier.Nicole at cs.ait.ac.th> wrote:
>>>Hi,
>>>
>>>I have been struggling with NFS Ganesha: one gluster node with
>ganesha
>>>serving only one client could not handle the load when dealing with
>>>thousand of small files. Legacy gluster NFS works flawlesly with 5
or
>6
>>>clients.
>>>
>>>But the documentation for gNFS is scarce, I could not find where to
>>>configure the various autorizations, so any pointer is greatly
>welcome.
>>>
>>>Best regards,
>>>
>>>Olivier
>>
>> Hi Oliver,
>>
>> Can you hint me why you are using gluster with a single node in the
>TSP serving only 1 client ?
>> Usually, this is not a typical gluster workload.
>
>Hi Strahil,
>
>Of course I have more than one node, other nodes are supporting the
>bricks and the data. I am using a node with no data to solve this issue
>with NFS. But in my comparison between gNFS and Ganesha, I was using
>the
>same configuration, with one node with no birck accessing the other
>nodes for the data. So the only change between what is working and what
>was not is the NFS server. Beside, I have been using NFS for over 15
>years and know that given my data and type of activity, one single NFS
>server should be able to serve 5 to 10 clients without a problem, that
>is why I suspected Ganesha from the begining.
You are not clmparing apples-to-apples. Pure  NFS  has been used in UNIXes 
before  reaching modern OS-es. Linux  has long  been using Pure  NFS  and the
kernel has been optimized  for that, while  Ganesha is new  tech and requires 
some tuning.

You haven't mentioned  what kind of  issues  do  you see -  searching  a
directory, accessing a  lot  of  files  for  read, writing a lot of small files,
etc.

Usually  a  negative lookup (searching/accessing) inexisting  object
(file/dir/etc) has  a  serious performance degradation.
>If I cannot configure gNFS, I think I could glusterfs_mount the volume
>and use the native NFS server of Linux, but that would add overhead and
>leave some features behind, that is why my focus is primarily on
>configuring gNFS.
>
>>
>> Also can you specify:
>> - Brick block device type and details (raid type, lvm, vdo, etc )
>
>All nodes are VMware virtual machines, the RAID being at VMware level
Yeah,  that's not very descriptive.
For  write-intensive  and small-file workload  the optimal raid mode  is  raid10
with at least  12 disks per node.
 What is the I/O scheduler,  are you using Thin LVM or thic?  How  many
snapshots  you have ?
Are  you using striping  on LVM level (  if you use  local  storage then most
probably no striping)?
PE  size  in VG  ?
>> - xfs_info of the brick
What kind  of  FS  are  you  using  ?  You need  to be  sure  that  inode  size 
is at least  512 bytes (1024  for  swift)  in order  to be  supported.
>> - mount options  for the brick
>
>Bricks are not mounted
It  is not good  to share OS  and Gluster Bricks VMDK. You can benefit  from
options like 'noatime,nodiratime,nobarrier,inode64'  .  Noatime 
requires  storage  with battery-backed  write  cache.
>> - SELINUX/APPARMOR status
>> - sysctl tunables (including tuned profile)
>
>All systems are vanilla Ubuntu with no tuning.
I have  done  some tests and you can benefit from the rhgs random IO  tuned
profile . The latest  source  rpm can be  found  at:
ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm

On top  of  that  you need  to  modify it  to disable LRO,  as  it  is 
automatically  enabled  for VMXNET NICs. This  increases bandwidth but reduces 
lattency which is crucial  for  looking up thousand  of files/directories.
>> - gluster volume information and status
>
>sudo gluster volume info gv0
>
>Volume Name: gv0
>Type: Distributed-Replicate
>Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: gluster3000:/gluster1/br
>Brick2: gluster5000:/gluster/br
>Brick3: gluster3000:/gluster2/br
>Brick4: gluster2000:/gluster/br
>Options Reconfigured:
>features.quota-deem-statfs: on
>features.inode-quota: on
>features.quota: on
>transport.address-family: inet
>nfs.disable: off
>features.cache-invalidation: on
>on at gluster3:~$ sudo gluster volume status gv0
>Status of volume: gv0
>Gluster process                             TCP Port  RDMA Port  Online
> Pid
>------------------------------------------------------------------------------
>Brick gluster3000:/gluster1/br              49152     0          Y     
> 1473
>Brick gluster5000:/gluster/br               49152     0          Y     
> 724
>Brick gluster3000:/gluster2/br              49153     0          Y     
> 1549
>Brick gluster2000:/gluster/br               49152     0          Y     
> 723
>Self-heal Daemon on localhost               N/A       N/A        Y     
> 1571
>NFS Server on localhost                     N/A       N/A        N     
> N/A
>Quota Daemon on localhost                   N/A       N/A        Y     
> 1560
>Self-heal Daemon on gluster2000.cs.ait.ac.t
>h                                           N/A       N/A        Y     
> 835
>NFS Server on gluster2000.cs.ait.ac.th      N/A       N/A        N     
> N/A
>Quota Daemon on gluster2000.cs.ait.ac.th    N/A       N/A        Y     
> 735
>Self-heal Daemon on gluster5000.cs.ait.ac.t
>h                                           N/A       N/A        Y     
> 829
>NFS Server on gluster5000.cs.ait.ac.th      N/A       N/A        N     
> N/A
>Quota Daemon on gluster5000.cs.ait.ac.th    N/A       N/A        Y     
> 736
>Self-heal Daemon on fbsd3500                N/A       N/A        Y     
> 2584
>NFS Server on fbsd3500                      2049      0          Y     
> 2671
>Quota Daemon on fbsd3500                    N/A       N/A        Y     
> 2571
>
>Task Status of Volume gv0
>------------------------------------------------------------------------------
>Task                 : Rebalance
>ID                   : 53e7c649-27f0-4da0-90dc-af59f937d01f
>Status               : completed

You don't have any tunings  in the volume, despite  the  predefined  ones 
in  /var/lib/glusterd/groups.
Both metadata-cache and nl-cache bring some performance  when having a
small-file  workload.  You  have  to try them and check the results. Use  a 
real-world  workload  job for testing, as  synthetic benches do not always show
the real truth.
In order to reset (revert) a setting you can use 'gluster volume  reset  gv0
<setting>'

>> - ganesha settings
>
>MDCACHE
>{
>Attr_Expiration_Time = 600;
>Entries_HWMark = 50000;
>LRU_Run_Interval = 90;
>FD_HWMark_Percent = 60;
>FD_LWMark_Percent = 20;
>FD_Limit_Percent = 90;
>}
>EXPORT
>{
>        Export_Id = 2;
>        etc.
>}
>
>> - Network settings + MTU
>
>MTU 1500 (I think it is my switch that never worked with jumbo
>frames). I have a dedicated VLAN for NFS and gluster and a VLAN for
>users connection.
Verify that there  is no fragmentation between the TSP  nodes and between the
NFS (Ganesha) and the cluster:
For  example  MTU  is  1500 ,  then use  a  size  of  1500  - 28 (ICMP  +  IP
headers) = 1472
ping  -M do -s  1472  -c  4 -I <interface> <other  gluster node>

Even the dumbest gigabit switches  support Jumbo frames  of  9000  (anything 
above that is  supported  by expensive hardware),  so I  would  recommend  you
to verify if Jumbo frames  is  possible  at least between the TSP nodes  and 
maybe the NFS.
>I hope that helps.
>
>Best regards,
>
>Olivier
>
>>
>> Best Regards,
>> Strahil Nikolov
>>

As you can see you  are getting further into the deep and we haven't covered
the storage stack yet, nor any Ganesha settings :)

Good luck!

Best Regards,
Strahil  Nikolov

Gluster users - May 2020 - Configuring legacy Gulster NFS

[Gluster-users] Configuring legacy Gulster NFS

[Gluster-users] Configuring legacy Gulster NFS