?? 25 ??? 2020 ?. 5:49:00 GMT+03:00, Olivier <Olivier.Nicole at
cs.ait.ac.th> ??????:>Strahil Nikolov <hunter86_bg at yahoo.com> writes:
>
>> On May 23, 2020 7:29:23 AM GMT+03:00, Olivier
><Olivier.Nicole at cs.ait.ac.th> wrote:
>>>Hi,
>>>
>>>I have been struggling with NFS Ganesha: one gluster node with
>ganesha
>>>serving only one client could not handle the load when dealing with
>>>thousand of small files. Legacy gluster NFS works flawlesly with 5
or
>6
>>>clients.
>>>
>>>But the documentation for gNFS is scarce, I could not find where to
>>>configure the various autorizations, so any pointer is greatly
>welcome.
>>>
>>>Best regards,
>>>
>>>Olivier
>>
>> Hi Oliver,
>>
>> Can you hint me why you are using gluster with a single node in the
>TSP serving only 1 client ?
>> Usually, this is not a typical gluster workload.
>
>Hi Strahil,
>
>Of course I have more than one node, other nodes are supporting the
>bricks and the data. I am using a node with no data to solve this issue
>with NFS. But in my comparison between gNFS and Ganesha, I was using
>the
>same configuration, with one node with no birck accessing the other
>nodes for the data. So the only change between what is working and what
>was not is the NFS server. Beside, I have been using NFS for over 15
>years and know that given my data and type of activity, one single NFS
>server should be able to serve 5 to 10 clients without a problem, that
>is why I suspected Ganesha from the begining.
You are not clmparing apples-to-apples. Pure NFS has been used in UNIXes
before reaching modern OS-es. Linux has long been using Pure NFS and the
kernel has been optimized for that, while Ganesha is new tech and requires
some tuning.
You haven't mentioned what kind of issues do you see - searching a
directory, accessing a lot of files for read, writing a lot of small files,
etc.
Usually a negative lookup (searching/accessing) inexisting object
(file/dir/etc) has a serious performance degradation.
>If I cannot configure gNFS, I think I could glusterfs_mount the volume
>and use the native NFS server of Linux, but that would add overhead and
>leave some features behind, that is why my focus is primarily on
>configuring gNFS.
>
>>
>> Also can you specify:
>> - Brick block device type and details (raid type, lvm, vdo, etc )
>
>All nodes are VMware virtual machines, the RAID being at VMware level
Yeah, that's not very descriptive.
For write-intensive and small-file workload the optimal raid mode is raid10
with at least 12 disks per node.
What is the I/O scheduler, are you using Thin LVM or thic? How many
snapshots you have ?
Are you using striping on LVM level ( if you use local storage then most
probably no striping)?
PE size in VG ?
>> - xfs_info of the brick
What kind of FS are you using ? You need to be sure that inode size
is at least 512 bytes (1024 for swift) in order to be supported.
>> - mount options for the brick
>
>Bricks are not mounted
It is not good to share OS and Gluster Bricks VMDK. You can benefit from
options like 'noatime,nodiratime,nobarrier,inode64' . Noatime
requires storage with battery-backed write cache.
>> - SELINUX/APPARMOR status
>> - sysctl tunables (including tuned profile)
>
>All systems are vanilla Ubuntu with no tuning.
I have done some tests and you can benefit from the rhgs random IO tuned
profile . The latest source rpm can be found at:
ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm
On top of that you need to modify it to disable LRO, as it is
automatically enabled for VMXNET NICs. This increases bandwidth but reduces
lattency which is crucial for looking up thousand of files/directories.
>> - gluster volume information and status
>
>sudo gluster volume info gv0
>
>Volume Name: gv0
>Type: Distributed-Replicate
>Volume ID: cc664830-1dd0-4dd4-9f1c-493578297e79
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: gluster3000:/gluster1/br
>Brick2: gluster5000:/gluster/br
>Brick3: gluster3000:/gluster2/br
>Brick4: gluster2000:/gluster/br
>Options Reconfigured:
>features.quota-deem-statfs: on
>features.inode-quota: on
>features.quota: on
>transport.address-family: inet
>nfs.disable: off
>features.cache-invalidation: on
>on at gluster3:~$ sudo gluster volume status gv0
>Status of volume: gv0
>Gluster process TCP Port RDMA Port Online
> Pid
>------------------------------------------------------------------------------
>Brick gluster3000:/gluster1/br 49152 0 Y
> 1473
>Brick gluster5000:/gluster/br 49152 0 Y
> 724
>Brick gluster3000:/gluster2/br 49153 0 Y
> 1549
>Brick gluster2000:/gluster/br 49152 0 Y
> 723
>Self-heal Daemon on localhost N/A N/A Y
> 1571
>NFS Server on localhost N/A N/A N
> N/A
>Quota Daemon on localhost N/A N/A Y
> 1560
>Self-heal Daemon on gluster2000.cs.ait.ac.t
>h N/A N/A Y
> 835
>NFS Server on gluster2000.cs.ait.ac.th N/A N/A N
> N/A
>Quota Daemon on gluster2000.cs.ait.ac.th N/A N/A Y
> 735
>Self-heal Daemon on gluster5000.cs.ait.ac.t
>h N/A N/A Y
> 829
>NFS Server on gluster5000.cs.ait.ac.th N/A N/A N
> N/A
>Quota Daemon on gluster5000.cs.ait.ac.th N/A N/A Y
> 736
>Self-heal Daemon on fbsd3500 N/A N/A Y
> 2584
>NFS Server on fbsd3500 2049 0 Y
> 2671
>Quota Daemon on fbsd3500 N/A N/A Y
> 2571
>
>Task Status of Volume gv0
>------------------------------------------------------------------------------
>Task : Rebalance
>ID : 53e7c649-27f0-4da0-90dc-af59f937d01f
>Status : completed
You don't have any tunings in the volume, despite the predefined ones
in /var/lib/glusterd/groups.
Both metadata-cache and nl-cache bring some performance when having a
small-file workload. You have to try them and check the results. Use a
real-world workload job for testing, as synthetic benches do not always show
the real truth.
In order to reset (revert) a setting you can use 'gluster volume reset gv0
<setting>'
>> - ganesha settings
>
>MDCACHE
>{
>Attr_Expiration_Time = 600;
>Entries_HWMark = 50000;
>LRU_Run_Interval = 90;
>FD_HWMark_Percent = 60;
>FD_LWMark_Percent = 20;
>FD_Limit_Percent = 90;
>}
>EXPORT
>{
> Export_Id = 2;
> etc.
>}
>
>> - Network settings + MTU
>
>MTU 1500 (I think it is my switch that never worked with jumbo
>frames). I have a dedicated VLAN for NFS and gluster and a VLAN for
>users connection.
Verify that there is no fragmentation between the TSP nodes and between the
NFS (Ganesha) and the cluster:
For example MTU is 1500 , then use a size of 1500 - 28 (ICMP + IP
headers) = 1472
ping -M do -s 1472 -c 4 -I <interface> <other gluster node>
Even the dumbest gigabit switches support Jumbo frames of 9000 (anything
above that is supported by expensive hardware), so I would recommend you
to verify if Jumbo frames is possible at least between the TSP nodes and
maybe the NFS.
>I hope that helps.
>
>Best regards,
>
>Olivier
>
>>
>> Best Regards,
>> Strahil Nikolov
>>
As you can see you are getting further into the deep and we haven't covered
the storage stack yet, nor any Ganesha settings :)
Good luck!
Best Regards,
Strahil Nikolov