Hi, I have been conducting performance tests over the past day on our new HW where we plan to deploy a scalable file-system solution. I hope the results can be helpful to someone. I hope to receive some feedback regarding optimizations and volume xlator setup. If necessary volume profiling has been collected. Regards Davide The Clients nr of clients 6 Network 10Gb Clients Mem 128GB Clients Cores 22 Centos 7.5.1804 Kernel 3.10.0-862.14.4.el7.x86_64 The Servers nr of servers 3 Network 100Gb *node to node is 100Gb, to clients 10Gb Server Mem 377GB Server Cores 56 *Intel 5120 CPU Storage 4x8TB NVME Centos 7.5.1804 Kernel 3.10.0-862.14.4.el7.x86_64 Gluster version Both clients and servers running glusterfs 4.1.5 (glusterd not glusterd2) Brick Setup The Bricks have been automatically configured by heketi at the volume creation resulting in: - 1 VG per NVME disk - 1 thinpool with one LV - 1 LV mapped to one brick - 1 x 3 = 3 The tests The tests have been carried out using smallfile utility: https://github.com/distributed-system-analysis/smallfile A set of comparative tests have been carried out between the following platforms, these tests include gluster volume profiling: - proprietary appliance NVME over iSCSI, top of the range (1 client only) - Proprietary appliance SSD service NFS, top of the range - Gluster 3 nodes cluster specs above A set of longer running and resilience tests for only Glusterfs, in these tests the system graph metrics are available and physically unplugging drives/nodes has been done Volume profiling: Has shown by the 8threads 4K test below, gluster volume profiling did not incur in any performance degradation so has been left on for all the 5K files tests. Volume profiling results are enclosed. Gluster volume options The following volume options have been configured based on previous experiences. Some of these options have been tested default VS custom as shown by the 4threads test. Other options haven?t been explicitly set since already enabled at the default values. Options Reconfigured: client.event-threads 3 performance.cache-size 8GB performance.io-thread-count 24 network.inode-lru-limit 1048576 performance.parallel-readdir on performance.cache-invalidation on performance.md-cache-timeout 600 features.cache-invalidation on features.cache-invalidation-timeout 600 performance.client-io-threads on The Cache I did not noticed any difference in re-mounting the share or dropping the cache in /proc and in real world i want to use any cache available as much as possible so none of the tests have been clearing caches or re-mounting. To note however that performing two subsequent stat operations resulted in aprox 10x faster FOPS, this test result is not recorded. *The results* Attached -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181114/a22e4efa/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterfs_smallfile.xlsx Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Size: 25701 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181114/a22e4efa/attachment.xlsx>