Hi,
We recently decided to try out glusterfs out in lab as a lot of our processing
is IOPS bound and our data sets are fairly large (The files that we process on
are broken up in 256 GB chunks). Our traditional storage is a 24 disk raid-6
Synology NAS with SSD cache. The NAS has a dual 10GbE card connected to 8
computers in our lab which also have dual 10GbE operating in 802.3ad LACP. The
8 processing nodes each have a 1 TB NVME SSD and four of the nodes have a 2 TB
SATA SSD.
For testing, I tried creating a distributed replicated volume, and a distributed
volume. I also experimented with sharding enabled and tested different shard
sizes. For purposes of testing, I created bricks on the 8 NVME SSDs using the
root partition which is formatted as ext4. I know this is considered bad
practice but I could not find documentation on what could go wrong (will create
dedicated XFS partitions if we decide to migrate to glusterfs). The four 2 TB
SATA SSDs are formatted with XFS. We are using Ubuntu 16.04 with GlusterFS
3.8.7.
When transferring a data chunk from the Synology NAS to a single NVME SSD, we
get a sustained sequential transfer rate of around 1.0 GB/sec. When testing
with GlusterFS, I have not been able to get a write performance greater than
180MB/sec. The throughput is about the same whether I am using a distributed
volume (1 copy) or a distributed replica 2 volume (twice the network bandwidth).
I hit the same performance ceiling copying from the NAS, or copying from the
NVME SSD to the Gluster volume. I haven?t done too much testing once the data
makes it to the Gluster volume as the current throughput to upload data to
GlusterFS would make it a no go for us.
Does anyone have any ideas on what may be my bottleneck? Or tips on identifying
the bottleneck and resolving?
Thanks,
Zack