P.Gotwalt
2011-Jan-21 11:42 UTC
[Gluster-users] striping - only for big files and how to tune
Hi, Using glusterfs 3.1.1 with a 4 node striped volume: # gluster volume info Volume Name: testvol Type: Stripe Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: node20.storage.xx.nl:/data1 Brick2: node30.storage.xx.nl:/data1 Brick3: node40.storage.xx.nl:/data1 Brick4: node50.storage.xx.nl:/data1 To do some performance test I copied /usr to the gluster volume: [root at drbd10.storage ~]# time rsync -avzx --quiet /usr /gluster real 5m54.453s user 2m1.026s sys 0m9.979s [root at drbd10.storage ~]# To see whether this operation was successful I check on the storage bricks the number of files, and used blocks. I expected these to be the same on all the bricks, because I use a striped configuration. The results are: Number of files seen on the client: [root at drbd10.storage ~]# find /gluster/usr -ls| wc -l 57517 Number of files seen on the storage bricks: # mpssh -f s2345.txt 'find /data1/usr -ls | wc -l' [*] read (4) hosts from the list [*] executing "find /data1/usr -ls | wc -l" as user "root" on each [*] spawning 4 parallel ssh sessions node20 -> 57517 node30 -> 55875 node40 -> 55875 node50 -> 55875 Why has node20 all the files, but the others seem to miss quiet a lot. The same but now for the real used storage blocks: On the client: [root at drbd10.storage ~]# du -sk /gluster/usr 1229448 /gluster/usr On the storage bricks: # mpssh -f s2345.txt 'du -sk /data1/usr' [*] read (4) hosts from the list [*] executing "du -sk /data1/usr" as user "root" on each [*] spawning 4 parallel ssh sessions node20 -> 1067784 /data1/usr node30 -> 535124 /data1/usr node40 -> 437896 /data1/usr node50 -> 405920 /data1/usr In total: 2446724 My conclusions: - all data is written to the first brick. If files are smaller than the chunk size then there is nothing more to stripe. So the first storage brick fills up with all the small files. Question: Does the filesystem stop working if the volume of the first brick is full? - when using striping, the overhead seems to be almost 50%. This can get worse when the first node fills up. Question: what is the size of the stripe chunk and can this be tuned for the average size of the files? All in all, glusterfs seems to be better for "big" files. Is there an "average" file size for which glusterfs is a better choice? Greetings Peter Gotwalt