thr3ads.net - Gluster users - [Gluster-users] Gluster performance on the small files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Joe Julian

2015-Feb-16 20:32 UTC

[Gluster-users] Gluster performance on the small files

On 02/12/2015 10:58 PM, Punit Dambiwal wrote:> Hi,
>
> I have seen the gluster performance is dead slow on the small 
> files...even i am using the SSD....it's too bad performance....even i 
> am getting better performance in my SAN with normal SATA disk...
>
> I am using distributed replicated glusterfs with replica count=2...i 
> have all SSD disks on the brick...
>
> root at vm3:~# dd bs=64k count=4k if=/dev/zero of=test oflag=dsync
>
> 4096+0 records in
>
> 4096+0 records out
>
> 268435456 bytes (268 MB) copied, 57.3145 s, 4.7 MB/s
>
>
> root at vm3:~# dd bs=64k count=4k if=/dev/zero of=test conv=fdatasync
>
> 4096+0 records in
>
> 4096+0 records out
>
> 268435456 bytes (268 MB) copied, 1.80093 s, 149 MB/s
>
>
>How small is your VM image? The image is the file that GlusterFS is 
serving, not the small files within it. Perhaps the filesystem you're 
using within your VM is inefficient with regard to how it handles disk 
writes.

I believe your concept of "small file" performance is misunderstood,
as
is often the case with this phrase. The "small file" issue has to do 
with the overhead of finding and checking the validity of any file, but 
with a small file the percentage of time doing those checks is 
proportionally greater. With your VM image, that file is already open. 
There are no self-heal checks or lookups that are happening in your 
tests, so that overhead is not the problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150216/33142068/attachment.html>

Ben Turner

2015-Feb-16 22:16 UTC

head link

[Gluster-users] Gluster performance on the small files

----- Original Message -----> From: "Joe Julian" <joe at julianfamily.org>
> To: "Punit Dambiwal" <hypunit at gmail.com>, gluster-users
at gluster.org, "Humble Devassy Chirammal"
> <humble.devassy at gmail.com>
> Sent: Monday, February 16, 2015 3:32:31 PM
> Subject: Re: [Gluster-users] Gluster performance on the small files
> 
> 
> On 02/12/2015 10:58 PM, Punit Dambiwal wrote:
> 
> 
> 
> Hi,
> 
> I have seen the gluster performance is dead slow on the small files...even
i
> am using the SSD....it's too bad performance....even i am getting
better
> performance in my SAN with normal SATA disk...
> 
> I am using distributed replicated glusterfs with replica count=2...i have
all
> SSD disks on the brick...
> 
> 
> 
> root at vm3:~# dd bs=64k count=4k if=/dev/zero of=test oflag=dsync
> 
> 4096+0 records in
> 
> 4096+0 records out
> 
> 268435456 bytes (268 MB) copied, 57.3145 s, 4.7 MB/s
> 
This seems pretty slow, even if you are using gigabit.  Here is what I get:

[root at gqac031 smallfile]# dd bs=64k count=4k if=/dev/zero
of=/gluster-emptyvol/test oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 10.5965 s, 25.3 MB/s

FYI this is on my 2 node pure replica + spinning disks(RAID 6, this is not setup
for smallfile workloads.  For smallfile I normally use RAID 10) + 10G.

The single threaded DD process is defiantly a bottle neck here, the power in
distributed systems is doing things in parallel across clients / threads.  You
may want to try smallfile:

http://www.gluster.org/community/documentation/index.php/Performance_Testing

Smallfile command used - python /small-files/smallfile/smallfile_cli.py
--operation create --threads 8 --file-size 64 --files 10000 --top
/gluster-emptyvol/ --pause 1000 --host-set "client1, client2"

total threads = 16
total files = 157100
total data =     9.589 GB
 98.19% of requested files processed, minimum is  70.00
41.271602 sec elapsed time
3806.491454 files/sec
3806.491454 IOPS
237.905716 MB/sec

If you wanted to do something similar with DD you could do:

<my script>
for i in `seq 1..4`
do
    dd bs=64k count=4k if=/dev/zero of=/gluster-emptyvol/test$i oflag=dsync
&
done
for pid in $(pidof dd); do
    while kill -0 "$pid"; do
        sleep 0.1
    done
done

# time myscript.sh

Then do the math to figure out the MB / sec of the system.

-b 
> 
> 
> root at vm3:~# dd bs=64k count=4k if=/dev/zero of=test conv=fdatasync
> 
> 4096+0 records in
> 
> 4096+0 records out
> 
> 268435456 bytes (268 MB) copied, 1.80093 s, 149 MB/s
> 
> 
> 
> How small is your VM image? The image is the file that GlusterFS is
serving,
> not the small files within it. Perhaps the filesystem you're using
within
> your VM is inefficient with regard to how it handles disk writes.
> 
> I believe your concept of "small file" performance is
misunderstood, as is
> often the case with this phrase. The "small file" issue has to do
with the
> overhead of finding and checking the validity of any file, but with a small
> file the percentage of time doing those checks is proportionally greater.
> With your VM image, that file is already open. There are no self-heal
checks
> or lookups that are happening in your tests, so that overhead is not the
> problem.
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2015 - Gluster performance on the small files

[Gluster-users] Gluster performance on the small files

[Gluster-users] Gluster performance on the small files