Hi Pranith, Unfortunately, we don't have similar hardware for a small scale test. All we have is our production hardware. Pat On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:> > > On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu>> wrote: > > > Hi Pranith, > > Since we are mounting the partitions as the bricks, I tried the dd > test writing to > <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The > results without oflag=sync were 1.6 Gb/s (faster than gluster but > not as fast as I was expecting given the 1.2 Gb/s to the > no-gluster area w/ fewer disks). > > > Okay, then 1.6Gb/s is what we need to target for, considering your > volume is just distribute. Is there any way you can do tests on > similar hardware but at a small scale? Just so we can run the workload > to learn more about the bottlenecks in the system? We can probably try > to get the speed to 1.2Gb/s on your /home partition you were telling > me yesterday. Let me know if that is something you are okay to do. > > > Pat > > > > On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote: >> >> >> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <phaley at mit.edu >> <mailto:phaley at mit.edu>> wrote: >> >> >> Hi Pranith, >> >> Not entirely sure (this isn't my area of expertise). I'll >> run your answer by some other people who are more familiar >> with this. >> >> I am also uncertain about how to interpret the results when >> we also add the dd tests writing to the /home area (no >> gluster, still on the same machine) >> >> * dd test without oflag=sync (rough average of multiple tests) >> o gluster w/ fuse mount : 570 Mb/s >> o gluster w/ nfs mount: 390 Mb/s >> o nfs (no gluster): 1.2 Gb/s >> * dd test with oflag=sync (rough average of multiple tests) >> o gluster w/ fuse mount: 5 Mb/s >> o gluster w/ nfs mount: 200 Mb/s >> o nfs (no gluster): 20 Mb/s >> >> Given that the non-gluster area is a RAID-6 of 4 disks while >> each brick of the gluster area is a RAID-6 of 32 disks, I >> would naively expect the writes to the gluster area to be >> roughly 8x faster than to the non-gluster. >> >> >> I think a better test is to try and write to a file using nfs >> without any gluster to a location that is not inside the brick >> but someother location that is on same disk(s). If you are >> mounting the partition as the brick, then we can write to a file >> inside .glusterfs directory, something like >> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >> >> >> I still think we have a speed issue, I can't tell if fuse vs >> nfs is part of the problem. >> >> >> I got interested in the post because I read that fuse speed is >> lesser than nfs speed which is counter-intuitive to my >> understanding. So wanted clarifications. Now that I got my >> clarifications where fuse outperformed nfs without sync, we can >> resume testing as described above and try to find what it is. >> Based on your email-id I am guessing you are from Boston and I am >> from Bangalore so if you are okay with doing this debugging for >> multiple days because of timezones, I will be happy to help. >> Please be a bit patient with me, I am under a release crunch but >> I am very curious with the problem you posted. >> >> Was there anything useful in the profiles? >> >> >> Unfortunately profiles didn't help me much, I think we are >> collecting the profiles from an active volume, so it has a lot of >> information that is not pertaining to dd so it is difficult to >> find the contributions of dd. So I went through your post again >> and found something I didn't pay much attention to earlier i.e. >> oflag=sync, so did my own tests on my setup with FUSE so sent >> that reply. >> >> >> Pat >> >> >> >> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote: >>> Okay good. At least this validates my doubts. Handling >>> O_SYNC in gluster NFS and fuse is a bit different. >>> When application opens a file with O_SYNC on fuse mount then >>> each write syscall has to be written to disk as part of the >>> syscall where as in case of NFS, there is no concept of >>> open. NFS performs write though a handle saying it needs to >>> be a synchronous write, so write() syscall is performed >>> first then it performs fsync(). so an write on an fd with >>> O_SYNC becomes write+fsync. I am suspecting that when >>> multiple threads do this write+fsync() operation on the same >>> file, multiple writes are batched together to be written do >>> disk so the throughput on the disk is increasing is my guess. >>> >>> Does it answer your doubts? >>> >>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <phaley at mit.edu >>> <mailto:phaley at mit.edu>> wrote: >>> >>> >>> Without the oflag=sync and only a single test of each, >>> the FUSE is going faster than NFS: >>> >>> FUSE: >>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 >>> bs=1048576 of=zeros.txt conv=sync >>> 4096+0 records in >>> 4096+0 records out >>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s >>> >>> >>> NFS >>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 >>> bs=1048576 of=zeros.txt conv=sync >>> 4096+0 records in >>> 4096+0 records out >>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s >>> >>> >>> >>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote: >>>> Could you let me know the speed without oflag=sync on >>>> both the mounts? No need to collect profiles. >>>> >>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley >>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>> >>>> >>>> Here is what I see now: >>>> >>>> [root at mseas-data2 ~]# gluster volume info >>>> >>>> Volume Name: data-volume >>>> Type: Distribute >>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>> Status: Started >>>> Number of Bricks: 2 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: mseas-data2:/mnt/brick1 >>>> Brick2: mseas-data2:/mnt/brick2 >>>> Options Reconfigured: >>>> diagnostics.count-fop-hits: on >>>> diagnostics.latency-measurement: on >>>> nfs.exports-auth-enable: on >>>> diagnostics.brick-sys-log-level: WARNING >>>> performance.readdir-ahead: on >>>> nfs.disable: on >>>> nfs.export-volumes: off >>>> >>>> >>>> >>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote: >>>>> Is this the volume info you have? >>>>> >>>>> >/[root at mseas-data2 >>>>> <gluster.org/mailman/listinfo/gluster-users> >>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off / >>>>> ?I copied this from old thread from 2016. This is >>>>> distribute volume. Did you change any of the >>>>> options in between? >>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213web.mit.edu/phaley/www >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> -- >>>> Pranith >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213web.mit.edu/phaley/www >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >>> Pranith >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213web.mit.edu/phaley/www >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> -- >> Pranith > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213web.mit.edu/phaley/www > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 web.mit.edu/phaley/www 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.gluster.org/pipermail/gluster-users/attachments/20170511/b03bff26/attachment.html>
Pranith Kumar Karampuri
2017-May-11 15:32 UTC
[Gluster-users] Slow write times to gluster disk
On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at mit.edu> wrote:> > Hi Pranith, > > Unfortunately, we don't have similar hardware for a small scale test. All > we have is our production hardware. >You said something about /home partition which has lesser disks, we can create plain distribute volume inside one of those directories. After we are done, we can remove the setup. What do you say?> > Pat > > > > > On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote: > > > > On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley at mit.edu> wrote: > >> >> Hi Pranith, >> >> Since we are mounting the partitions as the bricks, I tried the dd test >> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The >> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as >> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer >> disks). >> > > Okay, then 1.6Gb/s is what we need to target for, considering your volume > is just distribute. Is there any way you can do tests on similar hardware > but at a small scale? Just so we can run the workload to learn more about > the bottlenecks in the system? We can probably try to get the speed to > 1.2Gb/s on your /home partition you were telling me yesterday. Let me know > if that is something you are okay to do. > > >> >> Pat >> >> >> >> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote: >> >> >> >> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <phaley at mit.edu> wrote: >> >>> >>> Hi Pranith, >>> >>> Not entirely sure (this isn't my area of expertise). I'll run your >>> answer by some other people who are more familiar with this. >>> >>> I am also uncertain about how to interpret the results when we also add >>> the dd tests writing to the /home area (no gluster, still on the same >>> machine) >>> >>> - dd test without oflag=sync (rough average of multiple tests) >>> - gluster w/ fuse mount : 570 Mb/s >>> - gluster w/ nfs mount: 390 Mb/s >>> - nfs (no gluster): 1.2 Gb/s >>> - dd test with oflag=sync (rough average of multiple tests) >>> - gluster w/ fuse mount: 5 Mb/s >>> - gluster w/ nfs mount: 200 Mb/s >>> - nfs (no gluster): 20 Mb/s >>> >>> Given that the non-gluster area is a RAID-6 of 4 disks while each brick >>> of the gluster area is a RAID-6 of 32 disks, I would naively expect the >>> writes to the gluster area to be roughly 8x faster than to the non-gluster. >>> >> >> I think a better test is to try and write to a file using nfs without any >> gluster to a location that is not inside the brick but someother location >> that is on same disk(s). If you are mounting the partition as the brick, >> then we can write to a file inside .glusterfs directory, something like >> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >> >> >>> I still think we have a speed issue, I can't tell if fuse vs nfs is part >>> of the problem. >>> >> >> I got interested in the post because I read that fuse speed is lesser >> than nfs speed which is counter-intuitive to my understanding. So wanted >> clarifications. Now that I got my clarifications where fuse outperformed >> nfs without sync, we can resume testing as described above and try to find >> what it is. Based on your email-id I am guessing you are from Boston and I >> am from Bangalore so if you are okay with doing this debugging for multiple >> days because of timezones, I will be happy to help. Please be a bit patient >> with me, I am under a release crunch but I am very curious with the problem >> you posted. >> >> Was there anything useful in the profiles? >>> >> >> Unfortunately profiles didn't help me much, I think we are collecting the >> profiles from an active volume, so it has a lot of information that is not >> pertaining to dd so it is difficult to find the contributions of dd. So I >> went through your post again and found something I didn't pay much >> attention to earlier i.e. oflag=sync, so did my own tests on my setup with >> FUSE so sent that reply. >> >> >>> >>> Pat >>> >>> >>> >>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote: >>> >>> Okay good. At least this validates my doubts. Handling O_SYNC in gluster >>> NFS and fuse is a bit different. >>> When application opens a file with O_SYNC on fuse mount then each write >>> syscall has to be written to disk as part of the syscall where as in case >>> of NFS, there is no concept of open. NFS performs write though a handle >>> saying it needs to be a synchronous write, so write() syscall is performed >>> first then it performs fsync(). so an write on an fd with O_SYNC becomes >>> write+fsync. I am suspecting that when multiple threads do this >>> write+fsync() operation on the same file, multiple writes are batched >>> together to be written do disk so the throughput on the disk is increasing >>> is my guess. >>> >>> Does it answer your doubts? >>> >>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <phaley at mit.edu> wrote: >>> >>>> >>>> Without the oflag=sync and only a single test of each, the FUSE is >>>> going faster than NFS: >>>> >>>> FUSE: >>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576 >>>> of=zeros.txt conv=sync >>>> 4096+0 records in >>>> 4096+0 records out >>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s >>>> >>>> >>>> NFS >>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt >>>> conv=sync >>>> 4096+0 records in >>>> 4096+0 records out >>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s >>>> >>>> >>>> >>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote: >>>> >>>> Could you let me know the speed without oflag=sync on both the mounts? >>>> No need to collect profiles. >>>> >>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <phaley at mit.edu> wrote: >>>> >>>>> >>>>> Here is what I see now: >>>>> >>>>> [root at mseas-data2 ~]# gluster volume info >>>>> >>>>> Volume Name: data-volume >>>>> Type: Distribute >>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>> Status: Started >>>>> Number of Bricks: 2 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: mseas-data2:/mnt/brick1 >>>>> Brick2: mseas-data2:/mnt/brick2 >>>>> Options Reconfigured: >>>>> diagnostics.count-fop-hits: on >>>>> diagnostics.latency-measurement: on >>>>> nfs.exports-auth-enable: on >>>>> diagnostics.brick-sys-log-level: WARNING >>>>> performance.readdir-ahead: on >>>>> nfs.disable: on >>>>> nfs.export-volumes: off >>>>> >>>>> >>>>> >>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote: >>>>> >>>>> Is this the volume info you have? >>>>> >>>>> >* [root at mseas-data2 <gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info >>>>> *>>* Volume Name: data-volume >>>>> *>* Type: Distribute >>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>> *>* Status: Started >>>>> *>* Number of Bricks: 2 >>>>> *>* Transport-type: tcp >>>>> *>* Bricks: >>>>> *>* Brick1: mseas-data2:/mnt/brick1 >>>>> *>* Brick2: mseas-data2:/mnt/brick2 >>>>> *>* Options Reconfigured: >>>>> *>* performance.readdir-ahead: on >>>>> *>* nfs.disable: on >>>>> *>* nfs.export-volumes: off >>>>> >>>>> * >>>>> >>>>> ?I copied this from old thread from 2016. This is distribute volume. >>>>> Did you change any of the options in between? >>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email: phaley at mit.edu >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213 web.mit.edu/phaley/www >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> -- >>>> Pranith >>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email: phaley at mit.edu >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213 web.mit.edu/phaley/www >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 web.mit.edu/phaley/www >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >> Pranith >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 web.mit.edu/phaley/www >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 web.mit.edu/phaley/www > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.gluster.org/pipermail/gluster-users/attachments/20170511/0c2f2459/attachment.html>