Hi Ben, Sorry this took so long, but we had a real-time forecasting exercise last week and I could only get to this now. Backend Hardware/OS: * Much of the information on our back end system is included at the top of http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html * The specific model of the hard disks is SeaGate ENTERPRISE CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. * Note: there is one physical server that hosts both the NFS and the GlusterFS areas Latest tests I have had time to run the tests for one of the dd tests you requested to the underlying XFS FS. The median rate was 170 MB/s. The dd results and iostat record are in http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ I'll add tests for the other brick and to the NFS area later. Thanks Pat On 06/12/2017 06:06 PM, Ben Turner wrote:> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: > > throughput = slowest of disks / NIC * .6-.7 > > In your case we have: > > 1200 * .6 = 720 > > So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: > > -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? > > -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? > > -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: > > If you are focusing on a write workload run: > > # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync > > If you are focusing on a read workload run: > > # echo 3 > /proc/sys/vm/drop_caches > # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 > > ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** > > Run this in a loop similar to how you did in: > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: > > # iostat -c -m -x 1 > iostat-$(hostname).txt > > Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. > > -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. > > -b > > > > ----- Original Message ----- >> From: "Pat Haley" <phaley at mit.edu> >> To: "Ben Turner" <bturner at redhat.com> >> Sent: Monday, June 12, 2017 5:18:07 PM >> Subject: Re: [Gluster-users] Slow write times to gluster disk >> >> >> Hi Ben, >> >> Here is the output: >> >> [root at mseas-data2 ~]# gluster volume info >> >> Volume Name: data-volume >> Type: Distribute >> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >> Status: Started >> Number of Bricks: 2 >> Transport-type: tcp >> Bricks: >> Brick1: mseas-data2:/mnt/brick1 >> Brick2: mseas-data2:/mnt/brick2 >> Options Reconfigured: >> nfs.exports-auth-enable: on >> diagnostics.brick-sys-log-level: WARNING >> performance.readdir-ahead: on >> nfs.disable: on >> nfs.export-volumes: off >> >> >> On 06/12/2017 05:01 PM, Ben Turner wrote: >>> What is the output of gluster v info? That will tell us more about your >>> config. >>> >>> -b >>> >>> ----- Original Message ----- >>>> From: "Pat Haley" <phaley at mit.edu> >>>> To: "Ben Turner" <bturner at redhat.com> >>>> Sent: Monday, June 12, 2017 4:54:00 PM >>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>> >>>> >>>> Hi Ben, >>>> >>>> I guess I'm confused about what you mean by replication. If I look at >>>> the underlying bricks I only ever have a single copy of any file. It >>>> either resides on one brick or the other (directories exist on both >>>> bricks but not files). We are not using gluster for redundancy (or at >>>> least that wasn't our intent). Is that what you meant by replication >>>> or is it something else? >>>> >>>> Thanks >>>> >>>> Pat >>>> >>>> On 06/12/2017 04:28 PM, Ben Turner wrote: >>>>> ----- Original Message ----- >>>>>> From: "Pat Haley" <phaley at mit.edu> >>>>>> To: "Ben Turner" <bturner at redhat.com>, "Pranith Kumar Karampuri" >>>>>> <pkarampu at redhat.com> >>>>>> Cc: "Ravishankar N" <ravishankar at redhat.com>, gluster-users at gluster.org, >>>>>> "Steve Postma" <SPostma at ztechnet.com> >>>>>> Sent: Monday, June 12, 2017 2:35:41 PM >>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>> >>>>>> >>>>>> Hi Guys, >>>>>> >>>>>> I was wondering what our next steps should be to solve the slow write >>>>>> times. >>>>>> >>>>>> Recently I was debugging a large code and writing a lot of output at >>>>>> every time step. When I tried writing to our gluster disks, it was >>>>>> taking over a day to do a single time step whereas if I had the same >>>>>> program (same hardware, network) write to our nfs disk the time per >>>>>> time-step was about 45 minutes. What we are shooting for here would be >>>>>> to have similar times to either gluster of nfs. >>>>> I can see in your test: >>>>> >>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>> >>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / >>>>> sec} / #replicas{2} = 600). Gluster does client side replication so with >>>>> replica 2 you will only ever see 1/2 the speed of your slowest part of >>>>> the >>>>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is >>>>> normally >>>>> a best case. Now in your output I do see the instances where you went >>>>> down to 200 MB / sec. I can only explain this in three ways: >>>>> >>>>> 1. You are not using conv=fdatasync and writes are actually going to >>>>> page >>>>> cache and then being flushed to disk. During the fsync the memory is not >>>>> yet available and the disks are busy flushing dirty pages. >>>>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) >>>>> and when write times are slow the RAID group is busy serviceing other >>>>> LUNs. >>>>> 3. Gluster bug / config issue / some other unknown unknown. >>>>> >>>>> So I see 2 issues here: >>>>> >>>>> 1. NFS does in 45 minutes what gluster can do in 24 hours. >>>>> 2. Sometimes your throughput drops dramatically. >>>>> >>>>> WRT #1 - have a look at my estimates above. My formula for guestimating >>>>> gluster perf is: throughput = NIC throughput or storage(whatever is >>>>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the >>>>> record size the better for glusterfs mounts, I normally like to be at >>>>> LEAST 64k up to 1024k: >>>>> >>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 >>>>> conv=fdatasync >>>>> >>>>> WRT #2 - Again, I question your testing and your storage config. Try >>>>> using >>>>> conv=fdatasync for your DDs, use a larger record size, and make sure that >>>>> your back end storage is not causing your slowdowns. Also remember that >>>>> with replica 2 you will take ~50% hit on writes because the client uses >>>>> 50% of its bandwidth to write to one replica and 50% to the other. >>>>> >>>>> -b >>>>> >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>> Pat >>>>>> >>>>>> >>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote: >>>>>>> Are you sure using conv=sync is what you want? I normally use >>>>>>> conv=fdatasync, I'll look up the difference between the two and see if >>>>>>> it >>>>>>> affects your test. >>>>>>> >>>>>>> >>>>>>> -b >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: "Pat Haley" <phaley at mit.edu> >>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> >>>>>>>> Cc: "Ravishankar N" <ravishankar at redhat.com>, >>>>>>>> gluster-users at gluster.org, >>>>>>>> "Steve Postma" <SPostma at ztechnet.com>, "Ben >>>>>>>> Turner" <bturner at redhat.com> >>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM >>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>>> >>>>>>>> >>>>>>>> Hi Pranith, >>>>>>>> >>>>>>>> The "dd" command was: >>>>>>>> >>>>>>>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync >>>>>>>> >>>>>>>> There were 2 instances where dd reported 22 seconds. The output from >>>>>>>> the >>>>>>>> dd tests are in >>>>>>>> >>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>>>> >>>>>>>> Pat >>>>>>>> >>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: >>>>>>>>> Pat, >>>>>>>>> What is the command you used? As per the following output, >>>>>>>>> it >>>>>>>>> seems like at least one write operation took 16 seconds. Which is >>>>>>>>> really bad. >>>>>>>>> 96.39 1165.10 us 89.00 us*16487014.00 us* >>>>>>>>> 393212 >>>>>>>>> WRITE >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu >>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Pranith, >>>>>>>>> >>>>>>>>> I ran the same 'dd' test both in the gluster test volume and >>>>>>>>> in >>>>>>>>> the .glusterfs directory of each brick. The median results >>>>>>>>> (12 >>>>>>>>> dd >>>>>>>>> trials in each test) are similar to before >>>>>>>>> >>>>>>>>> * gluster test volume: 586.5 MB/s >>>>>>>>> * bricks (in .glusterfs): 1.4 GB/s >>>>>>>>> >>>>>>>>> The profile for the gluster test-volume is in >>>>>>>>> >>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt >>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Pat >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>> Let's start with the same 'dd' test we were testing with to >>>>>>>>>> see, >>>>>>>>>> what the numbers are. Please provide profile numbers for the >>>>>>>>>> same. From there on we will start tuning the volume to see >>>>>>>>>> what >>>>>>>>>> we can do. >>>>>>>>>> >>>>>>>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pranith, >>>>>>>>>> >>>>>>>>>> Thanks for the tip. We now have the gluster volume >>>>>>>>>> mounted >>>>>>>>>> under /home. What tests do you recommend we run? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> Pat >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>>>>>>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley >>>>>>>>>>> <phaley at mit.edu >>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Pranith, >>>>>>>>>>> >>>>>>>>>>> Sorry for the delay. I never saw received your >>>>>>>>>>> reply >>>>>>>>>>> (but I did receive Ben Turner's follow-up to your >>>>>>>>>>> reply). So we tried to create a gluster volume >>>>>>>>>>> under >>>>>>>>>>> /home using different variations of >>>>>>>>>>> >>>>>>>>>>> gluster volume create test-volume >>>>>>>>>>> mseas-data2:/home/gbrick_test_1 >>>>>>>>>>> mseas-data2:/home/gbrick_test_2 transport tcp >>>>>>>>>>> >>>>>>>>>>> However we keep getting errors of the form >>>>>>>>>>> >>>>>>>>>>> Wrong brick type: transport, use >>>>>>>>>>> <HOSTNAME>:<export-dir-abs-path> >>>>>>>>>>> >>>>>>>>>>> Any thoughts on what we're doing wrong? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You should give transport tcp at the beginning I think. >>>>>>>>>>> Anyways, transport tcp is the default, so no need to >>>>>>>>>>> specify >>>>>>>>>>> so remove those two words from the CLI. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also do you have a list of the test we should be >>>>>>>>>>> running >>>>>>>>>>> once we get this volume created? Given the >>>>>>>>>>> time-zone >>>>>>>>>>> difference it might help if we can run a small >>>>>>>>>>> battery >>>>>>>>>>> of tests and post the results rather than >>>>>>>>>>> test-post-new >>>>>>>>>>> test-post... . >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is the first time I am doing performance analysis >>>>>>>>>>> on >>>>>>>>>>> users as far as I remember. In our team there are >>>>>>>>>>> separate >>>>>>>>>>> engineers who do these tests. Ben who replied earlier is >>>>>>>>>>> one >>>>>>>>>>> such engineer. >>>>>>>>>>> >>>>>>>>>>> Ben, >>>>>>>>>>> Have any suggestions? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Pat >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri >>>>>>>>>>> wrote: >>>>>>>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>> >>>>>>>>>>>> The /home partition is mounted as ext4 >>>>>>>>>>>> /home ext4 defaults,usrquota,grpquota 1 2 >>>>>>>>>>>> >>>>>>>>>>>> The brick partitions are mounted ax xfs >>>>>>>>>>>> /mnt/brick1 xfs defaults 0 0 >>>>>>>>>>>> /mnt/brick2 xfs defaults 0 0 >>>>>>>>>>>> >>>>>>>>>>>> Will this cause a problem with creating a >>>>>>>>>>>> volume >>>>>>>>>>>> under /home? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I don't think the bottleneck is disk. You can do >>>>>>>>>>>> the >>>>>>>>>>>> same tests you did on your new volume to confirm? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Pat >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately, we don't have similar >>>>>>>>>>>>> hardware >>>>>>>>>>>>> for a small scale test. All we have is >>>>>>>>>>>>> our >>>>>>>>>>>>> production hardware. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You said something about /home partition which >>>>>>>>>>>>> has >>>>>>>>>>>>> lesser disks, we can create plain distribute >>>>>>>>>>>>> volume inside one of those directories. After >>>>>>>>>>>>> we >>>>>>>>>>>>> are done, we can remove the setup. What do you >>>>>>>>>>>>> say? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Pat >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar >>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat >>>>>>>>>>>>>> Haley >>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Since we are mounting the partitions >>>>>>>>>>>>>> as >>>>>>>>>>>>>> the bricks, I tried the dd test >>>>>>>>>>>>>> writing >>>>>>>>>>>>>> to >>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>> The results without oflag=sync were >>>>>>>>>>>>>> 1.6 >>>>>>>>>>>>>> Gb/s (faster than gluster but not as >>>>>>>>>>>>>> fast >>>>>>>>>>>>>> as I was expecting given the 1.2 Gb/s >>>>>>>>>>>>>> to >>>>>>>>>>>>>> the no-gluster area w/ fewer disks). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Okay, then 1.6Gb/s is what we need to >>>>>>>>>>>>>> target >>>>>>>>>>>>>> for, considering your volume is just >>>>>>>>>>>>>> distribute. Is there any way you can do >>>>>>>>>>>>>> tests >>>>>>>>>>>>>> on similar hardware but at a small scale? >>>>>>>>>>>>>> Just so we can run the workload to learn >>>>>>>>>>>>>> more >>>>>>>>>>>>>> about the bottlenecks in the system? We >>>>>>>>>>>>>> can >>>>>>>>>>>>>> probably try to get the speed to 1.2Gb/s >>>>>>>>>>>>>> on >>>>>>>>>>>>>> your /home partition you were telling me >>>>>>>>>>>>>> yesterday. Let me know if that is >>>>>>>>>>>>>> something >>>>>>>>>>>>>> you are okay to do. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pat >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar >>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM, >>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>> Haley <phaley at mit.edu >>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Not entirely sure (this isn't my >>>>>>>>>>>>>>> area of expertise). I'll run >>>>>>>>>>>>>>> your >>>>>>>>>>>>>>> answer by some other people who >>>>>>>>>>>>>>> are >>>>>>>>>>>>>>> more familiar with this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am also uncertain about how to >>>>>>>>>>>>>>> interpret the results when we >>>>>>>>>>>>>>> also >>>>>>>>>>>>>>> add the dd tests writing to the >>>>>>>>>>>>>>> /home area (no gluster, still on >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> same machine) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * dd test without oflag=sync >>>>>>>>>>>>>>> (rough average of multiple >>>>>>>>>>>>>>> tests) >>>>>>>>>>>>>>> o gluster w/ fuse mount : >>>>>>>>>>>>>>> 570 >>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>> 390 >>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>> o nfs (no gluster): 1.2 >>>>>>>>>>>>>>> Gb/s >>>>>>>>>>>>>>> * dd test with oflag=sync >>>>>>>>>>>>>>> (rough >>>>>>>>>>>>>>> average of multiple tests) >>>>>>>>>>>>>>> o gluster w/ fuse mount: >>>>>>>>>>>>>>> 5 >>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>> 200 >>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>> o nfs (no gluster): 20 >>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Given that the non-gluster area >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> RAID-6 of 4 disks while each >>>>>>>>>>>>>>> brick >>>>>>>>>>>>>>> of the gluster area is a RAID-6 >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> 32 disks, I would naively expect >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> writes to the gluster area to be >>>>>>>>>>>>>>> roughly 8x faster than to the >>>>>>>>>>>>>>> non-gluster. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think a better test is to try and >>>>>>>>>>>>>>> write to a file using nfs without >>>>>>>>>>>>>>> any >>>>>>>>>>>>>>> gluster to a location that is not >>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>> the brick but someother location >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> on same disk(s). If you are mounting >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> partition as the brick, then we can >>>>>>>>>>>>>>> write to a file inside .glusterfs >>>>>>>>>>>>>>> directory, something like >>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I still think we have a speed >>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>> I can't tell if fuse vs nfs is >>>>>>>>>>>>>>> part >>>>>>>>>>>>>>> of the problem. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I got interested in the post because >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>> read that fuse speed is lesser than >>>>>>>>>>>>>>> nfs >>>>>>>>>>>>>>> speed which is counter-intuitive to >>>>>>>>>>>>>>> my >>>>>>>>>>>>>>> understanding. So wanted >>>>>>>>>>>>>>> clarifications. >>>>>>>>>>>>>>> Now that I got my clarifications >>>>>>>>>>>>>>> where >>>>>>>>>>>>>>> fuse outperformed nfs without sync, >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>> can resume testing as described >>>>>>>>>>>>>>> above >>>>>>>>>>>>>>> and try to find what it is. Based on >>>>>>>>>>>>>>> your email-id I am guessing you are >>>>>>>>>>>>>>> from >>>>>>>>>>>>>>> Boston and I am from Bangalore so if >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>> are okay with doing this debugging >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> multiple days because of timezones, >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>> will be happy to help. Please be a >>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>> patient with me, I am under a >>>>>>>>>>>>>>> release >>>>>>>>>>>>>>> crunch but I am very curious with >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> problem you posted. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Was there anything useful in the >>>>>>>>>>>>>>> profiles? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately profiles didn't help >>>>>>>>>>>>>>> me >>>>>>>>>>>>>>> much, I think we are collecting the >>>>>>>>>>>>>>> profiles from an active volume, so >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> has a lot of information that is not >>>>>>>>>>>>>>> pertaining to dd so it is difficult >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> find the contributions of dd. So I >>>>>>>>>>>>>>> went >>>>>>>>>>>>>>> through your post again and found >>>>>>>>>>>>>>> something I didn't pay much >>>>>>>>>>>>>>> attention >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> earlier i.e. oflag=sync, so did my >>>>>>>>>>>>>>> own >>>>>>>>>>>>>>> tests on my setup with FUSE so sent >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> reply. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 05/10/2017 12:15 PM, Pranith >>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>> Okay good. At least this >>>>>>>>>>>>>>>> validates >>>>>>>>>>>>>>>> my doubts. Handling O_SYNC in >>>>>>>>>>>>>>>> gluster NFS and fuse is a bit >>>>>>>>>>>>>>>> different. >>>>>>>>>>>>>>>> When application opens a file >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> O_SYNC on fuse mount then each >>>>>>>>>>>>>>>> write syscall has to be written >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> disk as part of the syscall >>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>> as in case of NFS, there is no >>>>>>>>>>>>>>>> concept of open. NFS performs >>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>> though a handle saying it needs >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> be a synchronous write, so >>>>>>>>>>>>>>>> write() >>>>>>>>>>>>>>>> syscall is performed first then >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>> performs fsync(). so an write >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>> fd with O_SYNC becomes >>>>>>>>>>>>>>>> write+fsync. >>>>>>>>>>>>>>>> I am suspecting that when >>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>> threads do this write+fsync() >>>>>>>>>>>>>>>> operation on the same file, >>>>>>>>>>>>>>>> multiple writes are batched >>>>>>>>>>>>>>>> together to be written do disk >>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>> the throughput on the disk is >>>>>>>>>>>>>>>> increasing is my guess. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does it answer your doubts? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 9:35 >>>>>>>>>>>>>>>> PM, >>>>>>>>>>>>>>>> Pat Haley <phaley at mit.edu >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Without the oflag=sync and >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> a single test of each, the >>>>>>>>>>>>>>>> FUSE >>>>>>>>>>>>>>>> is going faster than NFS: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> FUSE: >>>>>>>>>>>>>>>> mseas-data2(dri_nascar)% dd >>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>> copied, 7.46961 s, 575 MB/s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> NFS >>>>>>>>>>>>>>>> mseas-data2(HYCOM)% dd >>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>> copied, 11.4264 s, 376 MB/s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 05/10/2017 11:53 AM, >>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>> Could you let me know the >>>>>>>>>>>>>>>>> speed without oflag=sync >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> both the mounts? No need >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> collect profiles. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at >>>>>>>>>>>>>>>>> 9:17 >>>>>>>>>>>>>>>>> PM, Pat Haley >>>>>>>>>>>>>>>>> <phaley at mit.edu >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here is what I see >>>>>>>>>>>>>>>>> now: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [root at mseas-data2 ~]# >>>>>>>>>>>>>>>>> gluster volume info >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Volume Name: >>>>>>>>>>>>>>>>> data-volume >>>>>>>>>>>>>>>>> Type: Distribute >>>>>>>>>>>>>>>>> Volume ID: >>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>> Status: Started >>>>>>>>>>>>>>>>> Number of Bricks: 2 >>>>>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>>>>> Brick1: >>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>> Brick2: >>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>>>>> diagnostics.count-fop-hits: >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> diagnostics.latency-measurement: >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> nfs.exports-auth-enable: >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> diagnostics.brick-sys-log-level: >>>>>>>>>>>>>>>>> WARNING >>>>>>>>>>>>>>>>> performance.readdir-ahead: >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>>>>> nfs.export-volumes: >>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 05/10/2017 11:44 >>>>>>>>>>>>>>>>> AM, >>>>>>>>>>>>>>>>> Pranith Kumar >>>>>>>>>>>>>>>>> Karampuri >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Is this the volume >>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>> you have? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >/[root at >>>>>>>>>>>>>>>>>> >mseas-data2 >>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>>>>>>> ~]# gluster volume >>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>> />//>/Volume Name: >>>>>>>>>>>>>>>>>> data-volume />/Type: >>>>>>>>>>>>>>>>>> Distribute />/Volume >>>>>>>>>>>>>>>>>> ID: >>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>> />/Status: Started >>>>>>>>>>>>>>>>>> />/Number >>>>>>>>>>>>>>>>>> of Bricks: 2 >>>>>>>>>>>>>>>>>> />/Transport-type: >>>>>>>>>>>>>>>>>> tcp >>>>>>>>>>>>>>>>>> />/Bricks: />/Brick1: >>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>> />/Brick2: >>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>> />/Options >>>>>>>>>>>>>>>>>> Reconfigured: >>>>>>>>>>>>>>>>>> />/performance.readdir-ahead: >>>>>>>>>>>>>>>>>> on />/nfs.disable: on >>>>>>>>>>>>>>>>>> />/nfs.export-volumes: >>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>> ?I copied this from >>>>>>>>>>>>>>>>>> old >>>>>>>>>>>>>>>>>> thread from 2016. >>>>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> distribute volume. >>>>>>>>>>>>>>>>>> Did >>>>>>>>>>>>>>>>>> you change any of the >>>>>>>>>>>>>>>>>> options in between? >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>> 77 Massachusetts >>>>>>>>>>>>>>>>> Avenue >>>>>>>>>>>>>>>>> Cambridge, MA >>>>>>>>>>>>>>>>> 02139-4301 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Pranith >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>> (617) >>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>> (617) >>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Pranith >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>> Pat Haley >>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>> (617) >>>>>>>>>>>> 253-6824 >>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>> (617) >>>>>>>>>>>> 253-8125 >>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Pranith >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>> Pat Haley >>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>> Center for Ocean Engineering Phone: (617) >>>>>>>>>>> 253-6824 >>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) >>>>>>>>>>> 253-8125 >>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Pranith >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Pranith >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>> 77 Massachusetts Avenue >>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>> -- >>>>>>>> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>> 77 Massachusetts Avenue >>>>>>>> Cambridge, MA 02139-4301 >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>> Pat Haley Email: phaley at mit.edu >>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>> 77 Massachusetts Avenue >>>>>> Cambridge, MA 02139-4301 >>>>>> >>>>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email: phaley at mit.edu >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >>-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/6ddc35ef/attachment.html>
Hi, Today we experimented with some of the FUSE options that we found in the list. Changing these options had no effect: gluster volume set test-volume performance.cache-max-file-size 2MB gluster volume set test-volume performance.cache-refresh-timeout 4 gluster volume set test-volume performance.cache-size 256MB gluster volume set test-volume performance.write-behind-window-size 4MB gluster volume set test-volume performance.write-behind-window-size 8MB Changing the following option from its default value made the speed slower gluster volume set test-volume performance.write-behind off (on by default) Changing the following options initially appeared to give a 10% increase in speed, but this vanished in subsequent tests (we think the apparent increase may have been to a lighter workload on the computer from other users) gluster volume set test-volume performance.stat-prefetch on gluster volume set test-volume client.event-threads 4 gluster volume set test-volume server.event-threads 4 Can anything be gleaned from these observations? Are there other things we can try? Thanks Pat On 06/20/2017 12:06 PM, Pat Haley wrote:> > Hi Ben, > > Sorry this took so long, but we had a real-time forecasting exercise > last week and I could only get to this now. > > Backend Hardware/OS: > > * Much of the information on our back end system is included at the > top of > http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html > * The specific model of the hard disks is SeaGate ENTERPRISE > CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. > * Note: there is one physical server that hosts both the NFS and the > GlusterFS areas > > Latest tests > > I have had time to run the tests for one of the dd tests you requested > to the underlying XFS FS. The median rate was 170 MB/s. The dd > results and iostat record are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ > > I'll add tests for the other brick and to the NFS area later. > > Thanks > > Pat > > > On 06/12/2017 06:06 PM, Ben Turner wrote: >> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: >> >> throughput = slowest of disks / NIC * .6-.7 >> >> In your case we have: >> >> 1200 * .6 = 720 >> >> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: >> >> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? >> >> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? >> >> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: >> >> If you are focusing on a write workload run: >> >> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync >> >> If you are focusing on a read workload run: >> >> # echo 3 > /proc/sys/vm/drop_caches >> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 >> >> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** >> >> Run this in a loop similar to how you did in: >> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >> >> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: >> >> # iostat -c -m -x 1 > iostat-$(hostname).txt >> >> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. >> >> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. >> >> -b >> >> >> >> ----- Original Message ----- >>> From: "Pat Haley"<phaley at mit.edu> >>> To: "Ben Turner"<bturner at redhat.com> >>> Sent: Monday, June 12, 2017 5:18:07 PM >>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>> >>> >>> Hi Ben, >>> >>> Here is the output: >>> >>> [root at mseas-data2 ~]# gluster volume info >>> >>> Volume Name: data-volume >>> Type: Distribute >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>> Status: Started >>> Number of Bricks: 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: mseas-data2:/mnt/brick1 >>> Brick2: mseas-data2:/mnt/brick2 >>> Options Reconfigured: >>> nfs.exports-auth-enable: on >>> diagnostics.brick-sys-log-level: WARNING >>> performance.readdir-ahead: on >>> nfs.disable: on >>> nfs.export-volumes: off >>> >>> >>> On 06/12/2017 05:01 PM, Ben Turner wrote: >>>> What is the output of gluster v info? That will tell us more about your >>>> config. >>>> >>>> -b >>>> >>>> ----- Original Message ----- >>>>> From: "Pat Haley"<phaley at mit.edu> >>>>> To: "Ben Turner"<bturner at redhat.com> >>>>> Sent: Monday, June 12, 2017 4:54:00 PM >>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>> >>>>> >>>>> Hi Ben, >>>>> >>>>> I guess I'm confused about what you mean by replication. If I look at >>>>> the underlying bricks I only ever have a single copy of any file. It >>>>> either resides on one brick or the other (directories exist on both >>>>> bricks but not files). We are not using gluster for redundancy (or at >>>>> least that wasn't our intent). Is that what you meant by replication >>>>> or is it something else? >>>>> >>>>> Thanks >>>>> >>>>> Pat >>>>> >>>>> On 06/12/2017 04:28 PM, Ben Turner wrote: >>>>>> ----- Original Message ----- >>>>>>> From: "Pat Haley"<phaley at mit.edu> >>>>>>> To: "Ben Turner"<bturner at redhat.com>, "Pranith Kumar Karampuri" >>>>>>> <pkarampu at redhat.com> >>>>>>> Cc: "Ravishankar N"<ravishankar at redhat.com>,gluster-users at gluster.org, >>>>>>> "Steve Postma"<SPostma at ztechnet.com> >>>>>>> Sent: Monday, June 12, 2017 2:35:41 PM >>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>> >>>>>>> >>>>>>> Hi Guys, >>>>>>> >>>>>>> I was wondering what our next steps should be to solve the slow write >>>>>>> times. >>>>>>> >>>>>>> Recently I was debugging a large code and writing a lot of output at >>>>>>> every time step. When I tried writing to our gluster disks, it was >>>>>>> taking over a day to do a single time step whereas if I had the same >>>>>>> program (same hardware, network) write to our nfs disk the time per >>>>>>> time-step was about 45 minutes. What we are shooting for here would be >>>>>>> to have similar times to either gluster of nfs. >>>>>> I can see in your test: >>>>>> >>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>> >>>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / >>>>>> sec} / #replicas{2} = 600). Gluster does client side replication so with >>>>>> replica 2 you will only ever see 1/2 the speed of your slowest part of >>>>>> the >>>>>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is >>>>>> normally >>>>>> a best case. Now in your output I do see the instances where you went >>>>>> down to 200 MB / sec. I can only explain this in three ways: >>>>>> >>>>>> 1. You are not using conv=fdatasync and writes are actually going to >>>>>> page >>>>>> cache and then being flushed to disk. During the fsync the memory is not >>>>>> yet available and the disks are busy flushing dirty pages. >>>>>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) >>>>>> and when write times are slow the RAID group is busy serviceing other >>>>>> LUNs. >>>>>> 3. Gluster bug / config issue / some other unknown unknown. >>>>>> >>>>>> So I see 2 issues here: >>>>>> >>>>>> 1. NFS does in 45 minutes what gluster can do in 24 hours. >>>>>> 2. Sometimes your throughput drops dramatically. >>>>>> >>>>>> WRT #1 - have a look at my estimates above. My formula for guestimating >>>>>> gluster perf is: throughput = NIC throughput or storage(whatever is >>>>>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the >>>>>> record size the better for glusterfs mounts, I normally like to be at >>>>>> LEAST 64k up to 1024k: >>>>>> >>>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 >>>>>> conv=fdatasync >>>>>> >>>>>> WRT #2 - Again, I question your testing and your storage config. Try >>>>>> using >>>>>> conv=fdatasync for your DDs, use a larger record size, and make sure that >>>>>> your back end storage is not causing your slowdowns. Also remember that >>>>>> with replica 2 you will take ~50% hit on writes because the client uses >>>>>> 50% of its bandwidth to write to one replica and 50% to the other. >>>>>> >>>>>> -b >>>>>> >>>>>> >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Pat >>>>>>> >>>>>>> >>>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote: >>>>>>>> Are you sure using conv=sync is what you want? I normally use >>>>>>>> conv=fdatasync, I'll look up the difference between the two and see if >>>>>>>> it >>>>>>>> affects your test. >>>>>>>> >>>>>>>> >>>>>>>> -b >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: "Pat Haley"<phaley at mit.edu> >>>>>>>>> To: "Pranith Kumar Karampuri"<pkarampu at redhat.com> >>>>>>>>> Cc: "Ravishankar N"<ravishankar at redhat.com>, >>>>>>>>> gluster-users at gluster.org, >>>>>>>>> "Steve Postma"<SPostma at ztechnet.com>, "Ben >>>>>>>>> Turner"<bturner at redhat.com> >>>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM >>>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Pranith, >>>>>>>>> >>>>>>>>> The "dd" command was: >>>>>>>>> >>>>>>>>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync >>>>>>>>> >>>>>>>>> There were 2 instances where dd reported 22 seconds. The output from >>>>>>>>> the >>>>>>>>> dd tests are in >>>>>>>>> >>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>>>>> >>>>>>>>> Pat >>>>>>>>> >>>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>> Pat, >>>>>>>>>> What is the command you used? As per the following output, >>>>>>>>>> it >>>>>>>>>> seems like at least one write operation took 16 seconds. Which is >>>>>>>>>> really bad. >>>>>>>>>> 96.39 1165.10 us 89.00 us*16487014.00 us* >>>>>>>>>> 393212 >>>>>>>>>> WRITE >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pranith, >>>>>>>>>> >>>>>>>>>> I ran the same 'dd' test both in the gluster test volume and >>>>>>>>>> in >>>>>>>>>> the .glusterfs directory of each brick. The median results >>>>>>>>>> (12 >>>>>>>>>> dd >>>>>>>>>> trials in each test) are similar to before >>>>>>>>>> >>>>>>>>>> * gluster test volume: 586.5 MB/s >>>>>>>>>> * bricks (in .glusterfs): 1.4 GB/s >>>>>>>>>> >>>>>>>>>> The profile for the gluster test-volume is in >>>>>>>>>> >>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt >>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> Pat >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>>> Let's start with the same 'dd' test we were testing with to >>>>>>>>>>> see, >>>>>>>>>>> what the numbers are. Please provide profile numbers for the >>>>>>>>>>> same. From there on we will start tuning the volume to see >>>>>>>>>>> what >>>>>>>>>>> we can do. >>>>>>>>>>> >>>>>>>>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu >>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Pranith, >>>>>>>>>>> >>>>>>>>>>> Thanks for the tip. We now have the gluster volume >>>>>>>>>>> mounted >>>>>>>>>>> under /home. What tests do you recommend we run? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Pat >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>>>>>>>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley >>>>>>>>>>>> <phaley at mit.edu >>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>> >>>>>>>>>>>> Sorry for the delay. I never saw received your >>>>>>>>>>>> reply >>>>>>>>>>>> (but I did receive Ben Turner's follow-up to your >>>>>>>>>>>> reply). So we tried to create a gluster volume >>>>>>>>>>>> under >>>>>>>>>>>> /home using different variations of >>>>>>>>>>>> >>>>>>>>>>>> gluster volume create test-volume >>>>>>>>>>>> mseas-data2:/home/gbrick_test_1 >>>>>>>>>>>> mseas-data2:/home/gbrick_test_2 transport tcp >>>>>>>>>>>> >>>>>>>>>>>> However we keep getting errors of the form >>>>>>>>>>>> >>>>>>>>>>>> Wrong brick type: transport, use >>>>>>>>>>>> <HOSTNAME>:<export-dir-abs-path> >>>>>>>>>>>> >>>>>>>>>>>> Any thoughts on what we're doing wrong? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You should give transport tcp at the beginning I think. >>>>>>>>>>>> Anyways, transport tcp is the default, so no need to >>>>>>>>>>>> specify >>>>>>>>>>>> so remove those two words from the CLI. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Also do you have a list of the test we should be >>>>>>>>>>>> running >>>>>>>>>>>> once we get this volume created? Given the >>>>>>>>>>>> time-zone >>>>>>>>>>>> difference it might help if we can run a small >>>>>>>>>>>> battery >>>>>>>>>>>> of tests and post the results rather than >>>>>>>>>>>> test-post-new >>>>>>>>>>>> test-post... . >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This is the first time I am doing performance analysis >>>>>>>>>>>> on >>>>>>>>>>>> users as far as I remember. In our team there are >>>>>>>>>>>> separate >>>>>>>>>>>> engineers who do these tests. Ben who replied earlier is >>>>>>>>>>>> one >>>>>>>>>>>> such engineer. >>>>>>>>>>>> >>>>>>>>>>>> Ben, >>>>>>>>>>>> Have any suggestions? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> Pat >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>> >>>>>>>>>>>>> The /home partition is mounted as ext4 >>>>>>>>>>>>> /home ext4 defaults,usrquota,grpquota 1 2 >>>>>>>>>>>>> >>>>>>>>>>>>> The brick partitions are mounted ax xfs >>>>>>>>>>>>> /mnt/brick1 xfs defaults 0 0 >>>>>>>>>>>>> /mnt/brick2 xfs defaults 0 0 >>>>>>>>>>>>> >>>>>>>>>>>>> Will this cause a problem with creating a >>>>>>>>>>>>> volume >>>>>>>>>>>>> under /home? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think the bottleneck is disk. You can do >>>>>>>>>>>>> the >>>>>>>>>>>>> same tests you did on your new volume to confirm? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Pat >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately, we don't have similar >>>>>>>>>>>>>> hardware >>>>>>>>>>>>>> for a small scale test. All we have is >>>>>>>>>>>>>> our >>>>>>>>>>>>>> production hardware. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You said something about /home partition which >>>>>>>>>>>>>> has >>>>>>>>>>>>>> lesser disks, we can create plain distribute >>>>>>>>>>>>>> volume inside one of those directories. After >>>>>>>>>>>>>> we >>>>>>>>>>>>>> are done, we can remove the setup. What do you >>>>>>>>>>>>>> say? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pat >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar >>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat >>>>>>>>>>>>>>> Haley >>>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Since we are mounting the partitions >>>>>>>>>>>>>>> as >>>>>>>>>>>>>>> the bricks, I tried the dd test >>>>>>>>>>>>>>> writing >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>> The results without oflag=sync were >>>>>>>>>>>>>>> 1.6 >>>>>>>>>>>>>>> Gb/s (faster than gluster but not as >>>>>>>>>>>>>>> fast >>>>>>>>>>>>>>> as I was expecting given the 1.2 Gb/s >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> the no-gluster area w/ fewer disks). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Okay, then 1.6Gb/s is what we need to >>>>>>>>>>>>>>> target >>>>>>>>>>>>>>> for, considering your volume is just >>>>>>>>>>>>>>> distribute. Is there any way you can do >>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>> on similar hardware but at a small scale? >>>>>>>>>>>>>>> Just so we can run the workload to learn >>>>>>>>>>>>>>> more >>>>>>>>>>>>>>> about the bottlenecks in the system? We >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> probably try to get the speed to 1.2Gb/s >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> your /home partition you were telling me >>>>>>>>>>>>>>> yesterday. Let me know if that is >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> you are okay to do. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar >>>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM, >>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>> Haley <phaley at mit.edu >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Not entirely sure (this isn't my >>>>>>>>>>>>>>>> area of expertise). I'll run >>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>> answer by some other people who >>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>> more familiar with this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am also uncertain about how to >>>>>>>>>>>>>>>> interpret the results when we >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>> add the dd tests writing to the >>>>>>>>>>>>>>>> /home area (no gluster, still on >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> same machine) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * dd test without oflag=sync >>>>>>>>>>>>>>>> (rough average of multiple >>>>>>>>>>>>>>>> tests) >>>>>>>>>>>>>>>> o gluster w/ fuse mount : >>>>>>>>>>>>>>>> 570 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>>> 390 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o nfs (no gluster): 1.2 >>>>>>>>>>>>>>>> Gb/s >>>>>>>>>>>>>>>> * dd test with oflag=sync >>>>>>>>>>>>>>>> (rough >>>>>>>>>>>>>>>> average of multiple tests) >>>>>>>>>>>>>>>> o gluster w/ fuse mount: >>>>>>>>>>>>>>>> 5 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>>> 200 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o nfs (no gluster): 20 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Given that the non-gluster area >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> RAID-6 of 4 disks while each >>>>>>>>>>>>>>>> brick >>>>>>>>>>>>>>>> of the gluster area is a RAID-6 >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> 32 disks, I would naively expect >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> writes to the gluster area to be >>>>>>>>>>>>>>>> roughly 8x faster than to the >>>>>>>>>>>>>>>> non-gluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think a better test is to try and >>>>>>>>>>>>>>>> write to a file using nfs without >>>>>>>>>>>>>>>> any >>>>>>>>>>>>>>>> gluster to a location that is not >>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>> the brick but someother location >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> on same disk(s). If you are mounting >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> partition as the brick, then we can >>>>>>>>>>>>>>>> write to a file inside .glusterfs >>>>>>>>>>>>>>>> directory, something like >>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I still think we have a speed >>>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>>> I can't tell if fuse vs nfs is >>>>>>>>>>>>>>>> part >>>>>>>>>>>>>>>> of the problem. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I got interested in the post because >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> read that fuse speed is lesser than >>>>>>>>>>>>>>>> nfs >>>>>>>>>>>>>>>> speed which is counter-intuitive to >>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>> understanding. So wanted >>>>>>>>>>>>>>>> clarifications. >>>>>>>>>>>>>>>> Now that I got my clarifications >>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>> fuse outperformed nfs without sync, >>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> can resume testing as described >>>>>>>>>>>>>>>> above >>>>>>>>>>>>>>>> and try to find what it is. Based on >>>>>>>>>>>>>>>> your email-id I am guessing you are >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> Boston and I am from Bangalore so if >>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>> are okay with doing this debugging >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> multiple days because of timezones, >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> will be happy to help. Please be a >>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>> patient with me, I am under a >>>>>>>>>>>>>>>> release >>>>>>>>>>>>>>>> crunch but I am very curious with >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> problem you posted. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Was there anything useful in the >>>>>>>>>>>>>>>> profiles? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately profiles didn't help >>>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>> much, I think we are collecting the >>>>>>>>>>>>>>>> profiles from an active volume, so >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>> has a lot of information that is not >>>>>>>>>>>>>>>> pertaining to dd so it is difficult >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> find the contributions of dd. So I >>>>>>>>>>>>>>>> went >>>>>>>>>>>>>>>> through your post again and found >>>>>>>>>>>>>>>> something I didn't pay much >>>>>>>>>>>>>>>> attention >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> earlier i.e. oflag=sync, so did my >>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>> tests on my setup with FUSE so sent >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> reply. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 05/10/2017 12:15 PM, Pranith >>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>> Okay good. At least this >>>>>>>>>>>>>>>>> validates >>>>>>>>>>>>>>>>> my doubts. Handling O_SYNC in >>>>>>>>>>>>>>>>> gluster NFS and fuse is a bit >>>>>>>>>>>>>>>>> different. >>>>>>>>>>>>>>>>> When application opens a file >>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>> O_SYNC on fuse mount then each >>>>>>>>>>>>>>>>> write syscall has to be written >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> disk as part of the syscall >>>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>>> as in case of NFS, there is no >>>>>>>>>>>>>>>>> concept of open. NFS performs >>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>> though a handle saying it needs >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> be a synchronous write, so >>>>>>>>>>>>>>>>> write() >>>>>>>>>>>>>>>>> syscall is performed first then >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> performs fsync(). so an write >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> fd with O_SYNC becomes >>>>>>>>>>>>>>>>> write+fsync. >>>>>>>>>>>>>>>>> I am suspecting that when >>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>> threads do this write+fsync() >>>>>>>>>>>>>>>>> operation on the same file, >>>>>>>>>>>>>>>>> multiple writes are batched >>>>>>>>>>>>>>>>> together to be written do disk >>>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>> the throughput on the disk is >>>>>>>>>>>>>>>>> increasing is my guess. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does it answer your doubts? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 9:35 >>>>>>>>>>>>>>>>> PM, >>>>>>>>>>>>>>>>> Pat Haley <phaley at mit.edu >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Without the oflag=sync and >>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>> a single test of each, the >>>>>>>>>>>>>>>>> FUSE >>>>>>>>>>>>>>>>> is going faster than NFS: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> FUSE: >>>>>>>>>>>>>>>>> mseas-data2(dri_nascar)% dd >>>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>>> copied, 7.46961 s, 575 MB/s >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> NFS >>>>>>>>>>>>>>>>> mseas-data2(HYCOM)% dd >>>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>>> copied, 11.4264 s, 376 MB/s >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 05/10/2017 11:53 AM, >>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>>> Could you let me know the >>>>>>>>>>>>>>>>>> speed without oflag=sync >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> both the mounts? No need >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> collect profiles. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at >>>>>>>>>>>>>>>>>> 9:17 >>>>>>>>>>>>>>>>>> PM, Pat Haley >>>>>>>>>>>>>>>>>> <phaley at mit.edu >>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is what I see >>>>>>>>>>>>>>>>>> now: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root at mseas-data2 ~]# >>>>>>>>>>>>>>>>>> gluster volume info >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Volume Name: >>>>>>>>>>>>>>>>>> data-volume >>>>>>>>>>>>>>>>>> Type: Distribute >>>>>>>>>>>>>>>>>> Volume ID: >>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>> Status: Started >>>>>>>>>>>>>>>>>> Number of Bricks: 2 >>>>>>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>>>>>> Brick1: >>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>> Brick2: >>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>>>>>> diagnostics.count-fop-hits: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> diagnostics.latency-measurement: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> nfs.exports-auth-enable: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> diagnostics.brick-sys-log-level: >>>>>>>>>>>>>>>>>> WARNING >>>>>>>>>>>>>>>>>> performance.readdir-ahead: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>>>>>> nfs.export-volumes: >>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 05/10/2017 11:44 >>>>>>>>>>>>>>>>>> AM, >>>>>>>>>>>>>>>>>> Pranith Kumar >>>>>>>>>>>>>>>>>> Karampuri >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> Is this the volume >>>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>>> you have? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >/[root at >>>>>>>>>>>>>>>>>>> >mseas-data2 >>>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>>>>>>>> ~]# gluster volume >>>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>>> />//>/Volume Name: >>>>>>>>>>>>>>>>>>> data-volume />/Type: >>>>>>>>>>>>>>>>>>> Distribute />/Volume >>>>>>>>>>>>>>>>>>> ID: >>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>>> />/Status: Started >>>>>>>>>>>>>>>>>>> />/Number >>>>>>>>>>>>>>>>>>> of Bricks: 2 >>>>>>>>>>>>>>>>>>> />/Transport-type: >>>>>>>>>>>>>>>>>>> tcp >>>>>>>>>>>>>>>>>>> />/Bricks: />/Brick1: >>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>>> />/Brick2: >>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>>> />/Options >>>>>>>>>>>>>>>>>>> Reconfigured: >>>>>>>>>>>>>>>>>>> />/performance.readdir-ahead: >>>>>>>>>>>>>>>>>>> on />/nfs.disable: on >>>>>>>>>>>>>>>>>>> />/nfs.export-volumes: >>>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>> ?I copied this from >>>>>>>>>>>>>>>>>>> old >>>>>>>>>>>>>>>>>>> thread from 2016. >>>>>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> distribute volume. >>>>>>>>>>>>>>>>>>> Did >>>>>>>>>>>>>>>>>>> you change any of the >>>>>>>>>>>>>>>>>>> options in between? >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>>> 77 Massachusetts >>>>>>>>>>>>>>>>>> Avenue >>>>>>>>>>>>>>>>>> Cambridge, MA >>>>>>>>>>>>>>>>>> 02139-4301 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>>> (617) >>>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>>> (617) >>>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Pranith >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>> (617) >>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>> (617) >>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Pranith >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>> Pat Haley >>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>> Center for Ocean Engineering Phone: (617) >>>>>>>>>>>> 253-6824 >>>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) >>>>>>>>>>>> 253-8125 >>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Pranith >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>> Pat HaleyEmail:phaley at mit.edu >>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Pranith >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>> Pat HaleyEmail:phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Pranith >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>> 77 Massachusetts Avenue >>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>> 77 Massachusetts Avenue >>>>>>> Cambridge, MA 02139-4301 >>>>>>> >>>>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email:phaley at mit.edu >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email:phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170622/7bc75870/attachment.html>
Pranith Kumar Karampuri
2017-Jun-23 03:40 UTC
[Gluster-users] Slow write times to gluster disk
On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu> wrote:> > Hi, > > Today we experimented with some of the FUSE options that we found in the > list. > > Changing these options had no effect: > > gluster volume set test-volume performance.cache-max-file-size 2MB > gluster volume set test-volume performance.cache-refresh-timeout 4 > gluster volume set test-volume performance.cache-size 256MB > gluster volume set test-volume performance.write-behind-window-size 4MB > gluster volume set test-volume performance.write-behind-window-size 8MB > >This is a good coincidence, I am meeting with write-behind maintainer(+Raghavendra G) today for the same doubt. I think we will have something by EOD IST. I will update you.> Changing the following option from its default value made the speed slower > > gluster volume set test-volume performance.write-behind off (on by default) > > Changing the following options initially appeared to give a 10% increase > in speed, but this vanished in subsequent tests (we think the apparent > increase may have been to a lighter workload on the computer from other > users) > > gluster volume set test-volume performance.stat-prefetch on > gluster volume set test-volume client.event-threads 4 > gluster volume set test-volume server.event-threads 4 > > Can anything be gleaned from these observations? Are there other things > we can try? > > Thanks > > Pat > > > > On 06/20/2017 12:06 PM, Pat Haley wrote: > > > Hi Ben, > > Sorry this took so long, but we had a real-time forecasting exercise last > week and I could only get to this now. > > Backend Hardware/OS: > > - Much of the information on our back end system is included at the > top of http://lists.gluster.org/pipermail/gluster-users/2017- > April/030529.html > - The specific model of the hard disks is SeaGate ENTERPRISE CAPACITY > V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. > - Note: there is one physical server that hosts both the NFS and the > GlusterFS areas > > Latest tests > > I have had time to run the tests for one of the dd tests you requested to > the underlying XFS FS. The median rate was 170 MB/s. The dd results and > iostat record are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ > > I'll add tests for the other brick and to the NFS area later. > > Thanks > > Pat > > > On 06/12/2017 06:06 PM, Ben Turner wrote: > > Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: > > throughput = slowest of disks / NIC * .6-.7 > > In your case we have: > > 1200 * .6 = 720 > > So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: > > -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? > > -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? > > -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: > > If you are focusing on a write workload run: > > # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync > > If you are focusing on a read workload run: > > # echo 3 > /proc/sys/vm/drop_caches > # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 > > ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** > > Run this in a loop similar to how you did in: > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: > > # iostat -c -m -x 1 > iostat-$(hostname).txt > > Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. > > -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. > > -b > > > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> > Sent: Monday, June 12, 2017 5:18:07 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Ben, > > Here is the output: > > [root at mseas-data2 ~]# gluster volume info > > Volume Name: data-volume > Type: Distribute > Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: mseas-data2:/mnt/brick1 > Brick2: mseas-data2:/mnt/brick2 > Options Reconfigured: > nfs.exports-auth-enable: on > diagnostics.brick-sys-log-level: WARNING > performance.readdir-ahead: on > nfs.disable: on > nfs.export-volumes: off > > > On 06/12/2017 05:01 PM, Ben Turner wrote: > > What is the output of gluster v info? That will tell us more about your > config. > > -b > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> > Sent: Monday, June 12, 2017 4:54:00 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Ben, > > I guess I'm confused about what you mean by replication. If I look at > the underlying bricks I only ever have a single copy of any file. It > either resides on one brick or the other (directories exist on both > bricks but not files). We are not using gluster for redundancy (or at > least that wasn't our intent). Is that what you meant by replication > or is it something else? > > Thanks > > Pat > > On 06/12/2017 04:28 PM, Ben Turner wrote: > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>, "Pranith Kumar Karampuri"<pkarampu at redhat.com> <pkarampu at redhat.com> > Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>, gluster-users at gluster.org, > "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com> > Sent: Monday, June 12, 2017 2:35:41 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Guys, > > I was wondering what our next steps should be to solve the slow write > times. > > Recently I was debugging a large code and writing a lot of output at > every time step. When I tried writing to our gluster disks, it was > taking over a day to do a single time step whereas if I had the same > program (same hardware, network) write to our nfs disk the time per > time-step was about 45 minutes. What we are shooting for here would be > to have similar times to either gluster of nfs. > > I can see in your test: > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / > sec} / #replicas{2} = 600). Gluster does client side replication so with > replica 2 you will only ever see 1/2 the speed of your slowest part of > the > stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is > normally > a best case. Now in your output I do see the instances where you went > down to 200 MB / sec. I can only explain this in three ways: > > 1. You are not using conv=fdatasync and writes are actually going to > page > cache and then being flushed to disk. During the fsync the memory is not > yet available and the disks are busy flushing dirty pages. > 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) > and when write times are slow the RAID group is busy serviceing other > LUNs. > 3. Gluster bug / config issue / some other unknown unknown. > > So I see 2 issues here: > > 1. NFS does in 45 minutes what gluster can do in 24 hours. > 2. Sometimes your throughput drops dramatically. > > WRT #1 - have a look at my estimates above. My formula for guestimating > gluster perf is: throughput = NIC throughput or storage(whatever is > slower) / # replicas * overhead(figure .7 or .8). Also the larger the > record size the better for glusterfs mounts, I normally like to be at > LEAST 64k up to 1024k: > > # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 > conv=fdatasync > > WRT #2 - Again, I question your testing and your storage config. Try > using > conv=fdatasync for your DDs, use a larger record size, and make sure that > your back end storage is not causing your slowdowns. Also remember that > with replica 2 you will take ~50% hit on writes because the client uses > 50% of its bandwidth to write to one replica and 50% to the other. > > -b > > > > > Thanks > > Pat > > > On 06/02/2017 01:07 AM, Ben Turner wrote: > > Are you sure using conv=sync is what you want? I normally use > conv=fdatasync, I'll look up the difference between the two and see if > it > affects your test. > > > -b > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> <pkarampu at redhat.com> > Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>,gluster-users at gluster.org, > "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>, "Ben > Turner" <bturner at redhat.com> <bturner at redhat.com> > Sent: Tuesday, May 30, 2017 9:40:34 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Pranith, > > The "dd" command was: > > dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > > There were 2 instances where dd reported 22 seconds. The output from > the > dd tests are in > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > Pat > > On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: > > Pat, > What is the command you used? As per the following output, > it > seems like at least one write operation took 16 seconds. Which is > really bad. > 96.39 1165.10 us 89.00 us*16487014.00 us* > 393212 > WRITE > > > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu<mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > I ran the same 'dd' test both in the gluster test volume and > in > the .glusterfs directory of each brick. The median results > (12 > dd > trials in each test) are similar to before > > * gluster test volume: 586.5 MB/s > * bricks (in .glusterfs): 1.4 GB/s > > The profile for the gluster test-volume is in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> > > Thanks > > Pat > > > > > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: > > Let's start with the same 'dd' test we were testing with to > see, > what the numbers are. Please provide profile numbers for the > same. From there on we will start tuning the volume to see > what > we can do. > > On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > Thanks for the tip. We now have the gluster volume > mounted > under /home. What tests do you recommend we run? > > Thanks > > Pat > > > > On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: > > On Tue, May 16, 2017 at 9:20 PM, Pat Haley > <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > Sorry for the delay. I never saw received your > reply > (but I did receive Ben Turner's follow-up to your > reply). So we tried to create a gluster volume > under > /home using different variations of > > gluster volume create test-volume > mseas-data2:/home/gbrick_test_1 > mseas-data2:/home/gbrick_test_2 transport tcp > > However we keep getting errors of the form > > Wrong brick type: transport, use > <HOSTNAME>:<export-dir-abs-path> > > Any thoughts on what we're doing wrong? > > > You should give transport tcp at the beginning I think. > Anyways, transport tcp is the default, so no need to > specify > so remove those two words from the CLI. > > > Also do you have a list of the test we should be > running > once we get this volume created? Given the > time-zone > difference it might help if we can run a small > battery > of tests and post the results rather than > test-post-new > test-post... . > > > This is the first time I am doing performance analysis > on > users as far as I remember. In our team there are > separate > engineers who do these tests. Ben who replied earlier is > one > such engineer. > > Ben, > Have any suggestions? > > > Thanks > > Pat > > > > On 05/11/2017 12:06 PM, Pranith Kumar Karampuri > wrote: > > On Thu, May 11, 2017 at 9:32 PM, Pat Haley > <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > The /home partition is mounted as ext4 > /home ext4 defaults,usrquota,grpquota 1 2 > > The brick partitions are mounted ax xfs > /mnt/brick1 xfs defaults 0 0 > /mnt/brick2 xfs defaults 0 0 > > Will this cause a problem with creating a > volume > under /home? > > > I don't think the bottleneck is disk. You can do > the > same tests you did on your new volume to confirm? > > > Pat > > > > On 05/11/2017 11:32 AM, Pranith Kumar Karampuri > wrote: > > On Thu, May 11, 2017 at 8:57 PM, Pat Haley > <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> > wrote: > > > Hi Pranith, > > Unfortunately, we don't have similar > hardware > for a small scale test. All we have is > our > production hardware. > > > You said something about /home partition which > has > lesser disks, we can create plain distribute > volume inside one of those directories. After > we > are done, we can remove the setup. What do you > say? > > > Pat > > > > > On 05/11/2017 07:05 AM, Pranith Kumar > Karampuri wrote: > > On Thu, May 11, 2017 at 2:48 AM, Pat > Haley > <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> > wrote: > > > Hi Pranith, > > Since we are mounting the partitions > as > the bricks, I tried the dd test > writing > to > <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > The results without oflag=sync were > 1.6 > Gb/s (faster than gluster but not as > fast > as I was expecting given the 1.2 Gb/s > to > the no-gluster area w/ fewer disks). > > > Okay, then 1.6Gb/s is what we need to > target > for, considering your volume is just > distribute. Is there any way you can do > tests > on similar hardware but at a small scale? > Just so we can run the workload to learn > more > about the bottlenecks in the system? We > can > probably try to get the speed to 1.2Gb/s > on > your /home partition you were telling me > yesterday. Let me know if that is > something > you are okay to do. > > > Pat > > > > On 05/10/2017 01:27 PM, Pranith Kumar > Karampuri wrote: > > On Wed, May 10, 2017 at 10:15 PM, > Pat > Haley <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > Not entirely sure (this isn't my > area of expertise). I'll run > your > answer by some other people who > are > more familiar with this. > > I am also uncertain about how to > interpret the results when we > also > add the dd tests writing to the > /home area (no gluster, still on > the > same machine) > > * dd test without oflag=sync > (rough average of multiple > tests) > o gluster w/ fuse mount : > 570 > Mb/s > o gluster w/ nfs mount: > 390 > Mb/s > o nfs (no gluster): 1.2 > Gb/s > * dd test with oflag=sync > (rough > average of multiple tests) > o gluster w/ fuse mount: > 5 > Mb/s > o gluster w/ nfs mount: > 200 > Mb/s > o nfs (no gluster): 20 > Mb/s > > Given that the non-gluster area > is > a > RAID-6 of 4 disks while each > brick > of the gluster area is a RAID-6 > of > 32 disks, I would naively expect > the > writes to the gluster area to be > roughly 8x faster than to the > non-gluster. > > > I think a better test is to try and > write to a file using nfs without > any > gluster to a location that is not > inside > the brick but someother location > that > is > on same disk(s). If you are mounting > the > partition as the brick, then we can > write to a file inside .glusterfs > directory, something like > <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > > > > I still think we have a speed > issue, > I can't tell if fuse vs nfs is > part > of the problem. > > > I got interested in the post because > I > read that fuse speed is lesser than > nfs > speed which is counter-intuitive to > my > understanding. So wanted > clarifications. > Now that I got my clarifications > where > fuse outperformed nfs without sync, > we > can resume testing as described > above > and try to find what it is. Based on > your email-id I am guessing you are > from > Boston and I am from Bangalore so if > you > are okay with doing this debugging > for > multiple days because of timezones, > I > will be happy to help. Please be a > bit > patient with me, I am under a > release > crunch but I am very curious with > the > problem you posted. > > Was there anything useful in the > profiles? > > > Unfortunately profiles didn't help > me > much, I think we are collecting the > profiles from an active volume, so > it > has a lot of information that is not > pertaining to dd so it is difficult > to > find the contributions of dd. So I > went > through your post again and found > something I didn't pay much > attention > to > earlier i.e. oflag=sync, so did my > own > tests on my setup with FUSE so sent > that > reply. > > > Pat > > > > On 05/10/2017 12:15 PM, Pranith > Kumar Karampuri wrote: > > Okay good. At least this > validates > my doubts. Handling O_SYNC in > gluster NFS and fuse is a bit > different. > When application opens a file > with > O_SYNC on fuse mount then each > write syscall has to be written > to > disk as part of the syscall > where > as in case of NFS, there is no > concept of open. NFS performs > write > though a handle saying it needs > to > be a synchronous write, so > write() > syscall is performed first then > it > performs fsync(). so an write > on > an > fd with O_SYNC becomes > write+fsync. > I am suspecting that when > multiple > threads do this write+fsync() > operation on the same file, > multiple writes are batched > together to be written do disk > so > the throughput on the disk is > increasing is my guess. > > Does it answer your doubts? > > On Wed, May 10, 2017 at 9:35 > PM, > Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Without the oflag=sync and > only > a single test of each, the > FUSE > is going faster than NFS: > > FUSE: > mseas-data2(dri_nascar)% dd > if=/dev/zero count=4096 > bs=1048576 of=zeros.txt > conv=sync > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) > copied, 7.46961 s, 575 MB/s > > > NFS > mseas-data2(HYCOM)% dd > if=/dev/zero count=4096 > bs=1048576 of=zeros.txt > conv=sync > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) > copied, 11.4264 s, 376 MB/s > > > > On 05/10/2017 11:53 AM, > Pranith > Kumar Karampuri wrote: > > Could you let me know the > speed without oflag=sync > on > both the mounts? No need > to > collect profiles. > > On Wed, May 10, 2017 at > 9:17 > PM, Pat Haley > <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> > wrote: > > > Here is what I see > now: > > [root at mseas-data2 ~]# > gluster volume info > > Volume Name: > data-volume > Type: Distribute > Volume ID: > c162161e-2a2d-4dac-b015-f31fd89ceb18 > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: > mseas-data2:/mnt/brick1 > Brick2: > mseas-data2:/mnt/brick2 > Options Reconfigured: > diagnostics.count-fop-hits: > on > diagnostics.latency-measurement: > on > nfs.exports-auth-enable: > on > diagnostics.brick-sys-log-level: > WARNING > performance.readdir-ahead: > on > nfs.disable: on > nfs.export-volumes: > off > > > > On 05/10/2017 11:44 > AM, > Pranith Kumar > Karampuri > wrote: > > Is this the volume > info > you have? > > >/[root at > >mseas-data2 > <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users> > ~]# gluster volume > info > />//>/Volume Name: > data-volume />/Type: > Distribute />/Volume > ID: > c162161e-2a2d-4dac-b015-f31fd89ceb18 > />/Status: Started > />/Number > of Bricks: 2 > />/Transport-type: > tcp > />/Bricks: />/Brick1: > mseas-data2:/mnt/brick1 > />/Brick2: > mseas-data2:/mnt/brick2 > />/Options > Reconfigured: > />/performance.readdir-ahead: > on />/nfs.disable: on > />/nfs.export-volumes: > off > / > ?I copied this from > old > thread from 2016. > This > is > distribute volume. > Did > you change any of the > options in between? > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean > Engineering > Phone: (617) 253-6824 > Dept. of Mechanical > Engineering > Fax: (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts > Avenue > Cambridge, MA > 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean > Engineering > Phone: (617) 253-6824 > Dept. of Mechanical > Engineering > Fax: (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering > Phone: > (617) 253-6824 > Dept. of Mechanical Engineering > Fax: > (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering > Phone: > (617) 253-6824 > Dept. of Mechanical Engineering > Fax: > (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: > (617) > 253-6824 > Dept. of Mechanical Engineering Fax: > (617) > 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: > (617) > 253-6824 > Dept. of Mechanical Engineering Fax: > (617) > 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: (617) > 253-6824 > Dept. of Mechanical Engineering Fax: (617) > 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170623/6d1055ce/attachment.html>