Hi All, Decided to try another tests of gluster mounted via FUSE vs gluster mounted via NFS, this time using the software we run in production (i.e. our ocean model writing a netCDF file). gluster mounted via NFS the run took 2.3 hr gluster mounted via FUSE: the run took 44.2 hr The only problem with using gluster mounted via NFS is that it does not respect the group write permissions which we need. We have an exercise coming up in the a couple of weeks. It seems to me that in order to improve our write times before then, it would be good to solve the group write permissions for gluster mounted via NFS now. We can then revisit gluster mounted via FUSE afterwards. What information would you need to help us force gluster mounted via NFS to respect the group write permissions? Thanks Pat On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote:> > > On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri > <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote: > > > > On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu>> wrote: > > > Hi, > > Today we experimented with some of the FUSE options that we > found in the list. > > Changing these options had no effect: > > gluster volume set test-volume performance.cache-max-file-size 2MB > gluster volume set test-volume performance.cache-refresh-timeout 4 > gluster volume set test-volume performance.cache-size 256MB > gluster volume set test-volume performance.write-behind-window-size 4MB > gluster volume set test-volume performance.write-behind-window-size 8MB > > > This is a good coincidence, I am meeting with write-behind > maintainer(+Raghavendra G) today for the same doubt. I think we > will have something by EOD IST. I will update you. > > > Sorry, forgot to update you. It seems like there is a bug in > Write-behind and Facebook guys sent a patch > http://review.gluster.org/16079 to fix the same. But even with that I > am not seeing any improvement. May be I am doing something wrong. Will > update you if I find anything more. > > Changing the following option from its default value made the > speed slower > > gluster volume set test-volume performance.write-behind off (on by default) > > Changing the following options initially appeared to give a > 10% increase in speed, but this vanished in subsequent tests > (we think the apparent increase may have been to a lighter > workload on the computer from other users) > > gluster volume set test-volume performance.stat-prefetch on > gluster volume set test-volume client.event-threads 4 > gluster volume set test-volume server.event-threads 4 > > > Can anything be gleaned from these observations? Are there > other things we can try? > > Thanks > > Pat > > > > On 06/20/2017 12:06 PM, Pat Haley wrote: >> >> Hi Ben, >> >> Sorry this took so long, but we had a real-time forecasting >> exercise last week and I could only get to this now. >> >> Backend Hardware/OS: >> >> * Much of the information on our back end system is >> included at the top of >> http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html >> <http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html> >> * The specific model of the hard disks is SeaGate >> ENTERPRISE CAPACITY V.4 6TB (ST6000NM0024). The rated >> speed is 6Gb/s. >> * Note: there is one physical server that hosts both the >> NFS and the GlusterFS areas >> >> Latest tests >> >> I have had time to run the tests for one of the dd tests you >> requested to the underlying XFS FS. The median rate was 170 >> MB/s. The dd results and iostat record are in >> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ >> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/> >> >> I'll add tests for the other brick and to the NFS area later. >> >> Thanks >> >> Pat >> >> >> On 06/12/2017 06:06 PM, Ben Turner wrote: >>> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: >>> >>> throughput = slowest of disks / NIC * .6-.7 >>> >>> In your case we have: >>> >>> 1200 * .6 = 720 >>> >>> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: >>> >>> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? >>> >>> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? >>> >>> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: >>> >>> If you are focusing on a write workload run: >>> >>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync >>> >>> If you are focusing on a read workload run: >>> >>> # echo 3 > /proc/sys/vm/drop_caches >>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 >>> >>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** >>> >>> Run this in a loop similar to how you did in: >>> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt> >>> >>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: >>> >>> # iostat -c -m -x 1 > iostat-$(hostname).txt >>> >>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. >>> >>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. >>> >>> -b >>> >>> >>> >>> ----- Original Message ----- >>>> From: "Pat Haley"<phaley at mit.edu> <mailto:phaley at mit.edu> >>>> To: "Ben Turner"<bturner at redhat.com> <mailto:bturner at redhat.com> >>>> Sent: Monday, June 12, 2017 5:18:07 PM >>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>> >>>> >>>> Hi Ben, >>>> >>>> Here is the output: >>>> >>>> [root at mseas-data2 ~]# gluster volume info >>>> >>>> Volume Name: data-volume >>>> Type: Distribute >>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>> Status: Started >>>> Number of Bricks: 2 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: mseas-data2:/mnt/brick1 >>>> Brick2: mseas-data2:/mnt/brick2 >>>> Options Reconfigured: >>>> nfs.exports-auth-enable: on >>>> diagnostics.brick-sys-log-level: WARNING >>>> performance.readdir-ahead: on >>>> nfs.disable: on >>>> nfs.export-volumes: off >>>> >>>> >>>> On 06/12/2017 05:01 PM, Ben Turner wrote: >>>>> What is the output of gluster v info? That will tell us more about your >>>>> config. >>>>> >>>>> -b >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Pat Haley"<phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>> To: "Ben Turner"<bturner at redhat.com> <mailto:bturner at redhat.com> >>>>>> Sent: Monday, June 12, 2017 4:54:00 PM >>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>> >>>>>> >>>>>> Hi Ben, >>>>>> >>>>>> I guess I'm confused about what you mean by replication. If I look at >>>>>> the underlying bricks I only ever have a single copy of any file. It >>>>>> either resides on one brick or the other (directories exist on both >>>>>> bricks but not files). We are not using gluster for redundancy (or at >>>>>> least that wasn't our intent). Is that what you meant by replication >>>>>> or is it something else? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Pat >>>>>> >>>>>> On 06/12/2017 04:28 PM, Ben Turner wrote: >>>>>>> ----- Original Message ----- >>>>>>>> From: "Pat Haley"<phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>> To: "Ben Turner"<bturner at redhat.com> <mailto:bturner at redhat.com>, "Pranith Kumar Karampuri" >>>>>>>> <pkarampu at redhat.com> <mailto:pkarampu at redhat.com> >>>>>>>> Cc: "Ravishankar N"<ravishankar at redhat.com> <mailto:ravishankar at redhat.com>,gluster-users at gluster.org >>>>>>>> <mailto:gluster-users at gluster.org>, >>>>>>>> "Steve Postma"<SPostma at ztechnet.com> <mailto:SPostma at ztechnet.com> >>>>>>>> Sent: Monday, June 12, 2017 2:35:41 PM >>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>>> >>>>>>>> >>>>>>>> Hi Guys, >>>>>>>> >>>>>>>> I was wondering what our next steps should be to solve the slow write >>>>>>>> times. >>>>>>>> >>>>>>>> Recently I was debugging a large code and writing a lot of output at >>>>>>>> every time step. When I tried writing to our gluster disks, it was >>>>>>>> taking over a day to do a single time step whereas if I had the same >>>>>>>> program (same hardware, network) write to our nfs disk the time per >>>>>>>> time-step was about 45 minutes. What we are shooting for here would be >>>>>>>> to have similar times to either gluster of nfs. >>>>>>> I can see in your test: >>>>>>> >>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt> >>>>>>> >>>>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / >>>>>>> sec} / #replicas{2} = 600). Gluster does client side replication so with >>>>>>> replica 2 you will only ever see 1/2 the speed of your slowest part of >>>>>>> the >>>>>>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is >>>>>>> normally >>>>>>> a best case. Now in your output I do see the instances where you went >>>>>>> down to 200 MB / sec. I can only explain this in three ways: >>>>>>> >>>>>>> 1. You are not using conv=fdatasync and writes are actually going to >>>>>>> page >>>>>>> cache and then being flushed to disk. During the fsync the memory is not >>>>>>> yet available and the disks are busy flushing dirty pages. >>>>>>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) >>>>>>> and when write times are slow the RAID group is busy serviceing other >>>>>>> LUNs. >>>>>>> 3. Gluster bug / config issue / some other unknown unknown. >>>>>>> >>>>>>> So I see 2 issues here: >>>>>>> >>>>>>> 1. NFS does in 45 minutes what gluster can do in 24 hours. >>>>>>> 2. Sometimes your throughput drops dramatically. >>>>>>> >>>>>>> WRT #1 - have a look at my estimates above. My formula for guestimating >>>>>>> gluster perf is: throughput = NIC throughput or storage(whatever is >>>>>>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the >>>>>>> record size the better for glusterfs mounts, I normally like to be at >>>>>>> LEAST 64k up to 1024k: >>>>>>> >>>>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 >>>>>>> conv=fdatasync >>>>>>> >>>>>>> WRT #2 - Again, I question your testing and your storage config. Try >>>>>>> using >>>>>>> conv=fdatasync for your DDs, use a larger record size, and make sure that >>>>>>> your back end storage is not causing your slowdowns. Also remember that >>>>>>> with replica 2 you will take ~50% hit on writes because the client uses >>>>>>> 50% of its bandwidth to write to one replica and 50% to the other. >>>>>>> >>>>>>> -b >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Pat >>>>>>>> >>>>>>>> >>>>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote: >>>>>>>>> Are you sure using conv=sync is what you want? I normally use >>>>>>>>> conv=fdatasync, I'll look up the difference between the two and see if >>>>>>>>> it >>>>>>>>> affects your test. >>>>>>>>> >>>>>>>>> >>>>>>>>> -b >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> From: "Pat Haley"<phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>> To: "Pranith Kumar Karampuri"<pkarampu at redhat.com> <mailto:pkarampu at redhat.com> >>>>>>>>>> Cc: "Ravishankar N"<ravishankar at redhat.com> <mailto:ravishankar at redhat.com>, >>>>>>>>>> gluster-users at gluster.org >>>>>>>>>> <mailto:gluster-users at gluster.org>, >>>>>>>>>> "Steve Postma"<SPostma at ztechnet.com> <mailto:SPostma at ztechnet.com>, "Ben >>>>>>>>>> Turner"<bturner at redhat.com> <mailto:bturner at redhat.com> >>>>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM >>>>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pranith, >>>>>>>>>> >>>>>>>>>> The "dd" command was: >>>>>>>>>> >>>>>>>>>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync >>>>>>>>>> >>>>>>>>>> There were 2 instances where dd reported 22 seconds. The output from >>>>>>>>>> the >>>>>>>>>> dd tests are in >>>>>>>>>> >>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt> >>>>>>>>>> >>>>>>>>>> Pat >>>>>>>>>> >>>>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>>> Pat, >>>>>>>>>>> What is the command you used? As per the following output, >>>>>>>>>>> it >>>>>>>>>>> seems like at least one write operation took 16 seconds. Which is >>>>>>>>>>> really bad. >>>>>>>>>>> 96.39 1165.10 us 89.00 us*16487014.00 us* >>>>>>>>>>> 393212 >>>>>>>>>>> WRITE >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Pranith, >>>>>>>>>>> >>>>>>>>>>> I ran the same 'dd' test both in the gluster test volume and >>>>>>>>>>> in >>>>>>>>>>> the .glusterfs directory of each brick. The median results >>>>>>>>>>> (12 >>>>>>>>>>> dd >>>>>>>>>>> trials in each test) are similar to before >>>>>>>>>>> >>>>>>>>>>> * gluster test volume: 586.5 MB/s >>>>>>>>>>> * bricks (in .glusterfs): 1.4 GB/s >>>>>>>>>>> >>>>>>>>>>> The profile for the gluster test-volume is in >>>>>>>>>>> >>>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt >>>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Pat >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>>>> Let's start with the same 'dd' test we were testing with to >>>>>>>>>>>> see, >>>>>>>>>>>> what the numbers are. Please provide profile numbers for the >>>>>>>>>>>> same. From there on we will start tuning the volume to see >>>>>>>>>>>> what >>>>>>>>>>>> we can do. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the tip. We now have the gluster volume >>>>>>>>>>>> mounted >>>>>>>>>>>> under /home. What tests do you recommend we run? >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> Pat >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>>>>>>>>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley >>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry for the delay. I never saw received your >>>>>>>>>>>>> reply >>>>>>>>>>>>> (but I did receive Ben Turner's follow-up to your >>>>>>>>>>>>> reply). So we tried to create a gluster volume >>>>>>>>>>>>> under >>>>>>>>>>>>> /home using different variations of >>>>>>>>>>>>> >>>>>>>>>>>>> gluster volume create test-volume >>>>>>>>>>>>> mseas-data2:/home/gbrick_test_1 >>>>>>>>>>>>> mseas-data2:/home/gbrick_test_2 transport tcp >>>>>>>>>>>>> >>>>>>>>>>>>> However we keep getting errors of the form >>>>>>>>>>>>> >>>>>>>>>>>>> Wrong brick type: transport, use >>>>>>>>>>>>> <HOSTNAME>:<export-dir-abs-path> >>>>>>>>>>>>> >>>>>>>>>>>>> Any thoughts on what we're doing wrong? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You should give transport tcp at the beginning I think. >>>>>>>>>>>>> Anyways, transport tcp is the default, so no need to >>>>>>>>>>>>> specify >>>>>>>>>>>>> so remove those two words from the CLI. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Also do you have a list of the test we should be >>>>>>>>>>>>> running >>>>>>>>>>>>> once we get this volume created? Given the >>>>>>>>>>>>> time-zone >>>>>>>>>>>>> difference it might help if we can run a small >>>>>>>>>>>>> battery >>>>>>>>>>>>> of tests and post the results rather than >>>>>>>>>>>>> test-post-new >>>>>>>>>>>>> test-post... . >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is the first time I am doing performance analysis >>>>>>>>>>>>> on >>>>>>>>>>>>> users as far as I remember. In our team there are >>>>>>>>>>>>> separate >>>>>>>>>>>>> engineers who do these tests. Ben who replied earlier is >>>>>>>>>>>>> one >>>>>>>>>>>>> such engineer. >>>>>>>>>>>>> >>>>>>>>>>>>> Ben, >>>>>>>>>>>>> Have any suggestions? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> Pat >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The /home partition is mounted as ext4 >>>>>>>>>>>>>> /home ext4 defaults,usrquota,grpquota 1 2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> The brick partitions are mounted ax xfs >>>>>>>>>>>>>> /mnt/brick1 xfs defaults 0 0 >>>>>>>>>>>>>> /mnt/brick2 xfs defaults 0 0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Will this cause a problem with creating a >>>>>>>>>>>>>> volume >>>>>>>>>>>>>> under /home? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't think the bottleneck is disk. You can do >>>>>>>>>>>>>> the >>>>>>>>>>>>>> same tests you did on your new volume to confirm? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pat >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately, we don't have similar >>>>>>>>>>>>>>> hardware >>>>>>>>>>>>>>> for a small scale test. All we have is >>>>>>>>>>>>>>> our >>>>>>>>>>>>>>> production hardware. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You said something about /home partition which >>>>>>>>>>>>>>> has >>>>>>>>>>>>>>> lesser disks, we can create plain distribute >>>>>>>>>>>>>>> volume inside one of those directories. After >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>> are done, we can remove the setup. What do you >>>>>>>>>>>>>>> say? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar >>>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat >>>>>>>>>>>>>>>> Haley >>>>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Since we are mounting the partitions >>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>> the bricks, I tried the dd test >>>>>>>>>>>>>>>> writing >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>>> The results without oflag=sync were >>>>>>>>>>>>>>>> 1.6 >>>>>>>>>>>>>>>> Gb/s (faster than gluster but not as >>>>>>>>>>>>>>>> fast >>>>>>>>>>>>>>>> as I was expecting given the 1.2 Gb/s >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> the no-gluster area w/ fewer disks). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Okay, then 1.6Gb/s is what we need to >>>>>>>>>>>>>>>> target >>>>>>>>>>>>>>>> for, considering your volume is just >>>>>>>>>>>>>>>> distribute. Is there any way you can do >>>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>>> on similar hardware but at a small scale? >>>>>>>>>>>>>>>> Just so we can run the workload to learn >>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>> about the bottlenecks in the system? We >>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>> probably try to get the speed to 1.2Gb/s >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> your /home partition you were telling me >>>>>>>>>>>>>>>> yesterday. Let me know if that is >>>>>>>>>>>>>>>> something >>>>>>>>>>>>>>>> you are okay to do. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar >>>>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM, >>>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>>> Haley <phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Not entirely sure (this isn't my >>>>>>>>>>>>>>>>> area of expertise). I'll run >>>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>>> answer by some other people who >>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>> more familiar with this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am also uncertain about how to >>>>>>>>>>>>>>>>> interpret the results when we >>>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>>> add the dd tests writing to the >>>>>>>>>>>>>>>>> /home area (no gluster, still on >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> same machine) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> * dd test without oflag=sync >>>>>>>>>>>>>>>>> (rough average of multiple >>>>>>>>>>>>>>>>> tests) >>>>>>>>>>>>>>>>> o gluster w/ fuse mount : >>>>>>>>>>>>>>>>> 570 >>>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>>>> 390 >>>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>>> o nfs (no gluster): 1.2 >>>>>>>>>>>>>>>>> Gb/s >>>>>>>>>>>>>>>>> * dd test with oflag=sync >>>>>>>>>>>>>>>>> (rough >>>>>>>>>>>>>>>>> average of multiple tests) >>>>>>>>>>>>>>>>> o gluster w/ fuse mount: >>>>>>>>>>>>>>>>> 5 >>>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>>>> 200 >>>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>>> o nfs (no gluster): 20 >>>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Given that the non-gluster area >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>> RAID-6 of 4 disks while each >>>>>>>>>>>>>>>>> brick >>>>>>>>>>>>>>>>> of the gluster area is a RAID-6 >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> 32 disks, I would naively expect >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> writes to the gluster area to be >>>>>>>>>>>>>>>>> roughly 8x faster than to the >>>>>>>>>>>>>>>>> non-gluster. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think a better test is to try and >>>>>>>>>>>>>>>>> write to a file using nfs without >>>>>>>>>>>>>>>>> any >>>>>>>>>>>>>>>>> gluster to a location that is not >>>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>>> the brick but someother location >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> on same disk(s). If you are mounting >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> partition as the brick, then we can >>>>>>>>>>>>>>>>> write to a file inside .glusterfs >>>>>>>>>>>>>>>>> directory, something like >>>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I still think we have a speed >>>>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>>>> I can't tell if fuse vs nfs is >>>>>>>>>>>>>>>>> part >>>>>>>>>>>>>>>>> of the problem. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I got interested in the post because >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> read that fuse speed is lesser than >>>>>>>>>>>>>>>>> nfs >>>>>>>>>>>>>>>>> speed which is counter-intuitive to >>>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>> understanding. So wanted >>>>>>>>>>>>>>>>> clarifications. >>>>>>>>>>>>>>>>> Now that I got my clarifications >>>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>>> fuse outperformed nfs without sync, >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can resume testing as described >>>>>>>>>>>>>>>>> above >>>>>>>>>>>>>>>>> and try to find what it is. Based on >>>>>>>>>>>>>>>>> your email-id I am guessing you are >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> Boston and I am from Bangalore so if >>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>> are okay with doing this debugging >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> multiple days because of timezones, >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> will be happy to help. Please be a >>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>>> patient with me, I am under a >>>>>>>>>>>>>>>>> release >>>>>>>>>>>>>>>>> crunch but I am very curious with >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> problem you posted. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Was there anything useful in the >>>>>>>>>>>>>>>>> profiles? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Unfortunately profiles didn't help >>>>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>>> much, I think we are collecting the >>>>>>>>>>>>>>>>> profiles from an active volume, so >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> has a lot of information that is not >>>>>>>>>>>>>>>>> pertaining to dd so it is difficult >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> find the contributions of dd. So I >>>>>>>>>>>>>>>>> went >>>>>>>>>>>>>>>>> through your post again and found >>>>>>>>>>>>>>>>> something I didn't pay much >>>>>>>>>>>>>>>>> attention >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> earlier i.e. oflag=sync, so did my >>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>> tests on my setup with FUSE so sent >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> reply. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 05/10/2017 12:15 PM, Pranith >>>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>>> Okay good. At least this >>>>>>>>>>>>>>>>>> validates >>>>>>>>>>>>>>>>>> my doubts. Handling O_SYNC in >>>>>>>>>>>>>>>>>> gluster NFS and fuse is a bit >>>>>>>>>>>>>>>>>> different. >>>>>>>>>>>>>>>>>> When application opens a file >>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>> O_SYNC on fuse mount then each >>>>>>>>>>>>>>>>>> write syscall has to be written >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> disk as part of the syscall >>>>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>>>> as in case of NFS, there is no >>>>>>>>>>>>>>>>>> concept of open. NFS performs >>>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>>> though a handle saying it needs >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> be a synchronous write, so >>>>>>>>>>>>>>>>>> write() >>>>>>>>>>>>>>>>>> syscall is performed first then >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> performs fsync(). so an write >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>> fd with O_SYNC becomes >>>>>>>>>>>>>>>>>> write+fsync. >>>>>>>>>>>>>>>>>> I am suspecting that when >>>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>>> threads do this write+fsync() >>>>>>>>>>>>>>>>>> operation on the same file, >>>>>>>>>>>>>>>>>> multiple writes are batched >>>>>>>>>>>>>>>>>> together to be written do disk >>>>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>>> the throughput on the disk is >>>>>>>>>>>>>>>>>> increasing is my guess. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does it answer your doubts? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 9:35 >>>>>>>>>>>>>>>>>> PM, >>>>>>>>>>>>>>>>>> Pat Haley <phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Without the oflag=sync and >>>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>> a single test of each, the >>>>>>>>>>>>>>>>>> FUSE >>>>>>>>>>>>>>>>>> is going faster than NFS: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> FUSE: >>>>>>>>>>>>>>>>>> mseas-data2(dri_nascar)% dd >>>>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>>>> copied, 7.46961 s, 575 MB/s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> NFS >>>>>>>>>>>>>>>>>> mseas-data2(HYCOM)% dd >>>>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>>>> copied, 11.4264 s, 376 MB/s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 05/10/2017 11:53 AM, >>>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>>>> Could you let me know the >>>>>>>>>>>>>>>>>>> speed without oflag=sync >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> both the mounts? No need >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> collect profiles. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at >>>>>>>>>>>>>>>>>>> 9:17 >>>>>>>>>>>>>>>>>>> PM, Pat Haley >>>>>>>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Here is what I see >>>>>>>>>>>>>>>>>>> now: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [root at mseas-data2 ~]# >>>>>>>>>>>>>>>>>>> gluster volume info >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Volume Name: >>>>>>>>>>>>>>>>>>> data-volume >>>>>>>>>>>>>>>>>>> Type: Distribute >>>>>>>>>>>>>>>>>>> Volume ID: >>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>>> Status: Started >>>>>>>>>>>>>>>>>>> Number of Bricks: 2 >>>>>>>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>>>>>>> Brick1: >>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>>> Brick2: >>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>>>>>>> diagnostics.count-fop-hits: >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> diagnostics.latency-measurement: >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> nfs.exports-auth-enable: >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> diagnostics.brick-sys-log-level: >>>>>>>>>>>>>>>>>>> WARNING >>>>>>>>>>>>>>>>>>> performance.readdir-ahead: >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>>>>>>> nfs.export-volumes: >>>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 05/10/2017 11:44 >>>>>>>>>>>>>>>>>>> AM, >>>>>>>>>>>>>>>>>>> Pranith Kumar >>>>>>>>>>>>>>>>>>> Karampuri >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> Is this the volume >>>>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>>>> you have? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >/[root at >>>>>>>>>>>>>>>>>>>> >mseas-data2 >>>>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>>>>>>>>> ~]# gluster volume >>>>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>>>> />//>/Volume Name: >>>>>>>>>>>>>>>>>>>> data-volume />/Type: >>>>>>>>>>>>>>>>>>>> Distribute />/Volume >>>>>>>>>>>>>>>>>>>> ID: >>>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>>>> />/Status: Started >>>>>>>>>>>>>>>>>>>> />/Number >>>>>>>>>>>>>>>>>>>> of Bricks: 2 >>>>>>>>>>>>>>>>>>>> />/Transport-type: >>>>>>>>>>>>>>>>>>>> tcp >>>>>>>>>>>>>>>>>>>> />/Bricks: />/Brick1: >>>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>>>> />/Brick2: >>>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>>>> />/Options >>>>>>>>>>>>>>>>>>>> Reconfigured: >>>>>>>>>>>>>>>>>>>> />/performance.readdir-ahead: >>>>>>>>>>>>>>>>>>>> on />/nfs.disable: on >>>>>>>>>>>>>>>>>>>> />/nfs.export-volumes: >>>>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>>> ?I copied this from >>>>>>>>>>>>>>>>>>>> old >>>>>>>>>>>>>>>>>>>> thread from 2016. >>>>>>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>> distribute volume. >>>>>>>>>>>>>>>>>>>> Did >>>>>>>>>>>>>>>>>>>> you change any of the >>>>>>>>>>>>>>>>>>>> options in between? >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>>>> <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>>>>>>>> 77 Massachusetts >>>>>>>>>>>>>>>>>>> Avenue >>>>>>>>>>>>>>>>>>> Cambridge, MA >>>>>>>>>>>>>>>>>>> 02139-4301 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>>> <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>> Email:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>> Email:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>> Email:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>>>> (617) >>>>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>>>> (617) >>>>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>> Email:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>>> (617) >>>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>>> (617) >>>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Pranith >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>> Email:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>>> Center for Ocean Engineering Phone: (617) >>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) >>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Pranith >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>> Pat HaleyEmail:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Pranith >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>> Pat HaleyEmail:phaley at mit.edu <mailto:Email:phaley at mit.edu> >>>>>>>>>>> <mailto:phaley at mit.edu> <mailto:phaley at mit.edu> >>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>> <http://web.mit.edu/phaley/www/> >>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Pranith >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>> 77 Massachusetts Avenue >>>>>>>> Cambridge, MA 02139-4301 >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>> 77 Massachusetts Avenue >>>>>> Cambridge, MA 02139-4301 >>>>>> >>>>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > > > > -- > Pranith-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170626/c2876ac2/attachment.html>
Pranith Kumar Karampuri
2017-Jun-27 04:47 UTC
[Gluster-users] Slow write times to gluster disk
On Mon, Jun 26, 2017 at 7:40 PM, Pat Haley <phaley at mit.edu> wrote:> > Hi All, > > Decided to try another tests of gluster mounted via FUSE vs gluster > mounted via NFS, this time using the software we run in production (i.e. > our ocean model writing a netCDF file). > > gluster mounted via NFS the run took 2.3 hr > > gluster mounted via FUSE: the run took 44.2 hr > > The only problem with using gluster mounted via NFS is that it does not > respect the group write permissions which we need. > > We have an exercise coming up in the a couple of weeks. It seems to me > that in order to improve our write times before then, it would be good to > solve the group write permissions for gluster mounted via NFS now. We can > then revisit gluster mounted via FUSE afterwards. > > What information would you need to help us force gluster mounted via NFS > to respect the group write permissions? >+Niels, +Jiffin I added 2 more guys who work on NFS to check why this problem happens in your environment. Let's see what information they may need to find the problem and solve this issue.> > Thanks > > Pat > > > > > On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote: > > > > On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > >> >> >> On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu> wrote: >> >>> >>> Hi, >>> >>> Today we experimented with some of the FUSE options that we found in the >>> list. >>> >>> Changing these options had no effect: >>> >>> gluster volume set test-volume performance.cache-max-file-size 2MB >>> gluster volume set test-volume performance.cache-refresh-timeout 4 >>> gluster volume set test-volume performance.cache-size 256MB >>> gluster volume set test-volume performance.write-behind-window-size 4MB >>> gluster volume set test-volume performance.write-behind-window-size 8MB >>> >>> >> This is a good coincidence, I am meeting with write-behind >> maintainer(+Raghavendra G) today for the same doubt. I think we will have >> something by EOD IST. I will update you. >> > > Sorry, forgot to update you. It seems like there is a bug in Write-behind > and Facebook guys sent a patch http://review.gluster.org/16079 to fix the > same. But even with that I am not seeing any improvement. May be I am doing > something wrong. Will update you if I find anything more. > >> Changing the following option from its default value made the speed slower >>> >>> gluster volume set test-volume performance.write-behind off (on by default) >>> >>> Changing the following options initially appeared to give a 10% increase >>> in speed, but this vanished in subsequent tests (we think the apparent >>> increase may have been to a lighter workload on the computer from other >>> users) >>> >>> gluster volume set test-volume performance.stat-prefetch on >>> gluster volume set test-volume client.event-threads 4 >>> gluster volume set test-volume server.event-threads 4 >>> >>> Can anything be gleaned from these observations? Are there other things >>> we can try? >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> On 06/20/2017 12:06 PM, Pat Haley wrote: >>> >>> >>> Hi Ben, >>> >>> Sorry this took so long, but we had a real-time forecasting exercise >>> last week and I could only get to this now. >>> >>> Backend Hardware/OS: >>> >>> - Much of the information on our back end system is included at the >>> top of http://lists.gluster.org/pipermail/gluster-users/2017-April/ >>> 030529.html >>> - The specific model of the hard disks is SeaGate ENTERPRISE >>> CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. >>> - Note: there is one physical server that hosts both the NFS and the >>> GlusterFS areas >>> >>> Latest tests >>> >>> I have had time to run the tests for one of the dd tests you requested >>> to the underlying XFS FS. The median rate was 170 MB/s. The dd results >>> and iostat record are in >>> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ >>> >>> I'll add tests for the other brick and to the NFS area later. >>> >>> Thanks >>> >>> Pat >>> >>> >>> On 06/12/2017 06:06 PM, Ben Turner wrote: >>> >>> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: >>> >>> throughput = slowest of disks / NIC * .6-.7 >>> >>> In your case we have: >>> >>> 1200 * .6 = 720 >>> >>> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: >>> >>> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? >>> >>> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? >>> >>> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: >>> >>> If you are focusing on a write workload run: >>> >>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync >>> >>> If you are focusing on a read workload run: >>> >>> # echo 3 > /proc/sys/vm/drop_caches >>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 >>> >>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** >>> >>> Run this in a loop similar to how you did in: >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>> >>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: >>> >>> # iostat -c -m -x 1 > iostat-$(hostname).txt >>> >>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. >>> >>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. >>> >>> -b >>> >>> >>> >>> ----- Original Message ----- >>> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> >>> Sent: Monday, June 12, 2017 5:18:07 PM >>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>> >>> >>> Hi Ben, >>> >>> Here is the output: >>> >>> [root at mseas-data2 ~]# gluster volume info >>> >>> Volume Name: data-volume >>> Type: Distribute >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>> Status: Started >>> Number of Bricks: 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: mseas-data2:/mnt/brick1 >>> Brick2: mseas-data2:/mnt/brick2 >>> Options Reconfigured: >>> nfs.exports-auth-enable: on >>> diagnostics.brick-sys-log-level: WARNING >>> performance.readdir-ahead: on >>> nfs.disable: on >>> nfs.export-volumes: off >>> >>> >>> On 06/12/2017 05:01 PM, Ben Turner wrote: >>> >>> What is the output of gluster v info? That will tell us more about your >>> config. >>> >>> -b >>> >>> ----- Original Message ----- >>> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> >>> Sent: Monday, June 12, 2017 4:54:00 PM >>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>> >>> >>> Hi Ben, >>> >>> I guess I'm confused about what you mean by replication. If I look at >>> the underlying bricks I only ever have a single copy of any file. It >>> either resides on one brick or the other (directories exist on both >>> bricks but not files). We are not using gluster for redundancy (or at >>> least that wasn't our intent). Is that what you meant by replication >>> or is it something else? >>> >>> Thanks >>> >>> Pat >>> >>> On 06/12/2017 04:28 PM, Ben Turner wrote: >>> >>> ----- Original Message ----- >>> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>, "Pranith Kumar Karampuri"<pkarampu at redhat.com> <pkarampu at redhat.com> >>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>, gluster-users at gluster.org, >>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com> >>> Sent: Monday, June 12, 2017 2:35:41 PM >>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>> >>> >>> Hi Guys, >>> >>> I was wondering what our next steps should be to solve the slow write >>> times. >>> >>> Recently I was debugging a large code and writing a lot of output at >>> every time step. When I tried writing to our gluster disks, it was >>> taking over a day to do a single time step whereas if I had the same >>> program (same hardware, network) write to our nfs disk the time per >>> time-step was about 45 minutes. What we are shooting for here would be >>> to have similar times to either gluster of nfs. >>> >>> I can see in your test: >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>> >>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / >>> sec} / #replicas{2} = 600). Gluster does client side replication so with >>> replica 2 you will only ever see 1/2 the speed of your slowest part of >>> the >>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is >>> normally >>> a best case. Now in your output I do see the instances where you went >>> down to 200 MB / sec. I can only explain this in three ways: >>> >>> 1. You are not using conv=fdatasync and writes are actually going to >>> page >>> cache and then being flushed to disk. During the fsync the memory is not >>> yet available and the disks are busy flushing dirty pages. >>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) >>> and when write times are slow the RAID group is busy serviceing other >>> LUNs. >>> 3. Gluster bug / config issue / some other unknown unknown. >>> >>> So I see 2 issues here: >>> >>> 1. NFS does in 45 minutes what gluster can do in 24 hours. >>> 2. Sometimes your throughput drops dramatically. >>> >>> WRT #1 - have a look at my estimates above. My formula for guestimating >>> gluster perf is: throughput = NIC throughput or storage(whatever is >>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the >>> record size the better for glusterfs mounts, I normally like to be at >>> LEAST 64k up to 1024k: >>> >>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 >>> conv=fdatasync >>> >>> WRT #2 - Again, I question your testing and your storage config. Try >>> using >>> conv=fdatasync for your DDs, use a larger record size, and make sure that >>> your back end storage is not causing your slowdowns. Also remember that >>> with replica 2 you will take ~50% hit on writes because the client uses >>> 50% of its bandwidth to write to one replica and 50% to the other. >>> >>> -b >>> >>> >>> >>> >>> Thanks >>> >>> Pat >>> >>> >>> On 06/02/2017 01:07 AM, Ben Turner wrote: >>> >>> Are you sure using conv=sync is what you want? I normally use >>> conv=fdatasync, I'll look up the difference between the two and see if >>> it >>> affects your test. >>> >>> >>> -b >>> >>> ----- Original Message ----- >>> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> >>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> <pkarampu at redhat.com> >>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>,gluster-users at gluster.org, >>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>, "Ben >>> Turner" <bturner at redhat.com> <bturner at redhat.com> >>> Sent: Tuesday, May 30, 2017 9:40:34 PM >>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>> >>> >>> Hi Pranith, >>> >>> The "dd" command was: >>> >>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync >>> >>> There were 2 instances where dd reported 22 seconds. The output from >>> the >>> dd tests are in >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>> >>> Pat >>> >>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: >>> >>> Pat, >>> What is the command you used? As per the following output, >>> it >>> seems like at least one write operation took 16 seconds. Which is >>> really bad. >>> 96.39 1165.10 us 89.00 us*16487014.00 us* >>> 393212 >>> WRITE >>> >>> >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu<mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> I ran the same 'dd' test both in the gluster test volume and >>> in >>> the .glusterfs directory of each brick. The median results >>> (12 >>> dd >>> trials in each test) are similar to before >>> >>> * gluster test volume: 586.5 MB/s >>> * bricks (in .glusterfs): 1.4 GB/s >>> >>> The profile for the gluster test-volume is in >>> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt >>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> >>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >>> >>> Let's start with the same 'dd' test we were testing with to >>> see, >>> what the numbers are. Please provide profile numbers for the >>> same. From there on we will start tuning the volume to see >>> what >>> we can do. >>> >>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Thanks for the tip. We now have the gluster volume >>> mounted >>> under /home. What tests do you recommend we run? >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley >>> <phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Sorry for the delay. I never saw received your >>> reply >>> (but I did receive Ben Turner's follow-up to your >>> reply). So we tried to create a gluster volume >>> under >>> /home using different variations of >>> >>> gluster volume create test-volume >>> mseas-data2:/home/gbrick_test_1 >>> mseas-data2:/home/gbrick_test_2 transport tcp >>> >>> However we keep getting errors of the form >>> >>> Wrong brick type: transport, use >>> <HOSTNAME>:<export-dir-abs-path> >>> >>> Any thoughts on what we're doing wrong? >>> >>> >>> You should give transport tcp at the beginning I think. >>> Anyways, transport tcp is the default, so no need to >>> specify >>> so remove those two words from the CLI. >>> >>> >>> Also do you have a list of the test we should be >>> running >>> once we get this volume created? Given the >>> time-zone >>> difference it might help if we can run a small >>> battery >>> of tests and post the results rather than >>> test-post-new >>> test-post... . >>> >>> >>> This is the first time I am doing performance analysis >>> on >>> users as far as I remember. In our team there are >>> separate >>> engineers who do these tests. Ben who replied earlier is >>> one >>> such engineer. >>> >>> Ben, >>> Have any suggestions? >>> >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri >>> wrote: >>> >>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> The /home partition is mounted as ext4 >>> /home ext4 defaults,usrquota,grpquota 1 2 >>> >>> The brick partitions are mounted ax xfs >>> /mnt/brick1 xfs defaults 0 0 >>> /mnt/brick2 xfs defaults 0 0 >>> >>> Will this cause a problem with creating a >>> volume >>> under /home? >>> >>> >>> I don't think the bottleneck is disk. You can do >>> the >>> same tests you did on your new volume to confirm? >>> >>> >>> Pat >>> >>> >>> >>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri >>> wrote: >>> >>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> >>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Unfortunately, we don't have similar >>> hardware >>> for a small scale test. All we have is >>> our >>> production hardware. >>> >>> >>> You said something about /home partition which >>> has >>> lesser disks, we can create plain distribute >>> volume inside one of those directories. After >>> we >>> are done, we can remove the setup. What do you >>> say? >>> >>> >>> Pat >>> >>> >>> >>> >>> On 05/11/2017 07:05 AM, Pranith Kumar >>> Karampuri wrote: >>> >>> On Thu, May 11, 2017 at 2:48 AM, Pat >>> Haley >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> >>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Since we are mounting the partitions >>> as >>> the bricks, I tried the dd test >>> writing >>> to >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>> The results without oflag=sync were >>> 1.6 >>> Gb/s (faster than gluster but not as >>> fast >>> as I was expecting given the 1.2 Gb/s >>> to >>> the no-gluster area w/ fewer disks). >>> >>> >>> Okay, then 1.6Gb/s is what we need to >>> target >>> for, considering your volume is just >>> distribute. Is there any way you can do >>> tests >>> on similar hardware but at a small scale? >>> Just so we can run the workload to learn >>> more >>> about the bottlenecks in the system? We >>> can >>> probably try to get the speed to 1.2Gb/s >>> on >>> your /home partition you were telling me >>> yesterday. Let me know if that is >>> something >>> you are okay to do. >>> >>> >>> Pat >>> >>> >>> >>> On 05/10/2017 01:27 PM, Pranith Kumar >>> Karampuri wrote: >>> >>> On Wed, May 10, 2017 at 10:15 PM, >>> Pat >>> Haley <phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Not entirely sure (this isn't my >>> area of expertise). I'll run >>> your >>> answer by some other people who >>> are >>> more familiar with this. >>> >>> I am also uncertain about how to >>> interpret the results when we >>> also >>> add the dd tests writing to the >>> /home area (no gluster, still on >>> the >>> same machine) >>> >>> * dd test without oflag=sync >>> (rough average of multiple >>> tests) >>> o gluster w/ fuse mount : >>> 570 >>> Mb/s >>> o gluster w/ nfs mount: >>> 390 >>> Mb/s >>> o nfs (no gluster): 1.2 >>> Gb/s >>> * dd test with oflag=sync >>> (rough >>> average of multiple tests) >>> o gluster w/ fuse mount: >>> 5 >>> Mb/s >>> o gluster w/ nfs mount: >>> 200 >>> Mb/s >>> o nfs (no gluster): 20 >>> Mb/s >>> >>> Given that the non-gluster area >>> is >>> a >>> RAID-6 of 4 disks while each >>> brick >>> of the gluster area is a RAID-6 >>> of >>> 32 disks, I would naively expect >>> the >>> writes to the gluster area to be >>> roughly 8x faster than to the >>> non-gluster. >>> >>> >>> I think a better test is to try and >>> write to a file using nfs without >>> any >>> gluster to a location that is not >>> inside >>> the brick but someother location >>> that >>> is >>> on same disk(s). If you are mounting >>> the >>> partition as the brick, then we can >>> write to a file inside .glusterfs >>> directory, something like >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>> >>> >>> >>> I still think we have a speed >>> issue, >>> I can't tell if fuse vs nfs is >>> part >>> of the problem. >>> >>> >>> I got interested in the post because >>> I >>> read that fuse speed is lesser than >>> nfs >>> speed which is counter-intuitive to >>> my >>> understanding. So wanted >>> clarifications. >>> Now that I got my clarifications >>> where >>> fuse outperformed nfs without sync, >>> we >>> can resume testing as described >>> above >>> and try to find what it is. Based on >>> your email-id I am guessing you are >>> from >>> Boston and I am from Bangalore so if >>> you >>> are okay with doing this debugging >>> for >>> multiple days because of timezones, >>> I >>> will be happy to help. Please be a >>> bit >>> patient with me, I am under a >>> release >>> crunch but I am very curious with >>> the >>> problem you posted. >>> >>> Was there anything useful in the >>> profiles? >>> >>> >>> Unfortunately profiles didn't help >>> me >>> much, I think we are collecting the >>> profiles from an active volume, so >>> it >>> has a lot of information that is not >>> pertaining to dd so it is difficult >>> to >>> find the contributions of dd. So I >>> went >>> through your post again and found >>> something I didn't pay much >>> attention >>> to >>> earlier i.e. oflag=sync, so did my >>> own >>> tests on my setup with FUSE so sent >>> that >>> reply. >>> >>> >>> Pat >>> >>> >>> >>> On 05/10/2017 12:15 PM, Pranith >>> Kumar Karampuri wrote: >>> >>> Okay good. At least this >>> validates >>> my doubts. Handling O_SYNC in >>> gluster NFS and fuse is a bit >>> different. >>> When application opens a file >>> with >>> O_SYNC on fuse mount then each >>> write syscall has to be written >>> to >>> disk as part of the syscall >>> where >>> as in case of NFS, there is no >>> concept of open. NFS performs >>> write >>> though a handle saying it needs >>> to >>> be a synchronous write, so >>> write() >>> syscall is performed first then >>> it >>> performs fsync(). so an write >>> on >>> an >>> fd with O_SYNC becomes >>> write+fsync. >>> I am suspecting that when >>> multiple >>> threads do this write+fsync() >>> operation on the same file, >>> multiple writes are batched >>> together to be written do disk >>> so >>> the throughput on the disk is >>> increasing is my guess. >>> >>> Does it answer your doubts? >>> >>> On Wed, May 10, 2017 at 9:35 >>> PM, >>> Pat Haley <phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: >>> >>> >>> Without the oflag=sync and >>> only >>> a single test of each, the >>> FUSE >>> is going faster than NFS: >>> >>> FUSE: >>> mseas-data2(dri_nascar)% dd >>> if=/dev/zero count=4096 >>> bs=1048576 of=zeros.txt >>> conv=sync >>> 4096+0 records in >>> 4096+0 records out >>> 4294967296 bytes (4.3 GB) >>> copied, 7.46961 s, 575 MB/s >>> >>> >>> NFS >>> mseas-data2(HYCOM)% dd >>> if=/dev/zero count=4096 >>> bs=1048576 of=zeros.txt >>> conv=sync >>> 4096+0 records in >>> 4096+0 records out >>> 4294967296 bytes (4.3 GB) >>> copied, 11.4264 s, 376 MB/s >>> >>> >>> >>> On 05/10/2017 11:53 AM, >>> Pranith >>> Kumar Karampuri wrote: >>> >>> Could you let me know the >>> speed without oflag=sync >>> on >>> both the mounts? No need >>> to >>> collect profiles. >>> >>> On Wed, May 10, 2017 at >>> 9:17 >>> PM, Pat Haley >>> <phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> >>> wrote: >>> >>> >>> Here is what I see >>> now: >>> >>> [root at mseas-data2 ~]# >>> gluster volume info >>> >>> Volume Name: >>> data-volume >>> Type: Distribute >>> Volume ID: >>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>> Status: Started >>> Number of Bricks: 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: >>> mseas-data2:/mnt/brick1 >>> Brick2: >>> mseas-data2:/mnt/brick2 >>> Options Reconfigured: >>> diagnostics.count-fop-hits: >>> on >>> diagnostics.latency-measurement: >>> on >>> nfs.exports-auth-enable: >>> on >>> diagnostics.brick-sys-log-level: >>> WARNING >>> performance.readdir-ahead: >>> on >>> nfs.disable: on >>> nfs.export-volumes: >>> off >>> >>> >>> >>> On 05/10/2017 11:44 >>> AM, >>> Pranith Kumar >>> Karampuri >>> wrote: >>> >>> Is this the volume >>> info >>> you have? >>> >>> >/[root at >>> >mseas-data2 >>> <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users> >>> ~]# gluster volume >>> info >>> />//>/Volume Name: >>> data-volume />/Type: >>> Distribute />/Volume >>> ID: >>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>> />/Status: Started >>> />/Number >>> of Bricks: 2 >>> />/Transport-type: >>> tcp >>> />/Bricks: />/Brick1: >>> mseas-data2:/mnt/brick1 >>> />/Brick2: >>> mseas-data2:/mnt/brick2 >>> />/Options >>> Reconfigured: >>> />/performance.readdir-ahead: >>> on />/nfs.disable: on >>> />/nfs.export-volumes: >>> off >>> / >>> ?I copied this from >>> old >>> thread from 2016. >>> This >>> is >>> distribute volume. >>> Did >>> you change any of the >>> options in between? >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean >>> Engineering >>> Phone: (617) 253-6824 >>> Dept. of Mechanical >>> Engineering >>> Fax: (617) 253-8125 >>> MIT, Room >>> 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts >>> Avenue >>> Cambridge, MA >>> 02139-4301 >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean >>> Engineering >>> Phone: (617) 253-6824 >>> Dept. of Mechanical >>> Engineering >>> Fax: (617) 253-8125 >>> MIT, Room >>> 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering >>> Phone: >>> (617) 253-6824 >>> Dept. of Mechanical Engineering >>> Fax: >>> (617) 253-8125 >>> MIT, Room >>> 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering >>> Phone: >>> (617) 253-6824 >>> Dept. of Mechanical Engineering >>> Fax: >>> (617) 253-8125 >>> MIT, Room >>> 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering Phone: >>> (617) >>> 253-6824 >>> Dept. of Mechanical Engineering Fax: >>> (617) >>> 253-8125 >>> MIT, Room >>> 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering Phone: >>> (617) >>> 253-6824 >>> Dept. of Mechanical Engineering Fax: >>> (617) >>> 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley >>> Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering Phone: (617) >>> 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) >>> 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email:phaley at mit.edu >>> <mailto:phaley at mit.edu> <phaley at mit.edu> >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> >>> -- >>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >> >> >> -- >> Pranith >> > > > > -- > Pranith > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170627/ecf5e496/attachment.html>
On 06/27/2017 10:17 AM, Pranith Kumar Karampuri wrote:> The only problem with using gluster mounted via NFS is that it does not > respect the group write permissions which we need. > > We have an exercise coming up in the a couple of weeks. It seems to me > that in order to improve our write times before then, it would be good > to solve the group write permissions for gluster mounted via NFS now. > We can then revisit gluster mounted via FUSE afterwards. > > What information would you need to help us force gluster mounted via NFS > to respect the group write permissions?Is this owning group or one of the auxiliary groups whose write permissions are not considered? AFAIK, there are no special permission checks done by gNFS server when compared to gluster native client. Could you please provide simple steps to reproduce the issue and collect pkt trace and nfs/brick logs as well. Thanks, Soumya
On Tue, Jun 27, 2017 at 10:17:40AM +0530, Pranith Kumar Karampuri wrote:> On Mon, Jun 26, 2017 at 7:40 PM, Pat Haley <phaley at mit.edu> wrote: > > > > > Hi All, > > > > Decided to try another tests of gluster mounted via FUSE vs gluster > > mounted via NFS, this time using the software we run in production (i.e. > > our ocean model writing a netCDF file). > > > > gluster mounted via NFS the run took 2.3 hr > > > > gluster mounted via FUSE: the run took 44.2 hr > > > > The only problem with using gluster mounted via NFS is that it does not > > respect the group write permissions which we need. > > > > We have an exercise coming up in the a couple of weeks. It seems to me > > that in order to improve our write times before then, it would be good to > > solve the group write permissions for gluster mounted via NFS now. We can > > then revisit gluster mounted via FUSE afterwards. > > > > What information would you need to help us force gluster mounted via NFS > > to respect the group write permissions? > > > > +Niels, +Jiffin > > I added 2 more guys who work on NFS to check why this problem happens in > your environment. Let's see what information they may need to find the > problem and solve this issue.Hi Pat, depending on the number of groups that a user is part of, you may need to change some volume options. A complete description of the limitations on the number of groups can be foune here: https://github.com/gluster/glusterdocs/blob/master/Administrator%20Guide/Handling-of-users-with-many-groups.md HTH, Niels> > > > > > Thanks > > > > Pat > > > > > > > > > > On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote: > > > > > > > > On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri < > > pkarampu at redhat.com> wrote: > > > >> > >> > >> On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu> wrote: > >> > >>> > >>> Hi, > >>> > >>> Today we experimented with some of the FUSE options that we found in the > >>> list. > >>> > >>> Changing these options had no effect: > >>> > >>> gluster volume set test-volume performance.cache-max-file-size 2MB > >>> gluster volume set test-volume performance.cache-refresh-timeout 4 > >>> gluster volume set test-volume performance.cache-size 256MB > >>> gluster volume set test-volume performance.write-behind-window-size 4MB > >>> gluster volume set test-volume performance.write-behind-window-size 8MB > >>> > >>> > >> This is a good coincidence, I am meeting with write-behind > >> maintainer(+Raghavendra G) today for the same doubt. I think we will have > >> something by EOD IST. I will update you. > >> > > > > Sorry, forgot to update you. It seems like there is a bug in Write-behind > > and Facebook guys sent a patch http://review.gluster.org/16079 to fix the > > same. But even with that I am not seeing any improvement. May be I am doing > > something wrong. Will update you if I find anything more. > > > >> Changing the following option from its default value made the speed slower > >>> > >>> gluster volume set test-volume performance.write-behind off (on by default) > >>> > >>> Changing the following options initially appeared to give a 10% increase > >>> in speed, but this vanished in subsequent tests (we think the apparent > >>> increase may have been to a lighter workload on the computer from other > >>> users) > >>> > >>> gluster volume set test-volume performance.stat-prefetch on > >>> gluster volume set test-volume client.event-threads 4 > >>> gluster volume set test-volume server.event-threads 4 > >>> > >>> Can anything be gleaned from these observations? Are there other things > >>> we can try? > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 06/20/2017 12:06 PM, Pat Haley wrote: > >>> > >>> > >>> Hi Ben, > >>> > >>> Sorry this took so long, but we had a real-time forecasting exercise > >>> last week and I could only get to this now. > >>> > >>> Backend Hardware/OS: > >>> > >>> - Much of the information on our back end system is included at the > >>> top of http://lists.gluster.org/pipermail/gluster-users/2017-April/ > >>> 030529.html > >>> - The specific model of the hard disks is SeaGate ENTERPRISE > >>> CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. > >>> - Note: there is one physical server that hosts both the NFS and the > >>> GlusterFS areas > >>> > >>> Latest tests > >>> > >>> I have had time to run the tests for one of the dd tests you requested > >>> to the underlying XFS FS. The median rate was 170 MB/s. The dd results > >>> and iostat record are in > >>> > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ > >>> > >>> I'll add tests for the other brick and to the NFS area later. > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> On 06/12/2017 06:06 PM, Ben Turner wrote: > >>> > >>> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: > >>> > >>> throughput = slowest of disks / NIC * .6-.7 > >>> > >>> In your case we have: > >>> > >>> 1200 * .6 = 720 > >>> > >>> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: > >>> > >>> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? > >>> > >>> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? > >>> > >>> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: > >>> > >>> If you are focusing on a write workload run: > >>> > >>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync > >>> > >>> If you are focusing on a read workload run: > >>> > >>> # echo 3 > /proc/sys/vm/drop_caches > >>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 > >>> > >>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** > >>> > >>> Run this in a loop similar to how you did in: > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > >>> > >>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: > >>> > >>> # iostat -c -m -x 1 > iostat-$(hostname).txt > >>> > >>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. > >>> > >>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. > >>> > >>> -b > >>> > >>> > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> > >>> Sent: Monday, June 12, 2017 5:18:07 PM > >>> Subject: Re: [Gluster-users] Slow write times to gluster disk > >>> > >>> > >>> Hi Ben, > >>> > >>> Here is the output: > >>> > >>> [root at mseas-data2 ~]# gluster volume info > >>> > >>> Volume Name: data-volume > >>> Type: Distribute > >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>> Status: Started > >>> Number of Bricks: 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: mseas-data2:/mnt/brick1 > >>> Brick2: mseas-data2:/mnt/brick2 > >>> Options Reconfigured: > >>> nfs.exports-auth-enable: on > >>> diagnostics.brick-sys-log-level: WARNING > >>> performance.readdir-ahead: on > >>> nfs.disable: on > >>> nfs.export-volumes: off > >>> > >>> > >>> On 06/12/2017 05:01 PM, Ben Turner wrote: > >>> > >>> What is the output of gluster v info? That will tell us more about your > >>> config. > >>> > >>> -b > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> > >>> Sent: Monday, June 12, 2017 4:54:00 PM > >>> Subject: Re: [Gluster-users] Slow write times to gluster disk > >>> > >>> > >>> Hi Ben, > >>> > >>> I guess I'm confused about what you mean by replication. If I look at > >>> the underlying bricks I only ever have a single copy of any file. It > >>> either resides on one brick or the other (directories exist on both > >>> bricks but not files). We are not using gluster for redundancy (or at > >>> least that wasn't our intent). Is that what you meant by replication > >>> or is it something else? > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> On 06/12/2017 04:28 PM, Ben Turner wrote: > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>, "Pranith Kumar Karampuri"<pkarampu at redhat.com> <pkarampu at redhat.com> > >>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>, gluster-users at gluster.org, > >>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com> > >>> Sent: Monday, June 12, 2017 2:35:41 PM > >>> Subject: Re: [Gluster-users] Slow write times to gluster disk > >>> > >>> > >>> Hi Guys, > >>> > >>> I was wondering what our next steps should be to solve the slow write > >>> times. > >>> > >>> Recently I was debugging a large code and writing a lot of output at > >>> every time step. When I tried writing to our gluster disks, it was > >>> taking over a day to do a single time step whereas if I had the same > >>> program (same hardware, network) write to our nfs disk the time per > >>> time-step was about 45 minutes. What we are shooting for here would be > >>> to have similar times to either gluster of nfs. > >>> > >>> I can see in your test: > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > >>> > >>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / > >>> sec} / #replicas{2} = 600). Gluster does client side replication so with > >>> replica 2 you will only ever see 1/2 the speed of your slowest part of > >>> the > >>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is > >>> normally > >>> a best case. Now in your output I do see the instances where you went > >>> down to 200 MB / sec. I can only explain this in three ways: > >>> > >>> 1. You are not using conv=fdatasync and writes are actually going to > >>> page > >>> cache and then being flushed to disk. During the fsync the memory is not > >>> yet available and the disks are busy flushing dirty pages. > >>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) > >>> and when write times are slow the RAID group is busy serviceing other > >>> LUNs. > >>> 3. Gluster bug / config issue / some other unknown unknown. > >>> > >>> So I see 2 issues here: > >>> > >>> 1. NFS does in 45 minutes what gluster can do in 24 hours. > >>> 2. Sometimes your throughput drops dramatically. > >>> > >>> WRT #1 - have a look at my estimates above. My formula for guestimating > >>> gluster perf is: throughput = NIC throughput or storage(whatever is > >>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the > >>> record size the better for glusterfs mounts, I normally like to be at > >>> LEAST 64k up to 1024k: > >>> > >>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 > >>> conv=fdatasync > >>> > >>> WRT #2 - Again, I question your testing and your storage config. Try > >>> using > >>> conv=fdatasync for your DDs, use a larger record size, and make sure that > >>> your back end storage is not causing your slowdowns. Also remember that > >>> with replica 2 you will take ~50% hit on writes because the client uses > >>> 50% of its bandwidth to write to one replica and 50% to the other. > >>> > >>> -b > >>> > >>> > >>> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> On 06/02/2017 01:07 AM, Ben Turner wrote: > >>> > >>> Are you sure using conv=sync is what you want? I normally use > >>> conv=fdatasync, I'll look up the difference between the two and see if > >>> it > >>> affects your test. > >>> > >>> > >>> -b > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > >>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> <pkarampu at redhat.com> > >>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>,gluster-users at gluster.org, > >>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>, "Ben > >>> Turner" <bturner at redhat.com> <bturner at redhat.com> > >>> Sent: Tuesday, May 30, 2017 9:40:34 PM > >>> Subject: Re: [Gluster-users] Slow write times to gluster disk > >>> > >>> > >>> Hi Pranith, > >>> > >>> The "dd" command was: > >>> > >>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > >>> > >>> There were 2 instances where dd reported 22 seconds. The output from > >>> the > >>> dd tests are in > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > >>> > >>> Pat > >>> > >>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: > >>> > >>> Pat, > >>> What is the command you used? As per the following output, > >>> it > >>> seems like at least one write operation took 16 seconds. Which is > >>> really bad. > >>> 96.39 1165.10 us 89.00 us*16487014.00 us* > >>> 393212 > >>> WRITE > >>> > >>> > >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu<mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> I ran the same 'dd' test both in the gluster test volume and > >>> in > >>> the .glusterfs directory of each brick. The median results > >>> (12 > >>> dd > >>> trials in each test) are similar to before > >>> > >>> * gluster test volume: 586.5 MB/s > >>> * bricks (in .glusterfs): 1.4 GB/s > >>> > >>> The profile for the gluster test-volume is in > >>> > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt > >>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> > >>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: > >>> > >>> Let's start with the same 'dd' test we were testing with to > >>> see, > >>> what the numbers are. Please provide profile numbers for the > >>> same. From there on we will start tuning the volume to see > >>> what > >>> we can do. > >>> > >>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Thanks for the tip. We now have the gluster volume > >>> mounted > >>> under /home. What tests do you recommend we run? > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: > >>> > >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley > >>> <phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Sorry for the delay. I never saw received your > >>> reply > >>> (but I did receive Ben Turner's follow-up to your > >>> reply). So we tried to create a gluster volume > >>> under > >>> /home using different variations of > >>> > >>> gluster volume create test-volume > >>> mseas-data2:/home/gbrick_test_1 > >>> mseas-data2:/home/gbrick_test_2 transport tcp > >>> > >>> However we keep getting errors of the form > >>> > >>> Wrong brick type: transport, use > >>> <HOSTNAME>:<export-dir-abs-path> > >>> > >>> Any thoughts on what we're doing wrong? > >>> > >>> > >>> You should give transport tcp at the beginning I think. > >>> Anyways, transport tcp is the default, so no need to > >>> specify > >>> so remove those two words from the CLI. > >>> > >>> > >>> Also do you have a list of the test we should be > >>> running > >>> once we get this volume created? Given the > >>> time-zone > >>> difference it might help if we can run a small > >>> battery > >>> of tests and post the results rather than > >>> test-post-new > >>> test-post... . > >>> > >>> > >>> This is the first time I am doing performance analysis > >>> on > >>> users as far as I remember. In our team there are > >>> separate > >>> engineers who do these tests. Ben who replied earlier is > >>> one > >>> such engineer. > >>> > >>> Ben, > >>> Have any suggestions? > >>> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri > >>> wrote: > >>> > >>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley > >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> The /home partition is mounted as ext4 > >>> /home ext4 defaults,usrquota,grpquota 1 2 > >>> > >>> The brick partitions are mounted ax xfs > >>> /mnt/brick1 xfs defaults 0 0 > >>> /mnt/brick2 xfs defaults 0 0 > >>> > >>> Will this cause a problem with creating a > >>> volume > >>> under /home? > >>> > >>> > >>> I don't think the bottleneck is disk. You can do > >>> the > >>> same tests you did on your new volume to confirm? > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri > >>> wrote: > >>> > >>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley > >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> > >>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Unfortunately, we don't have similar > >>> hardware > >>> for a small scale test. All we have is > >>> our > >>> production hardware. > >>> > >>> > >>> You said something about /home partition which > >>> has > >>> lesser disks, we can create plain distribute > >>> volume inside one of those directories. After > >>> we > >>> are done, we can remove the setup. What do you > >>> say? > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> > >>> On 05/11/2017 07:05 AM, Pranith Kumar > >>> Karampuri wrote: > >>> > >>> On Thu, May 11, 2017 at 2:48 AM, Pat > >>> Haley > >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> > >>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Since we are mounting the partitions > >>> as > >>> the bricks, I tried the dd test > >>> writing > >>> to > >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > >>> The results without oflag=sync were > >>> 1.6 > >>> Gb/s (faster than gluster but not as > >>> fast > >>> as I was expecting given the 1.2 Gb/s > >>> to > >>> the no-gluster area w/ fewer disks). > >>> > >>> > >>> Okay, then 1.6Gb/s is what we need to > >>> target > >>> for, considering your volume is just > >>> distribute. Is there any way you can do > >>> tests > >>> on similar hardware but at a small scale? > >>> Just so we can run the workload to learn > >>> more > >>> about the bottlenecks in the system? We > >>> can > >>> probably try to get the speed to 1.2Gb/s > >>> on > >>> your /home partition you were telling me > >>> yesterday. Let me know if that is > >>> something > >>> you are okay to do. > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/10/2017 01:27 PM, Pranith Kumar > >>> Karampuri wrote: > >>> > >>> On Wed, May 10, 2017 at 10:15 PM, > >>> Pat > >>> Haley <phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Not entirely sure (this isn't my > >>> area of expertise). I'll run > >>> your > >>> answer by some other people who > >>> are > >>> more familiar with this. > >>> > >>> I am also uncertain about how to > >>> interpret the results when we > >>> also > >>> add the dd tests writing to the > >>> /home area (no gluster, still on > >>> the > >>> same machine) > >>> > >>> * dd test without oflag=sync > >>> (rough average of multiple > >>> tests) > >>> o gluster w/ fuse mount : > >>> 570 > >>> Mb/s > >>> o gluster w/ nfs mount: > >>> 390 > >>> Mb/s > >>> o nfs (no gluster): 1.2 > >>> Gb/s > >>> * dd test with oflag=sync > >>> (rough > >>> average of multiple tests) > >>> o gluster w/ fuse mount: > >>> 5 > >>> Mb/s > >>> o gluster w/ nfs mount: > >>> 200 > >>> Mb/s > >>> o nfs (no gluster): 20 > >>> Mb/s > >>> > >>> Given that the non-gluster area > >>> is > >>> a > >>> RAID-6 of 4 disks while each > >>> brick > >>> of the gluster area is a RAID-6 > >>> of > >>> 32 disks, I would naively expect > >>> the > >>> writes to the gluster area to be > >>> roughly 8x faster than to the > >>> non-gluster. > >>> > >>> > >>> I think a better test is to try and > >>> write to a file using nfs without > >>> any > >>> gluster to a location that is not > >>> inside > >>> the brick but someother location > >>> that > >>> is > >>> on same disk(s). If you are mounting > >>> the > >>> partition as the brick, then we can > >>> write to a file inside .glusterfs > >>> directory, something like > >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > >>> > >>> > >>> > >>> I still think we have a speed > >>> issue, > >>> I can't tell if fuse vs nfs is > >>> part > >>> of the problem. > >>> > >>> > >>> I got interested in the post because > >>> I > >>> read that fuse speed is lesser than > >>> nfs > >>> speed which is counter-intuitive to > >>> my > >>> understanding. So wanted > >>> clarifications. > >>> Now that I got my clarifications > >>> where > >>> fuse outperformed nfs without sync, > >>> we > >>> can resume testing as described > >>> above > >>> and try to find what it is. Based on > >>> your email-id I am guessing you are > >>> from > >>> Boston and I am from Bangalore so if > >>> you > >>> are okay with doing this debugging > >>> for > >>> multiple days because of timezones, > >>> I > >>> will be happy to help. Please be a > >>> bit > >>> patient with me, I am under a > >>> release > >>> crunch but I am very curious with > >>> the > >>> problem you posted. > >>> > >>> Was there anything useful in the > >>> profiles? > >>> > >>> > >>> Unfortunately profiles didn't help > >>> me > >>> much, I think we are collecting the > >>> profiles from an active volume, so > >>> it > >>> has a lot of information that is not > >>> pertaining to dd so it is difficult > >>> to > >>> find the contributions of dd. So I > >>> went > >>> through your post again and found > >>> something I didn't pay much > >>> attention > >>> to > >>> earlier i.e. oflag=sync, so did my > >>> own > >>> tests on my setup with FUSE so sent > >>> that > >>> reply. > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/10/2017 12:15 PM, Pranith > >>> Kumar Karampuri wrote: > >>> > >>> Okay good. At least this > >>> validates > >>> my doubts. Handling O_SYNC in > >>> gluster NFS and fuse is a bit > >>> different. > >>> When application opens a file > >>> with > >>> O_SYNC on fuse mount then each > >>> write syscall has to be written > >>> to > >>> disk as part of the syscall > >>> where > >>> as in case of NFS, there is no > >>> concept of open. NFS performs > >>> write > >>> though a handle saying it needs > >>> to > >>> be a synchronous write, so > >>> write() > >>> syscall is performed first then > >>> it > >>> performs fsync(). so an write > >>> on > >>> an > >>> fd with O_SYNC becomes > >>> write+fsync. > >>> I am suspecting that when > >>> multiple > >>> threads do this write+fsync() > >>> operation on the same file, > >>> multiple writes are batched > >>> together to be written do disk > >>> so > >>> the throughput on the disk is > >>> increasing is my guess. > >>> > >>> Does it answer your doubts? > >>> > >>> On Wed, May 10, 2017 at 9:35 > >>> PM, > >>> Pat Haley <phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > >>> > >>> > >>> Without the oflag=sync and > >>> only > >>> a single test of each, the > >>> FUSE > >>> is going faster than NFS: > >>> > >>> FUSE: > >>> mseas-data2(dri_nascar)% dd > >>> if=/dev/zero count=4096 > >>> bs=1048576 of=zeros.txt > >>> conv=sync > >>> 4096+0 records in > >>> 4096+0 records out > >>> 4294967296 bytes (4.3 GB) > >>> copied, 7.46961 s, 575 MB/s > >>> > >>> > >>> NFS > >>> mseas-data2(HYCOM)% dd > >>> if=/dev/zero count=4096 > >>> bs=1048576 of=zeros.txt > >>> conv=sync > >>> 4096+0 records in > >>> 4096+0 records out > >>> 4294967296 bytes (4.3 GB) > >>> copied, 11.4264 s, 376 MB/s > >>> > >>> > >>> > >>> On 05/10/2017 11:53 AM, > >>> Pranith > >>> Kumar Karampuri wrote: > >>> > >>> Could you let me know the > >>> speed without oflag=sync > >>> on > >>> both the mounts? No need > >>> to > >>> collect profiles. > >>> > >>> On Wed, May 10, 2017 at > >>> 9:17 > >>> PM, Pat Haley > >>> <phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> > >>> wrote: > >>> > >>> > >>> Here is what I see > >>> now: > >>> > >>> [root at mseas-data2 ~]# > >>> gluster volume info > >>> > >>> Volume Name: > >>> data-volume > >>> Type: Distribute > >>> Volume ID: > >>> c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>> Status: Started > >>> Number of Bricks: 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: > >>> mseas-data2:/mnt/brick1 > >>> Brick2: > >>> mseas-data2:/mnt/brick2 > >>> Options Reconfigured: > >>> diagnostics.count-fop-hits: > >>> on > >>> diagnostics.latency-measurement: > >>> on > >>> nfs.exports-auth-enable: > >>> on > >>> diagnostics.brick-sys-log-level: > >>> WARNING > >>> performance.readdir-ahead: > >>> on > >>> nfs.disable: on > >>> nfs.export-volumes: > >>> off > >>> > >>> > >>> > >>> On 05/10/2017 11:44 > >>> AM, > >>> Pranith Kumar > >>> Karampuri > >>> wrote: > >>> > >>> Is this the volume > >>> info > >>> you have? > >>> > >>> >/[root at > >>> >mseas-data2 > >>> <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users> > >>> ~]# gluster volume > >>> info > >>> />//>/Volume Name: > >>> data-volume />/Type: > >>> Distribute />/Volume > >>> ID: > >>> c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>> />/Status: Started > >>> />/Number > >>> of Bricks: 2 > >>> />/Transport-type: > >>> tcp > >>> />/Bricks: />/Brick1: > >>> mseas-data2:/mnt/brick1 > >>> />/Brick2: > >>> mseas-data2:/mnt/brick2 > >>> />/Options > >>> Reconfigured: > >>> />/performance.readdir-ahead: > >>> on />/nfs.disable: on > >>> />/nfs.export-volumes: > >>> off > >>> / > >>> ?I copied this from > >>> old > >>> thread from 2016. > >>> This > >>> is > >>> distribute volume. > >>> Did > >>> you change any of the > >>> options in between? > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean > >>> Engineering > >>> Phone: (617) 253-6824 > >>> Dept. of Mechanical > >>> Engineering > >>> Fax: (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts > >>> Avenue > >>> Cambridge, MA > >>> 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean > >>> Engineering > >>> Phone: (617) 253-6824 > >>> Dept. of Mechanical > >>> Engineering > >>> Fax: (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering > >>> Phone: > >>> (617) 253-6824 > >>> Dept. of Mechanical Engineering > >>> Fax: > >>> (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering > >>> Phone: > >>> (617) 253-6824 > >>> Dept. of Mechanical Engineering > >>> Fax: > >>> (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering Phone: > >>> (617) > >>> 253-6824 > >>> Dept. of Mechanical Engineering Fax: > >>> (617) > >>> 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering Phone: > >>> (617) > >>> 253-6824 > >>> Dept. of Mechanical Engineering Fax: > >>> (617) > >>> 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering Phone: (617) > >>> 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) > >>> 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> <phaley at mit.edu> > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley at mit.edu > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley at mit.edu > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley at mit.edu > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley at mit.edu > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley at mit.edu > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> _______________________________________________ > >>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley at mit.edu > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >> > >> > >> -- > >> Pranith > >> > > > > > > > > -- > > Pranith > > > > > > -- > > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > > Pat Haley Email: phaley at mit.edu > > Center for Ocean Engineering Phone: (617) 253-6824 > > Dept. of Mechanical Engineering Fax: (617) 253-8125 > > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > > 77 Massachusetts Avenue > > Cambridge, MA 02139-4301 > > > > > > > -- > Pranith-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170627/52e5daca/attachment.sig>