Hi Pranith, The "dd" command was: dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync There were 2 instances where dd reported 22 seconds. The output from the dd tests are in http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt Pat On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:> Pat, > What is the command you used? As per the following output, it > seems like at least one write operation took 16 seconds. Which is > really bad. > 96.39 1165.10 us 89.00 us*16487014.00 us* 393212 WRITE > > > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu>> wrote: > > > Hi Pranith, > > I ran the same 'dd' test both in the gluster test volume and in > the .glusterfs directory of each brick. The median results (12 dd > trials in each test) are similar to before > > * gluster test volume: 586.5 MB/s > * bricks (in .glusterfs): 1.4 GB/s > > The profile for the gluster test-volume is in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> > > Thanks > > Pat > > > > > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >> Let's start with the same 'dd' test we were testing with to see, >> what the numbers are. Please provide profile numbers for the >> same. From there on we will start tuning the volume to see what >> we can do. >> >> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu >> <mailto:phaley at mit.edu>> wrote: >> >> >> Hi Pranith, >> >> Thanks for the tip. We now have the gluster volume mounted >> under /home. What tests do you recommend we run? >> >> Thanks >> >> Pat >> >> >> >> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>> >>> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu >>> <mailto:phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Sorry for the delay. I never saw received your reply >>> (but I did receive Ben Turner's follow-up to your >>> reply). So we tried to create a gluster volume under >>> /home using different variations of >>> >>> gluster volume create test-volume >>> mseas-data2:/home/gbrick_test_1 >>> mseas-data2:/home/gbrick_test_2 transport tcp >>> >>> However we keep getting errors of the form >>> >>> Wrong brick type: transport, use >>> <HOSTNAME>:<export-dir-abs-path> >>> >>> Any thoughts on what we're doing wrong? >>> >>> >>> You should give transport tcp at the beginning I think. >>> Anyways, transport tcp is the default, so no need to specify >>> so remove those two words from the CLI. >>> >>> >>> Also do you have a list of the test we should be running >>> once we get this volume created? Given the time-zone >>> difference it might help if we can run a small battery >>> of tests and post the results rather than test-post-new >>> test-post... . >>> >>> >>> This is the first time I am doing performance analysis on >>> users as far as I remember. In our team there are separate >>> engineers who do these tests. Ben who replied earlier is one >>> such engineer. >>> >>> Ben, >>> Have any suggestions? >>> >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote: >>>> >>>> >>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>> >>>> >>>> Hi Pranith, >>>> >>>> The /home partition is mounted as ext4 >>>> /home ext4 defaults,usrquota,grpquota 1 2 >>>> >>>> The brick partitions are mounted ax xfs >>>> /mnt/brick1 xfs defaults 0 0 >>>> /mnt/brick2 xfs defaults 0 0 >>>> >>>> Will this cause a problem with creating a volume >>>> under /home? >>>> >>>> >>>> I don't think the bottleneck is disk. You can do the >>>> same tests you did on your new volume to confirm? >>>> >>>> >>>> Pat >>>> >>>> >>>> >>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote: >>>>> >>>>> >>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>> >>>>> >>>>> Hi Pranith, >>>>> >>>>> Unfortunately, we don't have similar hardware >>>>> for a small scale test. All we have is our >>>>> production hardware. >>>>> >>>>> >>>>> You said something about /home partition which has >>>>> lesser disks, we can create plain distribute >>>>> volume inside one of those directories. After we >>>>> are done, we can remove the setup. What do you say? >>>>> >>>>> >>>>> Pat >>>>> >>>>> >>>>> >>>>> >>>>> On 05/11/2017 07:05 AM, Pranith Kumar >>>>> Karampuri wrote: >>>>>> >>>>>> >>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley >>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>>> >>>>>> >>>>>> Hi Pranith, >>>>>> >>>>>> Since we are mounting the partitions as >>>>>> the bricks, I tried the dd test writing >>>>>> to >>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>> The results without oflag=sync were 1.6 >>>>>> Gb/s (faster than gluster but not as fast >>>>>> as I was expecting given the 1.2 Gb/s to >>>>>> the no-gluster area w/ fewer disks). >>>>>> >>>>>> >>>>>> Okay, then 1.6Gb/s is what we need to target >>>>>> for, considering your volume is just >>>>>> distribute. Is there any way you can do tests >>>>>> on similar hardware but at a small scale? >>>>>> Just so we can run the workload to learn more >>>>>> about the bottlenecks in the system? We can >>>>>> probably try to get the speed to 1.2Gb/s on >>>>>> your /home partition you were telling me >>>>>> yesterday. Let me know if that is something >>>>>> you are okay to do. >>>>>> >>>>>> >>>>>> Pat >>>>>> >>>>>> >>>>>> >>>>>> On 05/10/2017 01:27 PM, Pranith Kumar >>>>>> Karampuri wrote: >>>>>>> >>>>>>> >>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat >>>>>>> Haley <phaley at mit.edu >>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Pranith, >>>>>>> >>>>>>> Not entirely sure (this isn't my >>>>>>> area of expertise). I'll run your >>>>>>> answer by some other people who are >>>>>>> more familiar with this. >>>>>>> >>>>>>> I am also uncertain about how to >>>>>>> interpret the results when we also >>>>>>> add the dd tests writing to the >>>>>>> /home area (no gluster, still on the >>>>>>> same machine) >>>>>>> >>>>>>> * dd test without oflag=sync >>>>>>> (rough average of multiple tests) >>>>>>> o gluster w/ fuse mount : 570 Mb/s >>>>>>> o gluster w/ nfs mount: 390 Mb/s >>>>>>> o nfs (no gluster): 1.2 Gb/s >>>>>>> * dd test with oflag=sync (rough >>>>>>> average of multiple tests) >>>>>>> o gluster w/ fuse mount: 5 Mb/s >>>>>>> o gluster w/ nfs mount: 200 Mb/s >>>>>>> o nfs (no gluster): 20 Mb/s >>>>>>> >>>>>>> Given that the non-gluster area is a >>>>>>> RAID-6 of 4 disks while each brick >>>>>>> of the gluster area is a RAID-6 of >>>>>>> 32 disks, I would naively expect the >>>>>>> writes to the gluster area to be >>>>>>> roughly 8x faster than to the >>>>>>> non-gluster. >>>>>>> >>>>>>> >>>>>>> I think a better test is to try and >>>>>>> write to a file using nfs without any >>>>>>> gluster to a location that is not inside >>>>>>> the brick but someother location that is >>>>>>> on same disk(s). If you are mounting the >>>>>>> partition as the brick, then we can >>>>>>> write to a file inside .glusterfs >>>>>>> directory, something like >>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I still think we have a speed issue, >>>>>>> I can't tell if fuse vs nfs is part >>>>>>> of the problem. >>>>>>> >>>>>>> >>>>>>> I got interested in the post because I >>>>>>> read that fuse speed is lesser than nfs >>>>>>> speed which is counter-intuitive to my >>>>>>> understanding. So wanted clarifications. >>>>>>> Now that I got my clarifications where >>>>>>> fuse outperformed nfs without sync, we >>>>>>> can resume testing as described above >>>>>>> and try to find what it is. Based on >>>>>>> your email-id I am guessing you are from >>>>>>> Boston and I am from Bangalore so if you >>>>>>> are okay with doing this debugging for >>>>>>> multiple days because of timezones, I >>>>>>> will be happy to help. Please be a bit >>>>>>> patient with me, I am under a release >>>>>>> crunch but I am very curious with the >>>>>>> problem you posted. >>>>>>> >>>>>>> Was there anything useful in the >>>>>>> profiles? >>>>>>> >>>>>>> >>>>>>> Unfortunately profiles didn't help me >>>>>>> much, I think we are collecting the >>>>>>> profiles from an active volume, so it >>>>>>> has a lot of information that is not >>>>>>> pertaining to dd so it is difficult to >>>>>>> find the contributions of dd. So I went >>>>>>> through your post again and found >>>>>>> something I didn't pay much attention to >>>>>>> earlier i.e. oflag=sync, so did my own >>>>>>> tests on my setup with FUSE so sent that >>>>>>> reply. >>>>>>> >>>>>>> >>>>>>> Pat >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/10/2017 12:15 PM, Pranith >>>>>>> Kumar Karampuri wrote: >>>>>>>> Okay good. At least this validates >>>>>>>> my doubts. Handling O_SYNC in >>>>>>>> gluster NFS and fuse is a bit >>>>>>>> different. >>>>>>>> When application opens a file with >>>>>>>> O_SYNC on fuse mount then each >>>>>>>> write syscall has to be written to >>>>>>>> disk as part of the syscall where >>>>>>>> as in case of NFS, there is no >>>>>>>> concept of open. NFS performs write >>>>>>>> though a handle saying it needs to >>>>>>>> be a synchronous write, so write() >>>>>>>> syscall is performed first then it >>>>>>>> performs fsync(). so an write on an >>>>>>>> fd with O_SYNC becomes write+fsync. >>>>>>>> I am suspecting that when multiple >>>>>>>> threads do this write+fsync() >>>>>>>> operation on the same file, >>>>>>>> multiple writes are batched >>>>>>>> together to be written do disk so >>>>>>>> the throughput on the disk is >>>>>>>> increasing is my guess. >>>>>>>> >>>>>>>> Does it answer your doubts? >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 9:35 PM, >>>>>>>> Pat Haley <phaley at mit.edu >>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Without the oflag=sync and only >>>>>>>> a single test of each, the FUSE >>>>>>>> is going faster than NFS: >>>>>>>> >>>>>>>> FUSE: >>>>>>>> mseas-data2(dri_nascar)% dd >>>>>>>> if=/dev/zero count=4096 >>>>>>>> bs=1048576 of=zeros.txt conv=sync >>>>>>>> 4096+0 records in >>>>>>>> 4096+0 records out >>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>> copied, 7.46961 s, 575 MB/s >>>>>>>> >>>>>>>> >>>>>>>> NFS >>>>>>>> mseas-data2(HYCOM)% dd >>>>>>>> if=/dev/zero count=4096 >>>>>>>> bs=1048576 of=zeros.txt conv=sync >>>>>>>> 4096+0 records in >>>>>>>> 4096+0 records out >>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>> copied, 11.4264 s, 376 MB/s >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 05/10/2017 11:53 AM, Pranith >>>>>>>> Kumar Karampuri wrote: >>>>>>>>> Could you let me know the >>>>>>>>> speed without oflag=sync on >>>>>>>>> both the mounts? No need to >>>>>>>>> collect profiles. >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 9:17 >>>>>>>>> PM, Pat Haley <phaley at mit.edu >>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Here is what I see now: >>>>>>>>> >>>>>>>>> [root at mseas-data2 ~]# >>>>>>>>> gluster volume info >>>>>>>>> >>>>>>>>> Volume Name: data-volume >>>>>>>>> Type: Distribute >>>>>>>>> Volume ID: >>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>> Status: Started >>>>>>>>> Number of Bricks: 2 >>>>>>>>> Transport-type: tcp >>>>>>>>> Bricks: >>>>>>>>> Brick1: >>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>> Brick2: >>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>> Options Reconfigured: >>>>>>>>> diagnostics.count-fop-hits: on >>>>>>>>> diagnostics.latency-measurement: >>>>>>>>> on >>>>>>>>> nfs.exports-auth-enable: on >>>>>>>>> diagnostics.brick-sys-log-level: >>>>>>>>> WARNING >>>>>>>>> performance.readdir-ahead: on >>>>>>>>> nfs.disable: on >>>>>>>>> nfs.export-volumes: off >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 05/10/2017 11:44 AM, >>>>>>>>> Pranith Kumar Karampuri wrote: >>>>>>>>>> Is this the volume info >>>>>>>>>> you have? >>>>>>>>>> >>>>>>>>>> >/[root at mseas-data2 >>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: >>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>> />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: >>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2: >>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: >>>>>>>>>> on />/nfs.disable: on />/nfs.export-volumes: off / >>>>>>>>>> ?I copied this from old >>>>>>>>>> thread from 2016. This is >>>>>>>>>> distribute volume. Did >>>>>>>>>> you change any of the >>>>>>>>>> options in between? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>> 77 Massachusetts Avenue >>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>>> <mailto:phaley at mit.edu> >>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>> 77 Massachusetts Avenue >>>>>>>> Cambridge, MA 02139-4301 >>>>>>>> >>>>>>>> -- >>>>>>>> Pranith >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>> 77 Massachusetts Avenue >>>>>>> Cambridge, MA 02139-4301 >>>>>>> >>>>>>> -- >>>>>>> Pranith >>>>>> >>>>>> -- >>>>>> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>> 77 Massachusetts Avenue >>>>>> Cambridge, MA 02139-4301 >>>>>> >>>>>> -- >>>>>> Pranith >>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> -- >>>>> Pranith >>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> -- >>>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> >>> -- >>> Pranith >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> >> >> >> -- >> Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170530/f3029b69/attachment-0001.html>
Pranith Kumar Karampuri
2017-May-31 01:54 UTC
[Gluster-users] Slow write times to gluster disk
Thanks this is good information. +Soumya Soumya, We are trying to find why kNFS is performing way better than plain distribute glusterfs+fuse. What information do you think will benefit us to compare the operations with kNFS vs gluster+fuse? We already have profile output from fuse. On Wed, May 31, 2017 at 7:10 AM, Pat Haley <phaley at mit.edu> wrote:> > Hi Pranith, > > The "dd" command was: > > dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > > There were 2 instances where dd reported 22 seconds. The output from the > dd tests are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/ > dd_testvol_gluster.txt > > Pat > > > On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: > > Pat, > What is the command you used? As per the following output, it seems > like at least one write operation took 16 seconds. Which is really bad. > > 96.39 1165.10 us 89.00 us *16487014.00 us* 393212 WRITE > > > > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu> wrote: > >> >> Hi Pranith, >> >> I ran the same 'dd' test both in the gluster test volume and in the >> .glusterfs directory of each brick. The median results (12 dd trials in >> each test) are similar to before >> >> - gluster test volume: 586.5 MB/s >> - bricks (in .glusterfs): 1.4 GB/s >> >> The profile for the gluster test-volume is in >> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/pr >> ofile_testvol_gluster.txt >> >> Thanks >> >> Pat >> >> >> >> >> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >> >> Let's start with the same 'dd' test we were testing with to see, what the >> numbers are. Please provide profile numbers for the same. From there on we >> will start tuning the volume to see what we can do. >> >> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu> wrote: >> >>> >>> Hi Pranith, >>> >>> Thanks for the tip. We now have the gluster volume mounted under >>> /home. What tests do you recommend we run? >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>> >>> >>> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu> wrote: >>> >>>> >>>> Hi Pranith, >>>> >>>> Sorry for the delay. I never saw received your reply (but I did >>>> receive Ben Turner's follow-up to your reply). So we tried to create a >>>> gluster volume under /home using different variations of >>>> >>>> gluster volume create test-volume mseas-data2:/home/gbrick_test_1 >>>> mseas-data2:/home/gbrick_test_2 transport tcp >>>> >>>> However we keep getting errors of the form >>>> >>>> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path> >>>> >>>> Any thoughts on what we're doing wrong? >>>> >>> >>> You should give transport tcp at the beginning I think. Anyways, >>> transport tcp is the default, so no need to specify so remove those two >>> words from the CLI. >>> >>>> >>>> Also do you have a list of the test we should be running once we get >>>> this volume created? Given the time-zone difference it might help if we >>>> can run a small battery of tests and post the results rather than >>>> test-post-new test-post... . >>>> >>> >>> This is the first time I am doing performance analysis on users as far >>> as I remember. In our team there are separate engineers who do these tests. >>> Ben who replied earlier is one such engineer. >>> >>> Ben, >>> Have any suggestions? >>> >>> >>>> >>>> Thanks >>>> >>>> Pat >>>> >>>> >>>> >>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote: >>>> >>>> >>>> >>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at mit.edu> wrote: >>>> >>>>> >>>>> Hi Pranith, >>>>> >>>>> The /home partition is mounted as ext4 >>>>> /home ext4 defaults,usrquota,grpquota 1 2 >>>>> >>>>> The brick partitions are mounted ax xfs >>>>> /mnt/brick1 xfs defaults 0 0 >>>>> /mnt/brick2 xfs defaults 0 0 >>>>> >>>>> Will this cause a problem with creating a volume under /home? >>>>> >>>> >>>> I don't think the bottleneck is disk. You can do the same tests you did >>>> on your new volume to confirm? >>>> >>>> >>>>> >>>>> Pat >>>>> >>>>> >>>>> >>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote: >>>>> >>>>> >>>>> >>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at mit.edu> wrote: >>>>> >>>>>> >>>>>> Hi Pranith, >>>>>> >>>>>> Unfortunately, we don't have similar hardware for a small scale >>>>>> test. All we have is our production hardware. >>>>>> >>>>> >>>>> You said something about /home partition which has lesser disks, we >>>>> can create plain distribute volume inside one of those directories. After >>>>> we are done, we can remove the setup. What do you say? >>>>> >>>>> >>>>>> >>>>>> Pat >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley at mit.edu> wrote: >>>>>> >>>>>>> >>>>>>> Hi Pranith, >>>>>>> >>>>>>> Since we are mounting the partitions as the bricks, I tried the dd >>>>>>> test writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>> The results without oflag=sync were 1.6 Gb/s (faster than gluster but not >>>>>>> as fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ >>>>>>> fewer disks). >>>>>>> >>>>>> >>>>>> Okay, then 1.6Gb/s is what we need to target for, considering your >>>>>> volume is just distribute. Is there any way you can do tests on similar >>>>>> hardware but at a small scale? Just so we can run the workload to learn >>>>>> more about the bottlenecks in the system? We can probably try to get the >>>>>> speed to 1.2Gb/s on your /home partition you were telling me yesterday. Let >>>>>> me know if that is something you are okay to do. >>>>>> >>>>>> >>>>>>> >>>>>>> Pat >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <phaley at mit.edu> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi Pranith, >>>>>>>> >>>>>>>> Not entirely sure (this isn't my area of expertise). I'll run your >>>>>>>> answer by some other people who are more familiar with this. >>>>>>>> >>>>>>>> I am also uncertain about how to interpret the results when we also >>>>>>>> add the dd tests writing to the /home area (no gluster, still on the same >>>>>>>> machine) >>>>>>>> >>>>>>>> - dd test without oflag=sync (rough average of multiple tests) >>>>>>>> - gluster w/ fuse mount : 570 Mb/s >>>>>>>> - gluster w/ nfs mount: 390 Mb/s >>>>>>>> - nfs (no gluster): 1.2 Gb/s >>>>>>>> - dd test with oflag=sync (rough average of multiple tests) >>>>>>>> - gluster w/ fuse mount: 5 Mb/s >>>>>>>> - gluster w/ nfs mount: 200 Mb/s >>>>>>>> - nfs (no gluster): 20 Mb/s >>>>>>>> >>>>>>>> Given that the non-gluster area is a RAID-6 of 4 disks while each >>>>>>>> brick of the gluster area is a RAID-6 of 32 disks, I would naively expect >>>>>>>> the writes to the gluster area to be roughly 8x faster than to the >>>>>>>> non-gluster. >>>>>>>> >>>>>>> >>>>>>> I think a better test is to try and write to a file using nfs >>>>>>> without any gluster to a location that is not inside the brick but >>>>>>> someother location that is on same disk(s). If you are mounting the >>>>>>> partition as the brick, then we can write to a file inside .glusterfs >>>>>>> directory, something like <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I still think we have a speed issue, I can't tell if fuse vs nfs is >>>>>>>> part of the problem. >>>>>>>> >>>>>>> >>>>>>> I got interested in the post because I read that fuse speed is >>>>>>> lesser than nfs speed which is counter-intuitive to my understanding. So >>>>>>> wanted clarifications. Now that I got my clarifications where fuse >>>>>>> outperformed nfs without sync, we can resume testing as described above and >>>>>>> try to find what it is. Based on your email-id I am guessing you are from >>>>>>> Boston and I am from Bangalore so if you are okay with doing this debugging >>>>>>> for multiple days because of timezones, I will be happy to help. Please be >>>>>>> a bit patient with me, I am under a release crunch but I am very curious >>>>>>> with the problem you posted. >>>>>>> >>>>>>> Was there anything useful in the profiles? >>>>>>>> >>>>>>> >>>>>>> Unfortunately profiles didn't help me much, I think we are >>>>>>> collecting the profiles from an active volume, so it has a lot of >>>>>>> information that is not pertaining to dd so it is difficult to find the >>>>>>> contributions of dd. So I went through your post again and found something >>>>>>> I didn't pay much attention to earlier i.e. oflag=sync, so did my own tests >>>>>>> on my setup with FUSE so sent that reply. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Pat >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote: >>>>>>>> >>>>>>>> Okay good. At least this validates my doubts. Handling O_SYNC in >>>>>>>> gluster NFS and fuse is a bit different. >>>>>>>> When application opens a file with O_SYNC on fuse mount then each >>>>>>>> write syscall has to be written to disk as part of the syscall where as in >>>>>>>> case of NFS, there is no concept of open. NFS performs write though a >>>>>>>> handle saying it needs to be a synchronous write, so write() syscall is >>>>>>>> performed first then it performs fsync(). so an write on an fd with O_SYNC >>>>>>>> becomes write+fsync. I am suspecting that when multiple threads do this >>>>>>>> write+fsync() operation on the same file, multiple writes are batched >>>>>>>> together to be written do disk so the throughput on the disk is increasing >>>>>>>> is my guess. >>>>>>>> >>>>>>>> Does it answer your doubts? >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley <phaley at mit.edu> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Without the oflag=sync and only a single test of each, the FUSE is >>>>>>>>> going faster than NFS: >>>>>>>>> >>>>>>>>> FUSE: >>>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576 >>>>>>>>> of=zeros.txt conv=sync >>>>>>>>> 4096+0 records in >>>>>>>>> 4096+0 records out >>>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s >>>>>>>>> >>>>>>>>> >>>>>>>>> NFS >>>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 >>>>>>>>> of=zeros.txt conv=sync >>>>>>>>> 4096+0 records in >>>>>>>>> 4096+0 records out >>>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote: >>>>>>>>> >>>>>>>>> Could you let me know the speed without oflag=sync on both the >>>>>>>>> mounts? No need to collect profiles. >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley <phaley at mit.edu> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Here is what I see now: >>>>>>>>>> >>>>>>>>>> [root at mseas-data2 ~]# gluster volume info >>>>>>>>>> >>>>>>>>>> Volume Name: data-volume >>>>>>>>>> Type: Distribute >>>>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>> Status: Started >>>>>>>>>> Number of Bricks: 2 >>>>>>>>>> Transport-type: tcp >>>>>>>>>> Bricks: >>>>>>>>>> Brick1: mseas-data2:/mnt/brick1 >>>>>>>>>> Brick2: mseas-data2:/mnt/brick2 >>>>>>>>>> Options Reconfigured: >>>>>>>>>> diagnostics.count-fop-hits: on >>>>>>>>>> diagnostics.latency-measurement: on >>>>>>>>>> nfs.exports-auth-enable: on >>>>>>>>>> diagnostics.brick-sys-log-level: WARNING >>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>> nfs.disable: on >>>>>>>>>> nfs.export-volumes: off >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote: >>>>>>>>>> >>>>>>>>>> Is this the volume info you have? >>>>>>>>>> >>>>>>>>>> >* [root at mseas-data2 <http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume info >>>>>>>>>> *>>* Volume Name: data-volume >>>>>>>>>> *>* Type: Distribute >>>>>>>>>> *>* Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>> *>* Status: Started >>>>>>>>>> *>* Number of Bricks: 2 >>>>>>>>>> *>* Transport-type: tcp >>>>>>>>>> *>* Bricks: >>>>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1 >>>>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2 >>>>>>>>>> *>* Options Reconfigured: >>>>>>>>>> *>* performance.readdir-ahead: on >>>>>>>>>> *>* nfs.disable: on >>>>>>>>>> *>* nfs.export-volumes: off >>>>>>>>>> >>>>>>>>>> * >>>>>>>>>> >>>>>>>>>> ?I copied this from old thread from 2016. This is distribute >>>>>>>>>> volume. Did you change any of the options in between? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>>> 77 Massachusetts Avenue >>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>> >>>>>>>>> -- >>>>>>>> Pranith >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>> 77 Massachusetts Avenue >>>>>>>> Cambridge, MA 02139-4301 >>>>>>>> >>>>>>>> -- >>>>>>> Pranith >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>> 77 Massachusetts Avenue >>>>>>> Cambridge, MA 02139-4301 >>>>>>> >>>>>>> -- >>>>>> Pranith >>>>>> >>>>>> -- >>>>>> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>> Pat Haley Email: phaley at mit.edu >>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>> 77 Massachusetts Avenue >>>>>> Cambridge, MA 02139-4301 >>>>>> >>>>>> -- >>>>> Pranith >>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email: phaley at mit.edu >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> -- >>>> Pranith >>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email: phaley at mit.edu >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> >>> >>> >>> -- >>> Pranith >>> >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >> >> >> -- >> Pranith >> >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> > > > -- > Pranith > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170531/b82920ee/attachment.html>