Hi Ben,
Sorry this took so long, but we had a real-time forecasting exercise 
last week and I could only get to this now.
Backend Hardware/OS:
  * Much of the information on our back end system is included at the
    top of
    http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html
  * The specific model of the hard disks is SeaGate ENTERPRISE CAPACITY
    V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s.
  * Note: there is one physical server that hosts both the NFS and the
    GlusterFS areas
Latest tests
I have had time to run the tests for one of the dd tests you requested 
to the underlying XFS FS.  The median rate was 170 MB/s. The dd results 
and iostat record are in
http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/
I'll add tests for the other brick and to the NFS area later.
Thanks
Pat
On 06/12/2017 06:06 PM, Ben Turner wrote:> Ok you are correct, you have a pure distributed volume.  IE no replication
overhead.  So normally for pure dist I use:
>
> throughput = slowest of disks / NIC * .6-.7
>
> In your case we have:
>
> 1200 * .6 = 720
>
> So you are seeing a little less throughput than I would expect in your
configuration.  What I like to do here is:
>
> -First tell me more about your back end storage, will it sustain 1200 MB /
sec?  What kind of HW?  How many disks?  What type and specs are the disks? 
What kind of RAID are you using?
>
> -Second can you refresh me on your workload?  Are you doing reads / writes
or both?  If both what mix?  Since we are using DD I assume you are working iwth
large file sequential I/O, is this correct?
>
> -Run some DD tests on the back end XFS FS.  I normally have
/xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS
-> /xfs-mount/my-test-dir.  Inside the test dir run:
>
> If you are focusing on a write workload run:
>
> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync
>
> If you are focusing on a read workload run:
>
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000
>
> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! **
>
> Run this in a loop similar to how you did in:
>
>
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>
> Run this on both servers one at a time and if you are running on a SAN then
run again on both at the same time.  While this is running gather iostat for me:
>
> # iostat -c -m -x 1 > iostat-$(hostname).txt
>
> Lets see how the back end performs on both servers while capturing iostat,
then see how the same workload / data looks on gluster.
>
> -Last thing, when you run your kernel NFS tests are you using the same
filesystem / storage you are using for the gluster bricks?  I want to be sure we
have an apples to apples comparison here.
>
> -b
>
>
>
> ----- Original Message -----
>> From: "Pat Haley" <phaley at mit.edu>
>> To: "Ben Turner" <bturner at redhat.com>
>> Sent: Monday, June 12, 2017 5:18:07 PM
>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>
>>
>> Hi Ben,
>>
>> Here is the output:
>>
>> [root at mseas-data2 ~]# gluster volume info
>>
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Options Reconfigured:
>> nfs.exports-auth-enable: on
>> diagnostics.brick-sys-log-level: WARNING
>> performance.readdir-ahead: on
>> nfs.disable: on
>> nfs.export-volumes: off
>>
>>
>> On 06/12/2017 05:01 PM, Ben Turner wrote:
>>> What is the output of gluster v info?  That will tell us more about
your
>>> config.
>>>
>>> -b
>>>
>>> ----- Original Message -----
>>>> From: "Pat Haley" <phaley at mit.edu>
>>>> To: "Ben Turner" <bturner at redhat.com>
>>>> Sent: Monday, June 12, 2017 4:54:00 PM
>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>>
>>>>
>>>> Hi Ben,
>>>>
>>>> I guess I'm confused about what you mean by replication. 
If I look at
>>>> the underlying bricks I only ever have a single copy of any
file.  It
>>>> either resides on one brick or the other  (directories exist on
both
>>>> bricks but not files).  We are not using gluster for redundancy
(or at
>>>> least that wasn't our intent).   Is that what you meant by
replication
>>>> or is it something else?
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>> On 06/12/2017 04:28 PM, Ben Turner wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Pat Haley" <phaley at mit.edu>
>>>>>> To: "Ben Turner" <bturner at
redhat.com>, "Pranith Kumar Karampuri"
>>>>>> <pkarampu at redhat.com>
>>>>>> Cc: "Ravishankar N" <ravishankar at
redhat.com>, gluster-users at gluster.org,
>>>>>> "Steve Postma" <SPostma at
ztechnet.com>
>>>>>> Sent: Monday, June 12, 2017 2:35:41 PM
>>>>>> Subject: Re: [Gluster-users] Slow write times to
gluster disk
>>>>>>
>>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> I was wondering what our next steps should be to solve
the slow write
>>>>>> times.
>>>>>>
>>>>>> Recently I was debugging a large code and writing a lot
of output at
>>>>>> every time step.  When I tried writing to our gluster
disks, it was
>>>>>> taking over a day to do a single time step whereas if I
had the same
>>>>>> program (same hardware, network) write to our nfs disk
the time per
>>>>>> time-step was about 45 minutes. What we are shooting
for here would be
>>>>>> to have similar times to either gluster of nfs.
>>>>> I can see in your test:
>>>>>
>>>>>
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>>>
>>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G,
{~1200 MB /
>>>>> sec} / #replicas{2} = 600).  Gluster does client side
replication so with
>>>>> replica 2 you will only ever see 1/2 the speed of your
slowest part of
>>>>> the
>>>>> stack(NW, disk, RAM, CPU).  This is usually NW or disk and
600 is
>>>>> normally
>>>>> a best case.  Now in your output I do see the instances
where you went
>>>>> down to 200 MB / sec.  I can only explain this in three
ways:
>>>>>
>>>>> 1.  You are not using conv=fdatasync and writes are
actually going to
>>>>> page
>>>>> cache and then being flushed to disk.  During the fsync the
memory is not
>>>>> yet available and the disks are busy flushing dirty pages.
>>>>> 2.  Your storage RAID group is shared across multiple
LUNS(like in a SAN)
>>>>> and when write times are slow the RAID group is busy
serviceing other
>>>>> LUNs.
>>>>> 3.  Gluster bug / config issue / some other unknown
unknown.
>>>>>
>>>>> So I see 2 issues here:
>>>>>
>>>>> 1.  NFS does in 45 minutes what gluster can do in 24 hours.
>>>>> 2.  Sometimes your throughput drops dramatically.
>>>>>
>>>>> WRT #1 - have a look at my estimates above.  My formula for
guestimating
>>>>> gluster perf is: throughput = NIC throughput or
storage(whatever is
>>>>> slower) / # replicas * overhead(figure .7 or .8).  Also the
larger the
>>>>> record size the better for glusterfs mounts, I normally
like to be at
>>>>> LEAST 64k up to 1024k:
>>>>>
>>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k
count=10000
>>>>> conv=fdatasync
>>>>>
>>>>> WRT #2 - Again, I question your testing and your storage
config.  Try
>>>>> using
>>>>> conv=fdatasync for your DDs, use a larger record size, and
make sure that
>>>>> your back end storage is not causing your slowdowns.  Also
remember that
>>>>> with replica 2 you will take ~50% hit on writes because the
client uses
>>>>> 50% of its bandwidth to write to one replica and 50% to the
other.
>>>>>
>>>>> -b
>>>>>
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote:
>>>>>>> Are you sure using conv=sync is what you want?  I
normally use
>>>>>>> conv=fdatasync, I'll look up the difference
between the two and see if
>>>>>>> it
>>>>>>> affects your test.
>>>>>>>
>>>>>>>
>>>>>>> -b
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Pat Haley" <phaley at
mit.edu>
>>>>>>>> To: "Pranith Kumar Karampuri"
<pkarampu at redhat.com>
>>>>>>>> Cc: "Ravishankar N" <ravishankar
at redhat.com>,
>>>>>>>> gluster-users at gluster.org,
>>>>>>>> "Steve Postma" <SPostma at
ztechnet.com>, "Ben
>>>>>>>> Turner" <bturner at redhat.com>
>>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM
>>>>>>>> Subject: Re: [Gluster-users] Slow write times
to gluster disk
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> The "dd" command was:
>>>>>>>>
>>>>>>>>          dd if=/dev/zero count=4096 bs=1048576
of=zeros.txt conv=sync
>>>>>>>>
>>>>>>>> There were 2 instances where dd reported 22
seconds. The output from
>>>>>>>> the
>>>>>>>> dd tests are in
>>>>>>>>
>>>>>>>>
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri
wrote:
>>>>>>>>> Pat,
>>>>>>>>>            What is the command you used? As
per the following output,
>>>>>>>>>            it
>>>>>>>>> seems like at least one write operation
took 16 seconds. Which is
>>>>>>>>> really bad.
>>>>>>>>>           96.39    1165.10 us      89.00
us*16487014.00 us*
>>>>>>>>>           393212
>>>>>>>>>           WRITE
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley
<phaley at mit.edu
>>>>>>>>> <mailto:phaley at mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         Hi Pranith,
>>>>>>>>>
>>>>>>>>>         I ran the same 'dd' test
both in the gluster test volume and
>>>>>>>>>         in
>>>>>>>>>         the .glusterfs directory of each
brick.  The median results
>>>>>>>>>         (12
>>>>>>>>>         dd
>>>>>>>>>         trials in each test) are similar to
before
>>>>>>>>>
>>>>>>>>>           * gluster test volume: 586.5 MB/s
>>>>>>>>>           * bricks (in .glusterfs): 1.4
GB/s
>>>>>>>>>
>>>>>>>>>         The profile for the gluster
test-volume is in
>>>>>>>>>
>>>>>>>>>        
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>>>>>>>>        
<http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>>>>>>>
>>>>>>>>>         Thanks
>>>>>>>>>
>>>>>>>>>         Pat
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         On 05/30/2017 12:10 PM, Pranith
Kumar Karampuri wrote:
>>>>>>>>>>         Let's start with the same
'dd' test we were testing with to
>>>>>>>>>>         see,
>>>>>>>>>>         what the numbers are. Please
provide profile numbers for the
>>>>>>>>>>         same. From there on we will
start tuning the volume to see
>>>>>>>>>>         what
>>>>>>>>>>         we can do.
>>>>>>>>>>
>>>>>>>>>>         On Tue, May 30, 2017 at 9:16
PM, Pat Haley <phaley at mit.edu
>>>>>>>>>>         <mailto:phaley at
mit.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>             Hi Pranith,
>>>>>>>>>>
>>>>>>>>>>             Thanks for the tip.  We now
have the gluster volume
>>>>>>>>>>             mounted
>>>>>>>>>>             under /home.  What tests do
you recommend we run?
>>>>>>>>>>
>>>>>>>>>>             Thanks
>>>>>>>>>>
>>>>>>>>>>             Pat
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>             On 05/17/2017 05:01 AM,
Pranith Kumar Karampuri wrote:
>>>>>>>>>>>             On Tue, May 16, 2017 at
9:20 PM, Pat Haley
>>>>>>>>>>>             <phaley at mit.edu
>>>>>>>>>>>             <mailto:phaley at
mit.edu>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 Hi Pranith,
>>>>>>>>>>>
>>>>>>>>>>>                 Sorry for the
delay.  I never saw received your
>>>>>>>>>>>                 reply
>>>>>>>>>>>                 (but I did receive
Ben Turner's follow-up to your
>>>>>>>>>>>                 reply).  So we
tried to create a gluster volume
>>>>>>>>>>>                 under
>>>>>>>>>>>                 /home using
different variations of
>>>>>>>>>>>
>>>>>>>>>>>                 gluster volume
create test-volume
>>>>>>>>>>>                
mseas-data2:/home/gbrick_test_1
>>>>>>>>>>>                
mseas-data2:/home/gbrick_test_2 transport tcp
>>>>>>>>>>>
>>>>>>>>>>>                 However we keep
getting errors of the form
>>>>>>>>>>>
>>>>>>>>>>>                 Wrong brick type:
transport, use
>>>>>>>>>>>                
<HOSTNAME>:<export-dir-abs-path>
>>>>>>>>>>>
>>>>>>>>>>>                 Any thoughts on
what we're doing wrong?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>             You should give
transport tcp at the beginning I think.
>>>>>>>>>>>             Anyways, transport tcp
is the default, so no need to
>>>>>>>>>>>             specify
>>>>>>>>>>>             so remove those two
words from the CLI.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 Also do you have a
list of the test we should be
>>>>>>>>>>>                 running
>>>>>>>>>>>                 once we get this
volume created?  Given the
>>>>>>>>>>>                 time-zone
>>>>>>>>>>>                 difference it might
help if we can run a small
>>>>>>>>>>>                 battery
>>>>>>>>>>>                 of tests and post
the results rather than
>>>>>>>>>>>                 test-post-new
>>>>>>>>>>>                 test-post... .
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>             This is the first time
I am doing performance analysis
>>>>>>>>>>>             on
>>>>>>>>>>>             users as far as I
remember. In our team there are
>>>>>>>>>>>             separate
>>>>>>>>>>>             engineers who do these
tests. Ben who replied earlier is
>>>>>>>>>>>             one
>>>>>>>>>>>             such engineer.
>>>>>>>>>>>
>>>>>>>>>>>             Ben,
>>>>>>>>>>>                 Have any
suggestions?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 Thanks
>>>>>>>>>>>
>>>>>>>>>>>                 Pat
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 On 05/11/2017 12:06
PM, Pranith Kumar Karampuri
>>>>>>>>>>>                 wrote:
>>>>>>>>>>>>                 On Thu, May 11,
2017 at 9:32 PM, Pat Haley
>>>>>>>>>>>>                 <phaley at
mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                     Hi Pranith,
>>>>>>>>>>>>
>>>>>>>>>>>>                     The /home
partition is mounted as ext4
>>>>>>>>>>>>                     /home ext4
defaults,usrquota,grpquota   1 2
>>>>>>>>>>>>
>>>>>>>>>>>>                     The brick
partitions are mounted ax xfs
>>>>>>>>>>>>                     /mnt/brick1
xfs defaults 0 0
>>>>>>>>>>>>                     /mnt/brick2
xfs defaults 0 0
>>>>>>>>>>>>
>>>>>>>>>>>>                     Will this
cause a problem with creating a
>>>>>>>>>>>>                     volume
>>>>>>>>>>>>                     under
/home?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                 I don't
think the bottleneck is disk. You can do
>>>>>>>>>>>>                 the
>>>>>>>>>>>>                 same tests you
did on your new volume to confirm?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                     Pat
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                     On
05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>>>>>>>>>>                     wrote:
>>>>>>>>>>>>>                     On Thu,
May 11, 2017 at 8:57 PM, Pat Haley
>>>>>>>>>>>>>                    
<phaley at mit.edu <mailto:phaley at mit.edu>>
>>>>>>>>>>>>>                     wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                         Hi
Pranith,
>>>>>>>>>>>>>
>>>>>>>>>>>>>                        
Unfortunately, we don't have similar
>>>>>>>>>>>>>                        
hardware
>>>>>>>>>>>>>                         for
a small scale test.  All we have is
>>>>>>>>>>>>>                         our
>>>>>>>>>>>>>                        
production hardware.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                     You
said something about /home partition which
>>>>>>>>>>>>>                     has
>>>>>>>>>>>>>                     lesser
disks, we can create plain distribute
>>>>>>>>>>>>>                     volume
inside one of those directories. After
>>>>>>>>>>>>>                     we
>>>>>>>>>>>>>                     are
done, we can remove the setup. What do you
>>>>>>>>>>>>>                     say?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                         Pat
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                         On
05/11/2017 07:05 AM, Pranith Kumar
>>>>>>>>>>>>>                        
Karampuri wrote:
>>>>>>>>>>>>>>                        
On Thu, May 11, 2017 at 2:48 AM, Pat
>>>>>>>>>>>>>>                        
Haley
>>>>>>>>>>>>>>                        
<phaley at mit.edu <mailto:phaley at mit.edu>>
>>>>>>>>>>>>>>                        
wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
Hi Pranith,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
Since we are mounting the partitions
>>>>>>>>>>>>>>                        
as
>>>>>>>>>>>>>>                        
the bricks, I tried the dd test
>>>>>>>>>>>>>>                        
writing
>>>>>>>>>>>>>>                        
to
>>>>>>>>>>>>>>                        
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>>>>>>                        
The results without oflag=sync were
>>>>>>>>>>>>>>                        
1.6
>>>>>>>>>>>>>>                        
Gb/s (faster than gluster but not as
>>>>>>>>>>>>>>                        
fast
>>>>>>>>>>>>>>                        
as I was expecting given the 1.2 Gb/s
>>>>>>>>>>>>>>                        
to
>>>>>>>>>>>>>>                        
the no-gluster area w/ fewer disks).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
Okay, then 1.6Gb/s is what we need to
>>>>>>>>>>>>>>                        
target
>>>>>>>>>>>>>>                        
for, considering your volume is just
>>>>>>>>>>>>>>                        
distribute. Is there any way you can do
>>>>>>>>>>>>>>                        
tests
>>>>>>>>>>>>>>                        
on similar hardware but at a small scale?
>>>>>>>>>>>>>>                        
Just so we can run the workload to learn
>>>>>>>>>>>>>>                        
more
>>>>>>>>>>>>>>                        
about the bottlenecks in the system? We
>>>>>>>>>>>>>>                        
can
>>>>>>>>>>>>>>                        
probably try to get the speed to 1.2Gb/s
>>>>>>>>>>>>>>                        
on
>>>>>>>>>>>>>>                        
your /home partition you were telling me
>>>>>>>>>>>>>>                        
yesterday. Let me know if that is
>>>>>>>>>>>>>>                        
something
>>>>>>>>>>>>>>                        
you are okay to do.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
Pat
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>>>>>>>>>>                        
Karampuri wrote:
>>>>>>>>>>>>>>>                    
On Wed, May 10, 2017 at 10:15 PM,
>>>>>>>>>>>>>>>                    
Pat
>>>>>>>>>>>>>>>                    
Haley <phaley at mit.edu
>>>>>>>>>>>>>>>                    
<mailto:phaley at mit.edu>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
Hi Pranith,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
Not entirely sure (this isn't my
>>>>>>>>>>>>>>>                    
area of expertise). I'll run
>>>>>>>>>>>>>>>                    
your
>>>>>>>>>>>>>>>                    
answer by some other people who
>>>>>>>>>>>>>>>                    
are
>>>>>>>>>>>>>>>                    
more familiar with this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
I am also uncertain about how to
>>>>>>>>>>>>>>>                    
interpret the results when we
>>>>>>>>>>>>>>>                    
also
>>>>>>>>>>>>>>>                    
add the dd tests writing to the
>>>>>>>>>>>>>>>                    
/home area (no gluster, still on
>>>>>>>>>>>>>>>                    
the
>>>>>>>>>>>>>>>                    
same machine)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
* dd test without oflag=sync
>>>>>>>>>>>>>>>                    
(rough average of multiple
>>>>>>>>>>>>>>>                    
tests)
>>>>>>>>>>>>>>>                    
o gluster w/ fuse mount :
>>>>>>>>>>>>>>>                    
570
>>>>>>>>>>>>>>>                    
Mb/s
>>>>>>>>>>>>>>>                    
o gluster w/ nfs mount:
>>>>>>>>>>>>>>>                    
390
>>>>>>>>>>>>>>>                    
Mb/s
>>>>>>>>>>>>>>>                    
o nfs (no gluster):  1.2
>>>>>>>>>>>>>>>                    
Gb/s
>>>>>>>>>>>>>>>                    
* dd test with oflag=sync
>>>>>>>>>>>>>>>                    
(rough
>>>>>>>>>>>>>>>                    
average of multiple tests)
>>>>>>>>>>>>>>>                    
o gluster w/ fuse mount:
>>>>>>>>>>>>>>>                    
5
>>>>>>>>>>>>>>>                    
Mb/s
>>>>>>>>>>>>>>>                    
o gluster w/ nfs mount:
>>>>>>>>>>>>>>>                    
200
>>>>>>>>>>>>>>>                    
Mb/s
>>>>>>>>>>>>>>>                    
o nfs (no gluster): 20
>>>>>>>>>>>>>>>                    
Mb/s
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
Given that the non-gluster area
>>>>>>>>>>>>>>>                    
is
>>>>>>>>>>>>>>>                    
a
>>>>>>>>>>>>>>>                    
RAID-6 of 4 disks while each
>>>>>>>>>>>>>>>                    
brick
>>>>>>>>>>>>>>>                    
of the gluster area is a RAID-6
>>>>>>>>>>>>>>>                    
of
>>>>>>>>>>>>>>>                    
32 disks, I would naively expect
>>>>>>>>>>>>>>>                    
the
>>>>>>>>>>>>>>>                    
writes to the gluster area to be
>>>>>>>>>>>>>>>                    
roughly 8x faster than to the
>>>>>>>>>>>>>>>                    
non-gluster.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
I think a better test is to try and
>>>>>>>>>>>>>>>                    
write to a file using nfs without
>>>>>>>>>>>>>>>                    
any
>>>>>>>>>>>>>>>                    
gluster to a location that is not
>>>>>>>>>>>>>>>                    
inside
>>>>>>>>>>>>>>>                    
the brick but someother location
>>>>>>>>>>>>>>>                    
that
>>>>>>>>>>>>>>>                    
is
>>>>>>>>>>>>>>>                    
on same disk(s). If you are mounting
>>>>>>>>>>>>>>>                    
the
>>>>>>>>>>>>>>>                    
partition as the brick, then we can
>>>>>>>>>>>>>>>                    
write to a file inside .glusterfs
>>>>>>>>>>>>>>>                    
directory, something like
>>>>>>>>>>>>>>>                    
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
I still think we have a speed
>>>>>>>>>>>>>>>                    
issue,
>>>>>>>>>>>>>>>                    
I can't tell if fuse vs nfs is
>>>>>>>>>>>>>>>                    
part
>>>>>>>>>>>>>>>                    
of the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
I got interested in the post because
>>>>>>>>>>>>>>>                    
I
>>>>>>>>>>>>>>>                    
read that fuse speed is lesser than
>>>>>>>>>>>>>>>                    
nfs
>>>>>>>>>>>>>>>                    
speed which is counter-intuitive to
>>>>>>>>>>>>>>>                    
my
>>>>>>>>>>>>>>>                    
understanding. So wanted
>>>>>>>>>>>>>>>                    
clarifications.
>>>>>>>>>>>>>>>                    
Now that I got my clarifications
>>>>>>>>>>>>>>>                    
where
>>>>>>>>>>>>>>>                    
fuse outperformed nfs without sync,
>>>>>>>>>>>>>>>                    
we
>>>>>>>>>>>>>>>                    
can resume testing as described
>>>>>>>>>>>>>>>                    
above
>>>>>>>>>>>>>>>                    
and try to find what it is. Based on
>>>>>>>>>>>>>>>                    
your email-id I am guessing you are
>>>>>>>>>>>>>>>                    
from
>>>>>>>>>>>>>>>                    
Boston and I am from Bangalore so if
>>>>>>>>>>>>>>>                    
you
>>>>>>>>>>>>>>>                    
are okay with doing this debugging
>>>>>>>>>>>>>>>                    
for
>>>>>>>>>>>>>>>                    
multiple days because of timezones,
>>>>>>>>>>>>>>>                    
I
>>>>>>>>>>>>>>>                    
will be happy to help. Please be a
>>>>>>>>>>>>>>>                    
bit
>>>>>>>>>>>>>>>                    
patient with me, I am under a
>>>>>>>>>>>>>>>                    
release
>>>>>>>>>>>>>>>                    
crunch but I am very curious with
>>>>>>>>>>>>>>>                    
the
>>>>>>>>>>>>>>>                    
problem you posted.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
Was there anything useful in the
>>>>>>>>>>>>>>>                    
profiles?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
Unfortunately profiles didn't help
>>>>>>>>>>>>>>>                    
me
>>>>>>>>>>>>>>>                    
much, I think we are collecting the
>>>>>>>>>>>>>>>                    
profiles from an active volume, so
>>>>>>>>>>>>>>>                    
it
>>>>>>>>>>>>>>>                    
has a lot of information that is not
>>>>>>>>>>>>>>>                    
pertaining to dd so it is difficult
>>>>>>>>>>>>>>>                    
to
>>>>>>>>>>>>>>>                    
find the contributions of dd. So I
>>>>>>>>>>>>>>>                    
went
>>>>>>>>>>>>>>>                    
through your post again and found
>>>>>>>>>>>>>>>                    
something I didn't pay much
>>>>>>>>>>>>>>>                    
attention
>>>>>>>>>>>>>>>                    
to
>>>>>>>>>>>>>>>                    
earlier i.e. oflag=sync, so did my
>>>>>>>>>>>>>>>                    
own
>>>>>>>>>>>>>>>                    
tests on my setup with FUSE so sent
>>>>>>>>>>>>>>>                    
that
>>>>>>>>>>>>>>>                    
reply.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
Pat
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
On 05/10/2017 12:15 PM, Pranith
>>>>>>>>>>>>>>>                    
Kumar Karampuri wrote:
>>>>>>>>>>>>>>>>                
Okay good. At least this
>>>>>>>>>>>>>>>>                
validates
>>>>>>>>>>>>>>>>                
my doubts. Handling O_SYNC in
>>>>>>>>>>>>>>>>                
gluster NFS and fuse is a bit
>>>>>>>>>>>>>>>>                
different.
>>>>>>>>>>>>>>>>                
When application opens a file
>>>>>>>>>>>>>>>>                
with
>>>>>>>>>>>>>>>>                
O_SYNC on fuse mount then each
>>>>>>>>>>>>>>>>                
write syscall has to be written
>>>>>>>>>>>>>>>>                
to
>>>>>>>>>>>>>>>>                
disk as part of the syscall
>>>>>>>>>>>>>>>>                
where
>>>>>>>>>>>>>>>>                
as in case of NFS, there is no
>>>>>>>>>>>>>>>>                
concept of open. NFS performs
>>>>>>>>>>>>>>>>                
write
>>>>>>>>>>>>>>>>                
though a handle saying it needs
>>>>>>>>>>>>>>>>                
to
>>>>>>>>>>>>>>>>                
be a synchronous write, so
>>>>>>>>>>>>>>>>                
write()
>>>>>>>>>>>>>>>>                
syscall is performed first then
>>>>>>>>>>>>>>>>                
it
>>>>>>>>>>>>>>>>                
performs fsync(). so an write
>>>>>>>>>>>>>>>>                
on
>>>>>>>>>>>>>>>>                
an
>>>>>>>>>>>>>>>>                
fd with O_SYNC becomes
>>>>>>>>>>>>>>>>                
write+fsync.
>>>>>>>>>>>>>>>>                
I am suspecting that when
>>>>>>>>>>>>>>>>                
multiple
>>>>>>>>>>>>>>>>                
threads do this write+fsync()
>>>>>>>>>>>>>>>>                
operation on the same file,
>>>>>>>>>>>>>>>>                
multiple writes are batched
>>>>>>>>>>>>>>>>                
together to be written do disk
>>>>>>>>>>>>>>>>                
so
>>>>>>>>>>>>>>>>                
the throughput on the disk is
>>>>>>>>>>>>>>>>                
increasing is my guess.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
Does it answer your doubts?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
On Wed, May 10, 2017 at 9:35
>>>>>>>>>>>>>>>>                
PM,
>>>>>>>>>>>>>>>>                
Pat Haley <phaley at mit.edu
>>>>>>>>>>>>>>>>                
<mailto:phaley at mit.edu>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
Without the oflag=sync and
>>>>>>>>>>>>>>>>                
only
>>>>>>>>>>>>>>>>                
a single test of each, the
>>>>>>>>>>>>>>>>                
FUSE
>>>>>>>>>>>>>>>>                
is going faster than NFS:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
FUSE:
>>>>>>>>>>>>>>>>                
mseas-data2(dri_nascar)% dd
>>>>>>>>>>>>>>>>                
if=/dev/zero count=4096
>>>>>>>>>>>>>>>>                
bs=1048576 of=zeros.txt
>>>>>>>>>>>>>>>>                
conv=sync
>>>>>>>>>>>>>>>>                
4096+0 records in
>>>>>>>>>>>>>>>>                
4096+0 records out
>>>>>>>>>>>>>>>>                
4294967296 bytes (4.3 GB)
>>>>>>>>>>>>>>>>                
copied, 7.46961 s, 575 MB/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
NFS
>>>>>>>>>>>>>>>>                
mseas-data2(HYCOM)% dd
>>>>>>>>>>>>>>>>                
if=/dev/zero count=4096
>>>>>>>>>>>>>>>>                
bs=1048576 of=zeros.txt
>>>>>>>>>>>>>>>>                
conv=sync
>>>>>>>>>>>>>>>>                
4096+0 records in
>>>>>>>>>>>>>>>>                
4096+0 records out
>>>>>>>>>>>>>>>>                
4294967296 bytes (4.3 GB)
>>>>>>>>>>>>>>>>                
copied, 11.4264 s, 376 MB/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
On 05/10/2017 11:53 AM,
>>>>>>>>>>>>>>>>                
Pranith
>>>>>>>>>>>>>>>>                
Kumar Karampuri wrote:
>>>>>>>>>>>>>>>>>            
Could you let me know the
>>>>>>>>>>>>>>>>>            
speed without oflag=sync
>>>>>>>>>>>>>>>>>            
on
>>>>>>>>>>>>>>>>>            
both the mounts? No need
>>>>>>>>>>>>>>>>>            
to
>>>>>>>>>>>>>>>>>            
collect profiles.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
On Wed, May 10, 2017 at
>>>>>>>>>>>>>>>>>            
9:17
>>>>>>>>>>>>>>>>>            
PM, Pat Haley
>>>>>>>>>>>>>>>>>            
<phaley at mit.edu
>>>>>>>>>>>>>>>>>            
<mailto:phaley at mit.edu>>
>>>>>>>>>>>>>>>>>            
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
Here is what I see
>>>>>>>>>>>>>>>>>            
now:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
[root at mseas-data2 ~]#
>>>>>>>>>>>>>>>>>            
gluster volume info
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
Volume Name:
>>>>>>>>>>>>>>>>>            
data-volume
>>>>>>>>>>>>>>>>>            
Type: Distribute
>>>>>>>>>>>>>>>>>            
Volume ID:
>>>>>>>>>>>>>>>>>            
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>>>>>>>            
Status: Started
>>>>>>>>>>>>>>>>>            
Number of Bricks: 2
>>>>>>>>>>>>>>>>>            
Transport-type: tcp
>>>>>>>>>>>>>>>>>            
Bricks:
>>>>>>>>>>>>>>>>>            
Brick1:
>>>>>>>>>>>>>>>>>            
mseas-data2:/mnt/brick1
>>>>>>>>>>>>>>>>>            
Brick2:
>>>>>>>>>>>>>>>>>            
mseas-data2:/mnt/brick2
>>>>>>>>>>>>>>>>>            
Options Reconfigured:
>>>>>>>>>>>>>>>>>            
diagnostics.count-fop-hits:
>>>>>>>>>>>>>>>>>            
on
>>>>>>>>>>>>>>>>>            
diagnostics.latency-measurement:
>>>>>>>>>>>>>>>>>            
on
>>>>>>>>>>>>>>>>>            
nfs.exports-auth-enable:
>>>>>>>>>>>>>>>>>            
on
>>>>>>>>>>>>>>>>>            
diagnostics.brick-sys-log-level:
>>>>>>>>>>>>>>>>>            
WARNING
>>>>>>>>>>>>>>>>>            
performance.readdir-ahead:
>>>>>>>>>>>>>>>>>            
on
>>>>>>>>>>>>>>>>>            
nfs.disable: on
>>>>>>>>>>>>>>>>>            
nfs.export-volumes:
>>>>>>>>>>>>>>>>>            
off
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
On 05/10/2017 11:44
>>>>>>>>>>>>>>>>>            
AM,
>>>>>>>>>>>>>>>>>            
Pranith Kumar
>>>>>>>>>>>>>>>>>            
Karampuri
>>>>>>>>>>>>>>>>>            
wrote:
>>>>>>>>>>>>>>>>>>        
Is this the volume
>>>>>>>>>>>>>>>>>>        
info
>>>>>>>>>>>>>>>>>>        
you have?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>        
>/[root at
>>>>>>>>>>>>>>>>>>        
>mseas-data2
>>>>>>>>>>>>>>>>>>        
<http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>>>>>>>>        
~]# gluster volume
>>>>>>>>>>>>>>>>>>        
info
>>>>>>>>>>>>>>>>>>        
/>//>/Volume Name:
>>>>>>>>>>>>>>>>>>        
data-volume />/Type:
>>>>>>>>>>>>>>>>>>        
Distribute />/Volume
>>>>>>>>>>>>>>>>>>        
ID:
>>>>>>>>>>>>>>>>>>        
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>>>>>>>>>        
/>/Status: Started
>>>>>>>>>>>>>>>>>>        
/>/Number
>>>>>>>>>>>>>>>>>>        
of Bricks: 2
>>>>>>>>>>>>>>>>>>        
/>/Transport-type:
>>>>>>>>>>>>>>>>>>        
tcp
>>>>>>>>>>>>>>>>>>        
/>/Bricks: />/Brick1:
>>>>>>>>>>>>>>>>>>        
mseas-data2:/mnt/brick1
>>>>>>>>>>>>>>>>>>        
/>/Brick2:
>>>>>>>>>>>>>>>>>>        
mseas-data2:/mnt/brick2
>>>>>>>>>>>>>>>>>>        
/>/Options
>>>>>>>>>>>>>>>>>>        
Reconfigured:
>>>>>>>>>>>>>>>>>>        
/>/performance.readdir-ahead:
>>>>>>>>>>>>>>>>>>        
on />/nfs.disable: on
>>>>>>>>>>>>>>>>>>        
/>/nfs.export-volumes:
>>>>>>>>>>>>>>>>>>        
off
>>>>>>>>>>>>>>>>>>        
/
>>>>>>>>>>>>>>>>>>        
?I copied this from
>>>>>>>>>>>>>>>>>>        
old
>>>>>>>>>>>>>>>>>>        
thread from 2016.
>>>>>>>>>>>>>>>>>>        
This
>>>>>>>>>>>>>>>>>>        
is
>>>>>>>>>>>>>>>>>>        
distribute volume.
>>>>>>>>>>>>>>>>>>        
Did
>>>>>>>>>>>>>>>>>>        
you change any of the
>>>>>>>>>>>>>>>>>>        
options in between?
>>>>>>>>>>>>>>>>>            
--
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>>>>            
Pat Haley
>>>>>>>>>>>>>>>>>            
Email:phaley at mit.edu
>>>>>>>>>>>>>>>>>            
<mailto:phaley at mit.edu>
>>>>>>>>>>>>>>>>>            
Center for Ocean
>>>>>>>>>>>>>>>>>            
Engineering
>>>>>>>>>>>>>>>>>            
Phone:  (617) 253-6824
>>>>>>>>>>>>>>>>>            
Dept. of Mechanical
>>>>>>>>>>>>>>>>>            
Engineering
>>>>>>>>>>>>>>>>>            
Fax:    (617) 253-8125
>>>>>>>>>>>>>>>>>            
MIT, Room
>>>>>>>>>>>>>>>>>            
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>>>>            
77 Massachusetts
>>>>>>>>>>>>>>>>>            
Avenue
>>>>>>>>>>>>>>>>>            
Cambridge, MA
>>>>>>>>>>>>>>>>>            
02139-4301
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>            
--
>>>>>>>>>>>>>>>>>            
Pranith
>>>>>>>>>>>>>>>>                
--
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>>>                
Pat Haley
>>>>>>>>>>>>>>>>                
Email:phaley at mit.edu
>>>>>>>>>>>>>>>>                
<mailto:phaley at mit.edu>
>>>>>>>>>>>>>>>>                
Center for Ocean
>>>>>>>>>>>>>>>>                
Engineering
>>>>>>>>>>>>>>>>                
Phone:  (617) 253-6824
>>>>>>>>>>>>>>>>                
Dept. of Mechanical
>>>>>>>>>>>>>>>>                
Engineering
>>>>>>>>>>>>>>>>                
Fax:    (617) 253-8125
>>>>>>>>>>>>>>>>                
MIT, Room
>>>>>>>>>>>>>>>>                
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>>>                
77 Massachusetts Avenue
>>>>>>>>>>>>>>>>                
Cambridge, MA  02139-4301
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                
--
>>>>>>>>>>>>>>>>                
Pranith
>>>>>>>>>>>>>>>                    
--
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>>                    
Pat Haley
>>>>>>>>>>>>>>>                    
Email:phaley at mit.edu
>>>>>>>>>>>>>>>                    
<mailto:phaley at mit.edu>
>>>>>>>>>>>>>>>                    
Center for Ocean Engineering
>>>>>>>>>>>>>>>                    
Phone:
>>>>>>>>>>>>>>>                    
(617) 253-6824
>>>>>>>>>>>>>>>                    
Dept. of Mechanical Engineering
>>>>>>>>>>>>>>>                    
Fax:
>>>>>>>>>>>>>>>                    
(617) 253-8125
>>>>>>>>>>>>>>>                    
MIT, Room
>>>>>>>>>>>>>>>                    
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>>                    
77 Massachusetts Avenue
>>>>>>>>>>>>>>>                    
Cambridge, MA  02139-4301
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
--
>>>>>>>>>>>>>>>                    
Pranith
>>>>>>>>>>>>>>                        
--
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>>                        
Pat Haley
>>>>>>>>>>>>>>                        
Email:phaley at mit.edu
>>>>>>>>>>>>>>                        
<mailto:phaley at mit.edu>
>>>>>>>>>>>>>>                        
Center for Ocean Engineering
>>>>>>>>>>>>>>                        
Phone:
>>>>>>>>>>>>>>                        
(617) 253-6824
>>>>>>>>>>>>>>                        
Dept. of Mechanical Engineering
>>>>>>>>>>>>>>                        
Fax:
>>>>>>>>>>>>>>                        
(617) 253-8125
>>>>>>>>>>>>>>                        
MIT, Room
>>>>>>>>>>>>>>                        
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>>                        
77 Massachusetts Avenue
>>>>>>>>>>>>>>                        
Cambridge, MA  02139-4301
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                        
--
>>>>>>>>>>>>>>                        
Pranith
>>>>>>>>>>>>>                         --
>>>>>>>>>>>>>
>>>>>>>>>>>>>                        
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>>                         Pat
Haley
>>>>>>>>>>>>>                        
Email:phaley at mit.edu
>>>>>>>>>>>>>                        
<mailto:phaley at mit.edu>
>>>>>>>>>>>>>                        
Center for Ocean Engineering       Phone:
>>>>>>>>>>>>>                        
(617)
>>>>>>>>>>>>>                        
253-6824
>>>>>>>>>>>>>                        
Dept. of Mechanical Engineering    Fax:
>>>>>>>>>>>>>                        
(617)
>>>>>>>>>>>>>                        
253-8125
>>>>>>>>>>>>>                        
MIT, Room
>>>>>>>>>>>>>                        
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>>                         77
Massachusetts Avenue
>>>>>>>>>>>>>                        
Cambridge, MA  02139-4301
>>>>>>>>>>>>>
>>>>>>>>>>>>>                     --
>>>>>>>>>>>>>                     Pranith
>>>>>>>>>>>>                     --
>>>>>>>>>>>>
>>>>>>>>>>>>                    
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>>                     Pat Haley
>>>>>>>>>>>>                    
Email:phaley at mit.edu
>>>>>>>>>>>>                    
<mailto:phaley at mit.edu>
>>>>>>>>>>>>                     Center for
Ocean Engineering       Phone:
>>>>>>>>>>>>                     (617)
>>>>>>>>>>>>                     253-6824
>>>>>>>>>>>>                     Dept. of
Mechanical Engineering    Fax:
>>>>>>>>>>>>                     (617)
>>>>>>>>>>>>                     253-8125
>>>>>>>>>>>>                     MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>>                     77
Massachusetts Avenue
>>>>>>>>>>>>                     Cambridge,
MA  02139-4301
>>>>>>>>>>>>
>>>>>>>>>>>>                 --
>>>>>>>>>>>>                 Pranith
>>>>>>>>>>>                 --
>>>>>>>>>>>
>>>>>>>>>>>                
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>>                 Pat Haley
>>>>>>>>>>>                 Email:phaley at
mit.edu
>>>>>>>>>>>                 <mailto:phaley
at mit.edu>
>>>>>>>>>>>                 Center for Ocean
Engineering       Phone:  (617)
>>>>>>>>>>>                 253-6824
>>>>>>>>>>>                 Dept. of Mechanical
Engineering    Fax:    (617)
>>>>>>>>>>>                 253-8125
>>>>>>>>>>>                 MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>>                 77 Massachusetts
Avenue
>>>>>>>>>>>                 Cambridge, MA 
02139-4301
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>             --
>>>>>>>>>>>             Pranith
>>>>>>>>>>             --
>>>>>>>>>>
>>>>>>>>>>            
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>>             Pat Haley                  
Email:phaley at mit.edu
>>>>>>>>>>             <mailto:phaley at
mit.edu>
>>>>>>>>>>             Center for Ocean
Engineering       Phone:  (617) 253-6824
>>>>>>>>>>             Dept. of Mechanical
Engineering    Fax:    (617) 253-8125
>>>>>>>>>>             MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>>             77 Massachusetts Avenue
>>>>>>>>>>             Cambridge, MA  02139-4301
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>         --
>>>>>>>>>>         Pranith
>>>>>>>>>         --
>>>>>>>>>
>>>>>>>>>        
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>         Pat Haley                         
Email:phaley at mit.edu
>>>>>>>>>         <mailto:phaley at mit.edu>
>>>>>>>>>         Center for Ocean Engineering      
Phone:  (617) 253-6824
>>>>>>>>>         Dept. of Mechanical Engineering   
Fax:    (617) 253-8125
>>>>>>>>>         MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>>>>>         77 Massachusetts Avenue
>>>>>>>>>         Cambridge, MA  02139-4301
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>> --
>>>>>>>>
>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley                          Email: 
phaley at mit.edu
>>>>>>>> Center for Ocean Engineering       Phone: 
(617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering    Fax:   
(617) 253-8125
>>>>>>>> MIT, Room 5-213                   
http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA  02139-4301
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>>
>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley                          Email:  phaley at
mit.edu
>>>>>> Center for Ocean Engineering       Phone:  (617)
253-6824
>>>>>> Dept. of Mechanical Engineering    Fax:    (617)
253-8125
>>>>>> MIT, Room 5-213                   
http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA  02139-4301
>>>>>>
>>>>>>
>>>> --
>>>>
>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley                          Email:  phaley at mit.edu
>>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>> MIT, Room 5-213                   
http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA  02139-4301
>>>>
>>>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley                          Email:  phaley at mit.edu
>> Center for Ocean Engineering       Phone:  (617) 253-6824
>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA  02139-4301
>>
>>
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley at mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/6ddc35ef/attachment.html>
Hi, Today we experimented with some of the FUSE options that we found in the list. Changing these options had no effect: gluster volume set test-volume performance.cache-max-file-size 2MB gluster volume set test-volume performance.cache-refresh-timeout 4 gluster volume set test-volume performance.cache-size 256MB gluster volume set test-volume performance.write-behind-window-size 4MB gluster volume set test-volume performance.write-behind-window-size 8MB Changing the following option from its default value made the speed slower gluster volume set test-volume performance.write-behind off (on by default) Changing the following options initially appeared to give a 10% increase in speed, but this vanished in subsequent tests (we think the apparent increase may have been to a lighter workload on the computer from other users) gluster volume set test-volume performance.stat-prefetch on gluster volume set test-volume client.event-threads 4 gluster volume set test-volume server.event-threads 4 Can anything be gleaned from these observations? Are there other things we can try? Thanks Pat On 06/20/2017 12:06 PM, Pat Haley wrote:> > Hi Ben, > > Sorry this took so long, but we had a real-time forecasting exercise > last week and I could only get to this now. > > Backend Hardware/OS: > > * Much of the information on our back end system is included at the > top of > http://lists.gluster.org/pipermail/gluster-users/2017-April/030529.html > * The specific model of the hard disks is SeaGate ENTERPRISE > CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. > * Note: there is one physical server that hosts both the NFS and the > GlusterFS areas > > Latest tests > > I have had time to run the tests for one of the dd tests you requested > to the underlying XFS FS. The median rate was 170 MB/s. The dd > results and iostat record are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ > > I'll add tests for the other brick and to the NFS area later. > > Thanks > > Pat > > > On 06/12/2017 06:06 PM, Ben Turner wrote: >> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: >> >> throughput = slowest of disks / NIC * .6-.7 >> >> In your case we have: >> >> 1200 * .6 = 720 >> >> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: >> >> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? >> >> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? >> >> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: >> >> If you are focusing on a write workload run: >> >> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync >> >> If you are focusing on a read workload run: >> >> # echo 3 > /proc/sys/vm/drop_caches >> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 >> >> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** >> >> Run this in a loop similar to how you did in: >> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >> >> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: >> >> # iostat -c -m -x 1 > iostat-$(hostname).txt >> >> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. >> >> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. >> >> -b >> >> >> >> ----- Original Message ----- >>> From: "Pat Haley"<phaley at mit.edu> >>> To: "Ben Turner"<bturner at redhat.com> >>> Sent: Monday, June 12, 2017 5:18:07 PM >>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>> >>> >>> Hi Ben, >>> >>> Here is the output: >>> >>> [root at mseas-data2 ~]# gluster volume info >>> >>> Volume Name: data-volume >>> Type: Distribute >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >>> Status: Started >>> Number of Bricks: 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: mseas-data2:/mnt/brick1 >>> Brick2: mseas-data2:/mnt/brick2 >>> Options Reconfigured: >>> nfs.exports-auth-enable: on >>> diagnostics.brick-sys-log-level: WARNING >>> performance.readdir-ahead: on >>> nfs.disable: on >>> nfs.export-volumes: off >>> >>> >>> On 06/12/2017 05:01 PM, Ben Turner wrote: >>>> What is the output of gluster v info? That will tell us more about your >>>> config. >>>> >>>> -b >>>> >>>> ----- Original Message ----- >>>>> From: "Pat Haley"<phaley at mit.edu> >>>>> To: "Ben Turner"<bturner at redhat.com> >>>>> Sent: Monday, June 12, 2017 4:54:00 PM >>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>> >>>>> >>>>> Hi Ben, >>>>> >>>>> I guess I'm confused about what you mean by replication. If I look at >>>>> the underlying bricks I only ever have a single copy of any file. It >>>>> either resides on one brick or the other (directories exist on both >>>>> bricks but not files). We are not using gluster for redundancy (or at >>>>> least that wasn't our intent). Is that what you meant by replication >>>>> or is it something else? >>>>> >>>>> Thanks >>>>> >>>>> Pat >>>>> >>>>> On 06/12/2017 04:28 PM, Ben Turner wrote: >>>>>> ----- Original Message ----- >>>>>>> From: "Pat Haley"<phaley at mit.edu> >>>>>>> To: "Ben Turner"<bturner at redhat.com>, "Pranith Kumar Karampuri" >>>>>>> <pkarampu at redhat.com> >>>>>>> Cc: "Ravishankar N"<ravishankar at redhat.com>,gluster-users at gluster.org, >>>>>>> "Steve Postma"<SPostma at ztechnet.com> >>>>>>> Sent: Monday, June 12, 2017 2:35:41 PM >>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>> >>>>>>> >>>>>>> Hi Guys, >>>>>>> >>>>>>> I was wondering what our next steps should be to solve the slow write >>>>>>> times. >>>>>>> >>>>>>> Recently I was debugging a large code and writing a lot of output at >>>>>>> every time step. When I tried writing to our gluster disks, it was >>>>>>> taking over a day to do a single time step whereas if I had the same >>>>>>> program (same hardware, network) write to our nfs disk the time per >>>>>>> time-step was about 45 minutes. What we are shooting for here would be >>>>>>> to have similar times to either gluster of nfs. >>>>>> I can see in your test: >>>>>> >>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>> >>>>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / >>>>>> sec} / #replicas{2} = 600). Gluster does client side replication so with >>>>>> replica 2 you will only ever see 1/2 the speed of your slowest part of >>>>>> the >>>>>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is >>>>>> normally >>>>>> a best case. Now in your output I do see the instances where you went >>>>>> down to 200 MB / sec. I can only explain this in three ways: >>>>>> >>>>>> 1. You are not using conv=fdatasync and writes are actually going to >>>>>> page >>>>>> cache and then being flushed to disk. During the fsync the memory is not >>>>>> yet available and the disks are busy flushing dirty pages. >>>>>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) >>>>>> and when write times are slow the RAID group is busy serviceing other >>>>>> LUNs. >>>>>> 3. Gluster bug / config issue / some other unknown unknown. >>>>>> >>>>>> So I see 2 issues here: >>>>>> >>>>>> 1. NFS does in 45 minutes what gluster can do in 24 hours. >>>>>> 2. Sometimes your throughput drops dramatically. >>>>>> >>>>>> WRT #1 - have a look at my estimates above. My formula for guestimating >>>>>> gluster perf is: throughput = NIC throughput or storage(whatever is >>>>>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the >>>>>> record size the better for glusterfs mounts, I normally like to be at >>>>>> LEAST 64k up to 1024k: >>>>>> >>>>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 >>>>>> conv=fdatasync >>>>>> >>>>>> WRT #2 - Again, I question your testing and your storage config. Try >>>>>> using >>>>>> conv=fdatasync for your DDs, use a larger record size, and make sure that >>>>>> your back end storage is not causing your slowdowns. Also remember that >>>>>> with replica 2 you will take ~50% hit on writes because the client uses >>>>>> 50% of its bandwidth to write to one replica and 50% to the other. >>>>>> >>>>>> -b >>>>>> >>>>>> >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Pat >>>>>>> >>>>>>> >>>>>>> On 06/02/2017 01:07 AM, Ben Turner wrote: >>>>>>>> Are you sure using conv=sync is what you want? I normally use >>>>>>>> conv=fdatasync, I'll look up the difference between the two and see if >>>>>>>> it >>>>>>>> affects your test. >>>>>>>> >>>>>>>> >>>>>>>> -b >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: "Pat Haley"<phaley at mit.edu> >>>>>>>>> To: "Pranith Kumar Karampuri"<pkarampu at redhat.com> >>>>>>>>> Cc: "Ravishankar N"<ravishankar at redhat.com>, >>>>>>>>> gluster-users at gluster.org, >>>>>>>>> "Steve Postma"<SPostma at ztechnet.com>, "Ben >>>>>>>>> Turner"<bturner at redhat.com> >>>>>>>>> Sent: Tuesday, May 30, 2017 9:40:34 PM >>>>>>>>> Subject: Re: [Gluster-users] Slow write times to gluster disk >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Pranith, >>>>>>>>> >>>>>>>>> The "dd" command was: >>>>>>>>> >>>>>>>>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync >>>>>>>>> >>>>>>>>> There were 2 instances where dd reported 22 seconds. The output from >>>>>>>>> the >>>>>>>>> dd tests are in >>>>>>>>> >>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt >>>>>>>>> >>>>>>>>> Pat >>>>>>>>> >>>>>>>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>> Pat, >>>>>>>>>> What is the command you used? As per the following output, >>>>>>>>>> it >>>>>>>>>> seems like at least one write operation took 16 seconds. Which is >>>>>>>>>> really bad. >>>>>>>>>> 96.39 1165.10 us 89.00 us*16487014.00 us* >>>>>>>>>> 393212 >>>>>>>>>> WRITE >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pranith, >>>>>>>>>> >>>>>>>>>> I ran the same 'dd' test both in the gluster test volume and >>>>>>>>>> in >>>>>>>>>> the .glusterfs directory of each brick. The median results >>>>>>>>>> (12 >>>>>>>>>> dd >>>>>>>>>> trials in each test) are similar to before >>>>>>>>>> >>>>>>>>>> * gluster test volume: 586.5 MB/s >>>>>>>>>> * bricks (in .glusterfs): 1.4 GB/s >>>>>>>>>> >>>>>>>>>> The profile for the gluster test-volume is in >>>>>>>>>> >>>>>>>>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt >>>>>>>>>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> Pat >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >>>>>>>>>>> Let's start with the same 'dd' test we were testing with to >>>>>>>>>>> see, >>>>>>>>>>> what the numbers are. Please provide profile numbers for the >>>>>>>>>>> same. From there on we will start tuning the volume to see >>>>>>>>>>> what >>>>>>>>>>> we can do. >>>>>>>>>>> >>>>>>>>>>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu >>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Pranith, >>>>>>>>>>> >>>>>>>>>>> Thanks for the tip. We now have the gluster volume >>>>>>>>>>> mounted >>>>>>>>>>> under /home. What tests do you recommend we run? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Pat >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>>>>>>>>>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley >>>>>>>>>>>> <phaley at mit.edu >>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>> >>>>>>>>>>>> Sorry for the delay. I never saw received your >>>>>>>>>>>> reply >>>>>>>>>>>> (but I did receive Ben Turner's follow-up to your >>>>>>>>>>>> reply). So we tried to create a gluster volume >>>>>>>>>>>> under >>>>>>>>>>>> /home using different variations of >>>>>>>>>>>> >>>>>>>>>>>> gluster volume create test-volume >>>>>>>>>>>> mseas-data2:/home/gbrick_test_1 >>>>>>>>>>>> mseas-data2:/home/gbrick_test_2 transport tcp >>>>>>>>>>>> >>>>>>>>>>>> However we keep getting errors of the form >>>>>>>>>>>> >>>>>>>>>>>> Wrong brick type: transport, use >>>>>>>>>>>> <HOSTNAME>:<export-dir-abs-path> >>>>>>>>>>>> >>>>>>>>>>>> Any thoughts on what we're doing wrong? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You should give transport tcp at the beginning I think. >>>>>>>>>>>> Anyways, transport tcp is the default, so no need to >>>>>>>>>>>> specify >>>>>>>>>>>> so remove those two words from the CLI. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Also do you have a list of the test we should be >>>>>>>>>>>> running >>>>>>>>>>>> once we get this volume created? Given the >>>>>>>>>>>> time-zone >>>>>>>>>>>> difference it might help if we can run a small >>>>>>>>>>>> battery >>>>>>>>>>>> of tests and post the results rather than >>>>>>>>>>>> test-post-new >>>>>>>>>>>> test-post... . >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This is the first time I am doing performance analysis >>>>>>>>>>>> on >>>>>>>>>>>> users as far as I remember. In our team there are >>>>>>>>>>>> separate >>>>>>>>>>>> engineers who do these tests. Ben who replied earlier is >>>>>>>>>>>> one >>>>>>>>>>>> such engineer. >>>>>>>>>>>> >>>>>>>>>>>> Ben, >>>>>>>>>>>> Have any suggestions? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> Pat >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>> >>>>>>>>>>>>> The /home partition is mounted as ext4 >>>>>>>>>>>>> /home ext4 defaults,usrquota,grpquota 1 2 >>>>>>>>>>>>> >>>>>>>>>>>>> The brick partitions are mounted ax xfs >>>>>>>>>>>>> /mnt/brick1 xfs defaults 0 0 >>>>>>>>>>>>> /mnt/brick2 xfs defaults 0 0 >>>>>>>>>>>>> >>>>>>>>>>>>> Will this cause a problem with creating a >>>>>>>>>>>>> volume >>>>>>>>>>>>> under /home? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think the bottleneck is disk. You can do >>>>>>>>>>>>> the >>>>>>>>>>>>> same tests you did on your new volume to confirm? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Pat >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately, we don't have similar >>>>>>>>>>>>>> hardware >>>>>>>>>>>>>> for a small scale test. All we have is >>>>>>>>>>>>>> our >>>>>>>>>>>>>> production hardware. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You said something about /home partition which >>>>>>>>>>>>>> has >>>>>>>>>>>>>> lesser disks, we can create plain distribute >>>>>>>>>>>>>> volume inside one of those directories. After >>>>>>>>>>>>>> we >>>>>>>>>>>>>> are done, we can remove the setup. What do you >>>>>>>>>>>>>> say? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Pat >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar >>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat >>>>>>>>>>>>>>> Haley >>>>>>>>>>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Since we are mounting the partitions >>>>>>>>>>>>>>> as >>>>>>>>>>>>>>> the bricks, I tried the dd test >>>>>>>>>>>>>>> writing >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>> The results without oflag=sync were >>>>>>>>>>>>>>> 1.6 >>>>>>>>>>>>>>> Gb/s (faster than gluster but not as >>>>>>>>>>>>>>> fast >>>>>>>>>>>>>>> as I was expecting given the 1.2 Gb/s >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> the no-gluster area w/ fewer disks). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Okay, then 1.6Gb/s is what we need to >>>>>>>>>>>>>>> target >>>>>>>>>>>>>>> for, considering your volume is just >>>>>>>>>>>>>>> distribute. Is there any way you can do >>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>> on similar hardware but at a small scale? >>>>>>>>>>>>>>> Just so we can run the workload to learn >>>>>>>>>>>>>>> more >>>>>>>>>>>>>>> about the bottlenecks in the system? We >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> probably try to get the speed to 1.2Gb/s >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> your /home partition you were telling me >>>>>>>>>>>>>>> yesterday. Let me know if that is >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> you are okay to do. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar >>>>>>>>>>>>>>> Karampuri wrote: >>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 10:15 PM, >>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>> Haley <phaley at mit.edu >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Not entirely sure (this isn't my >>>>>>>>>>>>>>>> area of expertise). I'll run >>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>> answer by some other people who >>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>> more familiar with this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am also uncertain about how to >>>>>>>>>>>>>>>> interpret the results when we >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>> add the dd tests writing to the >>>>>>>>>>>>>>>> /home area (no gluster, still on >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> same machine) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * dd test without oflag=sync >>>>>>>>>>>>>>>> (rough average of multiple >>>>>>>>>>>>>>>> tests) >>>>>>>>>>>>>>>> o gluster w/ fuse mount : >>>>>>>>>>>>>>>> 570 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>>> 390 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o nfs (no gluster): 1.2 >>>>>>>>>>>>>>>> Gb/s >>>>>>>>>>>>>>>> * dd test with oflag=sync >>>>>>>>>>>>>>>> (rough >>>>>>>>>>>>>>>> average of multiple tests) >>>>>>>>>>>>>>>> o gluster w/ fuse mount: >>>>>>>>>>>>>>>> 5 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o gluster w/ nfs mount: >>>>>>>>>>>>>>>> 200 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> o nfs (no gluster): 20 >>>>>>>>>>>>>>>> Mb/s >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Given that the non-gluster area >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> RAID-6 of 4 disks while each >>>>>>>>>>>>>>>> brick >>>>>>>>>>>>>>>> of the gluster area is a RAID-6 >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> 32 disks, I would naively expect >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> writes to the gluster area to be >>>>>>>>>>>>>>>> roughly 8x faster than to the >>>>>>>>>>>>>>>> non-gluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think a better test is to try and >>>>>>>>>>>>>>>> write to a file using nfs without >>>>>>>>>>>>>>>> any >>>>>>>>>>>>>>>> gluster to a location that is not >>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>> the brick but someother location >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> on same disk(s). If you are mounting >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> partition as the brick, then we can >>>>>>>>>>>>>>>> write to a file inside .glusterfs >>>>>>>>>>>>>>>> directory, something like >>>>>>>>>>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I still think we have a speed >>>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>>> I can't tell if fuse vs nfs is >>>>>>>>>>>>>>>> part >>>>>>>>>>>>>>>> of the problem. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I got interested in the post because >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> read that fuse speed is lesser than >>>>>>>>>>>>>>>> nfs >>>>>>>>>>>>>>>> speed which is counter-intuitive to >>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>> understanding. So wanted >>>>>>>>>>>>>>>> clarifications. >>>>>>>>>>>>>>>> Now that I got my clarifications >>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>> fuse outperformed nfs without sync, >>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> can resume testing as described >>>>>>>>>>>>>>>> above >>>>>>>>>>>>>>>> and try to find what it is. Based on >>>>>>>>>>>>>>>> your email-id I am guessing you are >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> Boston and I am from Bangalore so if >>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>> are okay with doing this debugging >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> multiple days because of timezones, >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> will be happy to help. Please be a >>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>> patient with me, I am under a >>>>>>>>>>>>>>>> release >>>>>>>>>>>>>>>> crunch but I am very curious with >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> problem you posted. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Was there anything useful in the >>>>>>>>>>>>>>>> profiles? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately profiles didn't help >>>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>> much, I think we are collecting the >>>>>>>>>>>>>>>> profiles from an active volume, so >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>> has a lot of information that is not >>>>>>>>>>>>>>>> pertaining to dd so it is difficult >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> find the contributions of dd. So I >>>>>>>>>>>>>>>> went >>>>>>>>>>>>>>>> through your post again and found >>>>>>>>>>>>>>>> something I didn't pay much >>>>>>>>>>>>>>>> attention >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> earlier i.e. oflag=sync, so did my >>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>> tests on my setup with FUSE so sent >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> reply. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 05/10/2017 12:15 PM, Pranith >>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>> Okay good. At least this >>>>>>>>>>>>>>>>> validates >>>>>>>>>>>>>>>>> my doubts. Handling O_SYNC in >>>>>>>>>>>>>>>>> gluster NFS and fuse is a bit >>>>>>>>>>>>>>>>> different. >>>>>>>>>>>>>>>>> When application opens a file >>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>> O_SYNC on fuse mount then each >>>>>>>>>>>>>>>>> write syscall has to be written >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> disk as part of the syscall >>>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>>> as in case of NFS, there is no >>>>>>>>>>>>>>>>> concept of open. NFS performs >>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>> though a handle saying it needs >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> be a synchronous write, so >>>>>>>>>>>>>>>>> write() >>>>>>>>>>>>>>>>> syscall is performed first then >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> performs fsync(). so an write >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> fd with O_SYNC becomes >>>>>>>>>>>>>>>>> write+fsync. >>>>>>>>>>>>>>>>> I am suspecting that when >>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>> threads do this write+fsync() >>>>>>>>>>>>>>>>> operation on the same file, >>>>>>>>>>>>>>>>> multiple writes are batched >>>>>>>>>>>>>>>>> together to be written do disk >>>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>> the throughput on the disk is >>>>>>>>>>>>>>>>> increasing is my guess. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does it answer your doubts? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 9:35 >>>>>>>>>>>>>>>>> PM, >>>>>>>>>>>>>>>>> Pat Haley <phaley at mit.edu >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Without the oflag=sync and >>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>> a single test of each, the >>>>>>>>>>>>>>>>> FUSE >>>>>>>>>>>>>>>>> is going faster than NFS: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> FUSE: >>>>>>>>>>>>>>>>> mseas-data2(dri_nascar)% dd >>>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>>> copied, 7.46961 s, 575 MB/s >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> NFS >>>>>>>>>>>>>>>>> mseas-data2(HYCOM)% dd >>>>>>>>>>>>>>>>> if=/dev/zero count=4096 >>>>>>>>>>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>>>>>>>>>> conv=sync >>>>>>>>>>>>>>>>> 4096+0 records in >>>>>>>>>>>>>>>>> 4096+0 records out >>>>>>>>>>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>>>>>>>>>> copied, 11.4264 s, 376 MB/s >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 05/10/2017 11:53 AM, >>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>> Kumar Karampuri wrote: >>>>>>>>>>>>>>>>>> Could you let me know the >>>>>>>>>>>>>>>>>> speed without oflag=sync >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> both the mounts? No need >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> collect profiles. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at >>>>>>>>>>>>>>>>>> 9:17 >>>>>>>>>>>>>>>>>> PM, Pat Haley >>>>>>>>>>>>>>>>>> <phaley at mit.edu >>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is what I see >>>>>>>>>>>>>>>>>> now: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root at mseas-data2 ~]# >>>>>>>>>>>>>>>>>> gluster volume info >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Volume Name: >>>>>>>>>>>>>>>>>> data-volume >>>>>>>>>>>>>>>>>> Type: Distribute >>>>>>>>>>>>>>>>>> Volume ID: >>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>> Status: Started >>>>>>>>>>>>>>>>>> Number of Bricks: 2 >>>>>>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>>>>>> Brick1: >>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>> Brick2: >>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>>>>>> diagnostics.count-fop-hits: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> diagnostics.latency-measurement: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> nfs.exports-auth-enable: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> diagnostics.brick-sys-log-level: >>>>>>>>>>>>>>>>>> WARNING >>>>>>>>>>>>>>>>>> performance.readdir-ahead: >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>>>>>> nfs.export-volumes: >>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 05/10/2017 11:44 >>>>>>>>>>>>>>>>>> AM, >>>>>>>>>>>>>>>>>> Pranith Kumar >>>>>>>>>>>>>>>>>> Karampuri >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> Is this the volume >>>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>>> you have? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >/[root at >>>>>>>>>>>>>>>>>>> >mseas-data2 >>>>>>>>>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>>>>>>>> ~]# gluster volume >>>>>>>>>>>>>>>>>>> info >>>>>>>>>>>>>>>>>>> />//>/Volume Name: >>>>>>>>>>>>>>>>>>> data-volume />/Type: >>>>>>>>>>>>>>>>>>> Distribute />/Volume >>>>>>>>>>>>>>>>>>> ID: >>>>>>>>>>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>>>>>>>>>> />/Status: Started >>>>>>>>>>>>>>>>>>> />/Number >>>>>>>>>>>>>>>>>>> of Bricks: 2 >>>>>>>>>>>>>>>>>>> />/Transport-type: >>>>>>>>>>>>>>>>>>> tcp >>>>>>>>>>>>>>>>>>> />/Bricks: />/Brick1: >>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>>>>>>>>>>> />/Brick2: >>>>>>>>>>>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>>>>>>>>>>> />/Options >>>>>>>>>>>>>>>>>>> Reconfigured: >>>>>>>>>>>>>>>>>>> />/performance.readdir-ahead: >>>>>>>>>>>>>>>>>>> on />/nfs.disable: on >>>>>>>>>>>>>>>>>>> />/nfs.export-volumes: >>>>>>>>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>> ?I copied this from >>>>>>>>>>>>>>>>>>> old >>>>>>>>>>>>>>>>>>> thread from 2016. >>>>>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> distribute volume. >>>>>>>>>>>>>>>>>>> Did >>>>>>>>>>>>>>>>>>> you change any of the >>>>>>>>>>>>>>>>>>> options in between? >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>>> 77 Massachusetts >>>>>>>>>>>>>>>>>> Avenue >>>>>>>>>>>>>>>>>> Cambridge, MA >>>>>>>>>>>>>>>>>> 02139-4301 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>>> Center for Ocean >>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>> Phone: (617) 253-6824 >>>>>>>>>>>>>>>>> Dept. of Mechanical >>>>>>>>>>>>>>>>> Engineering >>>>>>>>>>>>>>>>> Fax: (617) 253-8125 >>>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>>> Center for Ocean Engineering >>>>>>>>>>>>>>> Phone: >>>>>>>>>>>>>>> (617) 253-6824 >>>>>>>>>>>>>>> Dept. of Mechanical Engineering >>>>>>>>>>>>>>> Fax: >>>>>>>>>>>>>>> (617) 253-8125 >>>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Pranith >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>>> (617) >>>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>>> (617) >>>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>>> MIT, Room >>>>>>>>>>>>>> 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Pranith >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>>> Pat Haley >>>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>>> Center for Ocean Engineering Phone: >>>>>>>>>>>>> (617) >>>>>>>>>>>>> 253-6824 >>>>>>>>>>>>> Dept. of Mechanical Engineering Fax: >>>>>>>>>>>>> (617) >>>>>>>>>>>>> 253-8125 >>>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Pranith >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>>> Pat Haley >>>>>>>>>>>> Email:phaley at mit.edu >>>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>>> Center for Ocean Engineering Phone: (617) >>>>>>>>>>>> 253-6824 >>>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) >>>>>>>>>>>> 253-8125 >>>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Pranith >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>>> Pat HaleyEmail:phaley at mit.edu >>>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Pranith >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>> Pat HaleyEmail:phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Pranith >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>>>> 77 Massachusetts Avenue >>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>> Pat Haley Email:phaley at mit.edu >>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>>>> 77 Massachusetts Avenue >>>>>>> Cambridge, MA 02139-4301 >>>>>>> >>>>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email:phaley at mit.edu >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email:phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170622/7bc75870/attachment.html>
Pranith Kumar Karampuri
2017-Jun-23  03:40 UTC
[Gluster-users] Slow write times to gluster disk
On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu> wrote:> > Hi, > > Today we experimented with some of the FUSE options that we found in the > list. > > Changing these options had no effect: > > gluster volume set test-volume performance.cache-max-file-size 2MB > gluster volume set test-volume performance.cache-refresh-timeout 4 > gluster volume set test-volume performance.cache-size 256MB > gluster volume set test-volume performance.write-behind-window-size 4MB > gluster volume set test-volume performance.write-behind-window-size 8MB > >This is a good coincidence, I am meeting with write-behind maintainer(+Raghavendra G) today for the same doubt. I think we will have something by EOD IST. I will update you.> Changing the following option from its default value made the speed slower > > gluster volume set test-volume performance.write-behind off (on by default) > > Changing the following options initially appeared to give a 10% increase > in speed, but this vanished in subsequent tests (we think the apparent > increase may have been to a lighter workload on the computer from other > users) > > gluster volume set test-volume performance.stat-prefetch on > gluster volume set test-volume client.event-threads 4 > gluster volume set test-volume server.event-threads 4 > > Can anything be gleaned from these observations? Are there other things > we can try? > > Thanks > > Pat > > > > On 06/20/2017 12:06 PM, Pat Haley wrote: > > > Hi Ben, > > Sorry this took so long, but we had a real-time forecasting exercise last > week and I could only get to this now. > > Backend Hardware/OS: > > - Much of the information on our back end system is included at the > top of http://lists.gluster.org/pipermail/gluster-users/2017- > April/030529.html > - The specific model of the hard disks is SeaGate ENTERPRISE CAPACITY > V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. > - Note: there is one physical server that hosts both the NFS and the > GlusterFS areas > > Latest tests > > I have had time to run the tests for one of the dd tests you requested to > the underlying XFS FS. The median rate was 170 MB/s. The dd results and > iostat record are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ > > I'll add tests for the other brick and to the NFS area later. > > Thanks > > Pat > > > On 06/12/2017 06:06 PM, Ben Turner wrote: > > Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: > > throughput = slowest of disks / NIC * .6-.7 > > In your case we have: > > 1200 * .6 = 720 > > So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: > > -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? > > -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? > > -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: > > If you are focusing on a write workload run: > > # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync > > If you are focusing on a read workload run: > > # echo 3 > /proc/sys/vm/drop_caches > # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 > > ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** > > Run this in a loop similar to how you did in: > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: > > # iostat -c -m -x 1 > iostat-$(hostname).txt > > Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. > > -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. > > -b > > > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> > Sent: Monday, June 12, 2017 5:18:07 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Ben, > > Here is the output: > > [root at mseas-data2 ~]# gluster volume info > > Volume Name: data-volume > Type: Distribute > Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: mseas-data2:/mnt/brick1 > Brick2: mseas-data2:/mnt/brick2 > Options Reconfigured: > nfs.exports-auth-enable: on > diagnostics.brick-sys-log-level: WARNING > performance.readdir-ahead: on > nfs.disable: on > nfs.export-volumes: off > > > On 06/12/2017 05:01 PM, Ben Turner wrote: > > What is the output of gluster v info? That will tell us more about your > config. > > -b > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com> > Sent: Monday, June 12, 2017 4:54:00 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Ben, > > I guess I'm confused about what you mean by replication. If I look at > the underlying bricks I only ever have a single copy of any file. It > either resides on one brick or the other (directories exist on both > bricks but not files). We are not using gluster for redundancy (or at > least that wasn't our intent). Is that what you meant by replication > or is it something else? > > Thanks > > Pat > > On 06/12/2017 04:28 PM, Ben Turner wrote: > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>, "Pranith Kumar Karampuri"<pkarampu at redhat.com> <pkarampu at redhat.com> > Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>, gluster-users at gluster.org, > "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com> > Sent: Monday, June 12, 2017 2:35:41 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Guys, > > I was wondering what our next steps should be to solve the slow write > times. > > Recently I was debugging a large code and writing a lot of output at > every time step. When I tried writing to our gluster disks, it was > taking over a day to do a single time step whereas if I had the same > program (same hardware, network) write to our nfs disk the time per > time-step was about 45 minutes. What we are shooting for here would be > to have similar times to either gluster of nfs. > > I can see in your test: > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / > sec} / #replicas{2} = 600). Gluster does client side replication so with > replica 2 you will only ever see 1/2 the speed of your slowest part of > the > stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is > normally > a best case. Now in your output I do see the instances where you went > down to 200 MB / sec. I can only explain this in three ways: > > 1. You are not using conv=fdatasync and writes are actually going to > page > cache and then being flushed to disk. During the fsync the memory is not > yet available and the disks are busy flushing dirty pages. > 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) > and when write times are slow the RAID group is busy serviceing other > LUNs. > 3. Gluster bug / config issue / some other unknown unknown. > > So I see 2 issues here: > > 1. NFS does in 45 minutes what gluster can do in 24 hours. > 2. Sometimes your throughput drops dramatically. > > WRT #1 - have a look at my estimates above. My formula for guestimating > gluster perf is: throughput = NIC throughput or storage(whatever is > slower) / # replicas * overhead(figure .7 or .8). Also the larger the > record size the better for glusterfs mounts, I normally like to be at > LEAST 64k up to 1024k: > > # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 > conv=fdatasync > > WRT #2 - Again, I question your testing and your storage config. Try > using > conv=fdatasync for your DDs, use a larger record size, and make sure that > your back end storage is not causing your slowdowns. Also remember that > with replica 2 you will take ~50% hit on writes because the client uses > 50% of its bandwidth to write to one replica and 50% to the other. > > -b > > > > > Thanks > > Pat > > > On 06/02/2017 01:07 AM, Ben Turner wrote: > > Are you sure using conv=sync is what you want? I normally use > conv=fdatasync, I'll look up the difference between the two and see if > it > affects your test. > > > -b > > ----- Original Message ----- > > From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> <pkarampu at redhat.com> > Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>,gluster-users at gluster.org, > "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>, "Ben > Turner" <bturner at redhat.com> <bturner at redhat.com> > Sent: Tuesday, May 30, 2017 9:40:34 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Pranith, > > The "dd" command was: > > dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > > There were 2 instances where dd reported 22 seconds. The output from > the > dd tests are in > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > Pat > > On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: > > Pat, > What is the command you used? As per the following output, > it > seems like at least one write operation took 16 seconds. Which is > really bad. > 96.39 1165.10 us 89.00 us*16487014.00 us* > 393212 > WRITE > > > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu<mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > I ran the same 'dd' test both in the gluster test volume and > in > the .glusterfs directory of each brick. The median results > (12 > dd > trials in each test) are similar to before > > * gluster test volume: 586.5 MB/s > * bricks (in .glusterfs): 1.4 GB/s > > The profile for the gluster test-volume is in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> > > Thanks > > Pat > > > > > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: > > Let's start with the same 'dd' test we were testing with to > see, > what the numbers are. Please provide profile numbers for the > same. From there on we will start tuning the volume to see > what > we can do. > > On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > Thanks for the tip. We now have the gluster volume > mounted > under /home. What tests do you recommend we run? > > Thanks > > Pat > > > > On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: > > On Tue, May 16, 2017 at 9:20 PM, Pat Haley > <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > Sorry for the delay. I never saw received your > reply > (but I did receive Ben Turner's follow-up to your > reply). So we tried to create a gluster volume > under > /home using different variations of > > gluster volume create test-volume > mseas-data2:/home/gbrick_test_1 > mseas-data2:/home/gbrick_test_2 transport tcp > > However we keep getting errors of the form > > Wrong brick type: transport, use > <HOSTNAME>:<export-dir-abs-path> > > Any thoughts on what we're doing wrong? > > > You should give transport tcp at the beginning I think. > Anyways, transport tcp is the default, so no need to > specify > so remove those two words from the CLI. > > > Also do you have a list of the test we should be > running > once we get this volume created? Given the > time-zone > difference it might help if we can run a small > battery > of tests and post the results rather than > test-post-new > test-post... . > > > This is the first time I am doing performance analysis > on > users as far as I remember. In our team there are > separate > engineers who do these tests. Ben who replied earlier is > one > such engineer. > > Ben, > Have any suggestions? > > > Thanks > > Pat > > > > On 05/11/2017 12:06 PM, Pranith Kumar Karampuri > wrote: > > On Thu, May 11, 2017 at 9:32 PM, Pat Haley > <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > The /home partition is mounted as ext4 > /home ext4 defaults,usrquota,grpquota 1 2 > > The brick partitions are mounted ax xfs > /mnt/brick1 xfs defaults 0 0 > /mnt/brick2 xfs defaults 0 0 > > Will this cause a problem with creating a > volume > under /home? > > > I don't think the bottleneck is disk. You can do > the > same tests you did on your new volume to confirm? > > > Pat > > > > On 05/11/2017 11:32 AM, Pranith Kumar Karampuri > wrote: > > On Thu, May 11, 2017 at 8:57 PM, Pat Haley > <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> > wrote: > > > Hi Pranith, > > Unfortunately, we don't have similar > hardware > for a small scale test. All we have is > our > production hardware. > > > You said something about /home partition which > has > lesser disks, we can create plain distribute > volume inside one of those directories. After > we > are done, we can remove the setup. What do you > say? > > > Pat > > > > > On 05/11/2017 07:05 AM, Pranith Kumar > Karampuri wrote: > > On Thu, May 11, 2017 at 2:48 AM, Pat > Haley > <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> > wrote: > > > Hi Pranith, > > Since we are mounting the partitions > as > the bricks, I tried the dd test > writing > to > <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > The results without oflag=sync were > 1.6 > Gb/s (faster than gluster but not as > fast > as I was expecting given the 1.2 Gb/s > to > the no-gluster area w/ fewer disks). > > > Okay, then 1.6Gb/s is what we need to > target > for, considering your volume is just > distribute. Is there any way you can do > tests > on similar hardware but at a small scale? > Just so we can run the workload to learn > more > about the bottlenecks in the system? We > can > probably try to get the speed to 1.2Gb/s > on > your /home partition you were telling me > yesterday. Let me know if that is > something > you are okay to do. > > > Pat > > > > On 05/10/2017 01:27 PM, Pranith Kumar > Karampuri wrote: > > On Wed, May 10, 2017 at 10:15 PM, > Pat > Haley <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Hi Pranith, > > Not entirely sure (this isn't my > area of expertise). I'll run > your > answer by some other people who > are > more familiar with this. > > I am also uncertain about how to > interpret the results when we > also > add the dd tests writing to the > /home area (no gluster, still on > the > same machine) > > * dd test without oflag=sync > (rough average of multiple > tests) > o gluster w/ fuse mount : > 570 > Mb/s > o gluster w/ nfs mount: > 390 > Mb/s > o nfs (no gluster): 1.2 > Gb/s > * dd test with oflag=sync > (rough > average of multiple tests) > o gluster w/ fuse mount: > 5 > Mb/s > o gluster w/ nfs mount: > 200 > Mb/s > o nfs (no gluster): 20 > Mb/s > > Given that the non-gluster area > is > a > RAID-6 of 4 disks while each > brick > of the gluster area is a RAID-6 > of > 32 disks, I would naively expect > the > writes to the gluster area to be > roughly 8x faster than to the > non-gluster. > > > I think a better test is to try and > write to a file using nfs without > any > gluster to a location that is not > inside > the brick but someother location > that > is > on same disk(s). If you are mounting > the > partition as the brick, then we can > write to a file inside .glusterfs > directory, something like > <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > > > > I still think we have a speed > issue, > I can't tell if fuse vs nfs is > part > of the problem. > > > I got interested in the post because > I > read that fuse speed is lesser than > nfs > speed which is counter-intuitive to > my > understanding. So wanted > clarifications. > Now that I got my clarifications > where > fuse outperformed nfs without sync, > we > can resume testing as described > above > and try to find what it is. Based on > your email-id I am guessing you are > from > Boston and I am from Bangalore so if > you > are okay with doing this debugging > for > multiple days because of timezones, > I > will be happy to help. Please be a > bit > patient with me, I am under a > release > crunch but I am very curious with > the > problem you posted. > > Was there anything useful in the > profiles? > > > Unfortunately profiles didn't help > me > much, I think we are collecting the > profiles from an active volume, so > it > has a lot of information that is not > pertaining to dd so it is difficult > to > find the contributions of dd. So I > went > through your post again and found > something I didn't pay much > attention > to > earlier i.e. oflag=sync, so did my > own > tests on my setup with FUSE so sent > that > reply. > > > Pat > > > > On 05/10/2017 12:15 PM, Pranith > Kumar Karampuri wrote: > > Okay good. At least this > validates > my doubts. Handling O_SYNC in > gluster NFS and fuse is a bit > different. > When application opens a file > with > O_SYNC on fuse mount then each > write syscall has to be written > to > disk as part of the syscall > where > as in case of NFS, there is no > concept of open. NFS performs > write > though a handle saying it needs > to > be a synchronous write, so > write() > syscall is performed first then > it > performs fsync(). so an write > on > an > fd with O_SYNC becomes > write+fsync. > I am suspecting that when > multiple > threads do this write+fsync() > operation on the same file, > multiple writes are batched > together to be written do disk > so > the throughput on the disk is > increasing is my guess. > > Does it answer your doubts? > > On Wed, May 10, 2017 at 9:35 > PM, > Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote: > > > Without the oflag=sync and > only > a single test of each, the > FUSE > is going faster than NFS: > > FUSE: > mseas-data2(dri_nascar)% dd > if=/dev/zero count=4096 > bs=1048576 of=zeros.txt > conv=sync > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) > copied, 7.46961 s, 575 MB/s > > > NFS > mseas-data2(HYCOM)% dd > if=/dev/zero count=4096 > bs=1048576 of=zeros.txt > conv=sync > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) > copied, 11.4264 s, 376 MB/s > > > > On 05/10/2017 11:53 AM, > Pranith > Kumar Karampuri wrote: > > Could you let me know the > speed without oflag=sync > on > both the mounts? No need > to > collect profiles. > > On Wed, May 10, 2017 at > 9:17 > PM, Pat Haley > <phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu>> > wrote: > > > Here is what I see > now: > > [root at mseas-data2 ~]# > gluster volume info > > Volume Name: > data-volume > Type: Distribute > Volume ID: > c162161e-2a2d-4dac-b015-f31fd89ceb18 > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: > mseas-data2:/mnt/brick1 > Brick2: > mseas-data2:/mnt/brick2 > Options Reconfigured: > diagnostics.count-fop-hits: > on > diagnostics.latency-measurement: > on > nfs.exports-auth-enable: > on > diagnostics.brick-sys-log-level: > WARNING > performance.readdir-ahead: > on > nfs.disable: on > nfs.export-volumes: > off > > > > On 05/10/2017 11:44 > AM, > Pranith Kumar > Karampuri > wrote: > > Is this the volume > info > you have? > > >/[root at > >mseas-data2 > <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users> > ~]# gluster volume > info > />//>/Volume Name: > data-volume />/Type: > Distribute />/Volume > ID: > c162161e-2a2d-4dac-b015-f31fd89ceb18 > />/Status: Started > />/Number > of Bricks: 2 > />/Transport-type: > tcp > />/Bricks: />/Brick1: > mseas-data2:/mnt/brick1 > />/Brick2: > mseas-data2:/mnt/brick2 > />/Options > Reconfigured: > />/performance.readdir-ahead: > on />/nfs.disable: on > />/nfs.export-volumes: > off > / > ?I copied this from > old > thread from 2016. > This > is > distribute volume. > Did > you change any of the > options in between? > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean > Engineering > Phone: (617) 253-6824 > Dept. of Mechanical > Engineering > Fax: (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts > Avenue > Cambridge, MA > 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean > Engineering > Phone: (617) 253-6824 > Dept. of Mechanical > Engineering > Fax: (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering > Phone: > (617) 253-6824 > Dept. of Mechanical Engineering > Fax: > (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering > Phone: > (617) 253-6824 > Dept. of Mechanical Engineering > Fax: > (617) 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: > (617) > 253-6824 > Dept. of Mechanical Engineering Fax: > (617) > 253-8125 > MIT, Room > 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: > (617) > 253-6824 > Dept. of Mechanical Engineering Fax: > (617) > 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley > Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: (617) > 253-6824 > Dept. of Mechanical Engineering Fax: (617) > 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email:phaley at mit.edu > <mailto:phaley at mit.edu> <phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users > > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170623/6d1055ce/attachment.html>