Pranith Kumar Karampuri
2017-May-31 01:54 UTC
[Gluster-users] Slow write times to gluster disk
Thanks this is good information.
+Soumya
Soumya,
We are trying to find why kNFS is performing way better than plain
distribute glusterfs+fuse. What information do you think will benefit us to
compare the operations with kNFS vs gluster+fuse? We already have profile
output from fuse.
On Wed, May 31, 2017 at 7:10 AM, Pat Haley <phaley at mit.edu> wrote:
>
> Hi Pranith,
>
> The "dd" command was:
>
> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>
> There were 2 instances where dd reported 22 seconds. The output from the
> dd tests are in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/
> dd_testvol_gluster.txt
>
> Pat
>
>
> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>
> Pat,
> What is the command you used? As per the following output, it seems
> like at least one write operation took 16 seconds. Which is really bad.
>
> 96.39 1165.10 us 89.00 us *16487014.00 us* 393212
WRITE
>
>
>
> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu>
wrote:
>
>>
>> Hi Pranith,
>>
>> I ran the same 'dd' test both in the gluster test volume and in
the
>> .glusterfs directory of each brick. The median results (12 dd trials
in
>> each test) are similar to before
>>
>> - gluster test volume: 586.5 MB/s
>> - bricks (in .glusterfs): 1.4 GB/s
>>
>> The profile for the gluster test-volume is in
>>
>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/pr
>> ofile_testvol_gluster.txt
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>>
>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>
>> Let's start with the same 'dd' test we were testing with to
see, what the
>> numbers are. Please provide profile numbers for the same. From there on
we
>> will start tuning the volume to see what we can do.
>>
>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu>
wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Thanks for the tip. We now have the gluster volume mounted under
>>> /home. What tests do you recommend we run?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at
mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Sorry for the delay. I never saw received your reply (but I
did
>>>> receive Ben Turner's follow-up to your reply). So we tried
to create a
>>>> gluster volume under /home using different variations of
>>>>
>>>> gluster volume create test-volume
mseas-data2:/home/gbrick_test_1
>>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>>
>>>> However we keep getting errors of the form
>>>>
>>>> Wrong brick type: transport, use
<HOSTNAME>:<export-dir-abs-path>
>>>>
>>>> Any thoughts on what we're doing wrong?
>>>>
>>>
>>> You should give transport tcp at the beginning I think. Anyways,
>>> transport tcp is the default, so no need to specify so remove those
two
>>> words from the CLI.
>>>
>>>>
>>>> Also do you have a list of the test we should be running once
we get
>>>> this volume created? Given the time-zone difference it might
help if we
>>>> can run a small battery of tests and post the results rather
than
>>>> test-post-new test-post... .
>>>>
>>>
>>> This is the first time I am doing performance analysis on users as
far
>>> as I remember. In our team there are separate engineers who do
these tests.
>>> Ben who replied earlier is one such engineer.
>>>
>>> Ben,
>>> Have any suggestions?
>>>
>>>
>>>>
>>>> Thanks
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at
mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> The /home partition is mounted as ext4
>>>>> /home ext4 defaults,usrquota,grpquota
1 2
>>>>>
>>>>> The brick partitions are mounted ax xfs
>>>>> /mnt/brick1 xfs defaults 0 0
>>>>> /mnt/brick2 xfs defaults 0 0
>>>>>
>>>>> Will this cause a problem with creating a volume under
/home?
>>>>>
>>>>
>>>> I don't think the bottleneck is disk. You can do the same
tests you did
>>>> on your new volume to confirm?
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at
mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Unfortunately, we don't have similar hardware for a
small scale
>>>>>> test. All we have is our production hardware.
>>>>>>
>>>>>
>>>>> You said something about /home partition which has lesser
disks, we
>>>>> can create plain distribute volume inside one of those
directories. After
>>>>> we are done, we can remove the setup. What do you say?
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley
at mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Since we are mounting the partitions as the bricks,
I tried the dd
>>>>>>> test writing to
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>> The results without oflag=sync were 1.6 Gb/s
(faster than gluster but not
>>>>>>> as fast as I was expecting given the 1.2 Gb/s to
the no-gluster area w/
>>>>>>> fewer disks).
>>>>>>>
>>>>>>
>>>>>> Okay, then 1.6Gb/s is what we need to target for,
considering your
>>>>>> volume is just distribute. Is there any way you can do
tests on similar
>>>>>> hardware but at a small scale? Just so we can run the
workload to learn
>>>>>> more about the bottlenecks in the system? We can
probably try to get the
>>>>>> speed to 1.2Gb/s on your /home partition you were
telling me yesterday. Let
>>>>>> me know if that is something you are okay to do.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri
wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley
<phaley at mit.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pranith,
>>>>>>>>
>>>>>>>> Not entirely sure (this isn't my area of
expertise). I'll run your
>>>>>>>> answer by some other people who are more
familiar with this.
>>>>>>>>
>>>>>>>> I am also uncertain about how to interpret the
results when we also
>>>>>>>> add the dd tests writing to the /home area (no
gluster, still on the same
>>>>>>>> machine)
>>>>>>>>
>>>>>>>> - dd test without oflag=sync (rough average
of multiple tests)
>>>>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>>>>> - dd test with oflag=sync (rough average of
multiple tests)
>>>>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>>>>> - nfs (no gluster): 20 Mb/s
>>>>>>>>
>>>>>>>> Given that the non-gluster area is a RAID-6 of
4 disks while each
>>>>>>>> brick of the gluster area is a RAID-6 of 32
disks, I would naively expect
>>>>>>>> the writes to the gluster area to be roughly 8x
faster than to the
>>>>>>>> non-gluster.
>>>>>>>>
>>>>>>>
>>>>>>> I think a better test is to try and write to a file
using nfs
>>>>>>> without any gluster to a location that is not
inside the brick but
>>>>>>> someother location that is on same disk(s). If you
are mounting the
>>>>>>> partition as the brick, then we can write to a file
inside .glusterfs
>>>>>>> directory, something like
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I still think we have a speed issue, I
can't tell if fuse vs nfs is
>>>>>>>> part of the problem.
>>>>>>>>
>>>>>>>
>>>>>>> I got interested in the post because I read that
fuse speed is
>>>>>>> lesser than nfs speed which is counter-intuitive to
my understanding. So
>>>>>>> wanted clarifications. Now that I got my
clarifications where fuse
>>>>>>> outperformed nfs without sync, we can resume
testing as described above and
>>>>>>> try to find what it is. Based on your email-id I am
guessing you are from
>>>>>>> Boston and I am from Bangalore so if you are okay
with doing this debugging
>>>>>>> for multiple days because of timezones, I will be
happy to help. Please be
>>>>>>> a bit patient with me, I am under a release crunch
but I am very curious
>>>>>>> with the problem you posted.
>>>>>>>
>>>>>>> Was there anything useful in the profiles?
>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately profiles didn't help me much, I
think we are
>>>>>>> collecting the profiles from an active volume, so
it has a lot of
>>>>>>> information that is not pertaining to dd so it is
difficult to find the
>>>>>>> contributions of dd. So I went through your post
again and found something
>>>>>>> I didn't pay much attention to earlier i.e.
oflag=sync, so did my own tests
>>>>>>> on my setup with FUSE so sent that reply.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri
wrote:
>>>>>>>>
>>>>>>>> Okay good. At least this validates my doubts.
Handling O_SYNC in
>>>>>>>> gluster NFS and fuse is a bit different.
>>>>>>>> When application opens a file with O_SYNC on
fuse mount then each
>>>>>>>> write syscall has to be written to disk as part
of the syscall where as in
>>>>>>>> case of NFS, there is no concept of open. NFS
performs write though a
>>>>>>>> handle saying it needs to be a synchronous
write, so write() syscall is
>>>>>>>> performed first then it performs fsync(). so an
write on an fd with O_SYNC
>>>>>>>> becomes write+fsync. I am suspecting that when
multiple threads do this
>>>>>>>> write+fsync() operation on the same file,
multiple writes are batched
>>>>>>>> together to be written do disk so the
throughput on the disk is increasing
>>>>>>>> is my guess.
>>>>>>>>
>>>>>>>> Does it answer your doubts?
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
<phaley at mit.edu> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Without the oflag=sync and only a single
test of each, the FUSE is
>>>>>>>>> going faster than NFS:
>>>>>>>>>
>>>>>>>>> FUSE:
>>>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero
count=4096 bs=1048576
>>>>>>>>> of=zeros.txt conv=sync
>>>>>>>>> 4096+0 records in
>>>>>>>>> 4096+0 records out
>>>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961
s, 575 MB/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> NFS
>>>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero
count=4096 bs=1048576
>>>>>>>>> of=zeros.txt conv=sync
>>>>>>>>> 4096+0 records in
>>>>>>>>> 4096+0 records out
>>>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264
s, 376 MB/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar
Karampuri wrote:
>>>>>>>>>
>>>>>>>>> Could you let me know the speed without
oflag=sync on both the
>>>>>>>>> mounts? No need to collect profiles.
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
<phaley at mit.edu> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is what I see now:
>>>>>>>>>>
>>>>>>>>>> [root at mseas-data2 ~]# gluster volume
info
>>>>>>>>>>
>>>>>>>>>> Volume Name: data-volume
>>>>>>>>>> Type: Distribute
>>>>>>>>>> Volume ID:
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> Status: Started
>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>> Transport-type: tcp
>>>>>>>>>> Bricks:
>>>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>>>> Options Reconfigured:
>>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>>> diagnostics.brick-sys-log-level:
WARNING
>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>> nfs.disable: on
>>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar
Karampuri wrote:
>>>>>>>>>>
>>>>>>>>>> Is this the volume info you have?
>>>>>>>>>>
>>>>>>>>>> >* [root at mseas-data2
<http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume
info
>>>>>>>>>> *>>* Volume Name: data-volume
>>>>>>>>>> *>* Type: Distribute
>>>>>>>>>> *>* Volume ID:
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> *>* Status: Started
>>>>>>>>>> *>* Number of Bricks: 2
>>>>>>>>>> *>* Transport-type: tcp
>>>>>>>>>> *>* Bricks:
>>>>>>>>>> *>* Brick1:
mseas-data2:/mnt/brick1
>>>>>>>>>> *>* Brick2:
mseas-data2:/mnt/brick2
>>>>>>>>>> *>* Options Reconfigured:
>>>>>>>>>> *>* performance.readdir-ahead:
on
>>>>>>>>>> *>* nfs.disable: on
>>>>>>>>>> *>* nfs.export-volumes: off
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>> ?I copied this from old thread from
2016. This is distribute
>>>>>>>>>> volume. Did you change any of the
options in between?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>>> Pat Haley
Email: phaley at mit.edu
>>>>>>>>>> Center for Ocean Engineering
Phone: (617) 253-6824
>>>>>>>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email:
phaley at mit.edu
>>>>>>>>> Center for Ocean Engineering Phone:
(617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email:
phaley at mit.edu
>>>>>>>> Center for Ocean Engineering Phone:
(617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: phaley
at mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617)
253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617)
253-8125
>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: phaley at
mit.edu
>>>>>> Center for Ocean Engineering Phone: (617)
253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617)
253-8125
>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: phaley at
mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: phaley at mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: phaley at mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: phaley at mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>
>
> --
> Pranith
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: phaley at mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170531/b82920ee/attachment.html>
On 05/31/2017 07:24 AM, Pranith Kumar Karampuri wrote:> Thanks this is good information. > > +Soumya > > Soumya, > We are trying to find why kNFS is performing way better than > plain distribute glusterfs+fuse. What information do you think will > benefit us to compare the operations with kNFS vs gluster+fuse? We > already have profile output from fuse. >Could be because all operations done by kNFS are local to the system. The operations done by FUSE mount over network could be more in number and time-consuming than the ones sent by NFS-client. We could compare and examine the pattern from tcpump taken over fuse-mount and NFS-mount. Also nfsstat [1] may give some clue. Sorry I hadn't followed this mail from the beginning. But is this comparison between single brick volume and kNFS exporting that brick? Otherwise its not a fair comparison if the volume is replicated or distributed. Thanks, Soumya [1] https://linux.die.net/man/8/nfsstat> > On Wed, May 31, 2017 at 7:10 AM, Pat Haley <phaley at mit.edu > <mailto:phaley at mit.edu>> wrote: > > > Hi Pranith, > > The "dd" command was: > > dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > > There were 2 instances where dd reported 22 seconds. The output from > the dd tests are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt> > > Pat > > > On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: >> Pat, >> What is the command you used? As per the following output, >> it seems like at least one write operation took 16 seconds. Which >> is really bad. >> 96.39 1165.10 us 89.00 us *16487014.00 us* 393212 WRITE >> >> >> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu >> <mailto:phaley at mit.edu>> wrote: >> >> >> Hi Pranith, >> >> I ran the same 'dd' test both in the gluster test volume and >> in the .glusterfs directory of each brick. The median results >> (12 dd trials in each test) are similar to before >> >> * gluster test volume: 586.5 MB/s >> * bricks (in .glusterfs): 1.4 GB/s >> >> The profile for the gluster test-volume is in >> >> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt >> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> >> >> Thanks >> >> Pat >> >> >> >> >> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: >>> Let's start with the same 'dd' test we were testing with to >>> see, what the numbers are. Please provide profile numbers for >>> the same. From there on we will start tuning the volume to >>> see what we can do. >>> >>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu >>> <mailto:phaley at mit.edu>> wrote: >>> >>> >>> Hi Pranith, >>> >>> Thanks for the tip. We now have the gluster volume >>> mounted under /home. What tests do you recommend we run? >>> >>> Thanks >>> >>> Pat >>> >>> >>> >>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: >>>> >>>> >>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley >>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>> >>>> >>>> Hi Pranith, >>>> >>>> Sorry for the delay. I never saw received your >>>> reply (but I did receive Ben Turner's follow-up to >>>> your reply). So we tried to create a gluster volume >>>> under /home using different variations of >>>> >>>> gluster volume create test-volume >>>> mseas-data2:/home/gbrick_test_1 >>>> mseas-data2:/home/gbrick_test_2 transport tcp >>>> >>>> However we keep getting errors of the form >>>> >>>> Wrong brick type: transport, use >>>> <HOSTNAME>:<export-dir-abs-path> >>>> >>>> Any thoughts on what we're doing wrong? >>>> >>>> >>>> You should give transport tcp at the beginning I think. >>>> Anyways, transport tcp is the default, so no need to >>>> specify so remove those two words from the CLI. >>>> >>>> >>>> Also do you have a list of the test we should be >>>> running once we get this volume created? Given the >>>> time-zone difference it might help if we can run a >>>> small battery of tests and post the results rather >>>> than test-post-new test-post... . >>>> >>>> >>>> This is the first time I am doing performance analysis >>>> on users as far as I remember. In our team there are >>>> separate engineers who do these tests. Ben who replied >>>> earlier is one such engineer. >>>> >>>> Ben, >>>> Have any suggestions? >>>> >>>> >>>> >>>> Thanks >>>> >>>> Pat >>>> >>>> >>>> >>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote: >>>>> >>>>> >>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley >>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>> >>>>> >>>>> Hi Pranith, >>>>> >>>>> The /home partition is mounted as ext4 >>>>> /home ext4 >>>>> defaults,usrquota,grpquota 1 2 >>>>> >>>>> The brick partitions are mounted ax xfs >>>>> /mnt/brick1 xfs defaults 0 0 >>>>> /mnt/brick2 xfs defaults 0 0 >>>>> >>>>> Will this cause a problem with creating a >>>>> volume under /home? >>>>> >>>>> >>>>> I don't think the bottleneck is disk. You can do >>>>> the same tests you did on your new volume to confirm? >>>>> >>>>> >>>>> >>>>> Pat >>>>> >>>>> >>>>> >>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri >>>>> wrote: >>>>>> >>>>>> >>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley >>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: >>>>>> >>>>>> >>>>>> Hi Pranith, >>>>>> >>>>>> Unfortunately, we don't have similar >>>>>> hardware for a small scale test. All we >>>>>> have is our production hardware. >>>>>> >>>>>> >>>>>> You said something about /home partition which >>>>>> has lesser disks, we can create plain >>>>>> distribute volume inside one of those >>>>>> directories. After we are done, we can remove >>>>>> the setup. What do you say? >>>>>> >>>>>> >>>>>> >>>>>> Pat >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 05/11/2017 07:05 AM, Pranith Kumar >>>>>> Karampuri wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat >>>>>>> Haley <phaley at mit.edu >>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Pranith, >>>>>>> >>>>>>> Since we are mounting the partitions >>>>>>> as the bricks, I tried the dd test >>>>>>> writing to >>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>> The results without oflag=sync were >>>>>>> 1.6 Gb/s (faster than gluster but not >>>>>>> as fast as I was expecting given the >>>>>>> 1.2 Gb/s to the no-gluster area w/ >>>>>>> fewer disks). >>>>>>> >>>>>>> >>>>>>> Okay, then 1.6Gb/s is what we need to >>>>>>> target for, considering your volume is >>>>>>> just distribute. Is there any way you can >>>>>>> do tests on similar hardware but at a >>>>>>> small scale? Just so we can run the >>>>>>> workload to learn more about the >>>>>>> bottlenecks in the system? We can >>>>>>> probably try to get the speed to 1.2Gb/s >>>>>>> on your /home partition you were telling >>>>>>> me yesterday. Let me know if that is >>>>>>> something you are okay to do. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Pat >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar >>>>>>> Karampuri wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 10:15 PM, >>>>>>>> Pat Haley <phaley at mit.edu >>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Pranith, >>>>>>>> >>>>>>>> Not entirely sure (this isn't my >>>>>>>> area of expertise). I'll run >>>>>>>> your answer by some other people >>>>>>>> who are more familiar with this. >>>>>>>> >>>>>>>> I am also uncertain about how to >>>>>>>> interpret the results when we >>>>>>>> also add the dd tests writing to >>>>>>>> the /home area (no gluster, >>>>>>>> still on the same machine) >>>>>>>> >>>>>>>> * dd test without oflag=sync >>>>>>>> (rough average of multiple >>>>>>>> tests) >>>>>>>> o gluster w/ fuse mount : >>>>>>>> 570 Mb/s >>>>>>>> o gluster w/ nfs mount: >>>>>>>> 390 Mb/s >>>>>>>> o nfs (no gluster): 1.2 Gb/s >>>>>>>> * dd test with oflag=sync >>>>>>>> (rough average of multiple >>>>>>>> tests) >>>>>>>> o gluster w/ fuse mount: >>>>>>>> 5 Mb/s >>>>>>>> o gluster w/ nfs mount: >>>>>>>> 200 Mb/s >>>>>>>> o nfs (no gluster): 20 Mb/s >>>>>>>> >>>>>>>> Given that the non-gluster area >>>>>>>> is a RAID-6 of 4 disks while >>>>>>>> each brick of the gluster area >>>>>>>> is a RAID-6 of 32 disks, I would >>>>>>>> naively expect the writes to the >>>>>>>> gluster area to be roughly 8x >>>>>>>> faster than to the non-gluster. >>>>>>>> >>>>>>>> >>>>>>>> I think a better test is to try and >>>>>>>> write to a file using nfs without >>>>>>>> any gluster to a location that is >>>>>>>> not inside the brick but someother >>>>>>>> location that is on same disk(s). If >>>>>>>> you are mounting the partition as >>>>>>>> the brick, then we can write to a >>>>>>>> file inside .glusterfs directory, >>>>>>>> something like >>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I still think we have a speed >>>>>>>> issue, I can't tell if fuse vs >>>>>>>> nfs is part of the problem. >>>>>>>> >>>>>>>> >>>>>>>> I got interested in the post because >>>>>>>> I read that fuse speed is lesser >>>>>>>> than nfs speed which is >>>>>>>> counter-intuitive to my >>>>>>>> understanding. So wanted >>>>>>>> clarifications. Now that I got my >>>>>>>> clarifications where fuse >>>>>>>> outperformed nfs without sync, we >>>>>>>> can resume testing as described >>>>>>>> above and try to find what it is. >>>>>>>> Based on your email-id I am guessing >>>>>>>> you are from Boston and I am from >>>>>>>> Bangalore so if you are okay with >>>>>>>> doing this debugging for multiple >>>>>>>> days because of timezones, I will be >>>>>>>> happy to help. Please be a bit >>>>>>>> patient with me, I am under a >>>>>>>> release crunch but I am very curious >>>>>>>> with the problem you posted. >>>>>>>> >>>>>>>> Was there anything useful in >>>>>>>> the profiles? >>>>>>>> >>>>>>>> >>>>>>>> Unfortunately profiles didn't help >>>>>>>> me much, I think we are collecting >>>>>>>> the profiles from an active volume, >>>>>>>> so it has a lot of information that >>>>>>>> is not pertaining to dd so it is >>>>>>>> difficult to find the contributions >>>>>>>> of dd. So I went through your post >>>>>>>> again and found something I didn't >>>>>>>> pay much attention to earlier i.e. >>>>>>>> oflag=sync, so did my own tests on >>>>>>>> my setup with FUSE so sent that reply. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Pat >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 05/10/2017 12:15 PM, Pranith >>>>>>>> Kumar Karampuri wrote: >>>>>>>>> Okay good. At least this >>>>>>>>> validates my doubts. Handling >>>>>>>>> O_SYNC in gluster NFS and fuse >>>>>>>>> is a bit different. >>>>>>>>> When application opens a file >>>>>>>>> with O_SYNC on fuse mount then >>>>>>>>> each write syscall has to be >>>>>>>>> written to disk as part of the >>>>>>>>> syscall where as in case of >>>>>>>>> NFS, there is no concept of >>>>>>>>> open. NFS performs write though >>>>>>>>> a handle saying it needs to be >>>>>>>>> a synchronous write, so write() >>>>>>>>> syscall is performed first then >>>>>>>>> it performs fsync(). so an >>>>>>>>> write on an fd with O_SYNC >>>>>>>>> becomes write+fsync. I am >>>>>>>>> suspecting that when multiple >>>>>>>>> threads do this write+fsync() >>>>>>>>> operation on the same file, >>>>>>>>> multiple writes are batched >>>>>>>>> together to be written do disk >>>>>>>>> so the throughput on the disk >>>>>>>>> is increasing is my guess. >>>>>>>>> >>>>>>>>> Does it answer your doubts? >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 9:35 >>>>>>>>> PM, Pat Haley <phaley at mit.edu >>>>>>>>> <mailto:phaley at mit.edu>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Without the oflag=sync and >>>>>>>>> only a single test of each, >>>>>>>>> the FUSE is going faster >>>>>>>>> than NFS: >>>>>>>>> >>>>>>>>> FUSE: >>>>>>>>> mseas-data2(dri_nascar)% dd >>>>>>>>> if=/dev/zero count=4096 >>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>> conv=sync >>>>>>>>> 4096+0 records in >>>>>>>>> 4096+0 records out >>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>> copied, 7.46961 s, 575 MB/s >>>>>>>>> >>>>>>>>> >>>>>>>>> NFS >>>>>>>>> mseas-data2(HYCOM)% dd >>>>>>>>> if=/dev/zero count=4096 >>>>>>>>> bs=1048576 of=zeros.txt >>>>>>>>> conv=sync >>>>>>>>> 4096+0 records in >>>>>>>>> 4096+0 records out >>>>>>>>> 4294967296 bytes (4.3 GB) >>>>>>>>> copied, 11.4264 s, 376 MB/s >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 05/10/2017 11:53 AM, >>>>>>>>> Pranith Kumar Karampuri wrote: >>>>>>>>>> Could you let me know the >>>>>>>>>> speed without oflag=sync >>>>>>>>>> on both the mounts? No >>>>>>>>>> need to collect profiles. >>>>>>>>>> >>>>>>>>>> On Wed, May 10, 2017 at >>>>>>>>>> 9:17 PM, Pat Haley >>>>>>>>>> <phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Here is what I see now: >>>>>>>>>> >>>>>>>>>> [root at mseas-data2 ~]# >>>>>>>>>> gluster volume info >>>>>>>>>> >>>>>>>>>> Volume Name: data-volume >>>>>>>>>> Type: Distribute >>>>>>>>>> Volume ID: >>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>> Status: Started >>>>>>>>>> Number of Bricks: 2 >>>>>>>>>> Transport-type: tcp >>>>>>>>>> Bricks: >>>>>>>>>> Brick1: >>>>>>>>>> mseas-data2:/mnt/brick1 >>>>>>>>>> Brick2: >>>>>>>>>> mseas-data2:/mnt/brick2 >>>>>>>>>> Options Reconfigured: >>>>>>>>>> diagnostics.count-fop-hits: >>>>>>>>>> on >>>>>>>>>> diagnostics.latency-measurement: >>>>>>>>>> on >>>>>>>>>> nfs.exports-auth-enable: >>>>>>>>>> on >>>>>>>>>> diagnostics.brick-sys-log-level: >>>>>>>>>> WARNING >>>>>>>>>> performance.readdir-ahead: >>>>>>>>>> on >>>>>>>>>> nfs.disable: on >>>>>>>>>> nfs.export-volumes: off >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/10/2017 11:44 >>>>>>>>>> AM, Pranith Kumar >>>>>>>>>> Karampuri wrote: >>>>>>>>>>> Is this the volume >>>>>>>>>>> info you have? >>>>>>>>>>> >>>>>>>>>>> >/[root at mseas-data2 >>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>> ~]# gluster volume info />//>/Volume Name: >>>>>>>>>>> data-volume />/Type: Distribute />/Volume ID: >>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 >>>>>>>>>>> />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: >>>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2: >>>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: >>>>>>>>>>> on />/nfs.disable: on />/nfs.export-volumes: off / >>>>>>>>>>> ?I copied this from >>>>>>>>>>> old thread from 2016. >>>>>>>>>>> This is distribute >>>>>>>>>>> volume. Did you >>>>>>>>>>> change any of the >>>>>>>>>>> options in between? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>>>> 77 Massachusetts Avenue >>>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Pranith >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>>> <mailto:phaley at mit.edu> >>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>>> 77 Massachusetts Avenue >>>>>>>>> Cambridge, MA 02139-4301 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>>> Pat Haley Email: phaley at mit.edu >>>>>>>> <mailto:phaley at mit.edu> >>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>>> 77 Massachusetts Avenue >>>>>>>> Cambridge, MA 02139-4301 >>>>>>>> >>>>>>>> -- >>>>>>>> Pranith >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>>> Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> >>>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>>> 77 Massachusetts Avenue >>>>>>> Cambridge, MA 02139-4301 >>>>>>> >>>>>>> -- >>>>>>> Pranith >>>>>> >>>>>> -- >>>>>> >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>>> Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> >>>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>>> 77 Massachusetts Avenue >>>>>> Cambridge, MA 02139-4301 >>>>>> >>>>>> -- >>>>>> Pranith >>>>> >>>>> -- >>>>> >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>>> Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> >>>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>>> 77 Massachusetts Avenue >>>>> Cambridge, MA 02139-4301 >>>>> >>>>> -- >>>>> Pranith >>>> >>>> -- >>>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> >>>> >>>> >>>> >>>> -- >>>> Pranith >>> >>> -- >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> >>> >>> -- >>> Pranith >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> >> >> >> -- >> Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu <mailto:phaley at mit.edu> > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > > -- > Pranith