Pranith Kumar Karampuri
2017-May-31 01:27 UTC
[Gluster-users] Slow write times to gluster disk
Pat,
What is the command you used? As per the following output, it seems
like at least one write operation took 16 seconds. Which is really bad.
96.39 1165.10 us 89.00 us *16487014.00 us* 393212
WRITE
On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu> wrote:
>
> Hi Pranith,
>
> I ran the same 'dd' test both in the gluster test volume and in the
> .glusterfs directory of each brick. The median results (12 dd trials in
> each test) are similar to before
>
> - gluster test volume: 586.5 MB/s
> - bricks (in .glusterfs): 1.4 GB/s
>
> The profile for the gluster test-volume is in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/
> profile_testvol_gluster.txt
>
> Thanks
>
> Pat
>
>
>
>
> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>
> Let's start with the same 'dd' test we were testing with to
see, what the
> numbers are. Please provide profile numbers for the same. From there on we
> will start tuning the volume to see what we can do.
>
> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu> wrote:
>
>>
>> Hi Pranith,
>>
>> Thanks for the tip. We now have the gluster volume mounted under
/home.
>> What tests do you recommend we run?
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu>
wrote:
>>
>>>
>>> Hi Pranith,
>>>
>>> Sorry for the delay. I never saw received your reply (but I did
receive
>>> Ben Turner's follow-up to your reply). So we tried to create a
gluster
>>> volume under /home using different variations of
>>>
>>> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>
>>> However we keep getting errors of the form
>>>
>>> Wrong brick type: transport, use
<HOSTNAME>:<export-dir-abs-path>
>>>
>>> Any thoughts on what we're doing wrong?
>>>
>>
>> You should give transport tcp at the beginning I think. Anyways,
>> transport tcp is the default, so no need to specify so remove those two
>> words from the CLI.
>>
>>>
>>> Also do you have a list of the test we should be running once we
get
>>> this volume created? Given the time-zone difference it might help
if we
>>> can run a small battery of tests and post the results rather than
>>> test-post-new test-post... .
>>>
>>
>> This is the first time I am doing performance analysis on users as far
as
>> I remember. In our team there are separate engineers who do these
tests.
>> Ben who replied earlier is one such engineer.
>>
>> Ben,
>> Have any suggestions?
>>
>>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at
mit.edu> wrote:
>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> The /home partition is mounted as ext4
>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>
>>>> The brick partitions are mounted ax xfs
>>>> /mnt/brick1 xfs defaults 0 0
>>>> /mnt/brick2 xfs defaults 0 0
>>>>
>>>> Will this cause a problem with creating a volume under /home?
>>>>
>>>
>>> I don't think the bottleneck is disk. You can do the same tests
you did
>>> on your new volume to confirm?
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at
mit.edu> wrote:
>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Unfortunately, we don't have similar hardware for a
small scale test.
>>>>> All we have is our production hardware.
>>>>>
>>>>
>>>> You said something about /home partition which has lesser
disks, we can
>>>> create plain distribute volume inside one of those directories.
After we
>>>> are done, we can remove the setup. What do you say?
>>>>
>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley at
mit.edu> wrote:
>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Since we are mounting the partitions as the bricks, I
tried the dd
>>>>>> test writing to
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>> The results without oflag=sync were 1.6 Gb/s (faster
than gluster but not
>>>>>> as fast as I was expecting given the 1.2 Gb/s to the
no-gluster area w/
>>>>>> fewer disks).
>>>>>>
>>>>>
>>>>> Okay, then 1.6Gb/s is what we need to target for,
considering your
>>>>> volume is just distribute. Is there any way you can do
tests on similar
>>>>> hardware but at a small scale? Just so we can run the
workload to learn
>>>>> more about the bottlenecks in the system? We can probably
try to get the
>>>>> speed to 1.2Gb/s on your /home partition you were telling
me yesterday. Let
>>>>> me know if that is something you are okay to do.
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <phaley
at mit.edu> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Not entirely sure (this isn't my area of
expertise). I'll run your
>>>>>>> answer by some other people who are more familiar
with this.
>>>>>>>
>>>>>>> I am also uncertain about how to interpret the
results when we also
>>>>>>> add the dd tests writing to the /home area (no
gluster, still on the same
>>>>>>> machine)
>>>>>>>
>>>>>>> - dd test without oflag=sync (rough average of
multiple tests)
>>>>>>> - gluster w/ fuse mount : 570 Mb/s
>>>>>>> - gluster w/ nfs mount: 390 Mb/s
>>>>>>> - nfs (no gluster): 1.2 Gb/s
>>>>>>> - dd test with oflag=sync (rough average of
multiple tests)
>>>>>>> - gluster w/ fuse mount: 5 Mb/s
>>>>>>> - gluster w/ nfs mount: 200 Mb/s
>>>>>>> - nfs (no gluster): 20 Mb/s
>>>>>>>
>>>>>>> Given that the non-gluster area is a RAID-6 of 4
disks while each
>>>>>>> brick of the gluster area is a RAID-6 of 32 disks,
I would naively expect
>>>>>>> the writes to the gluster area to be roughly 8x
faster than to the
>>>>>>> non-gluster.
>>>>>>>
>>>>>>
>>>>>> I think a better test is to try and write to a file
using nfs without
>>>>>> any gluster to a location that is not inside the brick
but someother
>>>>>> location that is on same disk(s). If you are mounting
the partition as the
>>>>>> brick, then we can write to a file inside .glusterfs
directory, something
>>>>>> like
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>
>>>>>>
>>>>>>> I still think we have a speed issue, I can't
tell if fuse vs nfs is
>>>>>>> part of the problem.
>>>>>>>
>>>>>>
>>>>>> I got interested in the post because I read that fuse
speed is lesser
>>>>>> than nfs speed which is counter-intuitive to my
understanding. So wanted
>>>>>> clarifications. Now that I got my clarifications where
fuse outperformed
>>>>>> nfs without sync, we can resume testing as described
above and try to find
>>>>>> what it is. Based on your email-id I am guessing you
are from Boston and I
>>>>>> am from Bangalore so if you are okay with doing this
debugging for multiple
>>>>>> days because of timezones, I will be happy to help.
Please be a bit patient
>>>>>> with me, I am under a release crunch but I am very
curious with the problem
>>>>>> you posted.
>>>>>>
>>>>>> Was there anything useful in the profiles?
>>>>>>>
>>>>>>
>>>>>> Unfortunately profiles didn't help me much, I think
we are collecting
>>>>>> the profiles from an active volume, so it has a lot of
information that is
>>>>>> not pertaining to dd so it is difficult to find the
contributions of dd. So
>>>>>> I went through your post again and found something I
didn't pay much
>>>>>> attention to earlier i.e. oflag=sync, so did my own
tests on my setup with
>>>>>> FUSE so sent that reply.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri
wrote:
>>>>>>>
>>>>>>> Okay good. At least this validates my doubts.
Handling O_SYNC in
>>>>>>> gluster NFS and fuse is a bit different.
>>>>>>> When application opens a file with O_SYNC on fuse
mount then each
>>>>>>> write syscall has to be written to disk as part of
the syscall where as in
>>>>>>> case of NFS, there is no concept of open. NFS
performs write though a
>>>>>>> handle saying it needs to be a synchronous write,
so write() syscall is
>>>>>>> performed first then it performs fsync(). so an
write on an fd with O_SYNC
>>>>>>> becomes write+fsync. I am suspecting that when
multiple threads do this
>>>>>>> write+fsync() operation on the same file, multiple
writes are batched
>>>>>>> together to be written do disk so the throughput on
the disk is increasing
>>>>>>> is my guess.
>>>>>>>
>>>>>>> Does it answer your doubts?
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
<phaley at mit.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Without the oflag=sync and only a single test
of each, the FUSE is
>>>>>>>> going faster than NFS:
>>>>>>>>
>>>>>>>> FUSE:
>>>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero
count=4096 bs=1048576
>>>>>>>> of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s,
575 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>> NFS
>>>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096
bs=1048576
>>>>>>>> of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s,
376 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri
wrote:
>>>>>>>>
>>>>>>>> Could you let me know the speed without
oflag=sync on both the
>>>>>>>> mounts? No need to collect profiles.
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
<phaley at mit.edu> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is what I see now:
>>>>>>>>>
>>>>>>>>> [root at mseas-data2 ~]# gluster volume
info
>>>>>>>>>
>>>>>>>>> Volume Name: data-volume
>>>>>>>>> Type: Distribute
>>>>>>>>> Volume ID:
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 2
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>>>> Options Reconfigured:
>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>> diagnostics.latency-measurement: on
>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>> nfs.disable: on
>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:44 AM, Pranith Kumar
Karampuri wrote:
>>>>>>>>>
>>>>>>>>> Is this the volume info you have?
>>>>>>>>>
>>>>>>>>> >* [root at mseas-data2
<http://www.gluster.org/mailman/listinfo/gluster-users> ~]# gluster volume
info
>>>>>>>>> *>>* Volume Name: data-volume
>>>>>>>>> *>* Type: Distribute
>>>>>>>>> *>* Volume ID:
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> *>* Status: Started
>>>>>>>>> *>* Number of Bricks: 2
>>>>>>>>> *>* Transport-type: tcp
>>>>>>>>> *>* Bricks:
>>>>>>>>> *>* Brick1: mseas-data2:/mnt/brick1
>>>>>>>>> *>* Brick2: mseas-data2:/mnt/brick2
>>>>>>>>> *>* Options Reconfigured:
>>>>>>>>> *>* performance.readdir-ahead: on
>>>>>>>>> *>* nfs.disable: on
>>>>>>>>> *>* nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>>
>>>>>>>>> ?I copied this from old thread from 2016.
This is distribute
>>>>>>>>> volume. Did you change any of the options
in between?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email:
phaley at mit.edu
>>>>>>>>> Center for Ocean Engineering Phone:
(617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email:
phaley at mit.edu
>>>>>>>> Center for Ocean Engineering Phone:
(617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email: phaley
at mit.edu
>>>>>>> Center for Ocean Engineering Phone: (617)
253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617)
253-8125
>>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email: phaley at
mit.edu
>>>>>> Center for Ocean Engineering Phone: (617)
253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617)
253-8125
>>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email: phaley at
mit.edu
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>> Pranith
>>>>
>>>> --
>>>>
>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: phaley at mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213
http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: phaley at mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: phaley at mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>
>
> --
> Pranith
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: phaley at mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170531/0f728147/attachment.html>
Hi Pranith,
The "dd" command was:
dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
There were 2 instances where dd reported 22 seconds. The output from the
dd tests are in
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
Pat
On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:> Pat,
> What is the command you used? As per the following output, it
> seems like at least one write operation took 16 seconds. Which is
> really bad.
> 96.39 1165.10 us 89.00 us*16487014.00 us* 393212
WRITE
>
>
> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu
> <mailto:phaley at mit.edu>> wrote:
>
>
> Hi Pranith,
>
> I ran the same 'dd' test both in the gluster test volume and in
> the .glusterfs directory of each brick. The median results (12 dd
> trials in each test) are similar to before
>
> * gluster test volume: 586.5 MB/s
> * bricks (in .glusterfs): 1.4 GB/s
>
> The profile for the gluster test-volume is in
>
>
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>
<http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>
> Thanks
>
> Pat
>
>
>
>
> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>> Let's start with the same 'dd' test we were testing
with to see,
>> what the numbers are. Please provide profile numbers for the
>> same. From there on we will start tuning the volume to see what
>> we can do.
>>
>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu
>> <mailto:phaley at mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Thanks for the tip. We now have the gluster volume mounted
>> under /home. What tests do you recommend we run?
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at
mit.edu
>>> <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Sorry for the delay. I never saw received your reply
>>> (but I did receive Ben Turner's follow-up to your
>>> reply). So we tried to create a gluster volume under
>>> /home using different variations of
>>>
>>> gluster volume create test-volume
>>> mseas-data2:/home/gbrick_test_1
>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>
>>> However we keep getting errors of the form
>>>
>>> Wrong brick type: transport, use
>>> <HOSTNAME>:<export-dir-abs-path>
>>>
>>> Any thoughts on what we're doing wrong?
>>>
>>>
>>> You should give transport tcp at the beginning I think.
>>> Anyways, transport tcp is the default, so no need to
specify
>>> so remove those two words from the CLI.
>>>
>>>
>>> Also do you have a list of the test we should be
running
>>> once we get this volume created? Given the time-zone
>>> difference it might help if we can run a small battery
>>> of tests and post the results rather than test-post-new
>>> test-post... .
>>>
>>>
>>> This is the first time I am doing performance analysis on
>>> users as far as I remember. In our team there are separate
>>> engineers who do these tests. Ben who replied earlier is
one
>>> such engineer.
>>>
>>> Ben,
>>> Have any suggestions?
>>>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>> <phaley at mit.edu <mailto:phaley at
mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> The /home partition is mounted as ext4
>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>
>>>> The brick partitions are mounted ax xfs
>>>> /mnt/brick1 xfs defaults 0 0
>>>> /mnt/brick2 xfs defaults 0 0
>>>>
>>>> Will this cause a problem with creating a
volume
>>>> under /home?
>>>>
>>>>
>>>> I don't think the bottleneck is disk. You can
do the
>>>> same tests you did on your new volume to confirm?
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>> <phaley at mit.edu <mailto:phaley at
mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Unfortunately, we don't have
similar hardware
>>>>> for a small scale test. All we have is
our
>>>>> production hardware.
>>>>>
>>>>>
>>>>> You said something about /home partition
which has
>>>>> lesser disks, we can create plain
distribute
>>>>> volume inside one of those directories.
After we
>>>>> are done, we can remove the setup. What do
you say?
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 2:48 AM,
Pat Haley
>>>>>> <phaley at mit.edu
<mailto:phaley at mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Since we are mounting the
partitions as
>>>>>> the bricks, I tried the dd test
writing
>>>>>> to
>>>>>>
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>> The results without oflag=sync
were 1.6
>>>>>> Gb/s (faster than gluster but
not as fast
>>>>>> as I was expecting given the
1.2 Gb/s to
>>>>>> the no-gluster area w/ fewer
disks).
>>>>>>
>>>>>>
>>>>>> Okay, then 1.6Gb/s is what we need
to target
>>>>>> for, considering your volume is
just
>>>>>> distribute. Is there any way you
can do tests
>>>>>> on similar hardware but at a small
scale?
>>>>>> Just so we can run the workload to
learn more
>>>>>> about the bottlenecks in the
system? We can
>>>>>> probably try to get the speed to
1.2Gb/s on
>>>>>> your /home partition you were
telling me
>>>>>> yesterday. Let me know if that is
something
>>>>>> you are okay to do.
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 01:27 PM, Pranith
Kumar
>>>>>> Karampuri wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at
10:15 PM, Pat
>>>>>>> Haley <phaley at mit.edu
>>>>>>> <mailto:phaley at
mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Not entirely sure (this
isn't my
>>>>>>> area of expertise).
I'll run your
>>>>>>> answer by some other
people who are
>>>>>>> more familiar with
this.
>>>>>>>
>>>>>>> I am also uncertain
about how to
>>>>>>> interpret the results
when we also
>>>>>>> add the dd tests
writing to the
>>>>>>> /home area (no gluster,
still on the
>>>>>>> same machine)
>>>>>>>
>>>>>>> * dd test without
oflag=sync
>>>>>>> (rough average of
multiple tests)
>>>>>>> o gluster w/ fuse
mount : 570 Mb/s
>>>>>>> o gluster w/ nfs
mount: 390 Mb/s
>>>>>>> o nfs (no
gluster): 1.2 Gb/s
>>>>>>> * dd test with
oflag=sync (rough
>>>>>>> average of multiple
tests)
>>>>>>> o gluster w/ fuse
mount: 5 Mb/s
>>>>>>> o gluster w/ nfs
mount: 200 Mb/s
>>>>>>> o nfs (no
gluster): 20 Mb/s
>>>>>>>
>>>>>>> Given that the
non-gluster area is a
>>>>>>> RAID-6 of 4 disks while
each brick
>>>>>>> of the gluster area is
a RAID-6 of
>>>>>>> 32 disks, I would
naively expect the
>>>>>>> writes to the gluster
area to be
>>>>>>> roughly 8x faster than
to the
>>>>>>> non-gluster.
>>>>>>>
>>>>>>>
>>>>>>> I think a better test is to
try and
>>>>>>> write to a file using nfs
without any
>>>>>>> gluster to a location that
is not inside
>>>>>>> the brick but someother
location that is
>>>>>>> on same disk(s). If you are
mounting the
>>>>>>> partition as the brick,
then we can
>>>>>>> write to a file inside
.glusterfs
>>>>>>> directory, something like
>>>>>>>
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I still think we have a
speed issue,
>>>>>>> I can't tell if
fuse vs nfs is part
>>>>>>> of the problem.
>>>>>>>
>>>>>>>
>>>>>>> I got interested in the
post because I
>>>>>>> read that fuse speed is
lesser than nfs
>>>>>>> speed which is
counter-intuitive to my
>>>>>>> understanding. So wanted
clarifications.
>>>>>>> Now that I got my
clarifications where
>>>>>>> fuse outperformed nfs
without sync, we
>>>>>>> can resume testing as
described above
>>>>>>> and try to find what it is.
Based on
>>>>>>> your email-id I am guessing
you are from
>>>>>>> Boston and I am from
Bangalore so if you
>>>>>>> are okay with doing this
debugging for
>>>>>>> multiple days because of
timezones, I
>>>>>>> will be happy to help.
Please be a bit
>>>>>>> patient with me, I am under
a release
>>>>>>> crunch but I am very
curious with the
>>>>>>> problem you posted.
>>>>>>>
>>>>>>> Was there anything
useful in the
>>>>>>> profiles?
>>>>>>>
>>>>>>>
>>>>>>> Unfortunately profiles
didn't help me
>>>>>>> much, I think we are
collecting the
>>>>>>> profiles from an active
volume, so it
>>>>>>> has a lot of information
that is not
>>>>>>> pertaining to dd so it is
difficult to
>>>>>>> find the contributions of
dd. So I went
>>>>>>> through your post again and
found
>>>>>>> something I didn't pay
much attention to
>>>>>>> earlier i.e. oflag=sync, so
did my own
>>>>>>> tests on my setup with FUSE
so sent that
>>>>>>> reply.
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 12:15 PM,
Pranith
>>>>>>> Kumar Karampuri wrote:
>>>>>>>> Okay good. At least
this validates
>>>>>>>> my doubts. Handling
O_SYNC in
>>>>>>>> gluster NFS and
fuse is a bit
>>>>>>>> different.
>>>>>>>> When application
opens a file with
>>>>>>>> O_SYNC on fuse
mount then each
>>>>>>>> write syscall has
to be written to
>>>>>>>> disk as part of the
syscall where
>>>>>>>> as in case of NFS,
there is no
>>>>>>>> concept of open.
NFS performs write
>>>>>>>> though a handle
saying it needs to
>>>>>>>> be a synchronous
write, so write()
>>>>>>>> syscall is
performed first then it
>>>>>>>> performs fsync().
so an write on an
>>>>>>>> fd with O_SYNC
becomes write+fsync.
>>>>>>>> I am suspecting
that when multiple
>>>>>>>> threads do this
write+fsync()
>>>>>>>> operation on the
same file,
>>>>>>>> multiple writes are
batched
>>>>>>>> together to be
written do disk so
>>>>>>>> the throughput on
the disk is
>>>>>>>> increasing is my
guess.
>>>>>>>>
>>>>>>>> Does it answer your
doubts?
>>>>>>>>
>>>>>>>> On Wed, May 10,
2017 at 9:35 PM,
>>>>>>>> Pat Haley
<phaley at mit.edu
>>>>>>>> <mailto:phaley
at mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Without the
oflag=sync and only
>>>>>>>> a single test
of each, the FUSE
>>>>>>>> is going faster
than NFS:
>>>>>>>>
>>>>>>>> FUSE:
>>>>>>>>
mseas-data2(dri_nascar)% dd
>>>>>>>> if=/dev/zero
count=4096
>>>>>>>> bs=1048576
of=zeros.txt conv=sync
>>>>>>>> 4096+0 records
in
>>>>>>>> 4096+0 records
out
>>>>>>>> 4294967296
bytes (4.3 GB)
>>>>>>>> copied, 7.46961
s, 575 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>> NFS
>>>>>>>>
mseas-data2(HYCOM)% dd
>>>>>>>> if=/dev/zero
count=4096
>>>>>>>> bs=1048576
of=zeros.txt conv=sync
>>>>>>>> 4096+0 records
in
>>>>>>>> 4096+0 records
out
>>>>>>>> 4294967296
bytes (4.3 GB)
>>>>>>>> copied, 11.4264
s, 376 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017
11:53 AM, Pranith
>>>>>>>> Kumar Karampuri
wrote:
>>>>>>>>> Could you
let me know the
>>>>>>>>> speed
without oflag=sync on
>>>>>>>>> both the
mounts? No need to
>>>>>>>>> collect
profiles.
>>>>>>>>>
>>>>>>>>> On Wed, May
10, 2017 at 9:17
>>>>>>>>> PM, Pat
Haley <phaley at mit.edu
>>>>>>>>>
<mailto:phaley at mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is
what I see now:
>>>>>>>>>
>>>>>>>>> [root
at mseas-data2 ~]#
>>>>>>>>> gluster
volume info
>>>>>>>>>
>>>>>>>>> Volume
Name: data-volume
>>>>>>>>> Type:
Distribute
>>>>>>>>> Volume
ID:
>>>>>>>>>
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> Status:
Started
>>>>>>>>> Number
of Bricks: 2
>>>>>>>>>
Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1:
>>>>>>>>>
mseas-data2:/mnt/brick1
>>>>>>>>> Brick2:
>>>>>>>>>
mseas-data2:/mnt/brick2
>>>>>>>>> Options
Reconfigured:
>>>>>>>>>
diagnostics.count-fop-hits: on
>>>>>>>>>
diagnostics.latency-measurement:
>>>>>>>>> on
>>>>>>>>>
nfs.exports-auth-enable: on
>>>>>>>>>
diagnostics.brick-sys-log-level:
>>>>>>>>> WARNING
>>>>>>>>>
performance.readdir-ahead: on
>>>>>>>>>
nfs.disable: on
>>>>>>>>>
nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On
05/10/2017 11:44 AM,
>>>>>>>>> Pranith
Kumar Karampuri wrote:
>>>>>>>>>> Is
this the volume info
>>>>>>>>>> you
have?
>>>>>>>>>>
>>>>>>>>>>
>/[root at mseas-data2
>>>>>>>>>>
<http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>> ~]#
gluster volume info />//>/Volume Name: data-volume />/Type: Distribute
/>/Volume ID:
>>>>>>>>>>
c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>>
/>/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp
/>/Bricks: />/Brick1:
>>>>>>>>>>
mseas-data2:/mnt/brick1 />/Brick2:
>>>>>>>>>>
mseas-data2:/mnt/brick2 />/Options Reconfigured:
/>/performance.readdir-ahead:
>>>>>>>>>> on
/>/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>>> ?I
copied this from old
>>>>>>>>>>
thread from 2016. This is
>>>>>>>>>>
distribute volume. Did
>>>>>>>>>> you
change any of the
>>>>>>>>>>
options in between?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat
Haley Email:phaley at mit.edu
>>>>>>>>>
<mailto:phaley at mit.edu>
>>>>>>>>> Center
for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept.
of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT,
Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>> 77
Massachusetts Avenue
>>>>>>>>>
Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley
Email:phaley at mit.edu
>>>>>>>>
<mailto:phaley at mit.edu>
>>>>>>>> Center for
Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of
Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>>>> 77
Massachusetts Avenue
>>>>>>>> Cambridge, MA
02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley
Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>> Center for Ocean
Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical
Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA
02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley
Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>> Center for Ocean Engineering
Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering
Fax: (617) 253-8125
>>>>>> MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley
Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>> Center for Ocean Engineering
Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>>> MIT, Room
5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:phaley
at mit.edu <mailto:phaley at mit.edu>
>>>> Center for Ocean Engineering Phone:
(617) 253-6824
>>>> Dept. of Mechanical Engineering Fax:
(617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:phaley at
mit.edu <mailto:phaley at mit.edu>
>>> Center for Ocean Engineering Phone: (617)
253-6824
>>> Dept. of Mechanical Engineering Fax: (617)
253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>> --
>>
>>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:phaley at mit.edu
<mailto:phaley at mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:phaley at mit.edu
<mailto:phaley at mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170530/f3029b69/attachment-0001.html>
Are you sure using conv=sync is what you want? I normally use conv=fdatasync, I'll look up the difference between the two and see if it affects your test. -b ----- Original Message -----> From: "Pat Haley" <phaley at mit.edu> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> > Cc: "Ravishankar N" <ravishankar at redhat.com>, gluster-users at gluster.org, "Steve Postma" <SPostma at ztechnet.com>, "Ben > Turner" <bturner at redhat.com> > Sent: Tuesday, May 30, 2017 9:40:34 PM > Subject: Re: [Gluster-users] Slow write times to gluster disk > > > Hi Pranith, > > The "dd" command was: > > dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > > There were 2 instances where dd reported 22 seconds. The output from the > dd tests are in > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > > Pat > > On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: > > Pat, > > What is the command you used? As per the following output, it > > seems like at least one write operation took 16 seconds. Which is > > really bad. > > 96.39 1165.10 us 89.00 us*16487014.00 us* 393212 > > WRITE > > > > > > On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu > > <mailto:phaley at mit.edu>> wrote: > > > > > > Hi Pranith, > > > > I ran the same 'dd' test both in the gluster test volume and in > > the .glusterfs directory of each brick. The median results (12 dd > > trials in each test) are similar to before > > > > * gluster test volume: 586.5 MB/s > > * bricks (in .glusterfs): 1.4 GB/s > > > > The profile for the gluster test-volume is in > > > > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt > > <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> > > > > Thanks > > > > Pat > > > > > > > > > > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: > >> Let's start with the same 'dd' test we were testing with to see, > >> what the numbers are. Please provide profile numbers for the > >> same. From there on we will start tuning the volume to see what > >> we can do. > >> > >> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu > >> <mailto:phaley at mit.edu>> wrote: > >> > >> > >> Hi Pranith, > >> > >> Thanks for the tip. We now have the gluster volume mounted > >> under /home. What tests do you recommend we run? > >> > >> Thanks > >> > >> Pat > >> > >> > >> > >> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: > >>> > >>> > >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu > >>> <mailto:phaley at mit.edu>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Sorry for the delay. I never saw received your reply > >>> (but I did receive Ben Turner's follow-up to your > >>> reply). So we tried to create a gluster volume under > >>> /home using different variations of > >>> > >>> gluster volume create test-volume > >>> mseas-data2:/home/gbrick_test_1 > >>> mseas-data2:/home/gbrick_test_2 transport tcp > >>> > >>> However we keep getting errors of the form > >>> > >>> Wrong brick type: transport, use > >>> <HOSTNAME>:<export-dir-abs-path> > >>> > >>> Any thoughts on what we're doing wrong? > >>> > >>> > >>> You should give transport tcp at the beginning I think. > >>> Anyways, transport tcp is the default, so no need to specify > >>> so remove those two words from the CLI. > >>> > >>> > >>> Also do you have a list of the test we should be running > >>> once we get this volume created? Given the time-zone > >>> difference it might help if we can run a small battery > >>> of tests and post the results rather than test-post-new > >>> test-post... . > >>> > >>> > >>> This is the first time I am doing performance analysis on > >>> users as far as I remember. In our team there are separate > >>> engineers who do these tests. Ben who replied earlier is one > >>> such engineer. > >>> > >>> Ben, > >>> Have any suggestions? > >>> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote: > >>>> > >>>> > >>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley > >>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: > >>>> > >>>> > >>>> Hi Pranith, > >>>> > >>>> The /home partition is mounted as ext4 > >>>> /home ext4 defaults,usrquota,grpquota 1 2 > >>>> > >>>> The brick partitions are mounted ax xfs > >>>> /mnt/brick1 xfs defaults 0 0 > >>>> /mnt/brick2 xfs defaults 0 0 > >>>> > >>>> Will this cause a problem with creating a volume > >>>> under /home? > >>>> > >>>> > >>>> I don't think the bottleneck is disk. You can do the > >>>> same tests you did on your new volume to confirm? > >>>> > >>>> > >>>> Pat > >>>> > >>>> > >>>> > >>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote: > >>>>> > >>>>> > >>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley > >>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: > >>>>> > >>>>> > >>>>> Hi Pranith, > >>>>> > >>>>> Unfortunately, we don't have similar hardware > >>>>> for a small scale test. All we have is our > >>>>> production hardware. > >>>>> > >>>>> > >>>>> You said something about /home partition which has > >>>>> lesser disks, we can create plain distribute > >>>>> volume inside one of those directories. After we > >>>>> are done, we can remove the setup. What do you say? > >>>>> > >>>>> > >>>>> Pat > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 05/11/2017 07:05 AM, Pranith Kumar > >>>>> Karampuri wrote: > >>>>>> > >>>>>> > >>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley > >>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote: > >>>>>> > >>>>>> > >>>>>> Hi Pranith, > >>>>>> > >>>>>> Since we are mounting the partitions as > >>>>>> the bricks, I tried the dd test writing > >>>>>> to > >>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > >>>>>> The results without oflag=sync were 1.6 > >>>>>> Gb/s (faster than gluster but not as fast > >>>>>> as I was expecting given the 1.2 Gb/s to > >>>>>> the no-gluster area w/ fewer disks). > >>>>>> > >>>>>> > >>>>>> Okay, then 1.6Gb/s is what we need to target > >>>>>> for, considering your volume is just > >>>>>> distribute. Is there any way you can do tests > >>>>>> on similar hardware but at a small scale? > >>>>>> Just so we can run the workload to learn more > >>>>>> about the bottlenecks in the system? We can > >>>>>> probably try to get the speed to 1.2Gb/s on > >>>>>> your /home partition you were telling me > >>>>>> yesterday. Let me know if that is something > >>>>>> you are okay to do. > >>>>>> > >>>>>> > >>>>>> Pat > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 05/10/2017 01:27 PM, Pranith Kumar > >>>>>> Karampuri wrote: > >>>>>>> > >>>>>>> > >>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat > >>>>>>> Haley <phaley at mit.edu > >>>>>>> <mailto:phaley at mit.edu>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> Hi Pranith, > >>>>>>> > >>>>>>> Not entirely sure (this isn't my > >>>>>>> area of expertise). I'll run your > >>>>>>> answer by some other people who are > >>>>>>> more familiar with this. > >>>>>>> > >>>>>>> I am also uncertain about how to > >>>>>>> interpret the results when we also > >>>>>>> add the dd tests writing to the > >>>>>>> /home area (no gluster, still on the > >>>>>>> same machine) > >>>>>>> > >>>>>>> * dd test without oflag=sync > >>>>>>> (rough average of multiple tests) > >>>>>>> o gluster w/ fuse mount : 570 Mb/s > >>>>>>> o gluster w/ nfs mount: 390 Mb/s > >>>>>>> o nfs (no gluster): 1.2 Gb/s > >>>>>>> * dd test with oflag=sync (rough > >>>>>>> average of multiple tests) > >>>>>>> o gluster w/ fuse mount: 5 Mb/s > >>>>>>> o gluster w/ nfs mount: 200 Mb/s > >>>>>>> o nfs (no gluster): 20 Mb/s > >>>>>>> > >>>>>>> Given that the non-gluster area is a > >>>>>>> RAID-6 of 4 disks while each brick > >>>>>>> of the gluster area is a RAID-6 of > >>>>>>> 32 disks, I would naively expect the > >>>>>>> writes to the gluster area to be > >>>>>>> roughly 8x faster than to the > >>>>>>> non-gluster. > >>>>>>> > >>>>>>> > >>>>>>> I think a better test is to try and > >>>>>>> write to a file using nfs without any > >>>>>>> gluster to a location that is not inside > >>>>>>> the brick but someother location that is > >>>>>>> on same disk(s). If you are mounting the > >>>>>>> partition as the brick, then we can > >>>>>>> write to a file inside .glusterfs > >>>>>>> directory, something like > >>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> I still think we have a speed issue, > >>>>>>> I can't tell if fuse vs nfs is part > >>>>>>> of the problem. > >>>>>>> > >>>>>>> > >>>>>>> I got interested in the post because I > >>>>>>> read that fuse speed is lesser than nfs > >>>>>>> speed which is counter-intuitive to my > >>>>>>> understanding. So wanted clarifications. > >>>>>>> Now that I got my clarifications where > >>>>>>> fuse outperformed nfs without sync, we > >>>>>>> can resume testing as described above > >>>>>>> and try to find what it is. Based on > >>>>>>> your email-id I am guessing you are from > >>>>>>> Boston and I am from Bangalore so if you > >>>>>>> are okay with doing this debugging for > >>>>>>> multiple days because of timezones, I > >>>>>>> will be happy to help. Please be a bit > >>>>>>> patient with me, I am under a release > >>>>>>> crunch but I am very curious with the > >>>>>>> problem you posted. > >>>>>>> > >>>>>>> Was there anything useful in the > >>>>>>> profiles? > >>>>>>> > >>>>>>> > >>>>>>> Unfortunately profiles didn't help me > >>>>>>> much, I think we are collecting the > >>>>>>> profiles from an active volume, so it > >>>>>>> has a lot of information that is not > >>>>>>> pertaining to dd so it is difficult to > >>>>>>> find the contributions of dd. So I went > >>>>>>> through your post again and found > >>>>>>> something I didn't pay much attention to > >>>>>>> earlier i.e. oflag=sync, so did my own > >>>>>>> tests on my setup with FUSE so sent that > >>>>>>> reply. > >>>>>>> > >>>>>>> > >>>>>>> Pat > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 05/10/2017 12:15 PM, Pranith > >>>>>>> Kumar Karampuri wrote: > >>>>>>>> Okay good. At least this validates > >>>>>>>> my doubts. Handling O_SYNC in > >>>>>>>> gluster NFS and fuse is a bit > >>>>>>>> different. > >>>>>>>> When application opens a file with > >>>>>>>> O_SYNC on fuse mount then each > >>>>>>>> write syscall has to be written to > >>>>>>>> disk as part of the syscall where > >>>>>>>> as in case of NFS, there is no > >>>>>>>> concept of open. NFS performs write > >>>>>>>> though a handle saying it needs to > >>>>>>>> be a synchronous write, so write() > >>>>>>>> syscall is performed first then it > >>>>>>>> performs fsync(). so an write on an > >>>>>>>> fd with O_SYNC becomes write+fsync. > >>>>>>>> I am suspecting that when multiple > >>>>>>>> threads do this write+fsync() > >>>>>>>> operation on the same file, > >>>>>>>> multiple writes are batched > >>>>>>>> together to be written do disk so > >>>>>>>> the throughput on the disk is > >>>>>>>> increasing is my guess. > >>>>>>>> > >>>>>>>> Does it answer your doubts? > >>>>>>>> > >>>>>>>> On Wed, May 10, 2017 at 9:35 PM, > >>>>>>>> Pat Haley <phaley at mit.edu > >>>>>>>> <mailto:phaley at mit.edu>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Without the oflag=sync and only > >>>>>>>> a single test of each, the FUSE > >>>>>>>> is going faster than NFS: > >>>>>>>> > >>>>>>>> FUSE: > >>>>>>>> mseas-data2(dri_nascar)% dd > >>>>>>>> if=/dev/zero count=4096 > >>>>>>>> bs=1048576 of=zeros.txt conv=sync > >>>>>>>> 4096+0 records in > >>>>>>>> 4096+0 records out > >>>>>>>> 4294967296 bytes (4.3 GB) > >>>>>>>> copied, 7.46961 s, 575 MB/s > >>>>>>>> > >>>>>>>> > >>>>>>>> NFS > >>>>>>>> mseas-data2(HYCOM)% dd > >>>>>>>> if=/dev/zero count=4096 > >>>>>>>> bs=1048576 of=zeros.txt conv=sync > >>>>>>>> 4096+0 records in > >>>>>>>> 4096+0 records out > >>>>>>>> 4294967296 bytes (4.3 GB) > >>>>>>>> copied, 11.4264 s, 376 MB/s > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 05/10/2017 11:53 AM, Pranith > >>>>>>>> Kumar Karampuri wrote: > >>>>>>>>> Could you let me know the > >>>>>>>>> speed without oflag=sync on > >>>>>>>>> both the mounts? No need to > >>>>>>>>> collect profiles. > >>>>>>>>> > >>>>>>>>> On Wed, May 10, 2017 at 9:17 > >>>>>>>>> PM, Pat Haley <phaley at mit.edu > >>>>>>>>> <mailto:phaley at mit.edu>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Here is what I see now: > >>>>>>>>> > >>>>>>>>> [root at mseas-data2 ~]# > >>>>>>>>> gluster volume info > >>>>>>>>> > >>>>>>>>> Volume Name: data-volume > >>>>>>>>> Type: Distribute > >>>>>>>>> Volume ID: > >>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>>>>>>>> Status: Started > >>>>>>>>> Number of Bricks: 2 > >>>>>>>>> Transport-type: tcp > >>>>>>>>> Bricks: > >>>>>>>>> Brick1: > >>>>>>>>> mseas-data2:/mnt/brick1 > >>>>>>>>> Brick2: > >>>>>>>>> mseas-data2:/mnt/brick2 > >>>>>>>>> Options Reconfigured: > >>>>>>>>> diagnostics.count-fop-hits: on > >>>>>>>>> diagnostics.latency-measurement: > >>>>>>>>> on > >>>>>>>>> nfs.exports-auth-enable: on > >>>>>>>>> diagnostics.brick-sys-log-level: > >>>>>>>>> WARNING > >>>>>>>>> performance.readdir-ahead: on > >>>>>>>>> nfs.disable: on > >>>>>>>>> nfs.export-volumes: off > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 05/10/2017 11:44 AM, > >>>>>>>>> Pranith Kumar Karampuri wrote: > >>>>>>>>>> Is this the volume info > >>>>>>>>>> you have? > >>>>>>>>>> > >>>>>>>>>> >/[root at mseas-data2 > >>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> > >>>>>>>>>> ~]# gluster volume info > >>>>>>>>>> />//>/Volume Name: > >>>>>>>>>> data-volume />/Type: > >>>>>>>>>> Distribute />/Volume ID: > >>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>>>>>>>>> />/Status: Started />/Number > >>>>>>>>>> of Bricks: 2 > >>>>>>>>>> />/Transport-type: tcp > >>>>>>>>>> />/Bricks: />/Brick1: > >>>>>>>>>> mseas-data2:/mnt/brick1 > >>>>>>>>>> />/Brick2: > >>>>>>>>>> mseas-data2:/mnt/brick2 > >>>>>>>>>> />/Options Reconfigured: > >>>>>>>>>> />/performance.readdir-ahead: > >>>>>>>>>> on />/nfs.disable: on > >>>>>>>>>> />/nfs.export-volumes: off / > >>>>>>>>>> ?I copied this from old > >>>>>>>>>> thread from 2016. This is > >>>>>>>>>> distribute volume. Did > >>>>>>>>>> you change any of the > >>>>>>>>>> options in between? > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> > >>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>>>>>>> Pat Haley > >>>>>>>>> Email:phaley at mit.edu > >>>>>>>>> <mailto:phaley at mit.edu> > >>>>>>>>> Center for Ocean Engineering > >>>>>>>>> Phone: (617) 253-6824 > >>>>>>>>> Dept. of Mechanical Engineering > >>>>>>>>> Fax: (617) 253-8125 > >>>>>>>>> MIT, Room > >>>>>>>>> 5-213http://web.mit.edu/phaley/www/ > >>>>>>>>> 77 Massachusetts Avenue > >>>>>>>>> Cambridge, MA 02139-4301 > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Pranith > >>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>>>>>> Pat Haley > >>>>>>>> Email:phaley at mit.edu > >>>>>>>> <mailto:phaley at mit.edu> > >>>>>>>> Center for Ocean Engineering > >>>>>>>> Phone: (617) 253-6824 > >>>>>>>> Dept. of Mechanical Engineering > >>>>>>>> Fax: (617) 253-8125 > >>>>>>>> MIT, Room > >>>>>>>> 5-213http://web.mit.edu/phaley/www/ > >>>>>>>> 77 Massachusetts Avenue > >>>>>>>> Cambridge, MA 02139-4301 > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Pranith > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>>>>> Pat Haley > >>>>>>> Email:phaley at mit.edu > >>>>>>> <mailto:phaley at mit.edu> > >>>>>>> Center for Ocean Engineering Phone: > >>>>>>> (617) 253-6824 > >>>>>>> Dept. of Mechanical Engineering Fax: > >>>>>>> (617) 253-8125 > >>>>>>> MIT, Room > >>>>>>> 5-213http://web.mit.edu/phaley/www/ > >>>>>>> 77 Massachusetts Avenue > >>>>>>> Cambridge, MA 02139-4301 > >>>>>>> > >>>>>>> -- > >>>>>>> Pranith > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>>>> Pat Haley > >>>>>> Email:phaley at mit.edu > >>>>>> <mailto:phaley at mit.edu> > >>>>>> Center for Ocean Engineering Phone: > >>>>>> (617) 253-6824 > >>>>>> Dept. of Mechanical Engineering Fax: > >>>>>> (617) 253-8125 > >>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>>>>> 77 Massachusetts Avenue > >>>>>> Cambridge, MA 02139-4301 > >>>>>> > >>>>>> -- > >>>>>> Pranith > >>>>> > >>>>> -- > >>>>> > >>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>>> Pat Haley > >>>>> Email:phaley at mit.edu > >>>>> <mailto:phaley at mit.edu> > >>>>> Center for Ocean Engineering Phone: (617) > >>>>> 253-6824 > >>>>> Dept. of Mechanical Engineering Fax: (617) > >>>>> 253-8125 > >>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>>>> 77 Massachusetts Avenue > >>>>> Cambridge, MA 02139-4301 > >>>>> > >>>>> -- > >>>>> Pranith > >>>> > >>>> -- > >>>> > >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>>> Pat Haley Email:phaley at mit.edu > >>>> <mailto:phaley at mit.edu> > >>>> Center for Ocean Engineering Phone: (617) > >>>> 253-6824 > >>>> Dept. of Mechanical Engineering Fax: (617) > >>>> 253-8125 > >>>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>>> 77 Massachusetts Avenue > >>>> Cambridge, MA 02139-4301 > >>>> > >>>> -- > >>>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email:phaley at mit.edu > >>> <mailto:phaley at mit.edu> > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >> > >> -- > >> > >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >> Pat Haley Email:phaley at mit.edu > >> <mailto:phaley at mit.edu> > >> Center for Ocean Engineering Phone: (617) 253-6824 > >> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >> 77 Massachusetts Avenue > >> Cambridge, MA 02139-4301 > >> > >> > >> > >> > >> -- > >> Pranith > > > > -- > > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > > Pat Haley Email:phaley at mit.edu > > <mailto:phaley at mit.edu> > > Center for Ocean Engineering Phone: (617) 253-6824 > > Dept. of Mechanical Engineering Fax: (617) 253-8125 > > MIT, Room 5-213http://web.mit.edu/phaley/www/ > > 77 Massachusetts Avenue > > Cambridge, MA 02139-4301 > > > > > > > > > > -- > > Pranith > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > >