thr3ads.net - Gluster users - [Gluster-users] NFS replacement [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Stephan von Krawczynski

2009-Aug-31 12:51 UTC

[Gluster-users] NFS replacement

Hello all,

after playing around for some weeks we decided to make some real world tests
with glusterfs. Therefore we took a nfs-client and mounted the very same data
with glusterfs. The client does some logfile processing every 5 minutes and
needs around 3,5 mins runtime in a nfs setup.
We found out that it makes no sense to try this setup with gluster replicate
as long as we do not have the same performance in a single server setup with
glusterfs. So now we have one server mounted (halfway replicate) and would
like to tune performance.
Does anyone have experience with some simple replacement like that? We had to
find out that almost all performance options have exactly zero effect. The
only thing that seems to make at least some difference is read-ahead on the
server. We end up with around 4,5 - 5,5 minutes runtime of the scripts, which
is on the edge as we need something quite below 5 minutes (just like nfs was).
Our goal is to maximise performance in this setup and then try a real
replication setup with two servers.
The load itselfs looks like around 100 scripts starting at one time and
processing their data.

Any ideas?

-- 
Regards,
Stephan

Shehjar Tikoo

2009-Aug-31 14:18 UTC

head link

[Gluster-users] NFS replacement

Stephan von Krawczynski wrote:> Hello all,
> 
> after playing around for some weeks we decided to make some real world
tests
> with glusterfs. Therefore we took a nfs-client and mounted the very same
data
> with glusterfs. The client does some logfile processing every 5 minutes and
> needs around 3,5 mins runtime in a nfs setup.
> We found out that it makes no sense to try this setup with gluster
replicate
> as long as we do not have the same performance in a single server setup
with
> glusterfs. So now we have one server mounted (halfway replicate) and would
> like to tune performance.
> Does anyone have experience with some simple replacement like that? We had
to
> find out that almost all performance options have exactly zero effect. The
> only thing that seems to make at least some difference is read-ahead on the
> server. We end up with around 4,5 - 5,5 minutes runtime of the scripts,
which
> is on the edge as we need something quite below 5 minutes (just like nfs
was).
> Our goal is to maximise performance in this setup and then try a real
> replication setup with two servers.
> The load itselfs looks like around 100 scripts starting at one time and
> processing their data.
> 
> Any ideas?
> What nfs server are you using? The in-kernel one?

You could try the unfs3booster server, which is the original unfs3
with our modifications for bug fixes and slight performance
improvements. It should give better performance in certain cases
since it avoids the FUSE bottleneck on the server.

For more info, do take a look at this page:
http://www.gluster.org/docs/index.php/Unfs3boosterConfiguration

When using unfs3booster, please use GlusterFS release 2.0.6 since
that has the required changes to make booster work with NFS.

-Shehjar

Stephan von Krawczynski

2009-Aug-31 18:02 UTC

head link

[Gluster-users] NFS replacement, rest stopped

Hello all,

as told earlier we tried to replace a nfs-server/client combination in
semi-production environment with a trivial one-server gluster setup. We
thought at first that this pretty simple setup would allow some more testing.
Unfortunately we have to stop those tests because it turns out that the client
system has troubles with networking as soon as we start glusterfs.
The client has three network cards, first is for internet use, second is for
connection to glusterfs-server, third for collecting data from several other
boxes.
It turned out that the third interface had troubles soon after we started to
work with glusterfs. We could not ping several hosts on the same lan, or
packet delay was very high (up to 20 s).
The effects were pretty weird, looked like a bad interface card. But switching
back to kernel-nfs everything went back to normal.
It really looks like glusterfs client has some problems, too. It looks like
buffer re-usage or mem thrashing or pointer mixup or the like.
Interestingly no problems were visible on the interface where the glusterfs
was happening, I have no idea how something like this happens.
Anyway, the story looks like someone will tell me it is the kernel networking
that has troubles, just like reiserfs that has troubles or ext3 :-(
To give you an idea what ugly things look like:

Aug 31 08:20:16 heather kernel: ------------[ cut here ]------------
Aug 31 08:20:16 heather kernel: WARNING: at net/ipv4/tcp.c:1405
tcp_recvmsg+0x1c7/0x7b6()
Aug 31 08:20:16 heather kernel: Hardware name: empty
Aug 31 08:20:16 heather kernel: Modules linked in: nfs lockd nfs_acl sunrpc fuse
loop i2c_i801 e100 i2c_core e1000e
Aug 31 08:20:16 heather kernel: Pid: 31500, comm: netcat Not tainted 2.6.30.5 #1
Aug 31 08:20:16 heather kernel: Call Trace:
Aug 31 08:20:16 heather kernel:  [<ffffffff80431497>] ?
tcp_recvmsg+0x1c7/0x7b6
Aug 31 08:20:16 heather kernel:  [<ffffffff80431497>] ?
tcp_recvmsg+0x1c7/0x7b6
Aug 31 08:20:16 heather kernel:  [<ffffffff8023282d>] ?
warn_slowpath_common+0x77/0xa3
Aug 31 08:20:16 heather kernel:  [<ffffffff80431497>] ?
tcp_recvmsg+0x1c7/0x7b6
Aug 31 08:20:16 heather kernel:  [<ffffffff80401340>] ?
sock_common_recvmsg+0x30/0x45
Aug 31 08:20:16 heather kernel:  [<ffffffff8029b3d8>] ?
mnt_drop_write+0x25/0x12e
Aug 31 08:20:16 heather kernel:  [<ffffffff803fee67>] ?
sock_aio_read+0x109/0x11d
Aug 31 08:20:16 heather kernel:  [<ffffffff80287131>] ?
do_sync_read+0xce/0x113
Aug 31 08:20:16 heather kernel:  [<ffffffff80244348>] ?
autoremove_wake_function+0x0/0x2e
Aug 31 08:20:16 heather kernel:  [<ffffffff80293243>] ?
poll_select_copy_remaining+0xd0/0xf3
Aug 31 08:20:16 heather kernel:  [<ffffffff80287b83>] ?
vfs_read+0xbd/0x133
Aug 31 08:20:16 heather kernel:  [<ffffffff80287cb5>] ? sys_read+0x45/0x6e
Aug 31 08:20:16 heather kernel:  [<ffffffff8020ae6b>] ?
system_call_fastpath+0x16/0x1b
Aug 31 08:20:16 heather kernel: ---[ end trace 31e61d5bab6e7cc0 ]---

Hopefully you would not tell that netcat has problems, or not?
Hopefully we can agree on the fact that there are nasty things going on inside
this code and someone with better brain and kernel knowledge than me should give
it a very close look.

-- 
Regards,
Stephan

Gluster users - Aug 2009 - NFS replacement

[Gluster-users] NFS replacement

[Gluster-users] NFS replacement

[Gluster-users] NFS replacement, rest stopped