thr3ads.net - Ocfs2 users - [Ocfs2-users] Please urgent help required

If this information is useful, please help other people find it:
Share via:

Lorenzo Milesi

2008-Dec-02 10:59 UTC

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

Hi all...

I already wrote before on the list about the solution I have at a
customer running DRBD8+OCFS2 on two remote sites connected via VPN.
The different suggestions helped improving the situation, but still
we're having big troubles. We've also upgraded the old server with a new
and much more powerful one but there was nearly no improvement at all!
The situation is resumed as:
SITE A: Dual Core 2GHz Pentium, 1Gb ram, 1 SATA hdd for /, 3 SATA hdd in
software raid5, DRBD on /dev/md0. 
SITE B: Quad Core 2.4HGz Pentium, 2Gb ram, 3 SATA HDD in software raid5,
DRBD on /dev/md1.

The two sites are connected using two ADSL, with TWO bonded VPN.

Both machines run Debian Etch fully updated, kernel 2.6.26-bpo.1-686 SMP
with deadline scheduler, DRBD 8.0.13, OCFS2 1.4.1-1. 
The shared data partition is 187G, 30 of which used.

The recent upgrade to OCFS2 1.4 and kernel 2.6.26 didn't improve the
performances as much as I expected.

The main problems we have are:
1. very high load average: this was previously caused by very high
iowait percentages, but with the new server the load is high while top
says the machine is 99-100% idle! 
2. very slow dir browsing: Sunil pointed me to the user guide, where he
talks about inode stat. How can I raise inode cache memory? I've done
several searches without result... The server actually uses less than
300Mb of ram out of the 1Gb installed...
3. very long umount time: I often (not always) experience an extremely
long umount time. During the period while the process is executing iftop
says there's a high usage of network transfer. I suppose it's
transfering file locks, but is it possible that stays stuck for more
than one hour, and still going?

This is the configuration file of OCFS2. The quad-core is file-server-2.

#/etc/ocfs2/cluster.conf
node:
        ip_port = 7777
        ip_address = 192.168.0.1
        number = 0
        name = file-server-1
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 192.168.2.31
        number = 1
        name = file-server-2
        cluster = ocfs2
cluster:
        node_count = 2
        name = ocfs2


What is stunning me is that on file-server-2 we run a rsync backup during the
night on a local machine on the network, and it takes less than 20m! Doing the
same on the other server throws the load average to the stars!

We're in a critical situation because this solution is deployed since a long
time and it's not yet working as expected.
If nobody has suggestion we have no problem in paying qualified support for
solving these problems. In this case please contact me directly.
Sunil, can I get Oracle support for this?

Thank you.
-- 
Lorenzo Milesi - lorenzo.milesi at yetopen.it

YetOpen S.r.l. - http://www.yetopen.it/
C.so E. Filiberto, 74 23900 Lecco - ITALY -
Tel 0341 220 205 - Fax 178 607 8199

GPG/PGP Key-Id: 0xE704E230 - http://keyserver.linux.it

-------- D.Lgs. 196/2003 --------

Si avverte che tutte le informazioni contenute in questo messaggio sono
riservate ed a uso esclusivo del destinatario. Nel caso in cui questo
messaggio Le fosse pervenuto per errore, La invitiamo ad eliminarlo
senza copiarlo, a non inoltrarlo a terzi e ad avvertirci non appena
possibile.
Grazie.

Luis Freitas

2008-Dec-02 16:48 UTC

head link

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

Lorenzo,

?? My 2 cents. This is purely speculation, since I never worked with a
environment like what you have there.

?? You have a very different configuration from what is usual with OCFS2. The
filesystem is tested on systems that have a fast network connection between
nodes, so it is probably not tuned to environments where the network bandwith is
low.

??? You might get some improvement if you change some of the VM tunables
(/proc/sys/vm/*). On 2.6 there are not much of them for the filesystem cache,
and some seem to have no effect.

?? vfs_cache_pressure could give you some control on the quantity of inodes
cached by the kernel. Try either increasing or decreasing. Some of the problems
you relate, like the long time for umount might actually be caused by keeping a
large amount of structures on memory, and since the network is slow a long time
is needed to clear all of them.

?? Swappiness controls how aggressivelly pages are swapped out. Since you dont
have swap on the OCFS2 filesystem it should not have much impact. (You dont have
swap there, right?) But you may be able to force the kernel to release memory so
that it can be used by the OCFS2. Again this could actually cause the problem to
become worse.

?? There used to be a parameter that controls how much of the cache is used for
inodes, and how much of it is used for data blocks, dcache_priorty.? But it no
longer available on 2.6? and I could not find an equivalent.

Regards,
Luis

--- On Tue, 12/2/08, Lorenzo Milesi <lorenzo.milesi at yetopen.it> wrote:
From: Lorenzo Milesi <lorenzo.milesi at yetopen.it>
Subject: [Ocfs2-users] Please urgent help required - OCFS2 and VPN again
To: ocfs2-users at oss.oracle.com
Date: Tuesday, December 2, 2008, 8:59 AM

Hi all...

I already wrote before on the list about the solution I have at a
customer running DRBD8+OCFS2 on two remote sites connected via VPN.
The different suggestions helped improving the situation, but still
we're having big troubles. We've also upgraded the old server with a
new
and much more powerful one but there was nearly no improvement at all!
The situation is resumed as:
SITE A: Dual Core 2GHz Pentium, 1Gb ram, 1 SATA hdd for /, 3 SATA hdd in
software raid5, DRBD on /dev/md0. 
SITE B: Quad Core 2.4HGz Pentium, 2Gb ram, 3 SATA HDD in software raid5,
DRBD on /dev/md1.

The two sites are connected using two ADSL, with TWO bonded VPN.

Both machines run Debian Etch fully updated, kernel 2.6.26-bpo.1-686 SMP
with deadline scheduler, DRBD 8.0.13, OCFS2 1.4.1-1. 
The shared data partition is 187G, 30 of which used.

The recent upgrade to OCFS2 1.4 and kernel 2.6.26 didn't improve the
performances as much as I expected.

The main problems we have are:
1. very high load average: this was previously caused by very high
iowait percentages, but with the new server the load is high while top
says the machine is 99-100% idle! 
2. very slow dir browsing: Sunil pointed me to the user guide, where he
talks about inode stat. How can I raise inode cache memory? I've done
several searches without result... The server actually uses less than
300Mb of ram out of the 1Gb installed...
3. very long umount time: I often (not always) experience an extremely
long umount time. During the period while the process is executing iftop
says there's a high usage of network transfer. I suppose it's
transfering file locks, but is it possible that stays stuck for more
than one hour, and still going?

This is the configuration file of OCFS2. The quad-core is file-server-2.

#/etc/ocfs2/cluster.conf
node:
        ip_port = 7777
        ip_address = 192.168.0.1
        number = 0
        name = file-server-1
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 192.168.2.31
        number = 1
        name = file-server-2
        cluster = ocfs2
cluster:
        node_count = 2
        name = ocfs2


What is stunning me is that on file-server-2 we run a rsync backup during the
night on a local machine on the network, and it takes less than 20m! Doing the
same on the other server throws the load average to the stars!

We're in a critical situation because this solution is deployed since a
long time and it's not yet working as expected. 
If nobody has suggestion we have no problem in paying qualified support for
solving these problems. In this case please contact me directly. 
Sunil, can I get Oracle support for this?

Thank you.
-- 
Lorenzo Milesi - lorenzo.milesi at yetopen.it

YetOpen S.r.l. - http://www.yetopen.it/
C.so E. Filiberto, 74 23900 Lecco - ITALY -
Tel 0341 220 205 - Fax 178 607 8199

GPG/PGP Key-Id: 0xE704E230 - http://keyserver.linux.it

-------- D.Lgs. 196/2003 --------

Si avverte che tutte le informazioni contenute in questo messaggio sono
riservate ed a uso esclusivo del destinatario. Nel caso in cui questo
messaggio Le fosse pervenuto per errore, La invitiamo ad eliminarlo
senza copiarlo, a non inoltrarlo a terzi e ad avvertirci non appena
possibile.
Grazie.


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20081202/4e275a90/attachment.html

Sunil Mushran

2008-Dec-02 20:11 UTC

head link

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

The interconnect is your bottleneck.

Lorenzo Milesi wrote:> Hi all...
>
> I already wrote before on the list about the solution I have at a
> customer running DRBD8+OCFS2 on two remote sites connected via VPN.
> The different suggestions helped improving the situation, but still
> we're having big troubles. We've also upgraded the old server with
a new
> and much more powerful one but there was nearly no improvement at all!
> The situation is resumed as:
> SITE A: Dual Core 2GHz Pentium, 1Gb ram, 1 SATA hdd for /, 3 SATA hdd in
> software raid5, DRBD on /dev/md0. 
> SITE B: Quad Core 2.4HGz Pentium, 2Gb ram, 3 SATA HDD in software raid5,
> DRBD on /dev/md1.
>
> The two sites are connected using two ADSL, with TWO bonded VPN.
>
> Both machines run Debian Etch fully updated, kernel 2.6.26-bpo.1-686 SMP
> with deadline scheduler, DRBD 8.0.13, OCFS2 1.4.1-1. 
> The shared data partition is 187G, 30 of which used.
>
> The recent upgrade to OCFS2 1.4 and kernel 2.6.26 didn't improve the
> performances as much as I expected.
>
> The main problems we have are:
> 1. very high load average: this was previously caused by very high
> iowait percentages, but with the new server the load is high while top
> says the machine is 99-100% idle! 
> 2. very slow dir browsing: Sunil pointed me to the user guide, where he
> talks about inode stat. How can I raise inode cache memory? I've done
> several searches without result... The server actually uses less than
> 300Mb of ram out of the 1Gb installed...
> 3. very long umount time: I often (not always) experience an extremely
> long umount time. During the period while the process is executing iftop
> says there's a high usage of network transfer. I suppose it's
> transfering file locks, but is it possible that stays stuck for more
> than one hour, and still going?
>
> This is the configuration file of OCFS2. The quad-core is file-server-2.
>
> #/etc/ocfs2/cluster.conf
> node:
>         ip_port = 7777
>         ip_address = 192.168.0.1
>         number = 0
>         name = file-server-1
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 192.168.2.31
>         number = 1
>         name = file-server-2
>         cluster = ocfs2
> cluster:
>         node_count = 2
>         name = ocfs2
>
>
> What is stunning me is that on file-server-2 we run a rsync backup during
the night on a local machine on the network, and it takes less than 20m! Doing
the same on the other server throws the load average to the stars!
>
> We're in a critical situation because this solution is deployed since a
long time and it's not yet working as expected.
> If nobody has suggestion we have no problem in paying qualified support for
solving these problems. In this case please contact me directly.
> Sunil, can I get Oracle support for this?
>
> Thank you.
>

Lorenzo Milesi

2008-Dec-08 20:13 UTC

head link

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

Il giorno mar, 02/12/2008 alle 12.11 -0800, Sunil Mushran ha
scritto:> The interconnect is your bottleneck.
thanks for your answer. what could be the minimum bandwidth for letting
ocfs2 work fine?

thanks

-- 
Lorenzo Milesi - lorenzo.milesi at yetopen.it

YetOpen S.r.l. - http://www.yetopen.it/
C.so E. Filiberto, 74 23900 Lecco - ITALY -
Tel 0341 220 205 - Fax 178 607 8199

GPG/PGP Key-Id: 0xE704E230 - http://keyserver.linux.it

-------- D.Lgs. 196/2003 --------

Si avverte che tutte le informazioni contenute in questo messaggio sono
riservate ed a uso esclusivo del destinatario. Nel caso in cui questo
messaggio Le fosse pervenuto per errore, La invitiamo ad eliminarlo
senza copiarlo, a non inoltrarlo a terzi e ad avvertirci non appena
possibile.
Grazie.

Lorenzo Milesi

2008-Dec-09 09:00 UTC

head link

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

I suppose rsync takes ages in reading all directory tree and directory
listing, as it has to dig it all!
This shouldn't be affected by drbd "writes", isn't it? Is it
due to high
latency?

thanks!


Il giorno lun, 08/12/2008 alle 17.15 -0800, Sunil Mushran ha
scritto:> ocfs2 does not have a concept of "primary". While we do use
> lower node number as a tie breaker in the dlm, that in itself
> should not be affect the performance much.
> 
> See what rsync is doing.
> 
> Lorenzo Milesi wrote:
> > Thanks again.
> >
> > Another curiosity: as I told we have a rsync backup running on one
node.
> > If that node is set "primary" (node number 0 in
cluster.conf) that rsync
> > takes up to 4h for transferring very few files, while if I set it
> > "secondary" it takes 20m!
> >
> >
> >
> > Il giorno lun, 08/12/2008 alle 12.33 -0800, Sunil Mushran ha scritto:
> >   
> >> ocfs2's interconnect usage is latency sensitive. It does not
use
> >> much bandwidth. While it has small packets, it has lots of them.
> >> It has been written with a low latency gige private interconnect
> >> in mind.
> >>
> >> In your setup you are using drbd which is replicating all block
> >> writes across the interconnect. That is bandwidth sensitive.
> >>
> >> The performance of the ocfs2 file system (all fses actually) is
partly
> >> dependant on the underlying block device. If the block device is
> >> slow, the overall performance of the fs will be slow.
> >>
> >> Lorenzo Milesi wrote:
> >>     
> >>> Il giorno mar, 02/12/2008 alle 12.11 -0800, Sunil Mushran ha
scritto:
> >>>   
> >>>       
> >>>> The interconnect is your bottleneck.
> >>>>     
> >>>>         
> >>> thanks for your answer. what could be the minimum bandwidth
for letting
> >>> ocfs2 work fine?
> >>>
> >>> thanks
> >>>
> >>>   
> >>>       
> >>     
> 
> -- 
Lorenzo Milesi - lorenzo.milesi at yetopen.it

YetOpen S.r.l. - http://www.yetopen.it/
C.so E. Filiberto, 74 23900 Lecco - ITALY -
Tel 0341 220 205 - Fax 178 607 8199

GPG/PGP Key-Id: 0xE704E230 - http://keyserver.linux.it

-------- D.Lgs. 196/2003 --------

Si avverte che tutte le informazioni contenute in questo messaggio sono
riservate ed a uso esclusivo del destinatario. Nel caso in cui questo
messaggio Le fosse pervenuto per errore, La invitiamo ad eliminarlo
senza copiarlo, a non inoltrarlo a terzi e ad avvertirci non appena
possibile.
Grazie.

Ocfs2 users - Dec 2008 - Please urgent help required - OCFS2 and VPN again

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again

[Ocfs2-users] Please urgent help required - OCFS2 and VPN again