Hi, we are using a SLES10 Patchlevel 3 with 12 Nodes hosting tomcat application servers. The cluster was running some time (about 200 days) without problems. Recently we needed to shutdown the cluster for maintenance and experianced very long times for the umount of the filesystem. It took something like 45 minutes each node and filesystem (12 x 45 minutes shutdown time). As a result the planned downtime had to be extended ;-) . Is there any tuning option or the like to make those umounts faster or is this something we have to live with? Thanks for your help. If you need more information let me know. Marc. Some info on the configuration: ---------------------------X8----------------------------------- # /sbin/modinfo ocfs2 filename: /lib/modules/2.6.16.60-0.54.5-smp/kernel/fs/ocfs2/ocfs2.ko license: GPL author: Oracle version: 1.4.1-1-SLES description: OCFS2 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 (build f922955d99ef972235bd0c1fc236c5ddbb368611) srcversion: 986DD1EE4F5ABD8A44FF925 depends: ocfs2_dlm,jbd,ocfs2_nodemanager supported: yes vermagic: 2.6.16.60-0.54.5-smp SMP gcc-4.1 atix at CAS12:~> /sbin/modinfo ocfs2_dlm filename: /lib/modules/2.6.16.60-0.54.5-smp/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko license: GPL author: Oracle version: 1.4.1-1-SLES description: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 (build f922955d99ef972235bd0c1fc236c5ddbb368611) srcversion: 16FE87920EA41CA613E6609 depends: ocfs2_nodemanager supported: yes vermagic: 2.6.16.60-0.54.5-smp SMP gcc-4.1 parm: dlm_purge_interval_ms:int parm: dlm_purge_locks_max:int # rpm -qa ocfs2* ocfs2-tools-1.4.0-0.9.9 ocfs2console-1.4.0-0.9.9 ---------------------------X8----------------------------------- The kernel version is 2.6.16.60-0.54.5-smp ______________________________________________________________________________ Marc Grimme E-Mail: grimme at atix.de ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de Enterprise Linux einfach online kaufen: www.linux-subscriptions.com Registergericht: Amtsgericht M?nchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Thomas Merz (Vors.), Marc Grimme, Mark Hlawatschek, Jan R. Bergrath | Vorsitzender des Aufsichtsrats: Dr. Martin Buss
Sunil Mushran
2011-Jul-06 16:37 UTC
[Ocfs2-users] Slow umounts on SLES10 patchlevel 3 ocfs2
umount is a two step process. First the fs frees the inodes. Then the o2dlm takes stock of all active resources and migrates ones that are still in use. This typically takes some time. But I have never heard of it taking 45 mins. But I guess it could be if one has a lot of resources. Lets start by getting a count. This will dump the number of cluster locks held by the fs. # for vol in /sys/kernel/debug/ocfs2/* do count=$(wc -l ${vol}/locking_state | cut -f1 -d' '); echo "$(basename ${vol}): ${count} locks" ; done; This will dump the number of lock resources known to the dlm. # for vol in /sys/kernel/debug/o2dlm/* do count=$(grep -c "^NAME:" ${vol}/locking_state); echo "$(basename ${vol}): ${count} resources" ; done; The debugfs needs to be mounted for this to work. mount -t debugfs none /sys/kernel/debug Sunil On 07/06/2011 08:20 AM, Marc Grimme wrote:> Hi, > we are using a SLES10 Patchlevel 3 with 12 Nodes hosting tomcat application servers. > The cluster was running some time (about 200 days) without problems. > > Recently we needed to shutdown the cluster for maintenance and experianced very long times for the umount of the filesystem. It took something like 45 minutes each node and filesystem (12 x 45 minutes shutdown time). > As a result the planned downtime had to be extended ;-) . > > Is there any tuning option or the like to make those umounts faster or is this something we have to live with? > > Thanks for your help. > If you need more information let me know. > > Marc. > > Some info on the configuration: > ---------------------------X8----------------------------------- > # /sbin/modinfo ocfs2 > filename: /lib/modules/2.6.16.60-0.54.5-smp/kernel/fs/ocfs2/ocfs2.ko > license: GPL > author: Oracle > version: 1.4.1-1-SLES > description: OCFS2 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 (build f922955d99ef972235bd0c1fc236c5ddbb368611) > srcversion: 986DD1EE4F5ABD8A44FF925 > depends: ocfs2_dlm,jbd,ocfs2_nodemanager > supported: yes > vermagic: 2.6.16.60-0.54.5-smp SMP gcc-4.1 > atix at CAS12:~> /sbin/modinfo ocfs2_dlm > filename: /lib/modules/2.6.16.60-0.54.5-smp/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko > license: GPL > author: Oracle > version: 1.4.1-1-SLES > description: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 (build f922955d99ef972235bd0c1fc236c5ddbb368611) > srcversion: 16FE87920EA41CA613E6609 > depends: ocfs2_nodemanager > supported: yes > vermagic: 2.6.16.60-0.54.5-smp SMP gcc-4.1 > parm: dlm_purge_interval_ms:int > parm: dlm_purge_locks_max:int > # rpm -qa ocfs2* > ocfs2-tools-1.4.0-0.9.9 > ocfs2console-1.4.0-0.9.9 > ---------------------------X8----------------------------------- > The kernel version is 2.6.16.60-0.54.5-smp > > ______________________________________________________________________________ > > Marc Grimme > > E-Mail: grimme at atix.de
So I now have two figures from two different clusters. Both are quite slow during restarts. Having two filesystems mounted. Cluster1 (that one that last time took very long): Clusterlocks held by filesystem.. 1788AD39151A4E76997420D62A778E65: 274258 locks 1EFA64C36FD54AB48B734A99E7F45A73: 576842 locks Clusterresources held by filesystem.. 1788AD39151A4E76997420D62A778E65: 214545 resources 1EFA64C36FD54AB48B734A99E7F45A73: 469319 resources Second cluster (also takes quite long): Clusterlocks held by filesystem.. 1EDBCFF0CAB24D0CAE91CB2DA241E8CA: 717186 locks 585462C2FA5A428D913A3CBDBC77E116: 68 locks Clusterresources held by filesystem.. 1EDBCFF0CAB24D0CAE91CB2DA241E8CA: 587471 resources 585462C2FA5A428D913A3CBDBC77E116: 20 resources Let me know if you need more information. Thanks Marc. ----- "Sunil Mushran" <sunil.mushran at oracle.com> wrote:> It was designed to run in prod envs. > > On 07/07/2011 12:21 AM, Marc Grimme wrote: > > Sunil, > > can I query those figures during runtime of a productive cluster? > > Or might it influence the availability performance what ever? > > > > Thanks for your help. > > Marc. > > ----- "Sunil Mushran"<sunil.mushran at oracle.com> wrote: > > > >> umount is a two step process. First the fs frees the inodes. Then > the > >> o2dlm takes stock of all active resources and migrates ones that > are > >> still in use. This typically takes some time. But I have never > heard > >> of it taking 45 mins. > >> > >> But I guess it could be if one has a lot of resources. Lets start > by > >> getting a count. > >> > >> This will dump the number of cluster locks held by the fs. > >> # for vol in /sys/kernel/debug/ocfs2/* > >> do > >> count=$(wc -l ${vol}/locking_state | cut -f1 -d' '); > >> echo "$(basename ${vol}): ${count} locks" ; > >> done; > >> > >> This will dump the number of lock resources known to the dlm. > >> # for vol in /sys/kernel/debug/o2dlm/* > >> do > >> count=$(grep -c "^NAME:" ${vol}/locking_state); > >> echo "$(basename ${vol}): ${count} resources" ; > >> done; > >> > >> The debugfs needs to be mounted for this to work. > >> mount -t debugfs none /sys/kernel/debug > >> > >> Sunil > >> > >> On 07/06/2011 08:20 AM, Marc Grimme wrote: > >>> Hi, > >>> we are using a SLES10 Patchlevel 3 with 12 Nodes hosting tomcat > >> application servers. > >>> The cluster was running some time (about 200 days) without > >> problems. > >>> Recently we needed to shutdown the cluster for maintenance and > >> experianced very long times for the umount of the filesystem. It > took > >> something like 45 minutes each node and filesystem (12 x 45 > minutes > >> shutdown time). > >>> As a result the planned downtime had to be extended ;-) . > >>> > >>> Is there any tuning option or the like to make those umounts > faster > >> or is this something we have to live with? > >>> Thanks for your help. > >>> If you need more information let me know. > >>> > >>> Marc. > >>> > >>> Some info on the configuration: > >>> ---------------------------X8----------------------------------- > >>> # /sbin/modinfo ocfs2 > >>> filename: > >> /lib/modules/2.6.16.60-0.54.5-smp/kernel/fs/ocfs2/ocfs2.ko > >>> license: GPL > >>> author: Oracle > >>> version: 1.4.1-1-SLES > >>> description: OCFS2 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008 > >> (build f922955d99ef972235bd0c1fc236c5ddbb368611) > >>> srcversion: 986DD1EE4F5ABD8A44FF925 > >>> depends: ocfs2_dlm,jbd,ocfs2_nodemanager > >>> supported: yes > >>> vermagic: 2.6.16.60-0.54.5-smp SMP gcc-4.1 > >>> atix at CAS12:~> /sbin/modinfo ocfs2_dlm > >>> filename: > >> /lib/modules/2.6.16.60-0.54.5-smp/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko > >>> license: GPL > >>> author: Oracle > >>> version: 1.4.1-1-SLES > >>> description: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC > 2008 > >> (build f922955d99ef972235bd0c1fc236c5ddbb368611) > >>> srcversion: 16FE87920EA41CA613E6609 > >>> depends: ocfs2_nodemanager > >>> supported: yes > >>> vermagic: 2.6.16.60-0.54.5-smp SMP gcc-4.1 > >>> parm: dlm_purge_interval_ms:int > >>> parm: dlm_purge_locks_max:int > >>> # rpm -qa ocfs2* > >>> ocfs2-tools-1.4.0-0.9.9 > >>> ocfs2console-1.4.0-0.9.9 > >>> ---------------------------X8----------------------------------- > >>> The kernel version is 2.6.16.60-0.54.5-smp > >>> > >>> > >> > ______________________________________________________________________________ > >>> Marc Grimme > >>> > >>> E-Mail: grimme at atix.de-- ______________________________________________________________________________ Marc Grimme Tel: +49 89 4523538-14 Fax: +49 89 9901766-0 E-Mail: grimme at atix.de ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de Enterprise Linux einfach online kaufen: www.linux-subscriptions.com Registergericht: Amtsgericht M?nchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Thomas Merz (Vors.), Marc Grimme, Mark Hlawatschek, Jan R. Bergrath | Vorsitzender des Aufsichtsrats: Dr. Martin Buss