thr3ads.net - Ocfs2 users - [Ocfs2-users] ocfs2 file system hang during copy files [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Zosen Wang

2007-Jul-18 21:05 UTC

[Ocfs2-users] ocfs2 file system hang during copy files

I am trying to copy a single 42 gb file from et3 file system to ocfs2 file
system on node 1. The ocfs2 file system hang
on all nodes after/during the cp. The /p0ebsdb/u13 is an ocfs2 mount point
shared with other 2 nodes (3 nodes rac).

 

The following is unix copy command

[root@b30svrxp-ebsdb1 migrate]# time cp aexp02.dmp /p0ebsdb/u13/junk

 

real    17m49.351s

user    0m0.392s

sys     1m49.065s

 

The following is dmesg on node1

 

ocfs2_dlm: Nodes in domain ("A2AECED66891407D915CBF282A9E9299"): 0 1 2

o2net: connection to node b30svrxp-ebsdb2.ameripride.com (num 1) at
192.168.3.70:7777 has been idle for 10.0 seconds,
shutting it down.

(0,3):o2net_idle_timer:1418 here are some times that might help debug the
situation: (tmr 1184814613.883032 now
1184814623.882842 dr 1184814613.883028 adv 1184814613.883033:1184814613.883033
func (2b61f804:504)
1184814613.882900:1184814613.882904)

o2net: no longer connected to node b30svrxp-ebsdb2.ameripride.com (num 1) at
192.168.3.70:7777

(6047,3):dlm_send_proxy_ast_msg:459 ERROR: status = -107

(6047,3):dlm_flush_asts:600 ERROR: status = -107

(20810,0):dlm_do_master_request:1418 ERROR: link to 1 went down!

(20810,0):dlm_get_lock_resource:995 ERROR: status = -107

 

The following is dmesg on node2

(26243,1):dlm_send_remote_convert_request:398 ERROR: status = -107

(26243,1):dlm_wait_for_node_death:365 9EA98E20F6E44FF7B7A89789976C1E32: waiting
5000ms for notification of death of node
0

(7427,0):dlm_send_remote_convert_request:398 ERROR: status = -107

(7427,0):dlm_wait_for_node_death:365 75990178D36942BFA473A2AE4149690C: waiting
5000ms for notification of death of node
0

 

The following is dmesg on node3

mtrr: type mismatch for d8000000,2000000 old: uncachable new: write-combining

adl_trace[9860]: segfault at 000000000000000c rip 0000000040002462 rsp
0000007fbfffe3e0 error 4

 

Any clue? And thanks in advance

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20070718/7270cf81/attachment.html

Sunil Mushran

2007-Jul-19 10:46 UTC

head link

[Ocfs2-users] ocfs2 file system hang during copy files

The default disk heartbeat timeouts are way too low. In short, the
buffered write flush is probably flooding the device and delaying
the heartbeat io.

For more, refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT

If you are 1.2.5, then also refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT

Zosen Wang wrote:>
> I am trying to copy a single 42 gb file from et3 file system to ocfs2 
> file system on node 1. The ocfs2 file system hang on all nodes 
> after/during the cp. The /p0ebsdb/u13 is an ocfs2 mount point shared 
> with other 2 nodes (3 nodes rac).
>
>  
>
> The following is unix copy command
>
> [root@b30svrxp-ebsdb1 migrate]# time cp aexp02.dmp /p0ebsdb/u13/junk
>
>  
>
> real    17m49.351s
>
> user    0m0.392s
>
> sys     1m49.065s
>
>  
>
> The following is dmesg on node1
>
>  
>
> ocfs2_dlm: Nodes in domain ("A2AECED66891407D915CBF282A9E9299"):
0 1 2
>
> o2net: connection to node b30svrxp-ebsdb2.ameripride.com (num 1) at 
> 192.168.3.70:7777 has been idle for 10.0 seconds, shutting it down.
>
> (0,3):o2net_idle_timer:1418 here are some times that might help debug 
> the situation: (tmr 1184814613.883032 now 1184814623.882842 dr 
> 1184814613.883028 adv 1184814613.883033:1184814613.883033 func 
> (2b61f804:504) 1184814613.882900:1184814613.882904)
>
> o2net: no longer connected to node b30svrxp-ebsdb2.ameripride.com (num 
> 1) at 192.168.3.70:7777
>
> (6047,3):dlm_send_proxy_ast_msg:459 ERROR: status = -107
>
> (6047,3):dlm_flush_asts:600 ERROR: status = -107
>
> (20810,0):dlm_do_master_request:1418 ERROR: link to 1 went down!
>
> (20810,0):dlm_get_lock_resource:995 ERROR: status = -107
>
>  
>
> The following is dmesg on node2
>
> (26243,1):dlm_send_remote_convert_request:398 ERROR: status = -107
>
> (26243,1):dlm_wait_for_node_death:365 
> 9EA98E20F6E44FF7B7A89789976C1E32: waiting 5000ms for notification of 
> death of node 0
>
> (7427,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>
> (7427,0):dlm_wait_for_node_death:365 75990178D36942BFA473A2AE4149690C: 
> waiting 5000ms for notification of death of node 0
>
>  
>
> The following is dmesg on node3
>
> mtrr: type mismatch for d8000000,2000000 old: uncachable new: 
> write-combining
>
> adl_trace[9860]: segfault at 000000000000000c rip 0000000040002462 rsp 
> 0000007fbfffe3e0 error 4
>
>  
>
> Any clue? And thanks in advance
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Ocfs2 users - Jul 2007 - ocfs2 file system hang during copy files

[Ocfs2-users] ocfs2 file system hang during copy files

[Ocfs2-users] ocfs2 file system hang during copy files