I am still having this weird problem with nodes hanging while I'm running OCFS. I'm using OCFS 1.0.9-12 and RHAS 2.1 I've been working on tracking it down and here's what I've got so far: 1. I create a file from node 0. This succeeds; I can /bin/cat the file, append, edit, or whatever. 2. From node 1, I do an operation that accesses the DirNode (e.g. /bin/ls) 3. Node 0 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the DirNode itself (although I seem to still be able to *read* the DirNode from node 1) 4. I attempt to create a file from node 1... node 1 hangs, waiting for the exclusive lock on the DirNode to be released. *** node 1 is now completely dysfunctional. OCFS is hung. 5. I delete the file I created in step 1 (from node 0) 6. The OCFS_DLM_EXCLUSIVE_LOCK is released. 7. node 1 resumes, and creates a file 8. I access the DirNode from node 0 9. Node 1 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the DirNode itself... the whole process repeats, but with the nodes reversed. This looks a lot like a bug to me. I've had a case open with Oracle Support for it since the end of Feb, but at the moment BDE is too busy investigating some message about the local hard drive controller to consider that it might be a bug (and honestly, it probably doesn't involve my local hard drive controller). Anyone have any suggestions? Jeremy Lansing, MI <<<<...>>>>
another note: after I delete the file I created that caused the OCFS_DLM_EXCLUSIVE_LOCK to be held, the lock doesn't seem to actually be released (according to debugocfs) until the other node attempts to read the DirNode. (e.g. /bin/ls or something) Jeremy>>> "Jeremy Schneider" <jer1887@asugroup.com> 03/10/2004 4:55:56 PM >>>I am still having this weird problem with nodes hanging while I'm running OCFS. I'm using OCFS 1.0.9-12 and RHAS 2.1 I've been working on tracking it down and here's what I've got so far: 1. I create a file from node 0. This succeeds; I can /bin/cat the file, append, edit, or whatever. 2. From node 1, I do an operation that accesses the DirNode (e.g. /bin/ls) 3. Node 0 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the DirNode itself (although I seem to still be able to *read* the DirNode from node 1) 4. I attempt to create a file from node 1... node 1 hangs, waiting for the exclusive lock on the DirNode to be released. *** node 1 is now completely dysfunctional. OCFS is hung. 5. I delete the file I created in step 1 (from node 0) 6. The OCFS_DLM_EXCLUSIVE_LOCK is released. 7. node 1 resumes, and creates a file 8. I access the DirNode from node 0 9. Node 1 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the DirNode itself... the whole process repeats, but with the nodes reversed. This looks a lot like a bug to me. I've had a case open with Oracle Support for it since the end of Feb, but at the moment BDE is too busy investigating some message about the local hard drive controller to consider that it might be a bug (and honestly, it probably doesn't involve my local hard drive controller). Anyone have any suggestions? Jeremy Lansing, MI <<<<...>>>>
Ahh... you were exactly right about raw devices, Sunil. debugocfs was reading the DirNode from the buffered device. I mapped a raw device and tried again with much clearer results - the reason running 'ls' made the lock show up is that when I did an 'ls', ocfs forced the buffered device to refresh it's buffer - and that's when the EXCLUSIVE_LOCK obtained by the opposite node showed up to debugocfs. It seems that somehow the EXCLUSIVE_LOCK is not being released after ocfs creates a file. It doesn't matter whether or not I access the file in o_direct mode. If I use dd with the o_direct flag to create the file the EXCLUSIVE_LOCK on the DirNode is still not released until I remove the file. I bet you could reproduce this on a lab machine (it seems like the kind of problem that should be reproducible) - if anyone's interested, email me and I can give you specific steps to reproduce. I have not tried with the 1.0.10 tools yet; I'll do that. (It doesn't sound quite right, but I'm not exactly an ocfs guru <g> and it's the best suggestion I've received so far...) If anyone on the list has any other suggestions, please let me know. Also, if anyone over at Oracle would like to do me a favor and somehow straighten out the mess with the support case I have open (let them know it's not a hardware issue and get it out of BDE) I'd appreciate that too... just email me for details... Thanks again, Jeremy>>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>> > I hope, that when you were reading the dirnode, etc. usingdebugocfs,> you were accessing the volume via the raw device. If you weren't, doso.> This is important because that's the only way to ensure directio.Else,> you will be reading potentially stale data from the buffer cache. > > Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. > And all EXCLUSIVE_LOCKS are released when the operation is over. > So am not sure what is happening. Using debugocfs correctly shouldhelp> us understand the problem. > > Also, whenever you do your file operations (cat etc.) ensure thoseops are> o_direct. Now I am not sure why this would cause a problem, but donot> do buffered operations. ocfs does not support shared mmap. > > If you download the 1.0.10 tools, you will not need to manually mapthe> raw device. The tools do that automatically. > > So, upgrade to 1.0.10 module and tools. See if you can reproduce the > problem.<<<<...>>>>
Just noticed/thought of something new. /bin/rm acquires and releases an EXCLUSIVE_LOCK on the DirNode. If the node already *has* the EXCLUSIVE_LOCK, then of course ocfs just continues. Then, when the remove operation is complete, the EXCLUSIVE_LOCK is released. ... [5 minutes later] ... In fact, I just confirmed that removing *any* file in the directory releases the EXCLUSIVE_LOCK. ;) Maybe the lock just somehow isn't getting released by ocfs_create_file() in Common/ocfsgencreate.c (don't ask me why though)... indeed, any other operation (e.g. removing a file) that acquires and releases the lock successfully would "fix" the hung server... like I said, I'm no ocfs guru, but that's my hunch... Oh, the wonders of open source... (!) Jeremy>>> "Jeremy Schneider" <jer1887@asugroup.com> 03/11/2004 9:50:58 AM >>>Ahh... you were exactly right about raw devices, Sunil. debugocfs was reading the DirNode from the buffered device. I mapped a raw device and tried again with much clearer results - the reason running 'ls' made the lock show up is that when I did an 'ls', ocfs forced the buffered device to refresh it's buffer - and that's when the EXCLUSIVE_LOCK obtained by the opposite node showed up to debugocfs. It seems that somehow the EXCLUSIVE_LOCK is not being released after ocfs creates a file. It doesn't matter whether or not I access the file in o_direct mode. If I use dd with the o_direct flag to create the file the EXCLUSIVE_LOCK on the DirNode is still not released until I remove the file. I bet you could reproduce this on a lab machine (it seems like the kind of problem that should be reproducible) - if anyone's interested, email me and I can give you specific steps to reproduce. I have not tried with the 1.0.10 tools yet; I'll do that. (It doesn't sound quite right, but I'm not exactly an ocfs guru <g> and it's the best suggestion I've received so far...) If anyone on the list has any other suggestions, please let me know. Also, if anyone over at Oracle would like to do me a favor and somehow straighten out the mess with the support case I have open (let them know it's not a hardware issue and get it out of BDE) I'd appreciate that too... just email me for details... Thanks again, Jeremy>>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>> > I hope, that when you were reading the dirnode, etc. usingdebugocfs,> you were accessing the volume via the raw device. If you weren't, doso.> This is important because that's the only way to ensure directio.Else,> you will be reading potentially stale data from the buffer cache. > > Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. > And all EXCLUSIVE_LOCKS are released when the operation is over. > So am not sure what is happening. Using debugocfs correctly shouldhelp> us understand the problem. > > Also, whenever you do your file operations (cat etc.) ensure thoseops are> o_direct. Now I am not sure why this would cause a problem, but donot> do buffered operations. ocfs does not support shared mmap. > > If you download the 1.0.10 tools, you will not need to manually mapthe> raw device. The tools do that automatically. > > So, upgrade to 1.0.10 module and tools. See if you can reproduce the > problem.<<<<...>>>>
FYI, I downloaded ocfs 1.0.10 from oss.oracle.com and tried it... couldn't even successfully create a filesystem. (?!) [root@dc1node1 /]# mkfs.ocfs -V mkfs.ocfs 1.0.10-PROD1 Fri Mar 5 14:35:32 PST 2004 (build 902cb33b89695a48f0dd6517b713f949) [root@dc1node1 /]# mkfs.ocfs -b 128 -F -g 0 -L dc1:/u03 -m /u03 -p 755 -u 0 /dev/sda Cleared volume header sectors Cleared node config sectors Cleared publish sectors Cleared vote sectors Cleared bitmap sectors Cleared data block Wrote volume header [root@dc1node1 /]# fsck.ocfs /dev/sda fsck.ocfs 1.0.10-PROD1 Fri Mar 5 14:35:41 PST 2004 (build b5602eb387c7409e9f814faf1d363b5b) Checking Volume Header... ERROR: structure failed verification, fsck.c, 384 ocfs_vol_disk_hdr ================================minor_version: 2 major_version: 1 signature: OracleCFS mount_point: /u03 serial_num: 0 device_size: 10737418240 start_off: 0 bitmap_off: 56320 publ_off: 23552 vote_off: 39936 root_bitmap_off: 0 data_start_off: 1368064 root_bitmap_size: 0 root_off: <INVALID VALUE> 0 root_size: 0 cluster_size: 131072 num_nodes: 32 num_clusters: 81905 dir_node_size: 0 file_node_size: 0 internal_off: <INVALID VALUE> 0 node_cfg_off: 4096 node_cfg_size: 17408 new_cfg_off: 21504 prot_bits: -rwxr-xr-x uid: 0 (root) gid: 0 (root) excl_mount: OCFS_INVALID_NODE_NUM ERROR: Volume header bad. Exiting, fsck.c, 669 /dev/sda: 2 errors, 0 objects, 0/81905 blocks [root@dc1node1 /]#>>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>>I hope, that when you were reading the dirnode, etc. using debugocfs, you were accessing the volume via the raw device. If you weren't, do so. This is important because that's the only way to ensure directio. Else, you will be reading potentially stale data from the buffer cache. Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. And all EXCLUSIVE_LOCKS are released when the operation is over. So am not sure what is happening. Using debugocfs correctly should help us understand the problem. Also, whenever you do your file operations (cat etc.) ensure those ops are o_direct. Now I am not sure why this would cause a problem, but do not do buffered operations. ocfs does not support shared mmap. If you download the 1.0.10 tools, you will not need to manually map the raw device. The tools do that automatically. So, upgrade to 1.0.10 module and tools. See if you can reproduce the problem. Jeremy Schneider wrote:>another note: > >after I delete the file I created that caused the >OCFS_DLM_EXCLUSIVE_LOCK to be held, the lock doesn't seem to actuallybe>released (according to debugocfs) until the other node attempts toread>the DirNode. (e.g. /bin/ls or something) > >Jeremy > > > > >>>>"Jeremy Schneider" <jer1887@asugroup.com> 03/10/2004 4:55:56 PM >>>> >>>> >>>> >I am still having this weird problem with nodes hanging while I'm >running OCFS. I'm using OCFS 1.0.9-12 and RHAS 2.1 > >I've been working on tracking it down and here's what I've got sofar:>1. I create a file from node 0. This succeeds; I can /bin/cat the >file, append, edit, or whatever. >2. From node 1, I do an operation that accesses the DirNode (e.g. >/bin/ls) >3. Node 0 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the >DirNode >itself (although I seem to still be able to *read* the DirNode from >node >1) >4. I attempt to create a file from node 1... node 1 hangs, waiting >for >the exclusive lock on the DirNode to be released. >*** node 1 is now completely dysfunctional. OCFS is hung. >5. I delete the file I created in step 1 (from node 0) >6. The OCFS_DLM_EXCLUSIVE_LOCK is released. >7. node 1 resumes, and creates a file > >8. I access the DirNode from node 0 >9. Node 1 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the >DirNode >itself... the whole process repeats, but with the nodes reversed. > >This looks a lot like a bug to me. I've had a case open with Oracle >Support for it since the end of Feb, but at the moment BDE is toobusy>investigating some message about the local hard drive controller to >consider that it might be a bug (and honestly, it probably doesn't >involve my local hard drive controller). > >Anyone have any suggestions? > >Jeremy >Lansing, MI<<<<...>>>>
> FYI, I downloaded ocfs 1.0.10 from oss.oracle.com and tried it... > couldn't even successfully create a filesystem. (?!)That is because you must mount it at least once before the file system is completely created. There is code in the OCFS module which does some initialize filesystem initialization on the first mount. I believe that this is going to be transitioned out of the OCFS2 module and put into mkfs. I am not sure how this will affect OCFS1. John> [root@dc1node1 /]# mkfs.ocfs -V > mkfs.ocfs 1.0.10-PROD1 Fri Mar 5 14:35:32 PST 2004 (build > 902cb33b89695a48f0dd6517b713f949) > [root@dc1node1 /]# mkfs.ocfs -b 128 -F -g 0 -L dc1:/u03 -m /u03 -p 755 > -u 0 /dev/sda > Cleared volume header sectors > Cleared node config sectors > Cleared publish sectors > Cleared vote sectors > Cleared bitmap sectors > Cleared data block > Wrote volume header > [root@dc1node1 /]# fsck.ocfs /dev/sda > fsck.ocfs 1.0.10-PROD1 Fri Mar 5 14:35:41 PST 2004 (build > b5602eb387c7409e9f814faf1d363b5b) > Checking Volume Header... > ERROR: structure failed verification, fsck.c, 384 > ocfs_vol_disk_hdr > ================================> minor_version: 2 > major_version: 1 > signature: OracleCFS > mount_point: /u03 > serial_num: 0 > device_size: 10737418240 > start_off: 0 > bitmap_off: 56320 > publ_off: 23552 > vote_off: 39936 > root_bitmap_off: 0 > data_start_off: 1368064 > root_bitmap_size: 0 > root_off: <INVALID VALUE> 0 > root_size: 0 > cluster_size: 131072 > num_nodes: 32 > num_clusters: 81905 > dir_node_size: 0 > file_node_size: 0 > internal_off: <INVALID VALUE> 0 > node_cfg_off: 4096 > node_cfg_size: 17408 > new_cfg_off: 21504 > prot_bits: -rwxr-xr-x > uid: 0 (root) > gid: 0 (root) > excl_mount: OCFS_INVALID_NODE_NUM > > ERROR: Volume header bad. Exiting, fsck.c, 669 > /dev/sda: 2 errors, 0 objects, 0/81905 blocks > [root@dc1node1 /]# > > > > >>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>> > I hope, that when you were reading the dirnode, etc. using debugocfs, > you were accessing the volume via the raw device. If you weren't, do > so. > This is important because that's the only way to ensure directio. > Else, > you will be reading potentially stale data from the buffer cache. > > Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. > And all EXCLUSIVE_LOCKS are released when the operation is over. > So am not sure what is happening. Using debugocfs correctly should > help > us understand the problem. > > Also, whenever you do your file operations (cat etc.) ensure those ops > are > o_direct. Now I am not sure why this would cause a problem, but do not > do buffered operations. ocfs does not support shared mmap. > > If you download the 1.0.10 tools, you will not need to manually map > the > raw device. The tools do that automatically. > > So, upgrade to 1.0.10 module and tools. See if you can reproduce the > problem. > > Jeremy Schneider wrote: > > >another note: > > > >after I delete the file I created that caused the > >OCFS_DLM_EXCLUSIVE_LOCK to be held, the lock doesn't seem to actually > be > >released (according to debugocfs) until the other node attempts to > read > >the DirNode. (e.g. /bin/ls or something) > > > >Jeremy > > > > > > > > > >>>>"Jeremy Schneider" <jer1887@asugroup.com> 03/10/2004 4:55:56 PM > >>>> > >>>> > >>>> > >I am still having this weird problem with nodes hanging while I'm > >running OCFS. I'm using OCFS 1.0.9-12 and RHAS 2.1 > > > >I've been working on tracking it down and here's what I've got so > far: > >1. I create a file from node 0. This succeeds; I can /bin/cat the > >file, append, edit, or whatever. > >2. From node 1, I do an operation that accesses the DirNode (e.g. > >/bin/ls) > >3. Node 0 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the > >DirNode > >itself (although I seem to still be able to *read* the DirNode from > >node > >1) > >4. I attempt to create a file from node 1... node 1 hangs, waiting > >for > >the exclusive lock on the DirNode to be released. > >*** node 1 is now completely dysfunctional. OCFS is hung. > >5. I delete the file I created in step 1 (from node 0) > >6. The OCFS_DLM_EXCLUSIVE_LOCK is released. > >7. node 1 resumes, and creates a file > > > >8. I access the DirNode from node 0 > >9. Node 1 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the > >DirNode > >itself... the whole process repeats, but with the nodes reversed. > > > >This looks a lot like a bug to me. I've had a case open with Oracle > >Support for it since the end of Feb, but at the moment BDE is too > busy > >investigating some message about the local hard drive controller to > >consider that it might be a bug (and honestly, it probably doesn't > >involve my local hard drive controller). > > > >Anyone have any suggestions? > > > >Jeremy > >Lansing, MI > > > > <<<<...>>>> > _______________________________________________ > Ocfs-users mailing list > Ocfs-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs-users >
You're right. Sorry, that was a little dense on my part. It will be nice, of course, when fsck just says "this volume has not been mounted yet"... but it is perfectly functional the way it is. :) Jeremy>>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/11/2004 1:37:32 PM >>>It did create the filesystem. fsck is failing because the volume has never been mounted on any node. On the very first mount, we create the system files, which fsck does not find. Yes, we should create these systemfiles in mkfs. It's in our todo list. Meanwhile the next release of fsck will not fail. :-) Jeremy Schneider wrote:>FYI, I downloaded ocfs 1.0.10 from oss.oracle.com and tried it... >couldn't even successfully create a filesystem. (?!) > >[root@dc1node1 /]# mkfs.ocfs -V >mkfs.ocfs 1.0.10-PROD1 Fri Mar 5 14:35:32 PST 2004 (build >902cb33b89695a48f0dd6517b713f949) >[root@dc1node1 /]# mkfs.ocfs -b 128 -F -g 0 -L dc1:/u03 -m /u03 -p755>-u 0 /dev/sda >Cleared volume header sectors >Cleared node config sectors >Cleared publish sectors >Cleared vote sectors >Cleared bitmap sectors >Cleared data block >Wrote volume header >[root@dc1node1 /]# fsck.ocfs /dev/sda >fsck.ocfs 1.0.10-PROD1 Fri Mar 5 14:35:41 PST 2004 (build >b5602eb387c7409e9f814faf1d363b5b) >Checking Volume Header... >ERROR: structure failed verification, fsck.c, 384 >ocfs_vol_disk_hdr >================================>minor_version: 2 >major_version: 1 >signature: OracleCFS >mount_point: /u03 >serial_num: 0 >device_size: 10737418240 >start_off: 0 >bitmap_off: 56320 >publ_off: 23552 >vote_off: 39936 >root_bitmap_off: 0 >data_start_off: 1368064 >root_bitmap_size: 0 >root_off: <INVALID VALUE> 0 >root_size: 0 >cluster_size: 131072 >num_nodes: 32 >num_clusters: 81905 >dir_node_size: 0 >file_node_size: 0 >internal_off: <INVALID VALUE> 0 >node_cfg_off: 4096 >node_cfg_size: 17408 >new_cfg_off: 21504 >prot_bits: -rwxr-xr-x >uid: 0 (root) >gid: 0 (root) >excl_mount: OCFS_INVALID_NODE_NUM > >ERROR: Volume header bad. Exiting, fsck.c, 669 >/dev/sda: 2 errors, 0 objects, 0/81905 blocks >[root@dc1node1 /]# > > > > > >>>>Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>> >>>> >>>> >I hope, that when you were reading the dirnode, etc. using debugocfs, >you were accessing the volume via the raw device. If you weren't, do >so. >This is important because that's the only way to ensure directio. >Else, >you will be reading potentially stale data from the buffer cache. > >Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. >And all EXCLUSIVE_LOCKS are released when the operation is over. >So am not sure what is happening. Using debugocfs correctly should >help >us understand the problem. > >Also, whenever you do your file operations (cat etc.) ensure thoseops>are >o_direct. Now I am not sure why this would cause a problem, but donot>do buffered operations. ocfs does not support shared mmap. > >If you download the 1.0.10 tools, you will not need to manually map >the >raw device. The tools do that automatically. > >So, upgrade to 1.0.10 module and tools. See if you can reproduce the >problem. > >Jeremy Schneider wrote: > > > >>another note: >> >>after I delete the file I created that caused the >>OCFS_DLM_EXCLUSIVE_LOCK to be held, the lock doesn't seem toactually>> >> >be > > >>released (according to debugocfs) until the other node attempts to >> >> >read > > >>the DirNode. (e.g. /bin/ls or something) >> >>Jeremy >> >> >> >> >> >> >>>>>"Jeremy Schneider" <jer1887@asugroup.com> 03/10/2004 4:55:56 PM >>>>> >>>>> >>>>> >>>>> >>>>> >>I am still having this weird problem with nodes hanging while I'm >>running OCFS. I'm using OCFS 1.0.9-12 and RHAS 2.1 >> >>I've been working on tracking it down and here's what I've got so >> >> >far: > > >>1. I create a file from node 0. This succeeds; I can /bin/cat the >>file, append, edit, or whatever. >>2. From node 1, I do an operation that accesses the DirNode (e.g. >>/bin/ls) >>3. Node 0 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the >>DirNode >>itself (although I seem to still be able to *read* the DirNode from >>node >>1) >>4. I attempt to create a file from node 1... node 1 hangs, waiting >>for >>the exclusive lock on the DirNode to be released. >>*** node 1 is now completely dysfunctional. OCFS is hung. >>5. I delete the file I created in step 1 (from node 0) >>6. The OCFS_DLM_EXCLUSIVE_LOCK is released. >>7. node 1 resumes, and creates a file >> >>8. I access the DirNode from node 0 >>9. Node 1 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the >>DirNode >>itself... the whole process repeats, but with the nodes reversed. >> >>This looks a lot like a bug to me. I've had a case open with Oracle >>Support for it since the end of Feb, but at the moment BDE is too >> >> >busy > > >>investigating some message about the local hard drive controller to >>consider that it might be a bug (and honestly, it probably doesn't >>involve my local hard drive controller). >> >>Anyone have any suggestions? >> >>Jeremy >>Lansing, MI >> >> > > > ><<<<...>>>> >_______________________________________________ >Ocfs-users mailing list >Ocfs-users@oss.oracle.com >http://oss.oracle.com/mailman/listinfo/ocfs-users > >
Hey list... Sorry for all the emails lately about this bug I'm working on. I finally got tired of waiting for support to track it down so today I found it myself and fixed it. Of course I'm going through normal support channels to get an official [supported] fix; but if anyone's interested here's a patch that'll do it. And you never know... it might speed up the process a bit by posting this explanation. I tested the patch (it only changes a few lines) and it successfully fixed the problem without any side-effects. I'll try to explain it as clearly as possible but programmers aren't always the most brilliant communicators. ;) VERSIONS AFFECTED: all that I'm aware of (including 1.0.10-1) BUG: the EXCLUSIVE_LOCK on the DirNode is not being released after a file is created if the directory has multiple DirNodes and the last DirNode (not the one used for locking) has ENABLE_CACHE_LOCK set. ENABLE_CACHE_LOCK can be set on the last DirNode if the DirNode has this lock when the 255th file is added because ocfs copies the entire structure byte-for-byte (including locks) to the new DirNode it is creating and does not clear out the locks. CAUSE: ocfs_insert_file() in Common/ocfsgendirnode.c (line 1594 in the code available through subversion at oss.oracle.com right now) looks at the incorrect variable (DirNode, the last one instead of LockNode the locking one) to check for ENABLE_CACHE_LOCK and does not release the EXCLUSIVE_LOCK because it thinks that the node is holding an ENABLE_CACHE_LOCK instead of an EXCLUSIVE_LOCK. I've attached a patch that fixes it if anyone cares to have a look... STEPS TO REPRODUCE BUG: create a new ocfs directory. when you initially create it it will have ENABLE_CACHE_LOCK set... *BEFORE* accessing the directory from *ANY* other node (this will release the ENABLE_CACHE_LOCK) create at least 255 files in the directory. optionally you can check with debugocfs -D to see if the last (not-locking) DirNode has ENABLE_CACHE_LOCK set. *NOW* access it from the other node (to change the locking DirNode from ENABLE_CACHE_LOCK to NO_LOCK) and then create a file from either node -- the EXCLUSIVE_LOCK will not be released. attempting any operation from the opposite node that requires an EXCLUSIVE_LOCK will cause that node to hang until you do an operation that successfully releases the EXCLUSIVE_LOCK (e.g. deleting a file). FIX: Of course the best solution is to fix ocfs_insert_file() to look at the correct DirNode for the lock. Another thought would be to update the code that creates a new DirNode and have it clear out locks after copying the DirNode, although this just seems extraneous to me. PATCH: I have tested this patch (it is a minimal change) and it successfully fixes the bug. It releases the lock from the correct pointer (either the variable "DirNode" or "LockNode" depending on whether they come from the same disk DirNode) but it now also checks the correct variable for ENABLE_CACHE_LOCK. REQUESTED COMPENSATION: hehe... email me for the address where you can send pizza. I wonder if any oracle developers really do read this list... Index: ocfsgendirnode.c ==================================================================--- ocfsgendirnode.c (revision 5) +++ ocfsgendirnode.c (working copy) @@ -1591,17 +1591,26 @@ indexOffset = -1; } - if (DISK_LOCK_FILE_LOCK (DirNode) !OCFS_DLM_ENABLE_CACHE_LOCK) { - /* This is an optimization... */ - ocfs_acquire_lockres (LockResource); - LockResource->lock_type = OCFS_DLM_NO_LOCK; - ocfs_release_lockres (LockResource); + if (LockNode->node_disk_off == DirNode->node_disk_off) { + if (DISK_LOCK_FILE_LOCK (DirNode) !OCFS_DLM_ENABLE_CACHE_LOCK) { + /* This is an optimization... */ + ocfs_acquire_lockres (LockResource); + LockResource->lock_type = OCFS_DLM_NO_LOCK; + ocfs_release_lockres (LockResource); - if (LockNode->node_disk_off == DirNode->node_disk_off) + /* Reset the lock on the disk */ + DISK_LOCK_FILE_LOCK (DirNode) OCFS_DLM_NO_LOCK; + } + } else { + if (DISK_LOCK_FILE_LOCK (LockNode) !OCFS_DLM_ENABLE_CACHE_LOCK) { + /* This is an optimization... */ + ocfs_acquire_lockres (LockResource); + LockResource->lock_type = OCFS_DLM_NO_LOCK; + ocfs_release_lockres (LockResource); + /* Reset the lock on the disk */ - DISK_LOCK_FILE_LOCK (DirNode) OCFS_DLM_NO_LOCK; - else DISK_LOCK_FILE_LOCK (LockNode) OCFS_DLM_NO_LOCK; + } } status = ocfs_write_dir_node (osb, DirNode, indexOffset);>>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>>I hope, that when you were reading the dirnode, etc. using debugocfs, you were accessing the volume via the raw device. If you weren't, do so. This is important because that's the only way to ensure directio. Else, you will be reading potentially stale data from the buffer cache. Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. And all EXCLUSIVE_LOCKS are released when the operation is over. So am not sure what is happening. Using debugocfs correctly should help us understand the problem. Also, whenever you do your file operations (cat etc.) ensure those ops are o_direct. Now I am not sure why this would cause a problem, but do not do buffered operations. ocfs does not support shared mmap. If you download the 1.0.10 tools, you will not need to manually map the raw device. The tools do that automatically. So, upgrade to 1.0.10 module and tools. See if you can reproduce the problem. <<<<...>>>>
Wim Coekaerts
2004-Mar-11 19:36 UTC
[Ocfs-devel] Re: [Ocfs-users] Lock contention issue with ocfs
sunil, kurt mark and myself do ocfs work so I think we read this ;) On Thu, Mar 11, 2004 at 08:27:21PM -0500, Jeremy Schneider wrote:> Hey list... > > Sorry for all the emails lately about this bug I'm working on. I > finally got tired of waiting for support to track it down so today I > found it myself and fixed it. Of course I'm going through normal > support channels to get an official [supported] fix; but if anyone's > interested here's a patch that'll do it. And you never know... it > might speed up the process a bit by posting this explanation. I tested > the patch (it only changes a few lines) and it successfully fixed the > problem without any side-effects. I'll try to explain it as clearly as > possible but programmers aren't always the most brilliant communicators. > ;) > > VERSIONS AFFECTED: > all that I'm aware of (including 1.0.10-1) > > BUG: > the EXCLUSIVE_LOCK on the DirNode is not being released after a file > is created if the directory has multiple DirNodes and the last DirNode > (not the one used for locking) has ENABLE_CACHE_LOCK set. > ENABLE_CACHE_LOCK can be set on the last DirNode if the DirNode has this > lock when the 255th file is added because ocfs copies the entire > structure byte-for-byte (including locks) to the new DirNode it is > creating and does not clear out the locks. > > CAUSE: > ocfs_insert_file() in Common/ocfsgendirnode.c (line 1594 in the code > available through subversion at oss.oracle.com right now) looks at the > incorrect variable (DirNode, the last one instead of LockNode the > locking one) to check for ENABLE_CACHE_LOCK and does not release the > EXCLUSIVE_LOCK because it thinks that the node is holding an > ENABLE_CACHE_LOCK instead of an EXCLUSIVE_LOCK. I've attached a patch > that fixes it if anyone cares to have a look... > > STEPS TO REPRODUCE BUG: > create a new ocfs directory. when you initially create it it will > have ENABLE_CACHE_LOCK set... *BEFORE* accessing the directory from > *ANY* other node (this will release the ENABLE_CACHE_LOCK) create at > least 255 files in the directory. optionally you can check with > debugocfs -D to see if the last (not-locking) DirNode has > ENABLE_CACHE_LOCK set. *NOW* access it from the other node (to change > the locking DirNode from ENABLE_CACHE_LOCK to NO_LOCK) and then create a > file from either node -- the EXCLUSIVE_LOCK will not be released. > attempting any operation from the opposite node that requires an > EXCLUSIVE_LOCK will cause that node to hang until you do an operation > that successfully releases the EXCLUSIVE_LOCK (e.g. deleting a file). > > FIX: > Of course the best solution is to fix ocfs_insert_file() to look at > the correct DirNode for the lock. Another thought would be to update > the code that creates a new DirNode and have it clear out locks after > copying the DirNode, although this just seems extraneous to me. > > PATCH: > I have tested this patch (it is a minimal change) and it successfully > fixes the bug. It releases the lock from the correct pointer (either > the variable "DirNode" or "LockNode" depending on whether they come from > the same disk DirNode) but it now also checks the correct variable for > ENABLE_CACHE_LOCK. > > REQUESTED COMPENSATION: > hehe... email me for the address where you can send pizza. I wonder > if any oracle developers really do read this list... > > > > Index: ocfsgendirnode.c > ==================================================================> --- ocfsgendirnode.c (revision 5) > +++ ocfsgendirnode.c (working copy) > @@ -1591,17 +1591,26 @@ > indexOffset = -1; > } > > - if (DISK_LOCK_FILE_LOCK (DirNode) !> OCFS_DLM_ENABLE_CACHE_LOCK) { > - /* This is an optimization... */ > - ocfs_acquire_lockres (LockResource); > - LockResource->lock_type = OCFS_DLM_NO_LOCK; > - ocfs_release_lockres (LockResource); > + if (LockNode->node_disk_off == DirNode->node_disk_off) { > + if (DISK_LOCK_FILE_LOCK (DirNode) !> OCFS_DLM_ENABLE_CACHE_LOCK) { > + /* This is an optimization... */ > + ocfs_acquire_lockres (LockResource); > + LockResource->lock_type = OCFS_DLM_NO_LOCK; > + ocfs_release_lockres (LockResource); > > - if (LockNode->node_disk_off == DirNode->node_disk_off) > + /* Reset the lock on the disk */ > + DISK_LOCK_FILE_LOCK (DirNode) > OCFS_DLM_NO_LOCK; > + } > + } else { > + if (DISK_LOCK_FILE_LOCK (LockNode) !> OCFS_DLM_ENABLE_CACHE_LOCK) { > + /* This is an optimization... */ > + ocfs_acquire_lockres (LockResource); > + LockResource->lock_type = OCFS_DLM_NO_LOCK; > + ocfs_release_lockres (LockResource); > + > /* Reset the lock on the disk */ > - DISK_LOCK_FILE_LOCK (DirNode) > OCFS_DLM_NO_LOCK; > - else > DISK_LOCK_FILE_LOCK (LockNode) > OCFS_DLM_NO_LOCK; > + } > } > > status = ocfs_write_dir_node (osb, DirNode, indexOffset); > > > > >>> Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>> > I hope, that when you were reading the dirnode, etc. using debugocfs, > you were accessing the volume via the raw device. If you weren't, do > so. > This is important because that's the only way to ensure directio. > Else, > you will be reading potentially stale data from the buffer cache. > > Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. > And all EXCLUSIVE_LOCKS are released when the operation is over. > So am not sure what is happening. Using debugocfs correctly should > help > us understand the problem. > > Also, whenever you do your file operations (cat etc.) ensure those ops > are > o_direct. Now I am not sure why this would cause a problem, but do not > do buffered operations. ocfs does not support shared mmap. > > If you download the 1.0.10 tools, you will not need to manually map > the > raw device. The tools do that automatically. > > So, upgrade to 1.0.10 module and tools. See if you can reproduce the > problem. > > > <<<<...>>>> > _______________________________________________ > Ocfs-devel mailing list > Ocfs-devel@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs-devel
Wow... I am impressed. I still need to test it... but it looks good otherwise. BTW, it's mainly the oracle developers who are responding on this list. :-) Jeremy Schneider wrote:>Hey list... > >Sorry for all the emails lately about this bug I'm working on. I >finally got tired of waiting for support to track it down so today I >found it myself and fixed it. Of course I'm going through normal >support channels to get an official [supported] fix; but if anyone's >interested here's a patch that'll do it. And you never know... it >might speed up the process a bit by posting this explanation. I tested >the patch (it only changes a few lines) and it successfully fixed the >problem without any side-effects. I'll try to explain it as clearly as >possible but programmers aren't always the most brilliant communicators. > ;) > >VERSIONS AFFECTED: > all that I'm aware of (including 1.0.10-1) > >BUG: > the EXCLUSIVE_LOCK on the DirNode is not being released after a file >is created if the directory has multiple DirNodes and the last DirNode >(not the one used for locking) has ENABLE_CACHE_LOCK set. >ENABLE_CACHE_LOCK can be set on the last DirNode if the DirNode has this >lock when the 255th file is added because ocfs copies the entire >structure byte-for-byte (including locks) to the new DirNode it is >creating and does not clear out the locks. > >CAUSE: > ocfs_insert_file() in Common/ocfsgendirnode.c (line 1594 in the code >available through subversion at oss.oracle.com right now) looks at the >incorrect variable (DirNode, the last one instead of LockNode the >locking one) to check for ENABLE_CACHE_LOCK and does not release the >EXCLUSIVE_LOCK because it thinks that the node is holding an >ENABLE_CACHE_LOCK instead of an EXCLUSIVE_LOCK. I've attached a patch >that fixes it if anyone cares to have a look... > >STEPS TO REPRODUCE BUG: > create a new ocfs directory. when you initially create it it will >have ENABLE_CACHE_LOCK set... *BEFORE* accessing the directory from >*ANY* other node (this will release the ENABLE_CACHE_LOCK) create at >least 255 files in the directory. optionally you can check with >debugocfs -D to see if the last (not-locking) DirNode has >ENABLE_CACHE_LOCK set. *NOW* access it from the other node (to change >the locking DirNode from ENABLE_CACHE_LOCK to NO_LOCK) and then create a >file from either node -- the EXCLUSIVE_LOCK will not be released. >attempting any operation from the opposite node that requires an >EXCLUSIVE_LOCK will cause that node to hang until you do an operation >that successfully releases the EXCLUSIVE_LOCK (e.g. deleting a file). > >FIX: > Of course the best solution is to fix ocfs_insert_file() to look at >the correct DirNode for the lock. Another thought would be to update >the code that creates a new DirNode and have it clear out locks after >copying the DirNode, although this just seems extraneous to me. > >PATCH: > I have tested this patch (it is a minimal change) and it successfully >fixes the bug. It releases the lock from the correct pointer (either >the variable "DirNode" or "LockNode" depending on whether they come from >the same disk DirNode) but it now also checks the correct variable for >ENABLE_CACHE_LOCK. > >REQUESTED COMPENSATION: > hehe... email me for the address where you can send pizza. I wonder >if any oracle developers really do read this list... > > > >Index: ocfsgendirnode.c >==================================================================>--- ocfsgendirnode.c (revision 5) >+++ ocfsgendirnode.c (working copy) >@@ -1591,17 +1591,26 @@ > indexOffset = -1; > } > >- if (DISK_LOCK_FILE_LOCK (DirNode) !>OCFS_DLM_ENABLE_CACHE_LOCK) { >- /* This is an optimization... */ >- ocfs_acquire_lockres (LockResource); >- LockResource->lock_type = OCFS_DLM_NO_LOCK; >- ocfs_release_lockres (LockResource); >+ if (LockNode->node_disk_off == DirNode->node_disk_off) { >+ if (DISK_LOCK_FILE_LOCK (DirNode) !>OCFS_DLM_ENABLE_CACHE_LOCK) { >+ /* This is an optimization... */ >+ ocfs_acquire_lockres (LockResource); >+ LockResource->lock_type = OCFS_DLM_NO_LOCK; >+ ocfs_release_lockres (LockResource); > >- if (LockNode->node_disk_off == DirNode->node_disk_off) >+ /* Reset the lock on the disk */ >+ DISK_LOCK_FILE_LOCK (DirNode) >OCFS_DLM_NO_LOCK; >+ } >+ } else { >+ if (DISK_LOCK_FILE_LOCK (LockNode) !>OCFS_DLM_ENABLE_CACHE_LOCK) { >+ /* This is an optimization... */ >+ ocfs_acquire_lockres (LockResource); >+ LockResource->lock_type = OCFS_DLM_NO_LOCK; >+ ocfs_release_lockres (LockResource); >+ > /* Reset the lock on the disk */ >- DISK_LOCK_FILE_LOCK (DirNode) >OCFS_DLM_NO_LOCK; >- else > DISK_LOCK_FILE_LOCK (LockNode) >OCFS_DLM_NO_LOCK; >+ } > } > > status = ocfs_write_dir_node (osb, DirNode, indexOffset); > > > > > >>>>Sunil Mushran <Sunil.Mushran@oracle.com> 03/10/2004 5:49:58 PM >>> >>>> >>>> >I hope, that when you were reading the dirnode, etc. using debugocfs, >you were accessing the volume via the raw device. If you weren't, do >so. >This is important because that's the only way to ensure directio. >Else, >you will be reading potentially stale data from the buffer cache. > >Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK. >And all EXCLUSIVE_LOCKS are released when the operation is over. >So am not sure what is happening. Using debugocfs correctly should >help >us understand the problem. > >Also, whenever you do your file operations (cat etc.) ensure those ops >are >o_direct. Now I am not sure why this would cause a problem, but do not >do buffered operations. ocfs does not support shared mmap. > >If you download the 1.0.10 tools, you will not need to manually map >the >raw device. The tools do that automatically. > >So, upgrade to 1.0.10 module and tools. See if you can reproduce the >problem. > > ><<<<...>>>> > >