Andrew (Anything)
2009-Mar-24 12:46 UTC
[Ocfs2-users] mount ocfs2 after both nodes self fence
Hi All. Currently running ocfs2 in a dual node setup over dual primary DRBD with a gigabit backend for a webserver environment. Read performance is as expected, write performance is absolutely terrible (ie: 22 file modifications per second). The gigabit crossover achieves its full capacity easily, but has an avg 77ms latency. So Im looking to change to infiniband with some hardware from ebay and hopefully thatll solve the slow problem. Do you think it will solve my bad write performance issues? My next problem is where if too many applications are queued to write to the partition ocfs goes and restarts the system (obviously cause it hasn't communicated with the other node in quite a while, currently configured for 60 seconds). And cause im only running two node, the other one goes and kills itself too. (Im in the process of setting up a third node via iscsi, but haven't got there yet) When the two come back up, and drbd is finished syncing I go to manually re-mount one of the servers. But when I do it restarts itself again, and again, and again etc. All I see in messages/dmesg is something like this, then the server goes and resets itself. (3756,3):ocfs2_find_slot:502 slot 1 is already allocated to this node! (3756,3):ocfs2_check_volume:1753 File system was not unmounted cleanly, recovering volume. The slotmap has both nodes in it, even tho they aren't mounted. # echo "slotmap" | debugfs.ocfs2 -n /dev/drbd0 Slot# Node# 0 1 1 0 Currently im fsck'ing the partition, which replayed the journals of both nodes (contrary to the error message you see above). Then after a couple of failures (each time resetting one of the servers) I end up trying to mount with localflocks. It seems that half the time localflocks works, it mounts the partition. I can then unmount and remount normally, and happy sailing. But the other half the time the system resets itself again. Im not sure how im supposed to remount the partition properly in this scenario, can someone help me? Btw: Linux- 2.6.28 drbd 8.2.7 elevator=deadline I hope I included enough relevant information. Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090324/c49aa583/attachment.html