thr3ads.net - Ocfs2 users - [Ocfs2-users] ocfs2 crash on intensive disk write [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Matthew Chan

2010-Aug-22 06:20 UTC

[Ocfs2-users] ocfs2 crash on intensive disk write

Hi,

I'm getting system (and eventually cluster) crashes on intensive disk 
writes in ubuntu server 10.04 with my OCFS2 file system.

I have an iSER (infiniband) backed shared disk array with OCFS2 on it. 
There are 6 nodes in the cluster, and the heartbeat interface is over a 
regular 1GigE connection. Originally, the problem presented itself while 
I was doing performance testing and it's been reproducible ever since.

Running something like

'dd if=/dev/zero of=/<ocfs2 array>/zeroes bs=64k count=100000'

kills the node almost immediately, and then subsequently hangs the rest 
of the cluster when other nodes try to unmount the array (for a restart 
or whatever other reason). This happens regardless how many nodes are 
running on the server. I've tried with a single node and it still happens.

I was lucky enough to capture some messages from stderr that weren't 
being caught by syslog. I've attached it here as a screenshot, as my 
management interface doesn't allow directly copying or pasting text. 
Please take a look: http://img163.imageshack.us/img163/4771/screenshots.png

Take note that there are no other nodes started up, and I have no idea 
how there could be another node "heartbeating" in the same slot.

I should also note that I originally had the heartbeat configured on the 
same infiniband interface, so I thought the iSER traffic was blocking 
out the heartbeat. However, configuring the heartbeat to use another 
interface didn't help solve the problem. I'm also fairly certain it is 
not the iSER interface causing problems because I have formatted the 
array as ext4 and successfully run read/write tests (from one node at a 
time of course).

Thanks in advance for any replies,

Matt

Matthew Chan

2010-Aug-22 07:06 UTC

head link

[Ocfs2-users] ocfs2 crash on intensive disk write

Hi Guys,

Upon more investigation, it seems that my ext4 fs is getting data 
corruption at the FS level as well. It may be something up with iSER and 
stgt afterall. I'll do a bit more investigating.

Sorry for the trouble.

Matt

Maybe Matching Threads

Search for more apparently analagous threads

Ocfs2 users - Aug 2010 - ocfs2 crash on intensive disk write

[Ocfs2-users] ocfs2 crash on intensive disk write

[Ocfs2-users] ocfs2 crash on intensive disk write

Maybe Matching Threads