Hello,
We have a 5 node OCFS2 volume backed by a Sun (Oracle) StorageTek 2540. Each
system is running OEL5.4 and OCFS2 1.4.2, using device-mapper-multipath to load
balance over 2 active paths. We are using the default multipath configuration
for our SAN. We are observing iowait time between 60% - 90%, sustaining at over
80% as I'm writing this, driving load averages to >25 during an rsync
over ssh session. We are copying 200gig via a gigabit ethernet switch, so at
most doing 50-60MB/s. The volume we are pushing to is a RAID 5 device backed by
5 7200rpm SATA drives. I've been muttering wtf to myself for a little while,
and I thought it might be a bit more productive to mutter wtf to the users
group. Is this expected? What would be advisable courses of action? Block size
and cluster size is set to 4096, and we formatted with the fs type as
'mail'. In our current configuration, 2 of the nodes are offline and the
other 2 are idle.
Previously we were not seeing high iowait much more intensive IO tests. I'm
not seeing anything in /var/log/messages from the "observer" nodes or
the node actually pushing the data to the fs.
Any help would be greatly appreciated. I'm inclined to think that this issue
is SAN side, but I was hoping to gather some opinions from others first before
pursuing this with Sun (Oracle). Thanks!
-Daniel