On 2011-12-13, at 4:36 PM, Andrus, Brian Contractor
wrote:> A large Volume Group off a DDN connected via infiniband.
> This is broken into several Logical Volumes. Some are just regular ext3/4
filesystems. Quite a few are partitioned out (in 4TB chunks) for OSTs.
Anything that is using partitions/LVs that are smaller than a full
RAID LUN (i.e. 8+2 RAID-6) is going to have serious performance loss.
Having multiple OSTs on the same disks is only going to increase the
contention on those disks, and doesn''t provide any functional benefit.
> I have 3 lustre filesystems: home, scratch and work.
> Home consists of a single OST
> Scratch consists of 2 OSTs
> Work consists of 10 OSTs
>
> Each filesystem has its own combined MGS/MGT
> Each OSS has 2 OSTs where possible
> Each MGS will also serve one OST
>
> I have 8 systems that are OSSes (The MGSes are also among those 8)
>
> Now, ONE of my nodes (an OSSes that is only serving 2 OSTs) has a helluva
load:
>
> [root at nas-0-3 ~]# uptime
> 15:34:06 up 77 days, 22:39, 1 user, load average: 352.59, 339.80, 318.11
>
> I see lots of:
> Lustre: work-OST0004: slow commitrw commit 91s due to heavy IO load
>
> And:
> Dec 13 15:32:48 nas-0-3 kernel: LustreError:
6413:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107)
req at ffff8105c557ac00 x1381121762230130/t0 o400-><?>@<?>:0/0
lens 192/0 e 0 to 0 dl 1323819184 ref 1 fl Interpret:H/0/0 rc -107/0
> Dec 13 15:32:48 nas-0-3 kernel: LustreError:
6413:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1900 previous similar
messages
>
> Not sure what that one means, but it seems significant.
>
> Things get VERY slow and start timing out. Users see it as the system
?hanging?.
>
> Could someone point me in the right direction for figuring out the culprit
here?
>
> Thanks in advance!
>
>
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Principal Engineer
Whamcloud, Inc.