Evening all I''ve been testing/evaluating lustre 1.6b5 and been having some stability problems. The setup is 1 mds, 4 oss''s and 3 clients. oss''s have 2 ost''s each on raid5 (md) over 6 sata attached disks. All bar 1 client have dual gb nics (forcedeth). I''ve seen excellent bandwidth (~340GB/s aggregate) however there are some stability issues. First, network interfaces occassionally drop out while running 6 instances of bonnie++ (3 per client). The mds has also crashed with a kernel panic after loosing its network. Second, I don''t understand this error message: Dec 14 17:14:09 oss001 kernel: LustreError: 4777:0:(filter_io.c:537:filter_preprw_write()) dugeo-OST0000: trying to BRW to non-existent file 619123 This was around 30mins after an interface dropped out on a client (and reconnected some minutes later). Any ideas? Stu. -- Dr Stuart Midgley sdm900@gmail.com
Eric - you recently fixed some ptllnd problems - could this be related? Stu Midgley wrote:> Evening all > > I''ve been testing/evaluating lustre 1.6b5 and been having some > stability problems. The setup is 1 mds, 4 oss''s and 3 clients. oss''s > have 2 ost''s each on raid5 (md) over 6 sata attached disks. All bar 1 > client have dual gb nics (forcedeth). I''ve seen excellent bandwidth > (~340GB/s aggregate) however there are some stability issues. > > First, network interfaces occassionally drop out while running 6 > instances of bonnie++ (3 per client). The mds has also crashed with a > kernel panic after loosing its network. > > Second, I don''t understand this error message: > > Dec 14 17:14:09 oss001 kernel: LustreError: > 4777:0:(filter_io.c:537:filter_preprw_write()) dugeo-OST0000: trying > to BRW to non-existent file 619123 > > This was around 30mins after an interface dropped out on a client (and > reconnected some minutes later). > > Any ideas? > Stu. > >