scjody@clusterfs.com
2007-Apr-23 11:57 UTC
[Lustre-devel] [Bug 12326] New: sanity test 78 undetected failure; fills OST
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=12326 Sanity test 78 now fails silently for me and fills up an OST, causing later failures. The root cause of this for me is likely that I recently increased the amount of memory available in my test environment, so now it''s trying to write a much larger file than before. == test 78: handle large O_DIRECT writes correctly ============= 13:37:25 (1177349845) directio on /mnt/lustre/f78 for 125x1048576 bytes Write error Success (rc = 89128960, len = 131072000) lustre.fail_loc = 0 sanity.sh: FAIL: test_78 exit with rc=1 Debug log: 24266 lines, 24266 kept, 0 dropped. lustre.fail_loc = 0 PASS (5s) Here we can see the undetected failure. The write actually failed but the test shows as "PASS". But then later, other tests fail and: # lfs df /mnt/lustre UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 34984 8344 26640 23% /mnt/lustre[MDT:0] lustre-OST0000_UUID 46856 45936 920 98% /mnt/lustre[OST:0] lustre-OST0001_UUID 46856 46856 0 100% /mnt/lustre[OST:1] lustre-OST0002_UUID 46856 44480 2376 94% /mnt/lustre[OST:2] filesystem summary: 140568 137272 3296 97% /mnt/lustre I have a full OST, which was why the write in test 78 failed. So to fix, I suggest: 1. Detecting write failures and failing the test. 2. Writing a file small enough not to fill up any OST. 3. Unlinking the file at the end of the test. I may find time to work on this issue this week but I''m not promising anything so I''ll leave it unassigned.