Evan.Felix@pnl.gov
2007-Apr-24 12:27 UTC
[Lustre-devel] [Bug 12333] New: obdclass is limited by single OBD_ALLOC(idarray)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=12333 During some large # of OST testing i ran across a bug that causes failures when you mount a large #''ed ost. I can re-produce this on a x86_64 vmware session with the following script: mkfs.lustre --mdt --mgs --fsname=test1 --device-size=100000K --reformat /tmp/mgs mount -t lustre -o loop /tmp/mgs /mnt/mds mkfs.lustre --ost --mgsnode=10.4.0.128@tcp --fsname=test1 --index=4096 --device-size=1000000 --reformat /tmp/ost mount -t lustre -o loop /tmp/ost /mnt/ost1 Note the index=4096 portion of the ost format line. this seems to be about the limit on my x86_64 box, but it is more like 2048 on my ia64 boxes. doing this causes errors on the console like this: Lustre: Server test1-OST1000 on device /dev/loop7 has started Lustre: 3001:0:(quota_master.c:1105:mds_quota_recovery()) Not all osts are active, abort quota recovery LustreError: 3002:0:(llog_obd.c:324:llog_cat_initialize()) kmalloc of ''idarray'' (131104 bytes) failed at /home/efelix/gits/lustre-1.5.97/lustre/obdclass/llog_obd.c:324 LustreError: 3002:0:(llog_obd.c:324:llog_cat_initialize()) 6006335 total bytes allocated by Lustre, 302492 by Portals Lustre: test1-OST1000: received MDS connection from 0@lo LustreError: 3002:0:(lov_log.c:124:lov_llog_origin_connect()) error osc_llog_connect tgt 4096 (-107) LustreError: 3002:0:(mds_lov.c:665:__mds_lov_synchronize()) test1-MDT0000: failed at llog_origin_connect: -107 I guess this array needs to be managed differently for large OST counts. A simple work around is modifying <kernel>include/linux/kmalloc_sizes.h and adding a new cache size such as: CACHE(262144) for 8k or larger as needed