gregoire.pichon at bull.net
2012-Jan-30 10:09 UTC
[Lustre-discuss] low performance maybe related to quota
Hi, If someone could have a look, this would be very helpful. I have no idea what to look at. I am running a performance test (ES4) on a Lustre file-system, installed with Lustre 2.1 plus a few Bull patches, and I observe very low throughput compared to what I usually measure on the same hardware. Write bandwidth is varying between 150MB/s and 500 MB/s running with a standard user. With the exact same parameters and configuration, but running under the root user, I get around 2000 MB/s write bandwidth. This second value is what I observe usually. The profiling of the Lustre client indicates more than 50% of time is spent in osc_quota_chkdq() routine. So this seems related to the quota subsystem and certainly explains why root user is not impacted by the problem. The quota are disabled on the client:: # lfs quota /b9 user quotas are not enabled. group quotas are not enabled There is no quota parameter stored on the MDT, nor on the 15 OSTs: # tunefs.lustre /dev/loop1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: b9-MDT0000 Index: 0 Lustre FS: b9 Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 lov.stripecount=2 lov.stripesize=1048576 network=o2ib0 # for dev in `mount -t lustre | cut -d'' '' -f1`; do tunefs.lustre $dev | grep "^Parameters" | sort -u; done Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=60.64.0.37 at o2ib failover.node=60.64.0.39 at o2ib failover.node=60.64.0.36 at o2ib network=o2ib0 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=60.64.0.37 at o2ib failover.node=60.64.0.39 at o2ib failover.node=60.64.0.36 at o2ib network=o2ib0 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=61.64.0.36 at o2ib2 failover.node=61.64.0.37 at o2ib2 failover.node=61.64.0.39 at o2ib2 network=o2ib2 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=61.64.0.36 at o2ib2 failover.node=61.64.0.37 at o2ib2 failover.node=61.64.0.39 at o2ib2 network=o2ib2 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=160.64.0.39 at o2ib1 failover.node=160.64.0.36 at o2ib1 failover.node=160.64.0.37 at o2ib1 network=o2ib1 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=161.64.0.36 at o2ib3 failover.node=161.64.0.37 at o2ib3 failover.node=161.64.0.39 at o2ib3 network=o2ib3 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=160.64.0.37 at o2ib1 failover.node=160.64.0.39 at o2ib1 failover.node=160.64.0.36 at o2ib1 network=o2ib1 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=60.64.0.39 at o2ib failover.node=60.64.0.36 at o2ib failover.node=60.64.0.37 at o2ib network=o2ib0 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=160.64.0.36 at o2ib1 failover.node=160.64.0.37 at o2ib1 failover.node=160.64.0.39 at o2ib1 network=o2ib1 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=61.64.0.36 at o2ib2 failover.node=61.64.0.37 at o2ib2 failover.node=61.64.0.39 at o2ib2 network=o2ib2 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=160.64.0.37 at o2ib1 failover.node=160.64.0.39 at o2ib1 failover.node=160.64.0.36 at o2ib1 network=o2ib1 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=161.64.0.36 at o2ib3 failover.node=161.64.0.37 at o2ib3 failover.node=161.64.0.39 at o2ib3 network=o2ib3 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=161.64.0.37 at o2ib3 failover.node=161.64.0.39 at o2ib3 failover.node=161.64.0.36 at o2ib3 network=o2ib3 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=161.64.0.39 at o2ib3 failover.node=161.64.0.36 at o2ib3 failover.node=161.64.0.37 at o2ib3 network=o2ib3 Parameters: mgsnode=60.64.2.84 at o2ib,160.64.2.84 at o2ib1,61.64.2.84 at o2ib2,161.64.2.84 at o2ib3 failover.node=61.64.0.39 at o2ib2 failover.node=61.64.0.36 at o2ib2 failover.node=61.64.0.37 at o2ib2 network=o2ib2 Thanks in advance, Gr?goire. -- Gr?goire PICHON Software Developer, Lustre - Extreme Computing R&D Bull, Architect of an Open World Phone: +33 4 76 29 70 63 http://www.bull.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120130/2bb3f298/attachment.html