I am having trouble getting to quota work on Lustre version 1.6.7. I had quota working fine on Lustre 1.6.6 version and I used to have the following setting on the MGS, MDT and OSTs tunefs.lustre --erase-params --mgs --param lov.stripecount=1 --writeconf /dev/mapper/lustre1_volume-mgs_lv tunefs.lustre --erase-params --mdt --mgsnode=lustre1 at tcp1 --param lov.stripecount=1 --writeconf --param mdt.quota_type=ug /dev/mapper/lustre1_volume-new_mds_lv tunefs.lustre --erase-params --ost --mgsnode=lustre1 at tcp1 --param ost.quota_type=ug --writeconf /dev/sdc1 Once I upgraded from Lustre 1.6.6 to Lustre 1.6.7, the MDT crashes (Kernel panic) instantaneously when I try to mount an OST that has the quota enabled. I tried using the latest Lustre patched RHEL5 kernel and kernel.org kernel: 2.6.22.14 on the MDT server and that didn''t make any difference and both kernels paniced instantly on OST mount. To fix this problem, I had to do remove the parameter ost-quota_type=ug on all my OSTs: tunefs.lustre --erase-params --ost --mgsnode=lustre1 at tcp1 --writeconf /dev/sdc1 I am able mount all the OSTs after removing quotas on OSTs and the file system is healthy, but Quota is disabled. I get this error on the MDT server when I mount the various MGS, MDT and OST partitions: LustreError: 6372:0:(quota_master.c:1625:qmaster_recovery_main()) qmaster recovery failed! (id:120 type:1 rc:-3) I get the following error on MDT server when ever I try to do "lfs quota" on a client node: lustre1 kernel: LustreError: 4591:0:(quota_ctl.c:288:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3 Is this a known problem with quotas on 1.6.7? Is there any patches available to fix this problem? Thanks in advance. Nirmal
On Wed, Apr 01, 2009 at 01:30:40PM -0500, Nirmal Seenu wrote:> Once I upgraded from Lustre 1.6.6 to Lustre 1.6.7, the MDT crashes > (Kernel panic) instantaneously when I try to mount an OST that has the > quota enabled.Could you please provide us with the console logs (panic message + stack trace)? Cheers, Johann
We didn''t have anything in place to capture the console logs and I wont be able to provide you any specific details of the kernel panic at this time. The kernel panic was easily reproducible with Lustre 1.6.7 on lustre patched kernel(RHEL5) as well as 2.6.22.14 kernel. In our configuration we have a separate machine which hosts the MDT and MGS with each having their own partition. There are 2 OSSs and each OSS exports 6 OSTs amd each OST is 2.7TB in size. The MDT/MGS machine crashed consistently when I tried to mount the 5 or 6th(out of 12) OST. This is the setting on MGS, MDT and all the OSTs that produces the kernel panic. tunefs.lustre --erase-params --mgs --param lov.stripecount=1 --writeconf /dev/mapper/lustre1_volume-mgs_lv tunefs.lustre --erase-params --mdt --mgsnode=lustre1 at tcp1 --param lov.stripecount=1 --writeconf --param mdt.quota_type=ug /dev/mapper/lustre1_volume-new_mds_lv tunefs.lustre --erase-params --ost --mgsnode=lustre1 at tcp1 --param ost.quota_type=ug --writeconf /dev/sdc1 Our file system is in production use right now and I wont be able to take a downtime to reproduce this problem. Please let me know if I can provide you any other detail at this time. Thanks Nirmal Johann Lombardi wrote:> On Wed, Apr 01, 2009 at 01:30:40PM -0500, Nirmal Seenu wrote: >> Once I upgraded from Lustre 1.6.6 to Lustre 1.6.7, the MDT crashes >> (Kernel panic) instantaneously when I try to mount an OST that has the >> quota enabled. > > Could you please provide us with the console logs (panic message + > stack trace)? > > Cheers, > Johann
Hi, On Thu, Apr 02, 2009 at 12:37:42PM -0500, Nirmal Seenu wrote:> We didn''t have anything in place to capture the console logs and I wont > be able to provide you any specific details of the kernel panic at this > time. > > The kernel panic was easily reproducible with Lustre 1.6.7 on lustre > patched kernel(RHEL5) as well as 2.6.22.14 kernel.We test those kernels regularly and, afaik, we have never hit such a problem.> Our file system is in production use right now and I wont be able to > take a downtime to reproduce this problem. Please let me know if I can > provide you any other detail at this time.Unfortunately, there is not much we can do without the console logs. Johann