Hi all,
we somehow lost our quota info - a check of the quotas of a user gave me
> user quotas are not enabled
This system, running Lustre 1.6.5.1, was set up end of October, by which
time I had also enabled quotas there. Of course I had also run some
tests then which showed that quotas were actually working. Since then,
neither the hardware nor the MGS-MDT-OST setup were changed, i.e. no new
OSTs added or similar.
So my question is: what may cause the quota info to get lost?
Of course, I had some problems with quota with a particular User-Id on
this system, which I reported to this list earlier. But these should not
lead to a complete loss three weeks later?
In addition I verified the strong warning of the Lustre manual against
running "lfs setquota" on a running system. I did just that when I saw
that apparently quotas were not enabled.
However, I did not get as far as having ''inaccurate statistic
information'' as indicated in the manual. Instead, it seems I caused a
delay and timeout on the connection between the acting MDS and its
slave: we are running a HA+DRBD pair for MGS/MDS, with a dedicated 1Gbit
link for the DRBD mirroring. This link did not get through all its
pings, in consequence causing a stonith and takeover of the slave.
I promised I won''t do such bad things again, but still would like to
know: Was this just a coincidence or can such an ill-timed "lfs
setquota" cause a temporary overload or whatever of the MGS such that
the poor DRBD gets confused? (It was the drbd-ping to the MGS partition
that was lost).
Many thanks,
Thomas