Nirmal Seenu
2009-Mar-06 13:26 UTC
[Lustre-discuss] Huge Sparse files in ROOT partition of MDT
While trying to figure out the reason for LVM2 snapshots failing on our MDT server I found that there are a lot of sparse files on the MDT volume. The file size as seen from a ls command output on the MDT is same as the real file size. The tar runs for a few hours at this point even if try to use the --spare option in the tar command. The total MDT partition usage itself is about 500MB (as reported by df) and it used to take me less than 10 minutes to create a LVM2 snapshot and tar it up when I was running the servers using Lustre 1.6.5 with no quota enabled. I recently upgraded my Lustre servers to 1.6.7 and tried to enable quota on the MDT and OST by doing the following commands: tunefs.lustre --erase-params --mdt --mgsnode=iblustre1 at tcp1 --param lov.stripecount=1 --writeconf --param mdt.quota_type=ug /dev/mapper/lustre1_volume-mds_lv tunefs.lustre --erase-params --ost --mgsnode=iblustre1 at tcp1 --param ost.quota_type=ug --writeconf /dev/sdc1 I was never able to run a "lfs quotacheck" successfully due to LBUGS. I was able to create a LVM2 snapshot and look at the contents of the MDT. Every directory but for ROOT seems to have the correct content. Some of the files under ROOT still have 0 byte usage while the other have 35GB usage (the file itself is sparse as seen from od output): -rw-r--r-- 1 ***** *** 0 Jan 24 08:36 l48144f21b747m0036m018-Coul_000505 -rw-rw-r-- 1 ***** *** 36691775424 Feb 22 19:56 prop_WALL_pbc_m0.033_LS16_t0_002080 At this point I am curious to know if this is the expected behaviour of MDT or some corruption on the MDT file system. Do we have to live with the fact that the backup process using LVM2 snapshots take a few hours to complete if quotas are enabled? Thanks for your help in advance. Nirmal
Andreas Dilger
2009-Mar-06 23:29 UTC
[Lustre-discuss] Huge Sparse files in ROOT partition of MDT
On Mar 06, 2009 07:26 -0600, Nirmal Seenu wrote:> While trying to figure out the reason for LVM2 snapshots failing on our > MDT server I found that there are a lot of sparse files on the MDT > volume. The file size as seen from a ls command output on the MDT is > same as the real file size. The tar runs for a few hours at this point > even if try to use the --spare option in the tar command.All of the files on the MDT are sparse. The data lives on the OSTs.> The total MDT partition usage itself is about 500MB (as reported by df) > and it used to take me less than 10 minutes to create a LVM2 snapshot > and tar it up when I was running the servers using Lustre 1.6.5 with no > quota enabled. > > I recently upgraded my Lustre servers to 1.6.7 and tried to enable quota > on the MDT and OST by doing the following commands:Also, the quota file is a huge sparse file, size proportional to the highest UID in use. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Nirmal Seenu
2009-Mar-07 15:00 UTC
[Lustre-discuss] Huge Sparse files in ROOT partition of MDT
Thanks for that explanation. It would be nice if this detail gets included in the Quota section of the Lustre Manual rather than users getting surprised by the huge tar files in the archives. Is there a easy way to roll back the changes made by "lfs quotacheck" command where the huge sparse files are converted back into 0 byte files on the MDT? Thanks Nirmal Andreas Dilger wrote:> On Mar 06, 2009 07:26 -0600, Nirmal Seenu wrote: >> While trying to figure out the reason for LVM2 snapshots failing on our >> MDT server I found that there are a lot of sparse files on the MDT >> volume. The file size as seen from a ls command output on the MDT is >> same as the real file size. The tar runs for a few hours at this point >> even if try to use the --spare option in the tar command. > > All of the files on the MDT are sparse. The data lives on the OSTs. > >> The total MDT partition usage itself is about 500MB (as reported by df) >> and it used to take me less than 10 minutes to create a LVM2 snapshot >> and tar it up when I was running the servers using Lustre 1.6.5 with no >> quota enabled. >> >> I recently upgraded my Lustre servers to 1.6.7 and tried to enable quota >> on the MDT and OST by doing the following commands: > > Also, the quota file is a huge sparse file, size proportional to the > highest UID in use. > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
Andreas Dilger wrote:> On Mar 06, 2009 07:26 -0600, Nirmal Seenu wrote: > >> While trying to figure out the reason for LVM2 snapshots failing on our >> MDT server I found that there are a lot of sparse files on the MDT >> volume. The file size as seen from a ls command output on the MDT is >> same as the real file size. The tar runs for a few hours at this point >> even if try to use the --spare option in the tar command. >> > > All of the files on the MDT are sparse. The data lives on the OSTs. > > >> The total MDT partition usage itself is about 500MB (as reported by df) >> and it used to take me less than 10 minutes to create a LVM2 snapshot >> and tar it up when I was running the servers using Lustre 1.6.5 with no >> quota enabled. >> >> I recently upgraded my Lustre servers to 1.6.7 and tried to enable quota >> on the MDT and OST by doing the following commands: >> > > Also, the quota file is a huge sparse file, size proportional to the > highest UID in use. >Quota files are sparse files, but they aren''t *only* size proportional to the highest UID in use. It has some relationship with distribution of uid/gid. For example, 1. if the system only has one user whose uid=1 or uid=33554432, sizes of user quota file will be same. 2. if the system only has two users whose uid=(1, 2) or uid=(1, 33554432), the latter will take three more blocks. This thing is quite like ext3''s indirect blocks. BTW, there are two kinds of quota files in lustre: 1. admin quota files(only exists on mdt) 2. operation quota files(exist on mdt and osts) For 1, its size is relative to the number of users who have quota limitation; for 2, its size is relative to the number of users who are in use in filesystem. Landen