Richard W.M. Jones
2020-Feb-18 08:48 UTC
[Libguestfs] [PATCH] make-fs: Don't use du --apparent-size to estimate input size.
When calculating the initial size of the disk we must estimate how much space is taken by the input. This is quite difficult. For directories we used ‘du --apparent-size -bs DIR’. This is wrong because ’-b’ implies ‘--apparent-size --block-size=1’. But also ‘--apparent-size’ causes du to count the file size rather than number of blocks used by files. If you have a directory containing many small files this usually underestimates resulting in disk sizes which are far too small to actually contain the files. There's no really good answer here because du can't exactly do what we want, but we can at least remove this flag. This causes much larger estimates and therefore much larger virtual disks. Thanks: Nikolay Ivanets --- make-fs/make-fs.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/make-fs/make-fs.c b/make-fs/make-fs.c index 5d8c3a385..386142280 100644 --- a/make-fs/make-fs.c +++ b/make-fs/make-fs.c @@ -393,7 +393,7 @@ static int estimate_input (const char *input, uint64_t *estimate_rtn, char **ifmt_rtn) { struct stat statbuf; - const char *argv[6]; + const char *argv[5]; CLEANUP_UNLINK_FREE char *tmpfile = NULL; CLEANUP_FCLOSE FILE *fp = NULL; char line[256]; @@ -424,11 +424,10 @@ estimate_input (const char *input, uint64_t *estimate_rtn, char **ifmt_rtn) } argv[0] = "du"; - argv[1] = "--apparent-size"; - argv[2] = "-b"; - argv[3] = "-s"; - argv[4] = input; - argv[5] = NULL; + argv[1] = "--block-size=1"; + argv[2] = "-s"; + argv[3] = input; + argv[4] = NULL; if (exec_command ((char **) argv, tmpfile) == -1) return -1; -- 2.24.1
Nikolay Ivanets
2020-Feb-18 11:18 UTC
Re: [Libguestfs] [PATCH] make-fs: Don't use du --apparent-size to estimate input size.
вт, 18 лют. 2020 о 10:48 Richard W.M. Jones <rjones@redhat.com> пише:> > When calculating the initial size of the disk we must estimate > how much space is taken by the input. This is quite difficult. > > For directories we used ‘du --apparent-size -bs DIR’. This is wrong > because ’-b’ implies ‘--apparent-size --block-size=1’. But also > ‘--apparent-size’ causes du to count the file size rather than number > of blocks used by files. > > If you have a directory containing many small files this usually > underestimates resulting in disk sizes which are far too small to > actually contain the files. > > There's no really good answer here because du can't exactly do what we > want, but we can at least remove this flag. This causes much larger > estimates and therefore much larger virtual disks. > > Thanks: Nikolay Ivanets > --- > make-fs/make-fs.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/make-fs/make-fs.c b/make-fs/make-fs.c > index 5d8c3a385..386142280 100644 > --- a/make-fs/make-fs.c > +++ b/make-fs/make-fs.c > @@ -393,7 +393,7 @@ static int > estimate_input (const char *input, uint64_t *estimate_rtn, char **ifmt_rtn) > { > struct stat statbuf; > - const char *argv[6]; > + const char *argv[5]; > CLEANUP_UNLINK_FREE char *tmpfile = NULL; > CLEANUP_FCLOSE FILE *fp = NULL; > char line[256]; > @@ -424,11 +424,10 @@ estimate_input (const char *input, uint64_t *estimate_rtn, char **ifmt_rtn) > } > > argv[0] = "du"; > - argv[1] = "--apparent-size"; > - argv[2] = "-b"; > - argv[3] = "-s"; > - argv[4] = input; > - argv[5] = NULL; > + argv[1] = "--block-size=1"; > + argv[2] = "-s"; > + argv[3] = input; > + argv[4] = NULL; > > if (exec_command ((char **) argv, tmpfile) == -1) > return -1; > -- > 2.24.1 >That didn't help either. And with sparse files even made things worse: holes are not taken into account now at all. I repeat my thoughts here again: it is all like "cats and rats game". We cannot properly handle all the possible cases. I think we can agree that virt-make-fs MAY fail and it is expected situation. Just let the user know that he need to adjust size manually. -- Mykola Ivanets
Richard W.M. Jones
2020-Feb-18 11:33 UTC
Re: [Libguestfs] [PATCH] make-fs: Don't use du --apparent-size to estimate input size.
On Tue, Feb 18, 2020 at 01:18:54PM +0200, Nikolay Ivanets wrote:> вт, 18 лют. 2020 о 10:48 Richard W.M. Jones <rjones@redhat.com> пише: > > > > When calculating the initial size of the disk we must estimate > > how much space is taken by the input. This is quite difficult. > > > > For directories we used ‘du --apparent-size -bs DIR’. This is wrong > > because ’-b’ implies ‘--apparent-size --block-size=1’. But also > > ‘--apparent-size’ causes du to count the file size rather than number > > of blocks used by files. > > > > If you have a directory containing many small files this usually > > underestimates resulting in disk sizes which are far too small to > > actually contain the files. > > > > There's no really good answer here because du can't exactly do what we > > want, but we can at least remove this flag. This causes much larger > > estimates and therefore much larger virtual disks. > > > > Thanks: Nikolay Ivanets > > --- > > make-fs/make-fs.c | 11 +++++------ > > 1 file changed, 5 insertions(+), 6 deletions(-) > > > > diff --git a/make-fs/make-fs.c b/make-fs/make-fs.c > > index 5d8c3a385..386142280 100644 > > --- a/make-fs/make-fs.c > > +++ b/make-fs/make-fs.c > > @@ -393,7 +393,7 @@ static int > > estimate_input (const char *input, uint64_t *estimate_rtn, char **ifmt_rtn) > > { > > struct stat statbuf; > > - const char *argv[6]; > > + const char *argv[5]; > > CLEANUP_UNLINK_FREE char *tmpfile = NULL; > > CLEANUP_FCLOSE FILE *fp = NULL; > > char line[256]; > > @@ -424,11 +424,10 @@ estimate_input (const char *input, uint64_t *estimate_rtn, char **ifmt_rtn) > > } > > > > argv[0] = "du"; > > - argv[1] = "--apparent-size"; > > - argv[2] = "-b"; > > - argv[3] = "-s"; > > - argv[4] = input; > > - argv[5] = NULL; > > + argv[1] = "--block-size=1"; > > + argv[2] = "-s"; > > + argv[3] = input; > > + argv[4] = NULL; > > > > if (exec_command ((char **) argv, tmpfile) == -1) > > return -1; > > -- > > 2.24.1 > > > > That didn't help either. And with sparse files even made things worse: > holes are not taken into account now at all. > > I repeat my thoughts here again: it is all like "cats and rats game". > We cannot properly handle all the possible cases. > I think we can agree that virt-make-fs MAY fail and it is expected > situation. Just let the user know that he need to adjust size > manually.I think it's a difficult problem and one which won't be solved using du. For comparison we have some nbdkit plugins for creating filesystems and they are far more sophisticated, especially this one: https://github.com/libguestfs/nbdkit/tree/master/plugins/floppy I would note that the reason the test fails was nothing to do with free space. It was actually running out of inodes. With ext2/3/4 the number of inodes in the metadata is fixed at creation and based on the size of the disk (see /etc/mke2fs.conf inode_ratio and the mke2fs -i and -N options). Because of the small size of the disk, mke2fs was choosing < 100 inodes, but because the test had 100+ files it failed when creating the later files even though there was enough disk space. There's no easy way to fix this. I think it could even be considered as a bug in mke2fs. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Reasonably Related Threads
- Re: [PATCH] make-fs: Don't use du --apparent-size to estimate input size.
- Re: [PATCH] make-fs: Don't use du --apparent-size to estimate input size.
- [PATCH] make-fs: respect libguestfs' temporary dir
- [PATCH 0/3] Use gnulib's getprogname
- [PATCH INCOMPLETE] Rewrite virt-make-fs in C (originally Perl).