santi at usansolo.net
2008-Jun-09 17:33 UTC
2GB memory limit running fsck on a +6TB device
Dear Srs, That's the scenario: +6TB device on a 3ware 9550SX RAID controller, running Debian Etch 32bits, with 2.6.25.4 kernel, and defaults e2fsprogs version, "1.39+1.40-WIP-2006.11.14+dfsg-2etch1". Running "tune2fs" returns that filesystem is in EXT3_ERROR_FS state, "clean with errors": # tune2fs -l /dev/sda4 tune2fs 1.40.10 (21-May-2008) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 7701b70e-f776-417b-bf31-3693dba56f86 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal dir_index filetype needs_recovery sparse_super large_file Default mount options: (none) Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 792576000 Block count: 1585146848 It's a backup storage server, with more than 113 million files, this's the output of "df -i": # df -i /backup/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda4 792576000 113385959 679190041 15% /backup Running fsck.ext3 or fsck.ext2 I get: # fsck.ext3 /dev/sda4 e2fsck 1.40.10 (21-May-2008) Adding dirhash hint to filesystem. /dev/sda4 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Error allocating directory block array: Memory allocation failed e2fsck: aborted With some straces: ===============================================================================gettimeofday({1213032482, 940738}, NULL) = 0 getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 16001}, ...}) = 0 write(1, "Pass 1: Checking ", 17Pass 1: Checking ) = 17 write(1, "inode", 5inode) = 5 write(1, "s, ", 3s, ) = 3 write(1, "block", 5block) = 5 write(1, "s, and sizes\n", 13s, and sizes ) = 13 mmap2(NULL, 99074048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x404fa000 mmap2(NULL, 99074048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x46376000 mmap2(NULL, 99074048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4c1f2000 mmap2(NULL, 198148096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x5206e000 mmap2(NULL, 99074048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x5dd66000 mmap2(NULL, 748892160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x63be2000 mmap2(NULL, 1866240000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) brk(0x77488000) = 0x80ab000 mmap2(NULL, 1866375168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x90615000 munmap(0x90615000, 962560) = 0 munmap(0x90800000, 86016) = 0 mprotect(0x90700000, 135168, PROT_READ|PROT_WRITE) = 0 mmap2(NULL, 1866240000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) =============================================================================== Appears that fsck is trying to use more than 2GB memory to store inode table relationship. System has 4GB of physical RAM and 4GB of swap, is there anyway to limit the memory used by fsck or any solution to check this filesystem? Running fsck with a 64bit LiveCD will solve the problem? I also tried with last e2fsprogs stable release 1.40.10, getting the same error :-/ Regards, -- Santi Saez
On Mon, Jun 09, 2008 at 07:33:48PM +0200, santi at usansolo.net wrote:> It's a backup storage server, with more than 113 million files, this's the > output of "df -i": > > Appears that fsck is trying to use more than 2GB memory to store inode > table relationship. System has 4GB of physical RAM and 4GB of swap, is > there anyway to limit the memory used by fsck or any solution to check this > filesystem? Running fsck with a 64bit LiveCD will solve the problem?Yes, running with a 64-bit Live CD is one way to solve the problem. If you are using e2fsprogs 1.40.10, there is another solution that may help. Create an /etc/e2fsck.conf file with the following contents: [scratch_files] directory = /var/cache/e2fsck ...and then make sure /var/cache/e2fsck exists by running the command "mkdir /var/cache/e2fsck". This will cause e2fsck to store certain data structures which grow large with backup servers that have a vast number of hard-linked files in /var/cache/e2fsck instead of in memory. This will slow down e2fsck by approximately 25%, but for large filesystems where you couldn't otherwise get e2fsck to complete because you're exhausting the 2GB VM per-process limitation for 32-bit systems, it should allow you to run through to completion. - Ted
On Jun 09, 2008 19:33 +0200, santi at usansolo.net wrote:> That's the scenario: +6TB device on a 3ware 9550SX RAID controller, running > Debian Etch 32bits, with 2.6.25.4 kernel, and defaults e2fsprogs version, > "1.39+1.40-WIP-2006.11.14+dfsg-2etch1". > > Running "tune2fs" returns that filesystem is in EXT3_ERROR_FS state, "clean > with errors": > > # tune2fs -l /dev/sda4 > tune2fs 1.40.10 (21-May-2008) > Filesystem volume name: <none> > Last mounted on: <not available> > Filesystem UUID: 7701b70e-f776-417b-bf31-3693dba56f86 > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal dir_index filetype needs_recovery > sparse_super large_file > Default mount options: (none) > Filesystem state: clean with errors > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 792576000 > Block count: 1585146848 > > It's a backup storage server, with more than 113 million files, this's the > output of "df -i": > > # df -i /backup/ > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/sda4 792576000 113385959 679190041 15% /backup > > > Running fsck.ext3 or fsck.ext2 I get: > > # fsck.ext3 /dev/sda4 > e2fsck 1.40.10 (21-May-2008) > Adding dirhash hint to filesystem. > > /dev/sda4 contains a file system with errors, check forced. > Pass 1: Checking inodes, blocks, and sizesI recall that e2fsck allocates on the order of 3 * block_count / 8 bytes, and 5 * inode_count / 8 bytes, so in your case this is about: (5 * 1585146848 + 3 * 792576000) / 8 = 1287932780 bytes = 1.2GB at a minimum, but my estimates might be incorrect.> mmap2(NULL, 99074048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x404fa000Judging by the return values of these functions, this is a 32-bit system, and it is entirely possible that you are exceeding the per-process memory allocation limit.> mmap2(NULL, 748892160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x63be2000 > mmap2(NULL, 1866240000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = -1 ENOMEM (Cannot allocate memory)Hmm, it seems a bit excessive to allocate 1.8GB in a single chunk.> Error allocating directory block array: Memory allocation failed > e2fsck: abortedThis message is a bit tricky to nail down because it doesn't exist anywhere in the code directly. It is encoded into "e2fsck abbreviations", and the expansion that is normally in the corresponding comment is different. It is PR_1_ALLOCATE_DBCOUNT returned from the call chain: ext2fs_init_dblist-> make_dblist-> ext2fs_get_num_dirs() which is counting the number of directories in the filesystem, and allocating two 12-byte array element for each one. This implies you have 77M directories in your filesystem, or an average of only 10 files per directory?> Appears that fsck is trying to use more than 2GB memory to store inode > table relationship. System has 4GB of physical RAM and 4GB of swap, is > there anyway to limit the memory used by fsck or any solution to check this > filesystem?I don't know offhand how important the dblist structure is, so I'm not sure if there is a way to reduce the memory usage for it. I believe that in low-memory situations it is possible to use tdb in newer versions of e2fsck for the dblist, but I don't know much of the details.> Running fsck with a 64bit LiveCD will solve the problem?Yes, I suspect with a 64-bit kernel you could allocate the full 4GB of RAM for e2fsck and be able to check the filesystem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.