calinb at comcast.net
2004-Jul-30 01:46 UTC
Large File Copy to Large ext3 RAID5 Array Often Stalls
I'm experiencing strange behavior from my ext3 RAID5 array and my Fedora Core 2 system. Before I go crazy varying all sorts of tuning parameters, I thought some list subscribers might provide me with useful advice. The problematic array is: 3x Promise Technology Ultra 100 TX2 PCI cards 6x Maxtor 250GB IDE drives (one drive per cable) RAID level 5, 128Kb chunk size, EXT3: "mkfs -t ext3 -b 4096 -m 0 -R stride=16 /dev/md2" I'm running Samba 3 and I first noticed this problem when 3 out of my 5 Windows clients (2 XP machines and 1 Server 2K3 machine) failed to copy any large files (~1GB) to a subdirectory on the server containing about 220 other such large files. Two XP machines on my network have no problems whatsoever copying large files to the very same subdirectory on the server. A failing file transfer begins at a reasonable data rate (~6 MB / sec) but grinds to a near standstill after about 30 seconds and the copy continues to crawl until I cancel it (maybe 10kB / second--just a rough guess.) The two well behaived clients transfer the 1GB files in about 2-3 minutes, as expected. Yup, Samba--that's what I thought at first so I tried FTP and obtained the same results. I can't correlate the problem to anything on the five Windows clients, or the NICs or the switch, etc. I can't find any configuration differences amongst the clients that correlates to the 3 failing or 2 fully functional clients. However, I can successfully copy the large files across the network from all 5 clients to an empty or nearly empty subdirectory on the raid5 array. Then I move the files down to the subdirectory as desired. That's my workaround for the 3 "bad" macines. (Yuck!) Now, here's what happens from the server console: if I copy a large file from a different drive (a mirror pair) on the server to the raid5 ext3 array, I have the same kind of problems that I have with the 3 networked clients. If I copy the large files from the mirror pair to an empty or nearly empty subdirectory on the raid5 ext3 filesystem, then the performance varies widely (from about 10MB /sec to 40 MB / sec.) I'd probably never notice this problem with smaller file (<100 - 200 MB or so) because the copy completes before the stall. The subdirectory containing the 220 1GB files was originally populated by copying the directory structure and files from one of the "good" XP Samba clients across the network. Any ideas or suggestions are greatly appreciated. I've tried data=journal, ordered, and writeback and, though there are performance differences, the problem remains in all three modes. Thanks, Cal Brabandt, Linux System Admin newbie
Calin Brabandt
2004-Jul-31 21:36 UTC
Large File Copy to Large ext3 RAID5 Array Often Stalls
Perhaps this IS a Samba problem. I created a new directory structure on my RAID5 array from the Linux console and move its contents of large files to the new subdirectory. Although I need to do more testing, the problem seem to have vanished--at least when copying new files from my network Samba clients to the new subdirectory. Much of the subdirectory structure on the RAID5 array was originally created using Windows Explorer on a Windows XP client by simply dragging and dropping the subdirectory icons to the Samba share on the RAID5. Could Samba have screwed up this operation in a manner resulting in my symptoms? I'd sure like to better understand the problem so I can avoid it in the future. Thanks, Cal
--On 31 July 2004 14:36 -0700 Calin Brabandt <calinb at comcast.net> wrote:> Perhaps this IS a Samba problem. I created a new directory structure on > my RAID5 array from the Linux console and move its contents of large > files to the new subdirectory. Although I need to do more testing, the > problem seem to have vanished--at least when copying new files from my > network Samba clients to the new subdirectory.I am *FAR* from an expert, but I seem to remember that samba isn't particularly efficient at scanning directories, not least because of case-insensitive filename matching. You didn't say what kernel you were using, but if it is 2.4, then ext3 is also not efficient at holding directories with large numbers of files (which is what I think you said you had). It may be that the two problems compound to give you very slow performance. You can try the htree patch for your kernel, then ensure the directory gets indexed - IIRC mv the directory out the way, create a new directory when htree is turned on (which will then be created with the relevant htree index), and mv each of the files in the old directory back into the new directory. If you were using 2.4, but in the intervening period you've upgraded to 2.6 (which has the htree fixes in), I note you've just done effectively that, which may be why the problem has disappeared - just a guess. Alex