Hello, I have some performance problems in a file server system. It is used as Samba and NFS file server. I have some ideas what might cause the problems, and I want to try step by step. First I have to learn more about these areas. First I have some questions about tuning/sizing the ext3 journal. The most extensive list I found on ext3 performance tuning is <http://marc.info/?l=ext3-users&m=117943306605949&w=2> . I learned that the ext3 journal is flushed when either the journal is full or the commit interval is over (set by the mount option "commit=<number of seconds>"). So started trying these settings. I didnt manage to determine the size of the journal of an already existing filesystem. tunefs tells me the inode: ~# tune2fs -l /dev/vg0/lvol0 | grep -i journal Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file Journal inode: 8 Journal backup: inode blocks Is there a way to get the size of the journal? And how do I find out how much of the journal is used? Or how often a journal flush actually happens? Or whether the journal flushes happen because the commit interval has finished or because the journal was full? This would give me hints for the sizing of the journal. And I tried to increase the journal flush interval. ~# umount /data/ ~# mount -o commit=30 /dev/vg0/lvol0 /data/ ~# grep /data /proc/mounts /dev/vg0/lvol0 /data ext3 rw,data=ordered 0 0 ~# Watching the disk activity LEDs makes me believe that this works, but I expected the mount option "commit=30" to be listed in /proc/mounts. Did I do something wrong, or is there another way to explain it? As you see above in /proc/mounts I use data=ordered. The fileserver offers both NFS and Samba. "data=journal" might be better for NFS, but I believe that NFS is the smaller part of the fileserver load. Is there a way to measure or estimate how large the impact of NFS on the journal size and transfer rate is? If I used "data=journal" I would need a larger journal and the journal data transfer rate would increase. I fear this might induce a new bottleneck, but I have no idea how to measure this or how to estimate it in advance. Currently I have an internal journal, the filesystem resides on RAID6. I guess this is another potential performance problem. When discussions on external journals appeared some years ago it was mentioned that the external journal code was quite new (see <http://marc.info/?l=ext3-users&m=101466148203469&w=2>). I think nowadays I have the option to use an external journal and place it on a dedicated RAID1. Did anyone experience performance advantages by doing this? Even while using "data=journal"? Thats all. Thanks for reading that far ;-) Sven
On Dec 11, 2007 13:29 +0100, Sven Rudolph wrote:> I didnt manage to determine the size of the journal of an already > existing filesystem. tunefs tells me the inode: > > ~# tune2fs -l /dev/vg0/lvol0 | grep -i journal > Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file > Journal inode: 8 > Journal backup: inode blocks > > Is there a way to get the size of the journal?dumpe2fs -c -R "stat <8>" /dev/vg0/lvol0> And how do I find out how much of the journal is used? Or how often a > journal flush actually happens? Or whether the journal flushes happen > because the commit interval has finished or because the journal was > full? This would give me hints for the sizing of the journal.There is a patch for jbd2 (part of the ext4 patch queue, based on a patch for jbd from Lustre) that records transactions and journal stats.> And I tried to increase the journal flush interval. > > ~# umount /data/ > ~# mount -o commit=30 /dev/vg0/lvol0 /data/ > ~# grep /data /proc/mounts > /dev/vg0/lvol0 /data ext3 rw,data=ordered 0 0 > ~# > > Watching the disk activity LEDs makes me believe that this works, but > I expected the mount option "commit=30" to be listed in > /proc/mounts. Did I do something wrong, or is there another way to > explain it?No, /proc/mounts doesn't report all of the mount options correctly.> As you see above in /proc/mounts I use data=ordered. The fileserver > offers both NFS and Samba. "data=journal" might be better for NFS, but > I believe that NFS is the smaller part of the fileserver load. Is > there a way to measure or estimate how large the impact of NFS on the > journal size and transfer rate is? > > If I used "data=journal" I would need a larger journal and the journal > data transfer rate would increase. I fear this might induce a new > bottleneck, but I have no idea how to measure this or how to estimate > it in advance.Increasing the journal size is a good idea for any metadata-heavy load. We use a journal size of 400MB for Lustre metadata servers.> Currently I have an internal journal, the filesystem resides on > RAID6. I guess this is another potential performance problem.For the journal this doesn't make much difference since the IO is sequential writes. The RAID6 is bad for metadata performance because it has to do read-modify-write on the RAID stripes.> When discussions on external journals appeared some years ago it was > mentioned that the external journal code was quite new (see > <http://marc.info/?l=ext3-users&m=101466148203469&w=2>). > > I think nowadays I have the option to use an external journal and > place it on a dedicated RAID1. Did anyone experience performance > advantages by doing this? Even while using "data=journal"?Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, On Tue, 2007-12-11 at 13:29 +0100, Sven Rudolph wrote:> I have some performance problems in a file server system. It is used > as Samba and NFS file server. I have some ideas what might cause the > problems, and I want to try step by step. First I have to learn more > about these areas. > > First I have some questions about tuning/sizing the ext3 journal. > > The most extensive list I found on ext3 performance tuning is > <http://marc.info/?l=ext3-users&m=117943306605949&w=2> . > > > I learned that the ext3 journal is flushed when either the journal is > full or the commit interval is over (set by the mount option > "commit=<number of seconds>"). So started trying these settings.Are your filesystem mounted noatime ? It does a huge difference, especially if your workload is mainly read over write. Without noatime, each access to a file generates a write to change the metadata which will fill your journal. If you are not using noatime, it is worth trying it. See it for a thorough discussion of the topic: http://thread.gmane.org/gmane.linux.kernel/565148 Hope that helps, -- Brice Figureau Days of Wonder http://www.daysofwonder.com/