Hi, On 03/31/2016 01:52 PM, Graeme Donaldson wrote:> On 31 March 2016 at 04:17, Eric Ren <zren at suse.com> wrote: > >> Hi, >> >>> How did you perform the testing? It really matters. If you write a file on >>>> shared disk from one node, and read this file from another node, without, >>>> or with very little interval, the writing IO speed could decrease by ~20 >>>> times according my previous testing(just as a reference). It's a >>>> extremely >>>> bad situation for 2 nodes cluster, isn't? >>>> >>>> But it's incredible that in your case writing speed drop by >3000 times! >>>> >>> >>> >>> I simply used 'dd' to create a file with /dev/zero as a source. If there >>> is >>> a better way to do this I am all ears. >>> >> >> Alright, you just did a local IO on ocfs2, then the performance shouldn't >> be that bad. I guess the ocfs2 volume has been used over 60%? or seriously >> fragmented? >> Please give info with `df -h`, and super block with debugfs.ocfs2, and >> also the exact `dd` command you performed. Additionally, perform `dd` on >> each node. >> >> You know, ocfs2 is a shared disk fs. So 3 basic testing cases I can think >> of are: >> 1. only one node of cluster do IO; >> 2. more than one nodes of cluster perform IO, but each nodes just >> read/write its own file on shared disk; >> 3. like 2), but some nodes read and some write a same file on shared disk. >> >> The above model is much theoretically simplified though. The practical >> scenarios could be much more complicated, like fragmentation issue that >> your case much likely is. > > > Here is all the output requested: http://pastebin.com/raw/BnJAQv9TPaste directly in email is fine, that way all info would be archived;-) Hah, your disk is too small for ocfs2. What's your using scenario? it does really really matters. I guess files on ocfs2 volume are usually small ones? If so, 4k block size and 4k cluster size is little big? For more info to get optimal format parameter, please see mkfs.ocfs2 manuall, user maillist or google for the topic about ocfs2 formatting. I'm not good at it.> > It's interesting to me that you guessed the usage is over 60%. It is indeed > sitting at 65%. Is the solution as simple as ensuring that an OCFS2 > filesystem doesn't go over the 60% usage mark? Or am I getting ahead of > myself a little?Yes, so use ocfs2 on top cLVM is good idea if you want it to get resilience. I'm not sure if tune.ocfs2 can change block size suchlike offline. FWIW, fragmentation is always evil;-) Eric> > Thanks for your effort so far! > > Graeme. >
On 31 March 2016 at 11:44, Eric Ren <zren at suse.com> wrote:> Hi, > > > On 03/31/2016 01:52 PM, Graeme Donaldson wrote: > >> On 31 March 2016 at 04:17, Eric Ren <zren at suse.com> wrote: >> >> Hi, >>> >>> How did you perform the testing? It really matters. If you write a file >>>> on >>>> >>>>> shared disk from one node, and read this file from another node, >>>>> without, >>>>> or with very little interval, the writing IO speed could decrease by >>>>> ~20 >>>>> times according my previous testing(just as a reference). It's a >>>>> extremely >>>>> bad situation for 2 nodes cluster, isn't? >>>>> >>>>> But it's incredible that in your case writing speed drop by >3000 >>>>> times! >>>>> >>>>> >>>> >>>> I simply used 'dd' to create a file with /dev/zero as a source. If there >>>> is >>>> a better way to do this I am all ears. >>>> >>>> >>> Alright, you just did a local IO on ocfs2, then the performance shouldn't >>> be that bad. I guess the ocfs2 volume has been used over 60%? or >>> seriously >>> fragmented? >>> Please give info with `df -h`, and super block with debugfs.ocfs2, and >>> also the exact `dd` command you performed. Additionally, perform `dd` on >>> each node. >>> >>> You know, ocfs2 is a shared disk fs. So 3 basic testing cases I can think >>> of are: >>> 1. only one node of cluster do IO; >>> 2. more than one nodes of cluster perform IO, but each nodes just >>> read/write its own file on shared disk; >>> 3. like 2), but some nodes read and some write a same file on shared >>> disk. >>> >>> The above model is much theoretically simplified though. The practical >>> scenarios could be much more complicated, like fragmentation issue that >>> your case much likely is. >>> >> >> >> Here is all the output requested: http://pastebin.com/raw/BnJAQv9T >> > > Paste directly in email is fine, that way all info would be archived;-) > > Hah, your disk is too small for ocfs2. What's your using scenario? it does > really really matters. I guess files on ocfs2 volume are usually small > ones? If so, 4k block size and 4k cluster size is little big? For > more info to get optimal format parameter, please see mkfs.ocfs2 manuall, > user maillist or google for the topic about ocfs2 formatting. I'm not good > at it. > > >> It's interesting to me that you guessed the usage is over 60%. It is >> indeed >> sitting at 65%. Is the solution as simple as ensuring that an OCFS2 >> filesystem doesn't go over the 60% usage mark? Or am I getting ahead of >> myself a little? >> > > Yes, so use ocfs2 on top cLVM is good idea if you want it to get > resilience. I'm not sure if tune.ocfs2 can change block size suchlike > offline. FWIW, fragmentation is always evil;-)The files are the code and images, etc. that make up the customer's website. I ran something to show me the distributions of file sizes and only around 10% are under 4KB in size, so I wouldn't think that 4K block/cluster ought to be an issue. Perhaps it just down to the size. We're going to see if re-creating the filesystem with a 1K block (cluster cannot be smaller than 4K) and making it larger makes the issue go away. For interest's sake, this is the size distribution on the volume. The first column is the size in bytes and the second column is the count of files that fall in the size range, so there are 545 files of 0 bytes, there are 265 files between 16 bytes and 32 bytes, etc. 0 545 1 12 2 1 8 9 16 51 32 265 64 593 128 899 256 6902 512 1247 1024 10290 2048 21719 4096 46908 8192 53438 16384 42749 32768 68509 65536 62462 131072 32245 262144 13349 524288 5458 1048576 2193 2097152 245 4194304 66 8388608 15 67108864 3 268435456 1 536870912 1 Graeme -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20160331/bc00b2e4/attachment.html
Hi,>> >> Yes, so use ocfs2 on top cLVM is good idea if you want it to get resilience. I'm not sure if tune.ocfs2 can change block size suchlike offline. FWIW, fragmentation is always evil;-) > > The files are the code and images, etc. that make up the customer's website. I ran something to show me the distributions of file sizes and only around 10% are under 4KB in size, so I wouldn't think that 4K block/cluster ought to be an issue. Perhaps it just down to the size. We're going to see if re-creating the filesystem with a 1K block (cluster cannot be smaller than 4K) and making it larger makes the issue go away. > > For interest's sake, this is the size distribution on the volume. The first column is the size in bytes and the second column is the count of files that fall in the size range, so there are 545 files of 0 bytes, there are 265 files between 16 bytes and 32 bytes, etc. > > 0 545 > 1 12 > 2 1 > 8 9 > 16 51 > 32 265 > 64 593 > 128 899 > 256 6902 > 512 1247 > 1024 10290 > 2048 21719 > 4096 46908 > 8192 53438 > 16384 42749 > 32768 68509 > 65536 62462 > 131072 32245 > 262144 13349 > 524288 5458 > 1048576 2193 > 2097152 245 > 4194304 66 > 8388608 15 > 67108864 3 > 268435456 1 > 536870912 1Yes, you're right. Thanks for correcting me. The big idea is that the bigger the allocation unit is, the more space will be wasted; the smaller cluster size is, then the easier disk will be fragmented. So, 4kb block size is fine due to we have inline data fearture; you should try bigger cluster size if disk size is not a big concern. BYW, could you share the way you get the statistic data? it's cool! Eric> > Graeme-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20160402/af5da36d/attachment.html