David Schüler
2009-Jan-20 09:45 UTC
[Ocfs2-users] ocfs2 slows down with more servers in a cluster
Hello to everybody on the list, I have a problem regarding the file operations per second on an ocfs2 volume. I mean not the read or write speeds just the operations per second that the filesystem can handle. Let's start at the beginning, here's what I have and what I'm doing: I have a big fibre storage device with around 6TB of space, raid6 on sata-2 drives. I have 8 servers in my cluster connected to the storage via two fibre switches. Switches and HBAs are QLogic 4GBps. The Storage is an Infortrend EONStor. The servers are connected through GBit network switches. I think the hardware is not what causes the problem. I'm using Ubuntu Server 8.04.1 LTS on all the machines. I create one 6TB partion with parted using the gpt disklabel. I format the partition with ocfs2. On all servers the o2cb service is running configured with the same heartbeat, network and so on timeout values. The same cluster.conf on every server as well. I mount the ocfs2 volume on one machnine and everything works fine. I did some bonnie++ testing and got a write speed of 70MB/s and a read speed of 140MB/s. I mount the volume on all servers and everything still works good. I can do concurrent reads and writes, no errors, no fencing. Nevertheless everythings feels slow. I did a rsync to bring 1,4TB of data to the volume. With the volume mounted on one server this takes around 1 1/2 day. With the volume mounted on all the servers I stopped it after two days and not more than 200GB synced. This made me wonder so I started some more tests. bonnie++ with the volume mounted on all servers still reported a write speed of 70MB/s and a read speed of 130MB/s. So this seems not to be the problem. I took a look closer at the bonnie tests. The file operations per second are tested as well and I saw that with the ocfs2 volume mounted on one server it reaches around 2.500 operations/s with the volume mounted on all the server it slows down to 16 operations/s. For the mathematics: With a write speed of 70MB/s I should be able to get written 70 Files of 1MB size in just one second but I don't get it because not more than 16 files can be handled by the metadata. OK, still 16MB/s. Now, I don't have 1MB files, I have files of around 100kB each, so it slows my highspeed fibre storage down to 1,6MB/s (or even less). I did some more testing and found out the more servers are in the cluster the slower everything gets. After reading nearly everything about ocfs2 I could find I did additional tests. I reduced the volume size to 500GB no longer using gpt as volume label. I tried different cluster and block sizes. I reduced the number of node slots. I used -T mail and -T datafiles. All these options in nearly every combination. Nothing helped. I even switched to Ubuntu Server 8.10 because of the newer ocfs2 kernel module but nothing changed. I think I must be doing something wrong because I never read about such a problem before and if it was because of a bug more people would be reporting it I think. Here are my bonnie++ tests: Volume mounted on one server: root at upload1:/daten# bonnie++ -d /daten -n 1 -u 0 -g 0 Using uid:0, gid:0. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03b ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP upload1 8G 40434 76 74376 27 42018 18 41361 76 137827 25 538.8 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 1 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ upload1,8G,40434,76,74376,27,42018,18,41361,76,137827,25,538.8,1,1,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ Volume mounted on all 8 servers: root at upload1:/daten# bonnie++ -d /daten -n 1 -u 0 -g 0 Using uid:0, gid:0. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03b ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP upload1 8G 39445 74 75634 29 42245 20 42198 76 128275 22 573.1 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 1 16 1 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ upload1,8G,39445,74,75634,29,42245,20,42198,76,128275,22,573.1,0,1,16,1,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ It's this Create/sec line where my problem with slow speeds seems to come from. In some more tests I found out that Create/sec goes up to 2.500 with one server mounting the volume (the -n option has to be raised from 1 to 10 to see it). I formated the volume with ext3 and got 70.000 Create/sec with ext3, so it seems that even with one server mounting the volume something is not right. With tunefs.ocfs2 I switched the ocfs2 volume to 'local' and got Create/sec up to 3.500 but this still seems very slow to me. With this in mind I started testing on one machine with ocfs2 on a local drive but I can't get more than 2.500 Create/sec, no matter which cluster and block size I'm using. I can't test it with this volume on more than one server because I can't export the local drive to the other machines, but I'm sure it would get slower with more servers mounting the volume. I did one last test yesterday. I installed CentOS 5.2 with the latest ocfs2 module and tools from oracle for EL5. I still can't get more than 2.500 Create/sec with one server mounting the volume. Next I'll do some testing with more than one CentOS server but perhaps someone has a good idea for me or a hint what I'm doing wrong. I'm sure ocfs2 can perform much better than in my tests. Oh, I forgot: I even tested on different hardware, a dual xeon machnine with 4GB RAM as well as a core 2 duo machine with 2GB RAM, no changes. I used 32 bit and 64 bit versions of Ubuntu Server. As well, no changes. I'm sorry for the long post, but I'm new to the list and I think every little peace of information could be helpfull. Kind regards, David ____________ Virus checked by G DATA AntiVirusKit Version: AVF 19.226 from 18.01.2009 Virus news: www.antiviruslab.com
Sunil Mushran
2009-Jan-20 23:24 UTC
[Ocfs2-users] ocfs2 slows down with more servers in a cluster
In my run bonnie created over 20 thousand files in one directory. This is problematic as ocfs2 currently lacks indexed directories. Meaning during each create, it needs to scan the entire directory to ensure there is no name clash. The good news is that we are in the process of addressing this shortcoming. Sunil David Sch?ler wrote:> Hello to everybody on the list, > > I have a problem regarding the file operations per second on an ocfs2 volume. I mean not the read or write speeds just the operations per second that the filesystem can handle. > > Let's start at the beginning, here's what I have and what I'm doing: > I have a big fibre storage device with around 6TB of space, raid6 on sata-2 drives. I have 8 servers in my cluster connected to the storage via two fibre switches. Switches and HBAs are QLogic 4GBps. The Storage is an Infortrend EONStor. The servers are connected through GBit network switches. I think the hardware is not what causes the problem. > > I'm using Ubuntu Server 8.04.1 LTS on all the machines. I create one 6TB partion with parted using the gpt disklabel. I format the partition with ocfs2. On all servers the o2cb service is running configured with the same heartbeat, network and so on timeout values. The same cluster.conf on every server as well. > > I mount the ocfs2 volume on one machnine and everything works fine. I did some bonnie++ testing and got a write speed of 70MB/s and a read speed of 140MB/s. > I mount the volume on all servers and everything still works good. I can do concurrent reads and writes, no errors, no fencing. Nevertheless everythings feels slow. I did a rsync to bring 1,4TB of data to the volume. With the volume mounted on one server this takes around 1 1/2 day. With the volume mounted on all the servers I stopped it after two days and not more than 200GB synced. This made me wonder so I started some more tests. > bonnie++ with the volume mounted on all servers still reported a write speed of 70MB/s and a read speed of 130MB/s. So this seems not to be the problem. I took a look closer at the bonnie tests. The file operations per second are tested as well and I saw that with the ocfs2 volume mounted on one server it reaches around 2.500 operations/s with the volume mounted on all the server it slows down to 16 operations/s. For the mathematics: With a write speed of 70MB/s I should be able to get written 70 Files of 1MB size in just one second but I don't get it because not more than 16 files can be handled by the metadata. OK, still 16MB/s. Now, I don't have 1MB files, I have files of around 100kB each, so it slows my highspeed fibre storage down to 1,6MB/s (or even less). > > I did some more testing and found out the more servers are in the cluster the slower everything gets. After reading nearly everything about ocfs2 I could find I did additional tests. I reduced the volume size to 500GB no longer using gpt as volume label. I tried different cluster and block sizes. I reduced the number of node slots. I used -T mail and -T datafiles. All these options in nearly every combination. Nothing helped. I even switched to Ubuntu Server 8.10 because of the newer ocfs2 kernel module but nothing changed. > > I think I must be doing something wrong because I never read about such a problem before and if it was because of a bug more people would be reporting it I think. > > Here are my bonnie++ tests: > Volume mounted on one server: > root at upload1:/daten# bonnie++ -d /daten -n 1 -u 0 -g 0 > Using uid:0, gid:0. > Writing with putc()...done > Writing intelligently...done > Rewriting...done > Reading with getc()...done > Reading intelligently...done > start 'em...done...done...done... > Create files in sequential order...done. > Stat files in sequential order...done. > Delete files in sequential order...done. > Create files in random order...done. > Stat files in random order...done. > Delete files in random order...done. > Version 1.03b ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > upload1 8G 40434 76 74376 27 42018 18 41361 76 137827 25 538.8 1 > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 1 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ > upload1,8G,40434,76,74376,27,42018,18,41361,76,137827,25,538.8,1,1,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ > > Volume mounted on all 8 servers: > root at upload1:/daten# bonnie++ -d /daten -n 1 -u 0 -g 0 > Using uid:0, gid:0. > Writing with putc()...done > Writing intelligently...done > Rewriting...done > Reading with getc()...done > Reading intelligently...done > start 'em...done...done...done... > Create files in sequential order...done. > Stat files in sequential order...done. > Delete files in sequential order...done. > Create files in random order...done. > Stat files in random order...done. > Delete files in random order...done. > Version 1.03b ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > upload1 8G 39445 74 75634 29 42245 20 42198 76 128275 22 573.1 0 > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 1 16 1 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ > upload1,8G,39445,74,75634,29,42245,20,42198,76,128275,22,573.1,0,1,16,1,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ > > It's this Create/sec line where my problem with slow speeds seems to come from. > > In some more tests I found out that Create/sec goes up to 2.500 with one server mounting the volume (the -n option has to be raised from 1 to 10 to see it). > I formated the volume with ext3 and got 70.000 Create/sec with ext3, so it seems that even with one server mounting the volume something is not right. With tunefs.ocfs2 I switched the ocfs2 volume to 'local' and got Create/sec up to 3.500 but this still seems very slow to me. > > With this in mind I started testing on one machine with ocfs2 on a local drive but I can't get more than 2.500 Create/sec, no matter which cluster and block size I'm using. I can't test it with this volume on more than one server because I can't export the local drive to the other machines, but I'm sure it would get slower with more servers mounting the volume. > > I did one last test yesterday. I installed CentOS 5.2 with the latest ocfs2 module and tools from oracle for EL5. I still can't get more than 2.500 Create/sec with one server mounting the volume. Next I'll do some testing with more than one CentOS server but perhaps someone has a good idea for me or a hint what I'm doing wrong. I'm sure ocfs2 can perform much better than in my tests. > > Oh, I forgot: I even tested on different hardware, a dual xeon machnine with 4GB RAM as well as a core 2 duo machine with 2GB RAM, no changes. I used 32 bit and 64 bit versions of Ubuntu Server. As well, no changes. > > I'm sorry for the long post, but I'm new to the list and I think every little peace of information could be helpfull. > > Kind regards, > David > > > ____________ > Virus checked by G DATA AntiVirusKit > Version: AVF 19.226 from 18.01.2009 > Virus news: www.antiviruslab.com > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >