Hello all, new to the list here. Figured I'd hop in to get some insight/opinion on the hardware requirements of Gluster in our environment. We've been testing Gluster as our shared storage technology for our new Cloud product we're going to be launching. The primary role of the Gluster infrastructure is to house several hundred (potentially thousands) of Xen sparse image files. In our lab, we have the following servers/specs setup: Bricks1-4: Dual Xeon 5410 (8 cores) 16G RAM 4X1TB 7200RPM HDD @ 3ware RAID5 Bond0 - 2Gbps Cloud1 (client) Q8200 2.33GHz 8G RAM 1X1TB 7200RPM HDD Eth0 - 1Gbps It's all primarily SuperMicro chassis and motherboards, Seagate drives and standard RAM. The Gluster client is configured to distribute and mirror across both pairs of bricks -a typical setup for users like us that need the scalability and redundancy. Until yesterday, things were working great. We were getting line speed writes (55MB/s, because gluster has to write to 2 bricks), and amazing reads. However, after running 5 concurrent iozone benchmarks, we noticed the bricks becoming loaded, primarily in CPU, and some IO: Cloud1 saw peak of 92% cpu utilization with pdflush (buffer cache manager) reaching up to 12% of cpu top - 13:53:12 up 21:44, 2 users, load average: 1.72, 1.28, 0.72 Tasks: 102 total, 2 running, 100 sleeping, 0 stopped, 0 zombie Cpu(s): 7.7%us, 10.1%sy, 0.0%ni, 80.6%id, 0.0%wa, 0.0%hi, 1.2%si, 0.5%st Mem: 8120388k total, 8078832k used, 41556k free, 30760k buffers Swap: 4200988k total, 752k used, 4200236k free, 7606560k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2304 root 20 0 245m 7032 1388 R 91 0.1 8:27.03 glusterfs 4409 root 20 0 46812 16m 172 S 1 0.2 0:01.86 iozone 1159 root 15 -5 0 0 0 S 1 0.0 0:00.12 kjournald 4410 root 20 0 46812 16m 172 S 1 0.2 0:02.22 iozone 4411 root 20 0 46812 16m 172 S 1 0.2 0:01.32 iozone 4412 root 20 0 46812 16m 172 S 1 0.2 0:01.96 iozone 4413 root 20 0 46812 16m 172 S 1 0.2 0:02.00 iozone 1 root 20 0 10312 752 620 S 0 0.0 0:00.16 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 Brick1 saw peak of 14% cpu utilization top - 13:52:39 up 8 days, 5:10, 1 user, load average: 0.07, 0.05, 0.01 Tasks: 151 total, 2 running, 149 sleeping, 0 stopped, 0 zombie Cpu(s): 0.5%us, 0.7%sy, 0.0%ni, 98.1%id, 0.0%wa, 0.1%hi, 0.6%si, 0.0%st Mem: 16439672k total, 2705356k used, 13734316k free, 228624k buffers Swap: 8388600k total, 156k used, 8388444k free, 2255200k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3413 root 15 0 187m 9332 1172 R 10.0 0.1 25:28.51 glusterfsd 1 root 15 0 10348 692 580 S 0.0 0.0 0:02.15 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0 brick2 saw peak of 37 % cpu utilization top - 13:55:29 up 8 days, 5:54, 2 users, load average: 0.21, 0.14, 0.05 Tasks: 152 total, 1 running, 151 sleeping, 0 stopped, 0 zombie Cpu(s): 1.0%us, 1.1%sy, 0.0%ni, 97.2%id, 0.0%wa, 0.1%hi, 0.6%si, 0.0%st Mem: 16439672k total, 2698776k used, 13740896k free, 257288k buffers Swap: 8388600k total, 152k used, 8388448k free, 2251084k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3404 root 15 0 187m 4996 1156 S 16.6 0.0 36:50.78 glusterfsd 10547 root 15 0 0 0 0 S 1.0 0.0 0:11.93 pdflush 1 root 15 0 10348 692 580 S 0.0 0.0 0:02.18 init brick3 top - 21:44:22 up 7 days, 23:36, 3 users, load average: 0.74, 0.39, 0.19 Tasks: 110 total, 2 running, 108 sleeping, 0 stopped, 0 zombie Cpu(s): 3.7%us, 4.1%sy, 0.0%ni, 89.0%id, 0.0%wa, 0.4%hi, 2.8%si, 0.0%st Mem: 16439808k total, 14291352k used, 2148456k free, 191444k buffers Swap: 8388600k total, 120k used, 8388480k free, 13721644k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2953 root 15 0 187m 8380 1176 R 34.3 0.1 12:46.74 glusterfsd 1 root 15 0 10348 692 580 S 0.0 0.0 0:01.75 init brick4 top - 13:56:29 up 8 days, 3:57, 2 users, load average: 0.89, 0.62, 0.35 Tasks: 153 total, 1 running, 152 sleeping, 0 stopped, 0 zombie Cpu(s): 2.3%us, 2.9%sy, 0.0%ni, 92.8%id, 0.0%wa, 0.3%hi, 1.7%si, 0.0%st Mem: 16439672k total, 14405528k used, 2034144k free, 193340k buffers Swap: 8388600k total, 144k used, 8388456k free, 13782972k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3318 root 15 0 253m 7812 1168 S 44.6 0.0 16:36.88 glusterfsd 30901 root 15 0 12740 1108 800 R 0.3 0.0 0:00.01 top 1 root 15 0 10348 692 580 S 0.0 0.0 0:02.32 init Command line used: /usr/sbin/iozone -R -l 5 -u 5 -r 4k -s 2048m -F /distributed/f1 /distributed/f2 /distributed/f3 /distributed/f4 /distributed/f5 -G This obviously is very concerning, that 1 client can induce this sort of load across our gluster infrastructure. So, I'm thinking we get 4 of the following: Dual Xeon 5520 32G RAM (DDR3) 14X1TB 7200RPM HDD, 12 @ RAID10, 2 @ RAID1 (for the operating system) MaxIQ or CacheCade SSD addon depending on whether we go with Adaptec or LSI 6X1Gige bonded to 6Gbps Is this overkill? Here are some things to keep in mind: * The majority of our customers will be hosting web applications, blogs, forums, primarily database driven - lots of reads, some writes * Each client server only has a 1Gbps connection to the brick(s) * We obviously are trying to get the most bang for our buck, but not trying to spend $40k on something that's overpowered Is that iozone benchmark too intense for what we're ACTUALLY going to be seeing from our Xen instances? Is there an advantage to using more bricks at half the horsepower? At 6Gbps to the bricks, does RAID10 make more sense, because of its exceptional performance? Has anyone on the list setup something like this before? If so, mind sharing your wisdom? Thanks in advance everyone :) Just for reference, here are the iozone benchmarking results: Run began: Wed Jun 16 13:48:41 2010 Excel chart generation enabled Record Size 4 KB File size set to 2097152 KB Using msync(MS_SYNC) on mmap files Command line used: /usr/sbin/iozone -R -l 5 -u 5 -r 4k -s 2048m -F /distributed/f1 /distributed/f2 /distributed/f3 /distributed/f4 /distributed/f5 -G Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Min process = 5 Max process = 5 Throughput test with 5 processes Each process writes a 2097152 Kbyte file in 4 Kbyte records Children see throughput for 5 initial writers = 34936.65 KB/sec Parent sees throughput for 5 initial writers = 31595.96 KB/sec Min throughput per process = 6713.67 KB/sec Max throughput per process = 7407.69 KB/sec Avg throughput per process = 6987.33 KB/sec Min xfer = 1900672.00 KB Children see throughput for 5 rewriters = 34336.49 KB/sec Parent sees throughput for 5 rewriters = 33737.35 KB/sec Min throughput per process = 6506.53 KB/sec Max throughput per process = 7430.44 KB/sec Avg throughput per process = 6867.30 KB/sec Min xfer = 1836416.00 KB Children see throughput for 5 readers = 111977.31 KB/sec Parent sees throughput for 5 readers = 111846.84 KB/sec Min throughput per process = 20259.48 KB/sec Max throughput per process = 25610.23 KB/sec Avg throughput per process = 22395.46 KB/sec Min xfer = 1658992.00 KB Children see throughput for 5 re-readers = 111582.38 KB/sec Parent sees throughput for 5 re-readers = 111420.54 KB/sec Min throughput per process = 20841.22 KB/sec Max throughput per process = 25012.94 KB/sec Avg throughput per process = 22316.48 KB/sec Min xfer = 1747440.00 KB Children see throughput for 5 reverse readers = 110576.31 KB/sec Parent sees throughput for 5 reverse readers = 110356.95 KB/sec Min throughput per process = 18543.04 KB/sec Max throughput per process = 23964.38 KB/sec Avg throughput per process = 22115.26 KB/sec Min xfer = 1622784.00 KB Children see throughput for 5 stride readers = 6513.98 KB/sec Parent sees throughput for 5 stride readers = 6513.14 KB/sec Min throughput per process = 1189.43 KB/sec Max throughput per process = 1497.43 KB/sec Avg throughput per process = 1302.80 KB/sec Min xfer = 1665796.00 KB Children see throughput for 5 random readers = 3460.22 KB/sec Parent sees throughput for 5 random readers = 3460.08 KB/sec Min throughput per process = 622.45 KB/sec Max throughput per process = 799.28 KB/sec Avg throughput per process = 692.04 KB/sec Min xfer = 1633196.00 KB