All, I've been asked to share some rebuild times on my large gluster cluster. I recently added another more storage (bricks) and did a full ls -alR on the whole system. I estimate we have around 50 million files and directories. Gluster Server Hardware: 2 x Supermicro 4u chassis with 24 1.5tb SATA drives and another 24 1.5tb SATA drives in an external drive array via SAS (total of 96 drives all together), 8 core 2.5ghz xeon, 8gig ram 3ware raid controllers, 24 drives per raid6 array, 4 arrays total, 2 arrays per server Centos 5.3 64bit XFS with inode64 mount option Gluster 2.0.9 Bonded gigabit ethernet Clients: 20 or so Dell 1950 clients Mixture of RedHat ES4 and Centos 5 clients + 20 Windows XP clients via Samba (theses are VMs and do "have to run on windows" jobs) All clients on gigabit ethernet I must say that our load on our gluster servers is normally very high, "load average" on the box is anywhere from 7-10 at peak (although decent service times) - so im sure if we had a more idle system the rebuild time would have been quicker. The system is at its highest load while writing a large amount of data while at peak of the day - so i try to schedule jobs around our peak times. Anyhow... I started the job sometime January 16th and it JUST finished...18 days later. real 27229m56.894s user 13m19.833s sys 56m51.277s Finish date was Wed Feb 3 23:33:12 PST 2010 Now i've known some people have mentioned that Gluster is happier with many bricks instead of larger raid arrays like I use however either way id be stuck doing a ls -aglR which takes forever. So id rather add a huge amount of space at once and keep the system setup similar - and let my 3ware controllers deal with drive failures instead of having to do a ls -aglR each time i loose a drive. Replacing a drive with the 3ware controller 7 to 8 days in a 24 drive raid6 array but thats better then 18 days for Gluster to do a ls -aglR. By comparison our old 14 node Isilon 6000 cluster (6tb per node) did a node rebuilt/resync in about a day or two - theres a big difference in block level and file system level replication! We're still running Gluster 2.0.9 but I am looking to upgrade to 3.0 once a few more releases are out and am hoping that the new checksum based checks will speedup this whole process. Once i have some numbers on 3.0 ill be sure to share. thanks, liam