Emile Heitor
2010-Sep-15 13:18 UTC
[Gluster-users] Shared web hosting with GlusterFS and inotify
Hi list, For a couple of weeks, we're experimenting a web hosting system based on GlusterFS in order to share customers documentroots between more-than-one machine. Involved hardware and software are : Two servers composed of 2x Intel 5650 (i.e. 2x12 cores @2,6Ghz), 24GB DDR3 RAM, 146GB SAS disks / RAID 1 Both servers running 64bits Debian Lenny GNU/Linux with GlusterFS 3.0.5 The web server is Apache 2.2, the application is a huge PHP/MySQL monster. For our first naive tests were using the glusterfs mountpoint as apache's documentroot. In short, performances were catastrophic. A single of these servers, without GlusterFS, is capable of handling about 170 pages per second with 100 concurrent users. The same server, with apache documentroot being a gluster mountpoint, drops to 5 PPS for 20 CU and just stops responding for 40+. We tried a lot of tips (quick-read, io-threads, io-cache, thread-count, timeouts...) we read on this very mailing list, various websites, or experiences on our own, we never got better than 10 PPS / 20 users. So we took another approach: instead of declaring gluster mountpoint as the documentroot, we declared the local storage, but of course, without any modification, this would lead to inconsistencies if by any chance apache writes something (.htaccess, tmp file, log...). And so enters inotify. Using inotify-tools's "inotifywait", we have this little script watching for local documentroot modifications, duplicating them to the glusterfs share. The infinite loop is avoided by a md5 comparison. Here a very early proof of concept : #!/bin/sh [ $# -lt 2 ] && echo "usage: $0 <source> <destination>" && exit 1 PATH=${PATH}:/bin:/sbin:/usr/bin:/usr/sbin; export PATH SRC=$1 DST=$2 cd ${SRC} # no recursion RSYNC='rsync -dlptgoD --delete "${srcdir}" "${dstdir}/"' inotifywait -mr \ --exclude \..*\.sw.* \ -e close_write -e create -e delete_self -e delete . | \ while read dir action file do srcdir="${SRC}/${dir}" dstdir="${DST}/${dir}" [ -d "${srcdir}" ] && \ [ ! -z "`df -T \"${srcdir}\"|grep tmpfs`" ] \ && continue # debug echo ${dir} ${action} ${file} case "${action}" in CLOSE_WRITE,CLOSE) [ ! -f "${dstdir}/${file}" ] && eval ${RSYNC} && continue md5src="`md5sum \"${srcdir}/${file}\"|cut -d' ' -f1`" md5dst="`md5sum \"${dstdir}/${file}\"|cut -d' ' -f1`" [ ! $md5src == $md5dst ] && eval ${RSYNC} ;; CREATE,ISDIR) [ ! -d "${dstdir}/${file}" ] && eval ${RSYNC} ;; DELETE|DELETE,ISDIR) eval ${RSYNC} ;; esac done As for now a gluster mountpoint is barely unusable as an Apache DocumentRoot for us (and yes, with htaccess disabled), i'd like to have the list's point of view on this approach. Do you see any terrible glitch ? Thanks in advance, -- Emile Heitor, Responsable d'Exploitation --- www.nbs-system.com, 140 Bd Haussmann, 75008 Paris Tel: 01.58.56.60.80 / Fax: 01.58.56.60.81
Emile Heitor
2010-Sep-15 14:58 UTC
[Gluster-users] Shared web hosting with GlusterFS and inotify
Hi list, For a couple of weeks, we're experimenting a web hosting system based on GlusterFS in order to share customers documentroots between more-than-one machine. Involved hardware and software are : Two servers composed of 2x Intel 5650 (i.e. 2x12 cores @2,6Ghz), 24GB DDR3 RAM, 146GB SAS disks / RAID 1 Both servers running 64bits Debian Lenny GNU/Linux with GlusterFS 3.0.5 The web server is Apache 2.2, the application is a huge PHP/MySQL monster. For our first naive tests were using the glusterfs mountpoint as apache's documentroot. In short, performances were catastrophic. A single of these servers, without GlusterFS, is capable of handling about 170 pages per second with 100 concurrent users. The same server, with apache documentroot being a gluster mountpoint, drops to 5 PPS for 20 CU and just stops responding for 40+. We tried a lot of tips (quick-read, io-threads, io-cache, thread-count, timeouts...) we read on this very mailing list, various websites, or experiences on our own, we never got better than 10 PPS / 20 users. So we took another approach: instead of declaring gluster mountpoint as the documentroot, we declared the local storage, but of course, without any modification, this would lead to inconsistencies if by any chance apache writes something (.htaccess, tmp file, log...). And so enters inotify. Using inotify-tools's "inotifywait", we have this little script watching for local documentroot modifications, duplicating them to the glusterfs share. The infinite loop is avoided by a md5 comparison. Here a very early proof of concept : #!/bin/sh [ $# -lt 2 ]&& echo "usage: $0<source> <destination>"&& exit 1 PATH=${PATH}:/bin:/sbin:/usr/bin:/usr/sbin; export PATH SRC=$1 DST=$2 cd ${SRC} # no recursion RSYNC='rsync -dlptgoD --delete "${srcdir}" "${dstdir}/"' inotifywait -mr \ --exclude \..*\.sw.* \ -e close_write -e create -e delete_self -e delete . | \ while read dir action file do srcdir="${SRC}/${dir}" dstdir="${DST}/${dir}" [ -d "${srcdir}" ]&& \ [ ! -z "`df -T \"${srcdir}\"|grep tmpfs`" ] \ && continue # debug echo ${dir} ${action} ${file} case "${action}" in CLOSE_WRITE,CLOSE) [ ! -f "${dstdir}/${file}" ]&& eval ${RSYNC}&& continue md5src="`md5sum \"${srcdir}/${file}\"|cut -d' ' -f1`" md5dst="`md5sum \"${dstdir}/${file}\"|cut -d' ' -f1`" [ ! $md5src == $md5dst ]&& eval ${RSYNC} ;; CREATE,ISDIR) [ ! -d "${dstdir}/${file}" ]&& eval ${RSYNC} ;; DELETE|DELETE,ISDIR) eval ${RSYNC} ;; esac done As for now a gluster mountpoint is barely unusable as an Apache DocumentRoot for us (and yes, with htaccess disabled), i'd like to have the list's point of view on this approach. Do you see any terrible glitch ? Thanks in advance, -- Emile Heitor, Responsable d'Exploitation --- www.nbs-system.com, 140 Bd Haussmann, 75008 Paris Tel: 01.58.56.60.80 / Fax: 01.58.56.60.81