gnukix@alltel.blackberry.com
2009-Nov-02 03:15 UTC
Performance issues with 8.0 ZFS and sendfile/lighttpd
I can send in more documentation later but I am seeing severe zfs performance issues with lighttpd. Same machine using UFS will push 1gbit or more but same content and traffic load can not hit 200mbit. Ufs does around 3 megabytes/sec IO at 800mbit network but zfs pushes the disks into the ground with 50+ megabytes/sec dusk i/o. No compression no atime no checksums on zfs and still same IO levels. Ufs with soft updates and atime on. Orders of magnitude more disk IO... Like zfs isn't using cache or isn't coalescing disk reads or both. Has anyone else seen this or have any recommendations? Lighttpd config remains exactly the same as well FYI. Only difference is ufs vs zfs. Sent from my BlackBerry Smartphone provided by Alltel
gnukix@alltel.blackberry.com wrote:> I can send in more documentation later but I am seeing severe zfs performance issues with lighttpd. Same machine using UFS will push 1gbit or more but same content and traffic load can not hit 200mbit. Ufs does around 3 megabytes/sec IO at 800mbit network but zfs pushes the disks into the ground with 50+ megabytes/sec dusk i/o. No compression no atime no checksums on zfs and still same IO levels. Ufs with soft updates and atime on. Orders of magnitude more disk IO... Like zfs isn't using cache or isn't coalescing disk reads or both. > > Has anyone else seen this or have any recommendations? Lighttpd config remains exactly the same as well FYI. Only difference is ufs vs zfs.AFAIK, ZFS is incompatible (currently) with some advanced VM operations (like mmap, and I think sendfile relies on the same mechanism as mmap), so that could be a cause of the slowdown. Though I'm surprised you can only get 200 MBit/s - that's 25 MB/s and I think that even with multiple memcpy-ing data around the kernel you should be able to get hundreds of MB/s on newer hardware (which normally really can achieve tens of gigabytes/s of sustained memory access). What else can you observe from your system? Do you have exceedingly high sys times and load numbers? I'm also interested in what does 10 seconds of running 'vmstat 1' looks like on your system. Is it a bare machine or a virtual machine?
gnukix@alltel.blackberry.com wrote:> I can send in more documentation later but I am seeing severe zfs performance issues with lighttpd. Same machine using UFS will push 1gbit or more but same content and traffic load can not hit 200mbit. Ufs does around 3 megabytes/sec IO at 800mbit network but zfs pushes the disks into the ground with 50+ megabytes/sec dusk i/o. No compression no atime no checksums on zfs and still same IO levels. Ufs with soft updates and atime on. Orders of magnitude more disk IO... Like zfs isn't using cache or isn't coalescing disk reads or both. > > Has anyone else seen this or have any recommendations? Lighttpd config remains exactly the same as well FYI. Only difference is ufs vs zfs.Hi, I can confirm this on FreeBSD 7.2-STABLE. first run: :~# wget -O /dev/null http://192.168.1.1/1000m.bin --2009-11-04 15:36:15-- http://192.168.1.1/1000m.bin Connecting to 192.168.1.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576000 (1000M) [application/octet-stream] Saving to: `/dev/null' 100%[====================================>] 1,048,576,000 17.0M/s in 81s 2009-11-04 15:37:36 (12.3 MB/s) - `/dev/null' saved [1048576000/1048576000] second run is even slower, cannot wait till end: :~# wget -O /dev/null http://192.168.1.1/1000m.bin --2009-11-04 15:40:00-- http://192.168.1.1/1000m.bin Connecting to 192.168.1.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576000 (1000M) [application/octet-stream] Saving to: `/dev/null' 71% [==========================> ] 752,173,056 2.10M/s eta 2m 0s ^C After changing in lighttpd.conf server.network-backend = "writev" first run: :~# wget -O /dev/null http://192.168.1.1/1000m.bin --2009-11-04 15:47:51-- http://192.168.1.1/1000m.bin Connecting to 192.168.1.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576000 (1000M) [application/octet-stream] Saving to: `/dev/null' 100%[====================================>] 1,048,576,000 44.1M/s in 27s 2009-11-04 15:48:18 (37.2 MB/s) - `/dev/null' saved [1048576000/1048576000] second & third run: :~# wget -O /dev/null http://192.168.1.1/1000m.bin --2009-11-04 15:48:20-- http://192.168.1.1/1000m.bin Connecting to 192.168.1.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576000 (1000M) [application/octet-stream] Saving to: `/dev/null' 100%[====================================>] 1,048,576,000 788M/s in 1.3s 2009-11-04 15:48:21 (788 MB/s) - `/dev/null' saved [1048576000/1048576000] :~# wget -O /dev/null http://192.168.1.1/1000m.bin --2009-11-04 15:48:24-- http://192.168.1.1/1000m.bin Connecting to 192.168.1.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1048576000 (1000M) [application/octet-stream] Saving to: `/dev/null' 100%[====================================>] 1,048,576,000 910M/s in 1.1s 2009-11-04 15:48:25 (910 MB/s) - `/dev/null' saved [1048576000/1048576000] I have Intel 10GbE PCI-Express directly connected between two servers. -- Urmas Lett Tel: +(372) 7 302 110 Fax: +(372) 7 302 111 E-Mail: urmas.lett@eenet.ee