We are seeing really bad performance from a java app and it boils down to poor performance from 1-byte reads from a Lustre file system. After a detailed strace of the application running, I have generated the following code snippet which demonstrates the problem #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char **argv) { int fd=open("5GB_file", O_RDONLY, 0666); int i,j; char b[1]; for(i=0; i<10000; i++) { lseek(fd, i*500, SEEK_SET); for(j=0; j<100; j++) { read(fd, b, 1); } } close(fd); } 5GB_file is just a dd if=/dev/zero of=5GB_file bs=1024k count=5000 . Anyway, this code runs in <1s on local disk, ~5s on NFS and >30s on Lustre... I was disappointed to see Lustre slower than nfs. I was hoping that Lustre''s read-a-head would have been triggered by this code, but it doesn''t appear to be. Any way I can tune Lustre to work better with this code? (I know, change the code, but it isn''t that easy - this is a c example of an strace of a java app - and changing the original java app isn''t so easy). Thanks Stu. -- Dr Stuart Midgley sdm900@gmail.com
Extra, I zero''ed /proc/fs/lustre/llite/fs0/read_ahead_stats and re- ran the test and checked the read ahead stats afterward... and they were all zero. So, I assume this means that Lustre isn''t doing any read ahead in this case. On 17/07/2007, at 9:49 AM, Stuart Midgley wrote:> We are seeing really bad performance from a java app and it boils > down to > poor performance from 1-byte reads from a Lustre file system. After a > detailed strace of the application running, I have generated the > following > code snippet which demonstrates the problem > > #include <sys/stat.h> > #include <fcntl.h> > #include <unistd.h> > > int main(int argc, char **argv) { > int fd=open("5GB_file", O_RDONLY, 0666); > int i,j; > char b[1]; > > for(i=0; i<10000; i++) { > lseek(fd, i*500, SEEK_SET); > for(j=0; j<100; j++) { > read(fd, b, 1); > } > } > close(fd); > } > > > 5GB_file is just a dd if=/dev/zero of=5GB_file bs=1024k count=5000 . > > Anyway, this code runs in <1s on local disk, ~5s on NFS and >30s on > Lustre... I was disappointed to see Lustre slower than nfs. I was > hoping that Lustre''s read-a-head would have been triggered by this > code, > but it doesn''t appear to be. Any way I can tune Lustre to work better > with this code? (I know, change the code, but it isn''t that easy - > this > is a c example of an strace of a java app - and changing the > original java > app isn''t so easy). > > Thanks > Stu. >-- Dr Stuart Midgley sdm900@gmail.com
Sorry to be obnoxious, but why is an app doing 1 byte reads the filesystem''s problem? This seems like something that really should be the responsibility of the application, not the FS. why isn''t the java runtime doing some read-ahead of it''s own? On Tue, Jul 17, 2007 at 07:26:26PM +0800, Stuart Midgley wrote:> Extra, I zero''ed /proc/fs/lustre/llite/fs0/read_ahead_stats and re- > ran the test and checked the read ahead stats afterward... and they > were all zero. So, I assume this means that Lustre isn''t doing any > read ahead in this case. > > > On 17/07/2007, at 9:49 AM, Stuart Midgley wrote: > > >We are seeing really bad performance from a java app and it boils > >down to > >poor performance from 1-byte reads from a Lustre file system. After a > >detailed strace of the application running, I have generated the > >following > >code snippet which demonstrates the problem > > > >#include <sys/stat.h> > >#include <fcntl.h> > >#include <unistd.h> > > > >int main(int argc, char **argv) { > > int fd=open("5GB_file", O_RDONLY, 0666); > > int i,j; > > char b[1]; > > > > for(i=0; i<10000; i++) { > > lseek(fd, i*500, SEEK_SET); > > for(j=0; j<100; j++) { > > read(fd, b, 1); > > } > > } > > close(fd); > >} > > > > > >5GB_file is just a dd if=/dev/zero of=5GB_file bs=1024k count=5000 . > > > >Anyway, this code runs in <1s on local disk, ~5s on NFS and >30s on > >Lustre... I was disappointed to see Lustre slower than nfs. I was > >hoping that Lustre''s read-a-head would have been triggered by this > >code, > >but it doesn''t appear to be. Any way I can tune Lustre to work better > >with this code? (I know, change the code, but it isn''t that easy - > >this > >is a c example of an strace of a java app - and changing the > >original java > >app isn''t so easy). > > > >Thanks > >Stu. > > > > > -- > Dr Stuart Midgley > sdm900@gmail.com > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-- -------------------------------------------------------------------------- Troy Benjegerdes ''da hozer'' hozer@hozed.org Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn''t have any meaning for them if they didn''t. That''s why I draw cartoons. It''s my life." -- Charles Shultz
On Jul 17, 2007 09:49 +0800, Stuart Midgley wrote:> We are seeing really bad performance from a java app and it boils down to > poor performance from 1-byte reads from a Lustre file system. After a > detailed strace of the application running, I have generated the following > code snippet which demonstrates the problem > > Anyway, this code runs in <1s on local disk, ~5s on NFS and >30s on > Lustre... I was disappointed to see Lustre slower than nfs. I was > hoping that Lustre''s read-a-head would have been triggered by this code, > but it doesn''t appear to be. Any way I can tune Lustre to work better > with this code? (I know, change the code, but it isn''t that easy - this > is a c example of an strace of a java app - and changing the original java > app isn''t so easy).Two likely reasons: - lustre has DLM overhead for each read() syscall that local filesystems and NFS do not have - the default debug level for lustre is punishing for small reads. Try setting "sysctl -w lnet.debug=0" to test this. The default debug level will be changing in lustre 1.6.1. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
There is a reason your a super star # sysctl -w lnet.debug=0 lnet.debug = 0 > time ~/tmp/a.out 0.052u 4.937s 0:04.98 100.0% 0+0k 0+0io 0pf+0w so MUCH better. While still not as good as NFS, it is definitely acceptable. Thanks Stu.> Two likely reasons: > - lustre has DLM overhead for each read() syscall that local > filesystems > and NFS do not have > - the default debug level for lustre is punishing for small reads. > Try > setting "sysctl -w lnet.debug=0" to test this. The default debug > level > will be changing in lustre 1.6.1. > > Cheers, Andreas >-- Dr Stuart Midgley sdm900@gmail.com
Stuart Midgley wrote:> There is a reason your a super star > > # sysctl -w lnet.debug=0 > lnet.debug = 0 > > > time ~/tmp/a.out > 0.052u 4.937s 0:04.98 100.0% 0+0k 0+0io 0pf+0w > > so MUCH better. While still not as good as NFS, it is definitely > acceptable. >Turn off the debugging on the server too, if you have access. You also might want to try increasing the # of rpc''s in flight on the client: /proc/fs/lustre/osc/*/max_rpcs_in_flight /proc/fs/lutre/mdc/*/max_rpcs_in_flight and/or increase your readahead limits: /proc/fs/lustre/llite/lustre-c6cfd238/max_read_ahead_whole_mb /proc/fs/lustre/llite/lustre-c6cfd238/max_read_ahead_mb> Thanks > Stu. > > >> Two likely reasons: >> - lustre has DLM overhead for each read() syscall that local filesystems >> and NFS do not have >> - the default debug level for lustre is punishing for small reads. Try >> setting "sysctl -w lnet.debug=0" to test this. The default debug >> level >> will be changing in lustre 1.6.1. >> >> Cheers, Andreas >> > --Dr Stuart Midgley > sdm900@gmail.com > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Morning I have turned off debugging for our entire environment. Increasing the maximum number of RPC''s didn''t assist performance at all for this issue, nor did increasing the read ahead. But thanks for the suggestions Stu.> Turn off the debugging on the server too, if you have access. > You also might want to try increasing the # of rpc''s in flight on > the client: > /proc/fs/lustre/osc/*/max_rpcs_in_flight > /proc/fs/lutre/mdc/*/max_rpcs_in_flight > and/or increase your readahead limits: > /proc/fs/lustre/llite/lustre-c6cfd238/max_read_ahead_whole_mb > /proc/fs/lustre/llite/lustre-c6cfd238/max_read_ahead_mb >-- Dr Stuart Midgley sdm900@gmail.com