We recently upgraded from 1.4.10.1 to 1.6.5.1 (clients and servers) and now we are seeing errors like Aug 27 07:49:54 oss025 kernel: LustreError: 3738:0:(ost_handler.c: 1163:ost_brw_write()) client csum 2dbc1696, server csum 9d081697 Aug 27 07:49:54 oss025 kernel: LustreError: 168-f: p1-OST0018: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-172.16.4.93 at tcp inum 24522277/426969871 object 12021/0 extent [10485760-11534335] Aug 27 07:49:55 oss025 kernel: LustreError: 3738:0:(ost_handler.c: 1225:ost_brw_write()) client csum 2dbc1696, original server csum 9d081697, server csum now 9d081697 always from the same cluster node... Should we be worried? I suspect this means we shouldn''t turn check summing off? I assume these are rejected and resent from the client? -- Dr Stuart Midgley sdm900 at gmail.com
for completeness, here are the logs from 172.16.4.93 Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE CHECKSUM: changed on the client after we checksummed it - likely false positive due to mmap IO (bug 11742): from 172.16.0.25 at tcp inum 24522277/1605841060 object 12021/0 extent [10485760-11534335] Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 1162:check_write_checksum()) original client csum 2dbc1696 (type 2), server csum 9d081697 (type 2), client csum now 9d081697 Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 1372:osc_brw_redo_request()) @@@ redo for recoverable error req at ffff81012c434600 x4720217/t820873 o4->p1- OST0018_UUID at 172.16.0.25@tcp:6/4 lens 384/480 e 0 to 100 dl 1219794694 ref 2 fl Interpret:R/0/0 rc 0/0 -- Dr Stuart Midgley sdm900 at gmail.com On 28/08/2008, at 11:57 AM, Stuart Midgley wrote:> We recently upgraded from 1.4.10.1 to 1.6.5.1 (clients and servers) > and now we are seeing errors like > > > Aug 27 07:49:54 oss025 kernel: LustreError: 3738:0:(ost_handler.c: > 1163:ost_brw_write()) client csum 2dbc1696, server csum 9d081697 > Aug 27 07:49:54 oss025 kernel: LustreError: 168-f: p1-OST0018: BAD > WRITE CHECKSUM: changed in transit before arrival at OST from > 12345-172.16.4.93 at tcp inum 24522277/426969871 object 12021/0 extent > [10485760-11534335] > Aug 27 07:49:55 oss025 kernel: LustreError: 3738:0:(ost_handler.c: > 1225:ost_brw_write()) client csum 2dbc1696, original server csum > 9d081697, server csum now 9d081697 > > > always from the same cluster node... Should we be worried? I > suspect this means we shouldn''t turn check summing off? I assume > these are rejected and resent from the client? > > > -- > Dr Stuart Midgley > sdm900 at gmail.com > > >
On Aug 28, 2008 21:49 +0800, Stuart Midgley wrote:> for completeness, here are the logs from 172.16.4.93 > > Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE > CHECKSUM: changed on the client after we checksummed it - likely false > positive due to mmap IO (bug 11742): from 172.16.0.25 at tcp inumThis is the important part to note - if you are doing mmap writes then the VM doesn''t protect the pages from being modified while they are being checksummed and sent over the wire.> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: > 1162:check_write_checksum()) original client csum 2dbc1696 (type 2), > server csum 9d081697 (type 2), client csum now 9d081697This means the data changed after the initial checksum was computed, but now it has settled down. In some cases the "client csum now" can have changed again, depending on whether the process is rewriting the same file repeatedly.> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: > 1372:osc_brw_redo_request()) @@@ redo for recoverable error > req at ffff81012c434600 x4720217/t820873 o4->p1- > OST0018_UUID at 172.16.0.25@tcp:6/4 lens 384/480 e 0 to 100 dl 1219794694 > ref 2 fl Interpret:R/0/0 rc 0/0Here it tells you it is resending the RPC.> > always from the same cluster node... Should we be worried? I > > suspect this means we shouldn''t turn check summing off? I assume > > these are rejected and resent from the client?If you are NOT doing mmap IO (just normal read/write) then it is possible your node has memory corruption. There is an extra check that can be done to checksum the pages while they are in memory, instead of just over the wire. More overhead of course, but can help isolate the problem. echo 1> /proc/fs/lustre/llite/*/checksum_pages This will also enable on-wire checksumming, which is already on by default. One caveat is that turning off checksum_pages will also turn off the on-wire checksums (which can be re-enabled via /proc/fs/lustre/osc/*/checksums)... Blame Phil. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Thanks for the information, greatly appreciated. We are keeping an eye on the client causing these errors and doing a few tests. The mmap issue is interesting. The code producing these errors is running across the entire cluster, so I assume if it was mmap-ing we would be seeing these sort of random errors from more than just one node. If the problems persist, we will turn on the extra debugging and see where we are at. -- Dr Stuart Midgley sdm900 at gmail.com On 29/08/2008, at 1:25 AM, Andreas Dilger wrote:> On Aug 28, 2008 21:49 +0800, Stuart Midgley wrote: >> for completeness, here are the logs from 172.16.4.93 >> >> Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE >> CHECKSUM: changed on the client after we checksummed it - likely >> false >> positive due to mmap IO (bug 11742): from 172.16.0.25 at tcp inum > > This is the important part to note - if you are doing mmap writes then > the VM doesn''t protect the pages from being modified while they are > being checksummed and sent over the wire. > >> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: >> 1162:check_write_checksum()) original client csum 2dbc1696 (type 2), >> server csum 9d081697 (type 2), client csum now 9d081697 > > This means the data changed after the initial checksum was computed, > but now it has settled down. In some cases the "client csum now" can > have changed again, depending on whether the process is rewriting the > same file repeatedly. > >> Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: >> 1372:osc_brw_redo_request()) @@@ redo for recoverable error >> req at ffff81012c434600 x4720217/t820873 o4->p1- >> OST0018_UUID at 172.16.0.25@tcp:6/4 lens 384/480 e 0 to 100 dl >> 1219794694 >> ref 2 fl Interpret:R/0/0 rc 0/0 > > Here it tells you it is resending the RPC. > >>> always from the same cluster node... Should we be worried? I >>> suspect this means we shouldn''t turn check summing off? I assume >>> these are rejected and resent from the client? > > If you are NOT doing mmap IO (just normal read/write) then it is > possible > your node has memory corruption. There is an extra check that can be > done to checksum the pages while they are in memory, instead of just > over the wire. More overhead of course, but can help isolate the > problem. > > echo 1> /proc/fs/lustre/llite/*/checksum_pages > > This will also enable on-wire checksumming, which is already on > by default. One caveat is that turning off checksum_pages will > also turn off the on-wire checksums (which can be re-enabled via > /proc/fs/lustre/osc/*/checksums)... Blame Phil. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >