eeb@clusterfs.com
2007-Feb-08 09:03 UTC
[Lustre-devel] [Bug 11548] add lnet router traceability for debug purposes
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11548 Created an attachment (id=9543) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9543&action=view) patch to print router NID on checksum failures This patch adds a ''sender'' field to the LNET completion event and the lustre bulk descriptor. This records the NID of the peer that the node received a message from as well as the message initiator. If they are different, then ''sender'' is the NID of the last router that forwarded the message. If ''sender'' != initiator.nid when a bulk checksum fails (WRITEs on the server - READs on the client), this patch includes "via <router NID>" in the error message so that the last router that forwarded the bulk data can be identified. Please note that this patch has had minimal testing - I''ve checked that ''sender'' is being set correctly, but I''ve not forced checksum errors etc.