Hi, On a few of the hpc cluster nodes, i am seeing a new lustre error that is pasted below. The volumes are working fine and there is nothing on the oss and mds to report. LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) data3-OST0000_UUID at 192.168.2.98@tcp changed handle from 0xfe51139158c64fae to 0xfe511392a35878b3; copying, but this may foreshadow disaster LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) data4-OST0000_UUID at 192.168.2.98@tcp changed handle from 0xfe51139158c6502c to 0xfe511392a35878c1; copying, but this may foreshadow disaster LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) scratch2-OST0003_UUID at 192.168.2.99@tcp changed handle from 0x9ee58a75fddf2834 to 0x9ee58a761d190470; copying, but this may foreshadow disaster LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) scratch1-OST0003_UUID at 192.168.2.99@tcp changed handle from 0x9ee58a75fddf2754 to 0x9ee58a761d190462; copying, but this may foreshadow disaster Any help in interpreting these error messages is much appreciated. The two lustre oss/mds/mgs servers have been running fine with an uptime for over a month now after the ldiskfs patch is applied as mentioned here - bugzilla.lustre.org/show_bug.cgi?id=13438 The version of Lustre is 1.6.3. Thanks Balagopal
Hello! On Mar 6, 2008, at 10:57 AM, Balagopal Pillai wrote:> On a few of the hpc cluster nodes, i am seeing a new > lustre > error that is pasted below. The volumes are working fine and there > is nothing on the oss and mds to report. > LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) > data3-OST0000_UUID at 192.168.2.98@tcp changed handle from > 0xfe51139158c64fae to 0xfe511392a35878b3; copying, but this may > foreshadow disasterThis is some serious condition. Please see bug 14775 (bugzilla.lustre.org/show_bug.cgi? id=14775) for the patch and explanations.. Bye, Oleg
The bugzilla entry mentions that the patch was to go into 1.6.4 - which version are you running? The current stable if I am not mistaken is 1.6.4-2... Are there any bugs in that which we should patch before we use it in production? Michael -----Original Message----- From: Balagopal Pillai [mailto:pillai at mathstat.dal.ca] Sent: Thursday, March 06, 2008 07:57 AM Pacific Standard Time To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] strange lustre errors Hi, On a few of the hpc cluster nodes, i am seeing a new lustre error that is pasted below. The volumes are working fine and there is nothing on the oss and mds to report. LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) data3-OST0000_UUID at 192.168.2.98@tcp changed handle from 0xfe51139158c64fae to 0xfe511392a35878b3; copying, but this may foreshadow disaster LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) data4-OST0000_UUID at 192.168.2.98@tcp changed handle from 0xfe51139158c6502c to 0xfe511392a35878c1; copying, but this may foreshadow disaster LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) scratch2-OST0003_UUID at 192.168.2.99@tcp changed handle from 0x9ee58a75fddf2834 to 0x9ee58a761d190470; copying, but this may foreshadow disaster LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret()) scratch1-OST0003_UUID at 192.168.2.99@tcp changed handle from 0x9ee58a75fddf2754 to 0x9ee58a761d190462; copying, but this may foreshadow disaster Any help in interpreting these error messages is much appreciated. The two lustre oss/mds/mgs servers have been running fine with an uptime for over a month now after the ldiskfs patch is applied as mentioned here - bugzilla.lustre.org/show_bug.cgi?id=13438 The version of Lustre is 1.6.3. Thanks Balagopal _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: lists.lustre.org/pipermail/lustre-discuss/attachments/20080306/5d377b3e/attachment-0002.html