nic@cray.com
2007-Jan-09  16:13 UTC
[Lustre-devel] [Bug 11283] 1.4.9.pre: SEGV in libcfs_debug_vmsg2: format1 == NULL
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11283
So --  There is one major issue with the patch and a few small nits:
The 2 changes to lustre/include/linux/lustre_net.h re-add the annoying newlines
back into the debug messages.  Ick :)
The big one is the change from "buf[256]" to "buf[4047]". To
start, 4047 is an
awfully strange value to just pick, at minimum that needs to be commented by
CFS. The worst part is that lputs() has a hard limit of 256 chars that it will
actually print to the console.
Here is a short example:
nic@guppy1:~> cat lputs_test.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
       int size = 0;
       int i;
       char *buf = NULL;
       if (argc < 2) {
               fprintf(stderr, "usage: %s size\n", argv[0]);
               exit(1);
       }
       size = atoi(argv[1]);
       buf = (char *)malloc( (sizeof(char) * size) + 1);
       for (i=0; i < size; i++) {
               sprintf(buf+i, "%c", ''a''+ (i % 26));
       }
       buf[size] = ''\0'';
       printf("len: %d buf: %s\n", strlen(buf), buf);
       lputs(buf);
       return 0;
}
I''ll run it twice:
nic@guppy1:~> yod -np 1 ./qk_lputs_test 252
len: 252 buf:
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcde
jklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzab
ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr
nic@guppy1:~> yod -np 1 ./qk_lputs_test 253
len: 253 buf:
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcde
jklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzab
ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs
>From console log - notice that the leading "0- " takes up 4 of the
possible chars.
[2007-01-09 16:56:05][c0-0c0s3n2]0- ******* _cstart2(), yod_pid=30159 rank=0
lognid=0 physnid=0xe pid=5
[2007-01-09 16:56:05][c0-0c0s3n2]0-
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr
[2007-01-09 16:56:06][c0-0c0s3n2]0- received final app termination, pid=5
[2007-01-09 16:56:08][c0-0c0s3n2]0- ******* _cstart2(), yod_pid=30162 rank=0
lognid=0 physnid=0xe pid=2
[2007-01-09 16:56:08][c0-0c0s3n2]0-
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr
On the second run we are missing the very last "s".
So -- at best we are going to get quite truncated error messages from Lustre,
which is enough for me to light this on fire & hand back to you folks :)