nic@cray.com
2007-Jan-09 16:13 UTC
[Lustre-devel] [Bug 11283] 1.4.9.pre: SEGV in libcfs_debug_vmsg2: format1 == NULL
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11283 So -- There is one major issue with the patch and a few small nits: The 2 changes to lustre/include/linux/lustre_net.h re-add the annoying newlines back into the debug messages. Ick :) The big one is the change from "buf[256]" to "buf[4047]". To start, 4047 is an awfully strange value to just pick, at minimum that needs to be commented by CFS. The worst part is that lputs() has a hard limit of 256 chars that it will actually print to the console. Here is a short example: nic@guppy1:~> cat lputs_test.c #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { int size = 0; int i; char *buf = NULL; if (argc < 2) { fprintf(stderr, "usage: %s size\n", argv[0]); exit(1); } size = atoi(argv[1]); buf = (char *)malloc( (sizeof(char) * size) + 1); for (i=0; i < size; i++) { sprintf(buf+i, "%c", ''a''+ (i % 26)); } buf[size] = ''\0''; printf("len: %d buf: %s\n", strlen(buf), buf); lputs(buf); return 0; } I''ll run it twice: nic@guppy1:~> yod -np 1 ./qk_lputs_test 252 len: 252 buf: abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcde jklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzab ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr nic@guppy1:~> yod -np 1 ./qk_lputs_test 253 len: 253 buf: abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcde jklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzab ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs>From console log - notice that the leading "0- " takes up 4 of the possible chars.[2007-01-09 16:56:05][c0-0c0s3n2]0- ******* _cstart2(), yod_pid=30159 rank=0 lognid=0 physnid=0xe pid=5 [2007-01-09 16:56:05][c0-0c0s3n2]0- abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr [2007-01-09 16:56:06][c0-0c0s3n2]0- received final app termination, pid=5 [2007-01-09 16:56:08][c0-0c0s3n2]0- ******* _cstart2(), yod_pid=30162 rank=0 lognid=0 physnid=0xe pid=2 [2007-01-09 16:56:08][c0-0c0s3n2]0- abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr On the second run we are missing the very last "s". So -- at best we are going to get quite truncated error messages from Lustre, which is enough for me to light this on fire & hand back to you folks :)