My gnome desktop totally hanged, and out of
frustration,
I just pushed the reset button.
Now I got a memory corruption error when trying to
start
up slapd:
*** glibc detected *** malloc(): memory corruption:
0x08176080 ***
slapd-neo/start-slapd: line 33: 7560 Aborted
./ns-slapd -D /opt/fedora-ds/slapd-neo -i
/opt/fedora-ds/slapd-neo/logs/pid -w $STARTPIDFILE
"$@"
Couldn''t find anything about cleaning up corrupted
data
in the admin guide. Could someone tell what''s this
error about?
Now I wonder, if I can get this kind of corruption
that
easily, how would people handle it in real production
environment? If I get a sudden power outage, or the
cleaning guy just trips on the power cord, and boom,
the server won''t start. That''s not pretty, isn''t it?
thanks
sz
__________________________________
Start your day with Yahoo! - Make it your home page!
http://www.yahoo.com/r/hs
David Boreham
2005-Nov-11 03:02 UTC
Re: [Fedora-directory-users] help with memory corruption
speedy zinc wrote:>*** glibc detected *** malloc(): memory corruption: >0x08176080 *** >slapd-neo/start-slapd: line 33: 7560 Aborted > ./ns-slapd -D /opt/fedora-ds/slapd-neo -i >/opt/fedora-ds/slapd-neo/logs/pid -w $STARTPIDFILE >"$@" > >Couldn''t find anything about cleaning up corrupted >data >in the admin guide. Could someone tell what''s this >error about? > >I''m not sure. First let''s figure out which process is causing this error. I''m not 100% that it''s the server itself. What do you see in the errors log ? Do you get the server startup banner ? You could also try running under strace (make a copy of start-slapd, edit it to add ''strace -f -o /tmp/straceout'' to the line mentioned above). The output from strace will give us some idea of what''s happening. Running in gdb with a breakpoint on the heap error function you see caled above would be cool if you can manage that. Post the stack trace here if so.>Now I wonder, if I can get this kind of corruption >that >easily, how would people handle it in real production >environment? If I get a sudden power outage, or the >cleaning guy just trips on the power cord, and boom, >the server won''t start. That''s not pretty, isn''t it? > >Let''s hold judgement until we figure out the problem. I''ve been working on this code for 9 years and this is the first time I''ve seen something like this in released server code, if indeed that''s what is triggering the error here. This must be code you built, right ? (I don''t believe the FDS binaries are built with the debug heap). It''s also possible that this is a build or compiler issue. Again I don''t believe the product has been extensively tested when built with gcc4.
speedy zinc
2005-Nov-11 03:44 UTC
Re: [Fedora-directory-users] help with memory corruption
--- David Boreham wrote:> speedy zinc wrote: > > > > I''m not sure. First let''s figure out which process > is causing this error. > I''m not 100% that it''s the server itself. What do > you see in the errors > log ? Do you get the server startup banner ? You > could also try running > under strace (make a copy of start-slapd, edit it to > add ''strace -f -o > /tmp/straceout'' to the > line mentioned above). The output from strace will > give us some idea of > what''s happening. Running in gdb with a breakpoint > on the heap error > function you see caled above would be cool if you > can manage that. > Post the stack trace here if so. >Let me try if I understand your instruction here, and see if I can get anything. And no, I don''t get any banner, it hasn''t got to the flash screen at all.> Let''s hold judgement until we figure out the > problem. > I''ve been working on this code for 9 years and this > is the first time > I''ve seen something like this in released server > code, if indeed > that''s what is triggering the error here. >Sorry if that sounds like flaming. I didn''t do anythign for the whole, was working on a paper, and have Eclipse open on some test codes (which has nothing to do with FDS). Eclipse started to hang, and then Gnome, and then nothing works, except the cursor. But before I reset, I forgot that I have fds running.> This must be code you built, right ? (I don''t > believe the FDS binaries > are built with the debug heap). It''s also possible > that this is a > build or compiler issue. Again I don''t believe the > product has been > extensively tested when built with gcc4. >No, this is the package I downloaded. It''s in rpm format, I use alien to convert it to a deb package, and installed. I''ve never been able to build it yet, have been banging my head and asking for help on this list for a long time. Maybe I should just install a fedora distro or something... thanks sz __________________________________ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
David Boreham
2005-Nov-11 03:56 UTC
Re: [Fedora-directory-users] help with memory corruption
>for the whole, was working on a paper, and have >Eclipse >open on some test codes (which has nothing to do with >FDS). Eclipse started to hang, and then Gnome, and >then >nothing works, except the cursor. > >Well I doubt this is anything to do with FDS. It sounds like the machine, or at least the window manager has some serious problems.>But before I reset, I forgot that I have fds running. > >Reset with the server running is absolutely ok. If it were the sudden shutdown that had triggered the problem, it''d show up in database recovery. The fact that you have no output in the errors file indicates that we never got to recovery.> No, this is the package I downloaded. It''s in rpm > >format, I use alien to convert it to a deb package, >and installed. > >Ok, I guess it must be linked with the debug heap. Strange, because that would seriously affect performance, I suspect.
speedy zinc
2005-Nov-11 04:04 UTC
Re: [Fedora-directory-users] help with memory corruption
BTW, this is my home machine, and I have not worked on this for 2 days. The last access (from the timestamp in the access file) dates back to 11/09. And the error file is empty. sz __________________________________ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
Chen Shaopeng
2005-Nov-11 04:18 UTC
Re: [Fedora-directory-users] help with memory corruption
speedy zinc wrote:> > *** glibc detected *** malloc(): memory corruption: > 0x08176080 *** > slapd-neo/start-slapd: line 33: 7560 Aborted > ./ns-slapd -D /opt/fedora-ds/slapd-neo -i > /opt/fedora-ds/slapd-neo/logs/pid -w $STARTPIDFILE > "$@" >Hmm, not sure if this is the same problem, but it looks very similar. Take a look at your dse.ldif, and see if you have any plugin config which points to a non-existing .so file. I had a similar problem some time ago, when I moved my plugin .so file to another location, and forgot to update the dse.ldif file. csp -- Chen Shaopeng http://www.idsignet.com
speedy zinc
2005-Nov-15 01:27 UTC
Re: [Fedora-directory-users] help with memory corruption
Sorry for long delay, was preparing for my exams. --- Chen Shaopeng <chen_shaopeng@idsignet.com> wrote:> Hmm, not sure if this is the same problem, but it > looks > very similar. > > Take a look at your dse.ldif, and see if you have > any plugin > config which points to a non-existing .so file. > > I had a similar problem some time ago, when I moved > my plugin > .so file to another location, and forgot to update > the > dse.ldif file. >That was the problem, I had put the example plugin in, and remove the folder when my eclipse hangs and refuses to start. It was easy to find out with David''s instruction. Here''s part of the strace message: 9067 close(6) = 0 9067 open("/home/chris/workspace/examples/Debug/libexamples.so", O_RDONLY) = -1 ENOENT (No such file or directory) 9067 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 9067 time([1131682123]) = 1131682123 9067 time(NULL) = 1131682123 9067 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=165, ...}) = 0 9067 open("/dev/tty", O_RDWR|O_NONBLOCK|O_NOCTTY) = 6 9067 writev(6, [{"*** glibc detected *** ", 23}, {"malloc(): memory corruption", 27}, {": 0x", 4}, {"08176080", 8}, {" ***\n", 5}], 5) = 67 9067 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 9067 tgkill(9067, 9067, SIGABRT) = 0 9067 --- SIGABRT (Aborted) @ 0 (0) --- 9067 +++ killed by SIGABRT +++ I removed the entry in the dse.ldif file, and everything works again. thanks a lot. sz. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
David Boreham
2005-Nov-15 01:34 UTC
Re: [Fedora-directory-users] help with memory corruption
>That was the problem, I had put the example plugin in, >and remove the folder when my eclipse hangs and >refuses >to start. > >Hmm. We shouldn''t be aborting with no error message when a plugin .so file is missing : that''s not right. I''m not sure if this has always been broken (seems unlikely) or is some issue with the underlying shared library opening code in NSPR that has changed underneath us. Either way I think this should be fixed sometime. Worst case we can stat the file first.