Posted on 2006-03-07 21:11:37-08 by yani
Trying to track down a glibc bug
I have been encountering a particularly intractable bug, on only one page and, only when the form on the page is submitted without errors.

The following line shows up in the httpd error log:
*** glibc detected *** corrupted double-linked list: 0x0c6e82a8 ***
[Fri Mar 3 16:19:48 2006] [notice] child pid 15175 exit signal Aborted (6)

Oh, but sometimes it does this instead:
[Fri Mar 3 16:37:36 2006] [notice] child pid 15238 exit signal Segmentation fault (11)

A logical place to look then, is in the code that runs when the form submit gets past all the error checks, except that I found (from sticking warns all over the place) that the error does not happen at a fixed position in the code, and that after commenting out numerous things that seemed like possible causes, the error still happens (and still bounces around).

So, coming at it from the other direction, when httpd is invoked through gdb, a backtrace revealed that the line throwing that error is line 288 of Want.xs (which is what brought me here). That line is the close-curly-brace of a block that starts like this:
for (o = start; o; p = o, o = o->op_sibling, ++cn) {

So, somehow a badly-formed something is getting there, I just need to figure out what/how, and I'm kind of at a loss for what to try next. If I comment out the "use Want" line (which is in the .pms loaded up in mod_perl when httpd is started up), Apache does start, but then none of the pages (which are parsed by the parsing engine in those .pms) can even be viewed.

Oh, and, to make it more interesting, the same code (parsing engine, pages...) is running on the production server, and it works there, but everything else is a different version.

The (development) machine that has the bug:

uname -a
Linux 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686 i386 GNU/Linux

cat /etc/redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant)

rpm -qa | grep glibc
glibc-common-2.3.4-2
glibc-headers-2.3.4-2
glibc-kernheaders-2.4-9.1.87
glibc-2.3.4-2
glibc-devel-2.3.4-2

and has perl 5.8.5, Want 0.09, apache_1.3.34, mod_perl_1.29

The machine that does not have the bug:

uname -a
Linux 2.4.21-15.0.4.ELsmp #1 SMP Sat Jul 31 01:25:25 EDT 2004 i686 i686 i386 GNU/Linux

cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon Update 2)

rpm -qa | grep glibc
glibc-common-2.3.2-95.20
glibc-profile-2.3.2-95.20
glibc-kernheaders-2.4-8.34
glibc-devel-2.3.2-95.20
glibc-2.3.2-95.20
glibc-utils-2.3.2-95.20
glibc-headers-2.3.2-95.20

and has perl 5.8.4, Want 0.07, apache_1.3.31 and mod_perl_1.29

Does anybody have any ideas what I should try next?

(If I wanted to try reverting the Want on the broken machine to 0.07, how could I do that? I don't see sources on CPAN, and I don't have source on the production machine, just the Want.so.)

Yani
Direct Responses: 2084 | Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.