Posted on 2006-03-07 21:11:37-08 by yani
Trying to track down a glibc bug
I have been encountering a particularly intractable bug, on only one page and, only when the form on the page is submitted without errors.

The following line shows up in the httpd error log:
*** glibc detected *** corrupted double-linked list: 0x0c6e82a8 ***
[Fri Mar 3 16:19:48 2006] [notice] child pid 15175 exit signal Aborted (6)

Oh, but sometimes it does this instead:
[Fri Mar 3 16:37:36 2006] [notice] child pid 15238 exit signal Segmentation fault (11)

A logical place to look then, is in the code that runs when the form submit gets past all the error checks, except that I found (from sticking warns all over the place) that the error does not happen at a fixed position in the code, and that after commenting out numerous things that seemed like possible causes, the error still happens (and still bounces around).

So, coming at it from the other direction, when httpd is invoked through gdb, a backtrace revealed that the line throwing that error is line 288 of Want.xs (which is what brought me here). That line is the close-curly-brace of a block that starts like this:
for (o = start; o; p = o, o = o->op_sibling, ++cn) {

So, somehow a badly-formed something is getting there, I just need to figure out what/how, and I'm kind of at a loss for what to try next. If I comment out the "use Want" line (which is in the .pms loaded up in mod_perl when httpd is started up), Apache does start, but then none of the pages (which are parsed by the parsing engine in those .pms) can even be viewed.

Oh, and, to make it more interesting, the same code (parsing engine, pages...) is running on the production server, and it works there, but everything else is a different version.

The (development) machine that has the bug:

uname -a
Linux 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686 i386 GNU/Linux

cat /etc/redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant)

rpm -qa | grep glibc
glibc-common-2.3.4-2
glibc-headers-2.3.4-2
glibc-kernheaders-2.4-9.1.87
glibc-2.3.4-2
glibc-devel-2.3.4-2

and has perl 5.8.5, Want 0.09, apache_1.3.34, mod_perl_1.29

The machine that does not have the bug:

uname -a
Linux 2.4.21-15.0.4.ELsmp #1 SMP Sat Jul 31 01:25:25 EDT 2004 i686 i686 i386 GNU/Linux

cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon Update 2)

rpm -qa | grep glibc
glibc-common-2.3.2-95.20
glibc-profile-2.3.2-95.20
glibc-kernheaders-2.4-8.34
glibc-devel-2.3.2-95.20
glibc-2.3.2-95.20
glibc-utils-2.3.2-95.20
glibc-headers-2.3.2-95.20

and has perl 5.8.4, Want 0.07, apache_1.3.31 and mod_perl_1.29

Does anybody have any ideas what I should try next?

(If I wanted to try reverting the Want on the broken machine to 0.07, how could I do that? I don't see sources on CPAN, and I don't have source on the production machine, just the Want.so.)

Yani
Direct Responses: 2084 | Write a response
Posted on 2006-03-30 17:12:16-08 by dogbreath in response to 1909
Re: Trying to track down a glibc bug
I'm having a similar problem - but with a PHP app, so I think this may be a bug in glibc or kernel 2.6.9-5: Similarly, I have dev and production servers (RHEL4), and the glibc error only occurs on the one with kernel 2.6.9-5.ELsmp. In my case, the same glibc errors turn up in the httpd error_log for Apache 2.0.52. To compare environments:
--prod server (glibc errors)-- # uname -a Linux asgweb02.ucns.uga.edu 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686 i386 GNU/Lin +ux # cat /etc/redhat-release Red Hat Enterprise Linux ES release 4 (Nahant Update 3) # rpm -qa | grep glibc glibc-kernheaders-2.4-9.1.98.EL glibc-common-2.3.4-2.19 glibc-headers-2.3.4-2.19 glibc-2.3.4-2.19 glibc-devel-2.3.4-2.19 glibc-2.3.4-2.13 # /usr/sbin/httpd -v Server version: Apache/2.0.52 --dev server (no glibc errors)-- # uname -a Linux asg-rh-webdev 2.6.9-11.ELsmp #1 SMP Fri May 20 18:26:27 EDT 2005 i686 i686 i386 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux ES release 4 (Nahant Update 3) # rpm -qa | grep glibc glibc-2.3.4-2.13 glibc-2.3.4-2.19 glibc-headers-2.3.4-2.19 glibc-kernheaders-2.4-9.1.98.EL glibc-devel-2.3.4-2.19 glibc-common-2.3.4-2.19 # /usr/sbin/httpd -v Server version: Apache/2.0.52
Direct Responses: 2087 | Write a response
Posted on 2006-03-30 21:21:13-08 by yani in response to 2084
Re: Trying to track down a glibc bug
Interesting, I remember a kernel issue being one of the possible causes I had on my list, and I forget what exactly led me to consider that, but I never got as far as playing with kernel upgrades on this box.
I fixed the issue I was having by downgrading to Want-0.07.
Robin helpfully provided me with a link:
http://backpan.perl.org/authors/id/R/RO/ROBIN/Want-0.07.tar.gz

I have also now tried the newer Want-0.10 and it has the same issue as Want-0.09:

*** glibc detected *** corrupted double-linked list: 0x0aed9788 *** [Thu Mar 30 10:49:36 2006] [notice] child pid 19739 exit signal Aborted (6)
Direct Responses: Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.