[Bug 63328] New: Apparent race condition causes undeserved 500 / connection reset by peer errors

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Bug 63328] New: Apparent race condition causes undeserved 500 / connection reset by peer errors

Bugzilla from bugzilla@apache.org

            Bug ID: 63328
           Summary: Apparent race condition causes undeserved 500 /
                    connection reset by peer errors
           Product: Apache httpd-2
           Version: 2.4.25
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: mod_fcgid
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

There appears to be a race condition in mod_fcgid.

Here's what I see:

A Perl (CGI::Fast) application decides (actually is told) to exit.

In response to a POST, it issues a (303) redirect to its status page and
exit()s - expecting that a new instance will be started to service the

HTTPD reports:

[Sun Apr 07 13:24:13.991499 2019] [fcgid:warn] [pid 17236] (104)Connection
reset by peer: [client ...] mod_
fcgid: error reading data from FastCGI server, referer: ...
[Sun Apr 07 13:24:13.994622 2019] [core:error] [pid 17236] [...] End of script
output before headers
: notice.fcgi, referer: https://.../fastbrowser

Modsec's helpful audit log says:
Apache-Error: [file "fcgid_proc_unix.c"] [line 627] [level 4] [status 104]
mod_fcgid: error reading data from FastCGI server
Apache-Error: [file "util_script.c"] [line 500] [level 3] %s: %s
Apache-Handler: fcgid-script

And the web browser gets a 500 error from httpd.

What's interesting is that if the server generates a 200 response, the error
doesn't happen.

Further, if the application generates the 303 without doing anything else, the
500 isn't generated; the redirect works.

The crash seems to be timing sensitive.  My working theory is that:

The (Perl application)server exits, closing the FCGI server connection.  If a
200 is provided before the exit, all goes as expected.  A redirect takes time -
the browser sends the GET some time later.  If it's much later, it hits a new
server instance.  If it's at just the right time, it starts to get sent to the
(now exiting) server; the connection close is noticed, and the request is lost
to the 500.

This reproduces consistently with a real application.  I've tried to cut it
down to a reproducer, but failed.

I tried various ways to prevent this - including sending 'LastCall' - none work
in the real application.

httpd 2.4.25, mod_fcgid 2.3.9.  CGI::Fast 2.15 FCGI 0.78

Here is my attempt at a small reproducer.  While I haven't found the right
magic to reproduce the problem, it clearly illustrates the failing application
structure. (For simplicity, this is all done with GET, but that shouldn't


Setup shutdown.fcgi to run as a script, as, say /test.fcgi

Browse to /test.fcgi - hit refresh, you will see the Requests served counter

Now Browse to /test.fcgi/shutdown - the server issues a redirect and exits.
You will see that the response has a new PID, the requests served goes back to
1, and the URL in the address bar is no /test.fcgi/LoopExit.

Or (change the if(01)), it invokes LastCall - which tells the library
explicitly not to send more requests - then falls out of the loop synchronously
to exit.  The GET invoked by the redirect should start a new server; instead
you get the 500 error.  In the real application, the 500 errors are 100%
reproducible.  I haven't found the right timing to make the reproducer fail -
and if I did, I suspect that timing would not be portable to other machines.

What I expect is that once the server exits (and especially with LastCall
invoked), mod_fcgid will pass incoming requests to another server instance.
Starting a new one if necessary.  (In the real app, it is guaranteed that there
is only one server at this time.)  If one can't be found/started, the response
should be something like "no servers available", not "Internal error" with
logging that blames the server.

Here's the (very small) almost-reproducer.  The structure is the same as the
real application.


use warnings;
use strict;

require CGI::Fast;

my $n;
my $q;
while( ( $q = CGI::Fast->new ) ) {
    # Variable work here

    if( $ENV{PATH_INFO} eq '/shutdown' ) {
        if( 01 ) {
            print( <<"xx" );
Status: 303 See other
Location: /test.fcgi/LoopExit

Server $$ shutdown after $n requests
        no warnings 'once';

    print( <<"XX" );
Status: 200 OK
Content-Type: text/plain

Server $$, Requests served: $n
# Here when CGI::Fast returns undef to shut down.
print STDERR ( "ERR: Server $$ shutdown after $n requests\n" ) if( 0 );


Finally, my work around is to send a buffer page - it waits 15 seconds and then
does a javascript redirect.  This works every time  - but is a horrible user

You are receiving this mail because:
You are the assignee for the bug.
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]