[Bug 60487] New: Core dumps in mpm_event during graceful restart

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug 60487] New: Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

            Bug ID: 60487
           Summary: Core dumps in mpm_event during graceful restart
           Product: Apache httpd-2
           Version: 2.4-HEAD
          Hardware: PC
                OS: FreeBSD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mpm_event
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 34528
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34528&action=edit
Don't dereference retained if it isn't set yet.

It seems like if signals are sent to an httpd process in a particular
order/speed, Apache will segfault in mpm_event's ap_start_restart() function on
the line:

retained->is_graceful = graceful;

This behavior has been observed on a number of different systems at different
times.

When explored from gdb:

#0  0x00000008019dd2cc in ap_start_restart (graceful=1) at event.c:696
696         retained->is_graceful = graceful;
[New Thread 802006400 (LWP 101541/<unknown>)]
Current language:  auto; currently minimal
(gdb) print retained
$1 = (event_retained_data *) 0x0

The "retained" static variable is never explicitly set to NULL in the code.  It
is only directly assigned in two places, both in event_pre_config():

    retained = ap_retained_data_get(userdata_key);
    if (!retained) {
        retained = ap_retained_data_create(userdata_key, sizeof(*retained));

The processes that this happens to frequently crash after a couple of days, and
they are the top-level run-as-root parent process, so it is not a case where
the signal is coming in before the pointer has been allocated.

This doesn't appear to be related to receiving a restart signal while shutting
down; gdb reports shutdown_pending is not set:

(gdb) print shutdown_pending
$1 = 0

According to the call stack, this is happening from inside config parsing on
the LoadModule directive for mpm_event:

#0  0x00000008019dd2cc in ap_start_restart (graceful=1) at event.c:696
#1  0x00000008019dd27f in restart (sig=30) at event.c:706
#2  0x0000000801408b4a in pthread_sigmask () from /lib/libthr.so.3
#3  0x0000000801407c08 in pthread_getspecific () from /lib/libthr.so.3
#4  0x0000000801407abd in pthread_getspecific () from /lib/libthr.so.3
#5  0x000000080140cbd7 in pthread_timedjoin_np () from /lib/libthr.so.3
#6  0x00000008006c79fb in r_debug_state () from /libexec/ld-elf.so.1
#7  0x00000008006cc437 in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#8  0x00000008006c8ea0 in dlopen () from /libexec/ld-elf.so.1
#9  0x0000000800fc0b00 in apr_dso_load () from /usr/local/lib/libapr-1.so.0
#10 0x0000000000493c40 in dso_load (cmd=0x7fffffffcfd0,
    modhandlep=0x7fffffffca88,
    filename=0x802095138 "libexec/mod_mpm_event.so",
    used_filename=0x7fffffffca70) at mod_so.c:162
#11 0x0000000000493705 in load_module (cmd=0x7fffffffcfd0,
    dummy=0x7fffffffce80, modname=0x802095120 "mpm_event_module",
    filename=0x802095138 "libexec/mod_mpm_event.so") at mod_so.c:263
#12 0x0000000000478ef3 in invoke_cmd (cmd=0x4b5130, parms=0x7fffffffcfd0,
    mconfig=0x7fffffffce80, args=0x80207b445 "") at config.c:923
#13 0x00000000004799c0 in execute_now (cmd_line=0x80207b4e0 "LoadModule",
    args=0x80207b41b "mpm_event_module libexec/mod_mpm_event.so",
    parms=0x7fffffffcfd0, p=0x802021028, ptemp=0x80207b028,
    sub_tree=0x7fffffffce80, parent=0x0) at config.c:1688

This makes me think that what is happening is that two restart signals are
arriving in rapid succession.  The first initiates a restart, and then second
requests a restart after the previous restart has begun but before
event_pre_config() has initialized the retained variable in the newly-loaded
mod_mpm_event.so.

If that's the case, then it may be sufficient simply to check retained before
writing to it and just return if it is NULL (similar to what's done if
restart_pending is already set).  If so, the (trivial) attached patch
accomplishes that.

However, if a NULL value for retained indicates that the server hasn't finished
a previous restart, perhaps the check should be one line higher (above
"restart_pending = 1;") to short-circuit the second restart completely.

It's also entirely possible that there's much more going on here and the NULL
value for retained is indicative of a deeper problem.

If the simple solution is not the correct one, this is something I'm happy to
look into further and work to fix if someone would be willing to shove me in
the right direction.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

Jacob P <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #1 from Jacob P <[hidden email]> ---
A lot of our customers are starting to see this on their systems. Is there a
chance we can get this looked at for 2.4.26?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #2 from Eric Covener <[hidden email]> ---
Thanks for the report and patch.

Did you take some explicit efforts to get a SIGHUP/WINCH sent during the load
of the module?  I am wondering if there is something else more likely to delay
the startup in practice, perhaps in Jacob's env?

IIUC the core issue is that despite the one weird comment that is 15+ years
old, the signals for restart are not ignored while one is being processed / or
when the MPM is unloaded (a much later invention for MPMS)

I do think it's more likely you have back-to-back graceful signals and just
ignoring the second is better.  Maybe a good sanity check would be to look for
mpm_state == AP_MPMQ_STARTING when retained == NULL and no-op'ing it
altogether.  An even safer option might be to only handle the graceful flavors
w/ the new block, to avoid the chance a hard stop or hard restart is ignored if
we're missing something?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #3 from Jeff Wheelhouse <[hidden email]> ---
"Did you take some explicit efforts to get a SIGHUP/WINCH sent during the load
of the module?"

No, unfortunately, it rears its ugly head all on its own.

"I am wondering if there is something else more likely to delay the startup in
practice, perhaps in Jacob's env?"

There's no overlap between Jacob P and I, so I can't speak for him, but our
environments definitely don't overlap.  (I.e. there is no cPanel in my
environment.)

"I do think it's more likely you have back-to-back graceful signals and just
ignoring the second is better.  Maybe a good sanity check would be to look for
mpm_state == AP_MPMQ_STARTING when retained == NULL and no-op'ing it
altogether.  An even safer option might be to only handle the graceful flavors
w/ the new block, to avoid the chance a hard stop or hard restart is ignored if
we're missing something?"

These are good points.

If I understand this correctly, basically ap_start_restart() is getting called
at the wrong time.  It expects to be called after "retained" is initialized,
but it's getting called before that.  That happens because the second restart
happens before the first restart has finished initializing it.

In other words, ap_start_restart() is getting called when the server hasn't
started.  So the patch I provided might really be addressing the effect, not
the cause.

It seems like a "correct" fix would cause ap_start_restart() not to be called
when the server hasn't been started, either by deferring the 2nd restart until
the first one is done or by ignoring it completely.

So, the types of safer/sanity checks you're talking about... would they
possibly be best-positioned higher up in the code?  If so, would that be in the
core, or is there something specific to mpm_event that isn't re-restarting
properly?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #4 from Jacob P <[hidden email]> ---
Howdy,

On our customers servers, both backtraces involve a signal to gracefully
restart interrupting Apache while it is processing the configuration file in
various places.

We've been able to replicate this when gracefully restarting under mpm_worker
multiple times in a row with an extremely large httpd configuration (10k-20k
vhosts+).

I hope this helps!

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #5 from Jeff Wheelhouse <[hidden email]> ---
Jacob, is it alright if I contact you directly to discuss the issues you're
seeing?  We're having two restart-related segfault problems, this is one and we
haven't reported the other yet due to insufficient information -- it's much
squirrelier.

From what you describe, I have a hunch you're also seeing the other problem,
and I'd like to compare notes without derailing this report if (as I suspect)
it turns out to be a second, unrelated issue.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #6 from Jacob P <[hidden email]> ---
Hi Jeff,

Absolutely. [hidden email], or Perk on #freenode. I might be
available a little later this afternoon, or tomorrow.

Thanks!

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #7 from Yann Ylavic <[hidden email]> ---
(In reply to Eric Covener from comment #2)
>
> Maybe a good sanity check would be to look
> for mpm_state == AP_MPMQ_STARTING when retained == NULL and no-op'ing it
> altogether.  An even safer option might be to only handle the graceful
> flavors w/ the new block, to avoid the chance a hard stop or hard restart is
> ignored if we're missing something?

I think we should avoid touching anything scoped/static to the mpm module (dso)
in a signal handler, otherwise any signal delivered while the module is
unloaded might raise the fault.

That is: mpm_state, shutdown_pending, restart_pending and retained.

How about allocating them (in a struct) from ap_pglobal?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #8 from Eric Covener <[hidden email]> ---
(In reply to Yann Ylavic from comment #7)

> (In reply to Eric Covener from comment #2)
> >
> > Maybe a good sanity check would be to look
> > for mpm_state == AP_MPMQ_STARTING when retained == NULL and no-op'ing it
> > altogether.  An even safer option might be to only handle the graceful
> > flavors w/ the new block, to avoid the chance a hard stop or hard restart is
> > ignored if we're missing something?
>
> I think we should avoid touching anything scoped/static to the mpm module
> (dso) in a signal handler, otherwise any signal delivered while the module
> is unloaded might raise the fault.
>
> That is: mpm_state, shutdown_pending, restart_pending and retained.
>
> How about allocating them (in a struct) from ap_pglobal?

I assumed we wouldn't be able to enter the static functions that touch the
static variables, but I guess better safe than sorry.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #9 from Eric Covener <[hidden email]> ---
(In reply to Eric Covener from comment #8)

> (In reply to Yann Ylavic from comment #7)
> > (In reply to Eric Covener from comment #2)
> > >
> > > Maybe a good sanity check would be to look
> > > for mpm_state == AP_MPMQ_STARTING when retained == NULL and no-op'ing it
> > > altogether.  An even safer option might be to only handle the graceful
> > > flavors w/ the new block, to avoid the chance a hard stop or hard restart is
> > > ignored if we're missing something?
> >
> > I think we should avoid touching anything scoped/static to the mpm module
> > (dso) in a signal handler, otherwise any signal delivered while the module
> > is unloaded might raise the fault.
> >
> > That is: mpm_state, shutdown_pending, restart_pending and retained.
> >
> > How about allocating them (in a struct) from ap_pglobal?
>
> I assumed we wouldn't be able to enter the static functions that touch the
> static variables, but I guess better safe than sorry.

although safer still might be ignoring those signals until an MPM can come back
up?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #10 from Yann Ylavic <[hidden email]> ---
(In reply to Eric Covener from comment #9)
> > I assumed we wouldn't be able to enter the static functions that touch the
> > static variables, but I guess better safe than sorry.

Right, though the signals handlers (which look very similar if not exactly the
same for all *nix MPMs) and the associated globals could possibly live in a
static (non DSO) place too.

>
> although safer still might be ignoring those signals until an MPM can come
> back up?

We might miss a shutdown/restart while restarting (this is case case already
currently since the _pending variables are reset in pre_config), but I think we
could avoid that too.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #11 from Jeff W <[hidden email]> ---
After reading the source code at some length, I think I'm getting it now.  Just
to see if I'm following along...

- httpd goes through the config file twice in its main loop, once
(ap_read_config) is a pre-scan to build up a list of modules, hooks, the config
tree, etc, and a second time (ap_process_config_tree) to actually configure.

- Registered pre_config hooks are called after ap_read_config finishes and
before ap_process_config_tree.

- The MPM's "retained" variable is initialized in its pre_config hook.

- If a restart signal arrives anytime after the MPM has been loaded in
ap_read_config and before the MPM's pre_config hook is called, its "retained"
variable will never have been initialized.

- The uninitialized "retained" variables is a "lucky" NULL (rather than purely
undefined) due to being a static variable a dlopen()'d module.

- The NULL value leads to this crash.

- Jacob's case is exercising the crash because it involves a huge config file
and the MPM directive is probably very near the top, maximizing the window
during which signals will have this effect, possibly to many seconds.

- My case is exercising the crash frequently because my environment causes two
restart signals to arrive unusually close together under certain circumstances.

Is all of that correct?

Thanks!

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #12 from Yann Ylavic <[hidden email]> ---
Created attachment 34764
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34764&action=edit
Common/static signal handling on *nix MPMs

This patch seems to work for me.

I could reproduce the issue (with it applied) by adding a sleep() in the main()
loop.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

Yann Ylavic <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #34764|Common/static signal        |Static (common) signal
        description|handling on *nix MPMs       |handling for *nix MPMs

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #13 from Eric Covener <[hidden email]> ---
(In reply to Yann Ylavic from comment #12)
> Created attachment 34764 [details]
> Static (common) signal handling for *nix MPMs
>
> This patch seems to work for me.
>
> I could reproduce the issue (with it applied) by adding a sleep() in the
> main() loop.

Looks good, one caveat
static unixd: sles12 and ubuntu
shared unixd: RHEL7, my developer builds (and the commercial distro I maintain)

So w/o the removal of the signal handlers during reload I think we still need
to be more careful about retained_data access

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #14 from Yann Ylavic <[hidden email]> ---
(In reply to Eric Covener from comment #13)
>
> Looks good, one caveat
> static unixd: sles12 and ubuntu
> shared unixd: RHEL7, my developer builds (and the commercial distro I
> maintain)

Hmm, this is about mod_unixd right?
My patch's helpers/handlers are in "os/unix/unixd.c", hence statically loaded
no?

(In reply to Yann Ylavic from comment #12)
>
> I could reproduce the issue (with it applied) by adding a sleep()

I meant *without* the patch applied, i.e. no crash with the patch (even with
dynamic mod_unixd).


PS: I've got a v2 already (if v1 is acceptable), we could set handlers only
once at startup now. I still need to test restarts with different MPMs (e.g.
event<=>worker<=>prefork<=>motorz should work normally).

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

--- Comment #15 from Eric Covener <[hidden email]> ---
> Hmm, this is about mod_unixd right?
> My patch's helpers/handlers are in "os/unix/unixd.c", hence statically
> loaded no?

my mistake. Seems like a good plan to me w/ the de-duplication

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

Yann Ylavic <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |FixedInTrunk

--- Comment #16 from Yann Ylavic <[hidden email]> ---
Committed to trunk (r1783849).

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 60487] Core dumps in mpm_event during graceful restart

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=60487

Yann Ylavic <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #34764|0                           |1
        is obsolete|                            |

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12