connection state handling and KeepAliveTimeout

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

connection state handling and KeepAliveTimeout

Stefan Eissing
People with in insight into your connection state handling, please review:

https://bz.apache.org/bugzilla/show_bug.cgi?id=63534

describes an issue with HTTP/2 and KeepAliveTimeout. Basically, HTTP/2 writes response DATA, gets blocked on HTTP/2 flow control,
and makes a blocking READ on the connection. Since the only thing that can move things forward is a packet from the client.

This triggers the KeepAlive behaviour of our connections, which is not what should apply here. The alternatives I can see are:
1. non-blocking reads and therefore continue to block a worker
2. twiddle the server->keepalive setting, question is, will that influence the mpm?

Thanks for your help in this,

Stefan
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Eric Covener
On Tue, Jul 2, 2019 at 4:51 AM Stefan Eissing
<[hidden email]> wrote:

>
> People with in insight into your connection state handling, please review:
>
> https://bz.apache.org/bugzilla/show_bug.cgi?id=63534
>
> describes an issue with HTTP/2 and KeepAliveTimeout. Basically, HTTP/2 writes response DATA, gets blocked on HTTP/2 flow control,
> and makes a blocking READ on the connection. Since the only thing that can move things forward is a packet from the client.
>
> This triggers the KeepAlive behaviour of our connections, which is not what should apply here. The alternatives I can see are:
> 1. non-blocking reads and therefore continue to block a worker
> 2. twiddle the server->keepalive setting, question is, will that influence the mpm?

Is it actually keepalive behavior/processing or just the servers
keepalive timeout explicitly set before the outstanding read on the
master connection?

I don't exactly follow how, but it seems like w/ event mod_h2 already
knows how to suspend/resume based on CONN_STATE_WRITE_COMPLETION when
the session thread knows it's idle too long.
But maybe there is an issue in how the connection handler returns and
event goes into lingering close instead?

(sorry if I am totally misunderstanding)
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Stefan Eissing


> Am 02.07.2019 um 13:35 schrieb Eric Covener <[hidden email]>:
>
> On Tue, Jul 2, 2019 at 4:51 AM Stefan Eissing
> <[hidden email]> wrote:
>>
>> People with in insight into your connection state handling, please review:
>>
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=63534
>>
>> describes an issue with HTTP/2 and KeepAliveTimeout. Basically, HTTP/2 writes response DATA, gets blocked on HTTP/2 flow control,
>> and makes a blocking READ on the connection. Since the only thing that can move things forward is a packet from the client.
>>
>> This triggers the KeepAlive behaviour of our connections, which is not what should apply here. The alternatives I can see are:
>> 1. non-blocking reads and therefore continue to block a worker
>> 2. twiddle the server->keepalive setting, question is, will that influence the mpm?
>
> Is it actually keepalive behavior/processing or just the servers
> keepalive timeout explicitly set before the outstanding read on the
> master connection?
>
> I don't exactly follow how, but it seems like w/ event mod_h2 already
> knows how to suspend/resume based on CONN_STATE_WRITE_COMPLETION when
> the session thread knows it's idle too long.
> But maybe there is an issue in how the connection handler returns and
> event goes into lingering close instead?

What I think is happening is that connection state is CONN_STATE_WRITE_COMPLETION and h2 does a BLOCKing read on the main connection. After the write has been completed, the keepalive handling applies to the connection - and pre-close()/linger()/closes it.

But the HTTP/2 state is not suitable for keepalive, just timeout. Is there a way to avoid that?

>
> (sorry if I am totally misunderstanding)

Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Eric Covener
On Tue, Jul 2, 2019 at 7:38 AM Stefan Eissing
<[hidden email]> wrote:

>
>
>
> > Am 02.07.2019 um 13:35 schrieb Eric Covener <[hidden email]>:
> >
> > On Tue, Jul 2, 2019 at 4:51 AM Stefan Eissing
> > <[hidden email]> wrote:
> >>
> >> People with in insight into your connection state handling, please review:
> >>
> >> https://bz.apache.org/bugzilla/show_bug.cgi?id=63534
> >>
> >> describes an issue with HTTP/2 and KeepAliveTimeout. Basically, HTTP/2 writes response DATA, gets blocked on HTTP/2 flow control,
> >> and makes a blocking READ on the connection. Since the only thing that can move things forward is a packet from the client.
> >>
> >> This triggers the KeepAlive behaviour of our connections, which is not what should apply here. The alternatives I can see are:
> >> 1. non-blocking reads and therefore continue to block a worker
> >> 2. twiddle the server->keepalive setting, question is, will that influence the mpm?
> >
> > Is it actually keepalive behavior/processing or just the servers
> > keepalive timeout explicitly set before the outstanding read on the
> > master connection?
> >
> > I don't exactly follow how, but it seems like w/ event mod_h2 already
> > knows how to suspend/resume based on CONN_STATE_WRITE_COMPLETION when
> > the session thread knows it's idle too long.
> > But maybe there is an issue in how the connection handler returns and
> > event goes into lingering close instead?
>
> What I think is happening is that connection state is CONN_STATE_WRITE_COMPLETION and h2 does a BLOCKing read on the main connection. After the write has been completed, the keepalive handling applies to the connection - and pre-close()/linger()/closes it.
>
> But the HTTP/2 state is not suitable for keepalive, just timeout. Is there a way to avoid that?

I think a successful exit of CONN_STATE_WRITE_COMPLETION would
conceptually lead to keepalive state so that would be plausible.  And
a failed one would go into lingering close.
This touches back to where I didn't really understand how h2 gets
control back after explicitly yielding via
CONN_STATE_WRITE_COMPLETION.  Do you know if h2 currently/ever hops
mpm threads this way successfully?

Another option is the pre-existing timed callabck
ap_mpm_register_timed_callback() once you get into this stall where
you really want to give a thread back but still have hope it might
progress.

An option in this area is trunk-only PT_USER callback in event which
frees you from shoehorning into the states and just lets you get
called back when a socket is readable.
This is what's used in mod_proxy_wstunnell in the opt-in async mode.




--
Eric Covener
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Stefan Eissing


> Am 02.07.2019 um 14:00 schrieb Eric Covener <[hidden email]>:
>
> On Tue, Jul 2, 2019 at 7:38 AM Stefan Eissing
> <[hidden email]> wrote:
>>
>>
>>
>>> Am 02.07.2019 um 13:35 schrieb Eric Covener <[hidden email]>:
>>>
>>> On Tue, Jul 2, 2019 at 4:51 AM Stefan Eissing
>>> <[hidden email]> wrote:
>>>>
>>>> People with in insight into your connection state handling, please review:
>>>>
>>>> https://bz.apache.org/bugzilla/show_bug.cgi?id=63534
>>>>
>>>> describes an issue with HTTP/2 and KeepAliveTimeout. Basically, HTTP/2 writes response DATA, gets blocked on HTTP/2 flow control,
>>>> and makes a blocking READ on the connection. Since the only thing that can move things forward is a packet from the client.
>>>>
>>>> This triggers the KeepAlive behaviour of our connections, which is not what should apply here. The alternatives I can see are:
>>>> 1. non-blocking reads and therefore continue to block a worker
>>>> 2. twiddle the server->keepalive setting, question is, will that influence the mpm?
>>>
>>> Is it actually keepalive behavior/processing or just the servers
>>> keepalive timeout explicitly set before the outstanding read on the
>>> master connection?
>>>
>>> I don't exactly follow how, but it seems like w/ event mod_h2 already
>>> knows how to suspend/resume based on CONN_STATE_WRITE_COMPLETION when
>>> the session thread knows it's idle too long.
>>> But maybe there is an issue in how the connection handler returns and
>>> event goes into lingering close instead?
>>
>> What I think is happening is that connection state is CONN_STATE_WRITE_COMPLETION and h2 does a BLOCKing read on the main connection. After the write has been completed, the keepalive handling applies to the connection - and pre-close()/linger()/closes it.
>>
>> But the HTTP/2 state is not suitable for keepalive, just timeout. Is there a way to avoid that?
>

Thanks for taking the time to discuss this. :)

> I think a successful exit of CONN_STATE_WRITE_COMPLETION would
> conceptually lead to keepalive state so that would be plausible.  And
> a failed one would go into lingering close.

That is also how I understand it.

> This touches back to where I didn't really understand how h2 gets
> control back after explicitly yielding via
> CONN_STATE_WRITE_COMPLETION.  Do you know if h2 currently/ever hops
> mpm threads this way successfully?

With the current mpm architecture, there are only two situations where
connection processing by h2 returns to the mpm:
1. when all streams(request) have been handled and the connection is really in KeepAlive
2. when the flow-control windows of existing streams are exhausted. The only thing that can unlock this state are new frames from the client.

My understanding is that the mpm either times out the connection or receives client data which it then makes invoke connection processing again. Here, the KeepAliveTimeout seems to alway apply which is wrong for the situation 2 described above.

> Another option is the pre-existing timed callabck
> ap_mpm_register_timed_callback() once you get into this stall where
> you really want to give a thread back but still have hope it might
> progress.

The problem is not a stall, but that mpm shuts down the connection too early. Or I misunderstood your point...

> An option in this area is trunk-only PT_USER callback in event which
> frees you from shoehorning into the states and just lets you get
> called back when a socket is readable.
> This is what's used in mod_proxy_wstunnell in the opt-in async mode.

Hmm, I need to look at that. Thanks for the pointer.

Cheers, Stefan

> --
> Eric Covener
> [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Yann Ylavic
On Wed, Jul 3, 2019 at 9:23 AM Stefan Eissing
<[hidden email]> wrote:
>
> > Am 02.07.2019 um 14:00 schrieb Eric Covener <[hidden email]>:
> >
> > I think a successful exit of CONN_STATE_WRITE_COMPLETION would
> > conceptually lead to keepalive state so that would be plausible.  And
> > a failed one would go into lingering close.
>
> That is also how I understand it.

I think WRITE_COMPLETION, as his name does _not_ indicate, can also
handle _read_ completion by using/setting CONN_SENSE_WANT_READ.

> With the current mpm architecture, there are only two situations where
> connection processing by h2 returns to the mpm:
> 1. when all streams(request) have been handled and the connection is really in KeepAlive
> 2. when the flow-control windows of existing streams are exhausted. The only thing that can unlock this state are new frames from the client.
>
> My understanding is that the mpm either times out the connection or receives client data which it then makes invoke connection processing again. Here, the KeepAliveTimeout seems to alway apply which is wrong for the situation 2 described above.

So you'd want Timeout, right?

> > An option in this area is trunk-only PT_USER callback in event which
> > frees you from shoehorning into the states and just lets you get
> > called back when a socket is readable.
> > This is what's used in mod_proxy_wstunnell in the opt-in async mode.
>
> Hmm, I need to look at that. Thanks for the pointer.

That (but trunk only and quite huge change for 2.4.x), or as said
above CONN_SENSE_WANT_READ.
However the MPM event's queues (write_completion_q, keepalive_q, ...)
handle _fixed_ timeouts only (using fast/dumb APR_RINGs in O(1)), so
CONN_SENSE_WANT_READ could work only if h2 needs Timeout instead of
KeepAliveTimeout.
If a "dynamic" timeout is needed, PT_USER is the only option I think.

So possibly something along the lines of the attached patch would be
enough, modulo CONN_SENSE_WANT_READ which you know better where/how to
set ;)


Regards,
Yann.

CONN_SENSE_WANT_READ.diff (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Stefan Eissing
HE IS ALIVE! \o/

Great! I will try this in trunk later today. That there are two "queues", one with Timeout and one with KeepAliveTimeout, fixed for all connections, was my read, but I am not that deep into mpm_event. Thanks for confirming and the patch.

Will report back.

Cheers, Stefan

> Am 03.07.2019 um 13:44 schrieb Yann Ylavic <[hidden email]>:
>
> On Wed, Jul 3, 2019 at 9:23 AM Stefan Eissing
> <[hidden email]> wrote:
>>
>>> Am 02.07.2019 um 14:00 schrieb Eric Covener <[hidden email]>:
>>>
>>> I think a successful exit of CONN_STATE_WRITE_COMPLETION would
>>> conceptually lead to keepalive state so that would be plausible.  And
>>> a failed one would go into lingering close.
>>
>> That is also how I understand it.
>
> I think WRITE_COMPLETION, as his name does _not_ indicate, can also
> handle _read_ completion by using/setting CONN_SENSE_WANT_READ.
>
>> With the current mpm architecture, there are only two situations where
>> connection processing by h2 returns to the mpm:
>> 1. when all streams(request) have been handled and the connection is really in KeepAlive
>> 2. when the flow-control windows of existing streams are exhausted. The only thing that can unlock this state are new frames from the client.
>>
>> My understanding is that the mpm either times out the connection or receives client data which it then makes invoke connection processing again. Here, the KeepAliveTimeout seems to alway apply which is wrong for the situation 2 described above.
>
> So you'd want Timeout, right?
>
>>> An option in this area is trunk-only PT_USER callback in event which
>>> frees you from shoehorning into the states and just lets you get
>>> called back when a socket is readable.
>>> This is what's used in mod_proxy_wstunnell in the opt-in async mode.
>>
>> Hmm, I need to look at that. Thanks for the pointer.
>
> That (but trunk only and quite huge change for 2.4.x), or as said
> above CONN_SENSE_WANT_READ.
> However the MPM event's queues (write_completion_q, keepalive_q, ...)
> handle _fixed_ timeouts only (using fast/dumb APR_RINGs in O(1)), so
> CONN_SENSE_WANT_READ could work only if h2 needs Timeout instead of
> KeepAliveTimeout.
> If a "dynamic" timeout is needed, PT_USER is the only option I think.
>
> So possibly something along the lines of the attached patch would be
> enough, modulo CONN_SENSE_WANT_READ which you know better where/how to
> set ;)
>
>
> Regards,
> Yann.
> <CONN_SENSE_WANT_READ.diff>


Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Yann Ylavic
On Wed, Jul 3, 2019 at 1:50 PM Stefan Eissing
<[hidden email]> wrote:
>
> HE IS ALIVE! \o/

Yes :)) still a bit overwhelmed at day $job though :/
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Stefan Eissing


> Am 03.07.2019 um 13:57 schrieb Yann Ylavic <[hidden email]>:
>
> On Wed, Jul 3, 2019 at 1:50 PM Stefan Eissing
> <[hidden email]> wrote:
>>
>> HE IS ALIVE! \o/
>
> Yes :)) still a bit overwhelmed at day $job though :/

It's a shabby world where open source devs need to pay rent and food+drinks in restaurants...
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Yann Ylavic
On Wed, Jul 3, 2019 at 2:03 PM Stefan Eissing
<[hidden email]> wrote:

>
> > Am 03.07.2019 um 13:57 schrieb Yann Ylavic <[hidden email]>:
> >
> > On Wed, Jul 3, 2019 at 1:50 PM Stefan Eissing
> > <[hidden email]> wrote:
> >>
> >> HE IS ALIVE! \o/
> >
> > Yes :)) still a bit overwhelmed at day $job though :/
>
> It's a shabby world where open source devs need to pay rent and food+drinks in restaurants...

You're telling me!
Reply | Threaded
Open this post in threaded view
|

Re: connection state handling and KeepAliveTimeout

Stefan Eissing
The change works nicely in my tests. I have some where I trigger timeout and keepalive and I see the effect.

For 2.4.x I test as well and need a slight adjustment to event.c. Could you look if that ok? Then I would propose that for backport.

Cheers, Stefan



> Am 03.07.2019 um 14:08 schrieb Yann Ylavic <[hidden email]>:
>
> On Wed, Jul 3, 2019 at 2:03 PM Stefan Eissing
> <[hidden email]> wrote:
>>
>>> Am 03.07.2019 um 13:57 schrieb Yann Ylavic <[hidden email]>:
>>>
>>> On Wed, Jul 3, 2019 at 1:50 PM Stefan Eissing
>>> <[hidden email]> wrote:
>>>>
>>>> HE IS ALIVE! \o/
>>>
>>> Yes :)) still a bit overwhelmed at day $job though :/
>>
>> It's a shabby world where open source devs need to pay rent and food+drinks in restaurants...
>
> You're telling me!


h2-keepalive-yann-v2.patch (2K) Download Attachment