Broken: apache/httpd#1091 (trunk - 74d32d7)

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Broken: apache/httpd#1091 (trunk - 74d32d7)

Travis CI

apache

/

httpd

branch icontrunk

arrow to build time
clock icon12 mins and 2 secs

Christophe Jaillet avatarChristophe Jaillet

Fix a few warnings on 64 bits windows compilation

git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1881303 13f79535-47bb-0310-9956-ffa450edef68

Want to know about upcoming build environment updates?

Would you like to stay up-to-date with the upcoming Travis CI build environment updates? We set up a mailing list for you!

SIGN UP HERE
book icon

Documentation about Travis CI

<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "EmailMessage", "action": { "@type": "ViewAction", "url": "https://travis-ci.org/github/apache/httpd/builds/722360185?utm_medium=notification&amp;utm_source=email", "name": "View Build" }, "description": "View Build #1091 on Travis CI" } </script>
Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Christophe JAILLET
Le 29/08/2020 à 22:43, Travis CI a écrit :

> apache
>
> /
>
> httpd
>
> <https://travis-ci.org/github/apache/httpd?utm_medium=notification&utm_source=email>
>
>
> branch icontrunk <https://github.com/apache/httpd/tree/trunk>
>
> build has failed
> Build #1091 was broken
> <https://travis-ci.org/github/apache/httpd/builds/722360185?utm_medium=notification&utm_source=email>
> arrow to build time
> clock icon12 mins and 2 secs
>
> Christophe Jaillet avatarChristophe Jaillet
>
> 74d32d7 CHANGESET →
> <https://github.com/apache/httpd/compare/e602f55d734a...74d32d7bafd3>
>
> Fix a few warnings on 64 bits windows compilation
>
> git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1881303 
> 13f79535-47bb-0310-9956-ffa450edef68
>
> Want to know about upcoming build environment updates?
>
> Would you like to stay up-to-date with the upcoming Travis CI build
> environment updates? We set up a mailing list for you!
>
> SIGN UP HERE <http://eepurl.com/9OCsP>
>
> book icon
>
> Documentation <https://docs.travis-ci.com/> about Travis CI
>
> Have any questions? We're here to help. <mailto:[hidden email]>
> Unsubscribe
> <https://travis-ci.org/account/preferences/unsubscribe?repository=69847&utm_medium=notification&utm_source=email>
> from build emails from the apache/httpd repository.
> To unsubscribe from *all* build emails, please update your settings
> <https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email>.
>
> black and white travis ci logo <https://travis-ci.com>
>
> Travis CI GmbH, Rigaer Str. 8, 10427 Berlin, Germany | GF/CEO: Randy
> Jacops | Contact: [hidden email] <mailto:[hidden email]> |
> Amtsgericht Charlottenburg, Berlin, HRB 140133 B | Umsatzsteuer-ID gemäß
> §27 a Umsatzsteuergesetz: DE282002648
>

Hi,

spurious failure, unlikely related to the corresponding commit.
Never seen that before.

Any idea?

CJ


Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):

#0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  <signal handler called>
#2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790,
key=key@entry=0x560cb949006a "Connection", val=val@entry=0x560cb9488f86
"close") at tables/apr_tables.c:750
#5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at
http_protocol.c:309
#6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0,
b=0x560cba65db80) at http_filters.c:1369
#7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0,
b=0x560cba65db80) at protocol.c:2040
Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Ruediger Pluem


On 8/30/20 7:22 AM, Christophe JAILLET wrote:

> Le 29/08/2020 à 22:43, Travis CI a écrit :
>> apache
>>
>> /
>>
>> httpd
>>
>> <https://travis-ci.org/github/apache/httpd?utm_medium=notification&utm_source=email>
>>
>> branch icontrunk <https://github.com/apache/httpd/tree/trunk>
>>
>> build has failed
>> Build #1091 was broken <https://travis-ci.org/github/apache/httpd/builds/722360185?utm_medium=notification&utm_source=email>
>> arrow to build time
>> clock icon12 mins and 2 secs
>>
>> Christophe Jaillet avatarChristophe Jaillet
>>
>> 74d32d7 CHANGESET → <https://github.com/apache/httpd/compare/e602f55d734a...74d32d7bafd3>
>>
>> Fix a few warnings on 64 bits windows compilation
>>
>> git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1881303 13f79535-47bb-0310-9956-ffa450edef68
>>
>> Want to know about upcoming build environment updates?
>>
>> Would you like to stay up-to-date with the upcoming Travis CI build environment updates? We set up a mailing list for you!
>>
>> SIGN UP HERE <http://eepurl.com/9OCsP>
>>
>> book icon
>>
>> Documentation <https://docs.travis-ci.com/> about Travis CI
>>
>> Have any questions? We're here to help. <mailto:[hidden email]>
>> Unsubscribe <https://travis-ci.org/account/preferences/unsubscribe?repository=69847&utm_medium=notification&utm_source=email>
>> from build emails from the apache/httpd repository.
>> To unsubscribe from *all* build emails, please update your settings
>> <https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email>.
>> black and white travis ci logo <https://travis-ci.com>
>>
>> Travis CI GmbH, Rigaer Str. 8, 10427 Berlin, Germany | GF/CEO: Randy Jacops | Contact: [hidden email]
>> <mailto:[hidden email]> | Amtsgericht Charlottenburg, Berlin, HRB 140133 B | Umsatzsteuer-ID gemäß §27 a
>> Umsatzsteuergesetz: DE282002648
>>
>
> Hi,
>
> spurious failure, unlikely related to the corresponding commit.
> Never seen that before.
>
> Any idea?
>
> CJ
>
>
> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
>
> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
> #1  <signal handler called>
> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
>

My only guess is some sort of pool corruption. The key is a literal and as such its memory address shouldn't be in the scope of
any memory managed by a pool.

Regards

Rüdiger


Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Ruediger Pluem


On 9/1/20 10:18 AM, Joe Orton wrote:

> On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
>> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
>>> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
>>>
>>> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  <signal handler called>
>>> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
>>> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
>>> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
>>> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
>>> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
>>>
>>
>> My only guess is some sort of pool corruption. The key is a literal
>> and as such its memory address shouldn't be in the scope of any memory
>> managed by a pool.
>
> I agree.  This is triggered by running TLSv1.2 tests with memcache
> configured as the session cache (a relatively new test).  The memcache
> socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o
> locks held across multiple threads so that might be a good place to
> investigate.
>

A pool is used for the retrieval of sessions, but it is the pool of the associated connection (conn_rec) which seems uncritical as
we should not have two threads working with the same connection at the same time.

OTOH the pconf pool is used to init the session cache and the memcache apr code creates subpools of it and uses these for allocations.

I guess ms_find_conn in apr code should be inspected. It is called by apr_memcache_getp and it uses these subpools (tp in this
case) for the below:

    balloc = apr_bucket_alloc_create((*conn)->tp);
    (*conn)->bb = apr_brigade_create((*conn)->tp, balloc);
    (*conn)->tb = apr_brigade_create((*conn)->tp, balloc);


I think that tp can be used by multiple threads at the same time.


BTW: How often does this test fail?


Regards

Rüdiger

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Joe Orton
On Tue, Sep 01, 2020 at 01:04:14PM +0200, Ruediger Pluem wrote:

>
>
> On 9/1/20 10:18 AM, Joe Orton wrote:
> > On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
> >> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
> >>> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
> >>>
> >>> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
> >>> #1  <signal handler called>
> >>> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> >>> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
> >>> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
> >>> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
> >>> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
> >>> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
> >>>
> >>
> >> My only guess is some sort of pool corruption. The key is a literal
> >> and as such its memory address shouldn't be in the scope of any memory
> >> managed by a pool.
> >
> > I agree.  This is triggered by running TLSv1.2 tests with memcache
> > configured as the session cache (a relatively new test).  The memcache
> > socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o
> > locks held across multiple threads so that might be a good place to
> > investigate.
> >
>
> A pool is used for the retrieval of sessions, but it is the pool of the associated connection (conn_rec) which seems uncritical as
> we should not have two threads working with the same connection at the same time.
>
> OTOH the pconf pool is used to init the session cache and the memcache apr code creates subpools of it and uses these for allocations.
>
> I guess ms_find_conn in apr code should be inspected. It is called by apr_memcache_getp and it uses these subpools (tp in this
> case) for the below:
>
>     balloc = apr_bucket_alloc_create((*conn)->tp);
>     (*conn)->bb = apr_brigade_create((*conn)->tp, balloc);
>     (*conn)->tb = apr_brigade_create((*conn)->tp, balloc);
>
> I think that tp can be used by multiple threads at the same time.

The subpools here are specific to the apr_memcache_conn_t, access to
which is managed via apr_reslist_* in ms_find_conn() - so *should* be
thread-safe I think?  I think the subpool creation within
ms_conn_construct() is also safe but may be missing the bigger picture.

> BTW: How often does this test fail?

I don't remember seeing it fail before.  This test combination has only
been running since r1879594 (July 7th).

We could try adding an ab run or something to see if it's easier to
trigger, it'd be generally a good idea for session cache testing anyway.

Regards, Joe

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Ruediger Pluem


On 9/1/20 1:43 PM, Joe Orton wrote:

> On Tue, Sep 01, 2020 at 01:04:14PM +0200, Ruediger Pluem wrote:
>>
>>
>> On 9/1/20 10:18 AM, Joe Orton wrote:
>>> On Mon, Aug 31, 2020 at 08:37:55AM +0200, Ruediger Pluem wrote:
>>>> On 8/30/20 7:22 AM, Christophe JAILLET wrote:
>>>>> Thread 66 (Thread 0x7f13a6525700 (LWP 7436)):
>>>>>
>>>>> #0  0x00007f13dd61a187 in kill () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #1  <signal handler called>
>>>>> #2  0x00007f13dd619e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #3  0x00007f13dd61b801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #4  0x00007f13ddc05f62 in apr_table_mergen (t=0x560cba82d790, key=key@entry=0x560cb949006a "Connection",
>>>>> val=val@entry=0x560cb9488f86 "close") at tables/apr_tables.c:750
>>>>> #5  0x0000560cb9434fcf in ap_set_keepalive (r=r@entry=0x560cba7c9c60) at http_protocol.c:309
>>>>> #6  0x0000560cb943a475 in ap_http_header_filter (f=0x560cba74f8e0, b=0x560cba65db80) at http_filters.c:1369
>>>>> #7  0x0000560cb9458091 in ap_content_length_filter (f=0x560cba1d95c0, b=0x560cba65db80) at protocol.c:2040
>>>>>
>>>>
>>>> My only guess is some sort of pool corruption. The key is a literal
>>>> and as such its memory address shouldn't be in the scope of any memory
>>>> managed by a pool.
>>>
>>> I agree.  This is triggered by running TLSv1.2 tests with memcache
>>> configured as the session cache (a relatively new test).  The memcache
>>> socache provider does *not* use AP_SOCACHE_FLAG_NOTMPSAFE so is used w/o
>>> locks held across multiple threads so that might be a good place to
>>> investigate.
>>>
>>
>> A pool is used for the retrieval of sessions, but it is the pool of the associated connection (conn_rec) which seems uncritical as
>> we should not have two threads working with the same connection at the same time.
>>
>> OTOH the pconf pool is used to init the session cache and the memcache apr code creates subpools of it and uses these for allocations.
>>
>> I guess ms_find_conn in apr code should be inspected. It is called by apr_memcache_getp and it uses these subpools (tp in this
>> case) for the below:
>>
>>     balloc = apr_bucket_alloc_create((*conn)->tp);
>>     (*conn)->bb = apr_brigade_create((*conn)->tp, balloc);
>>     (*conn)->tb = apr_brigade_create((*conn)->tp, balloc);
>>
>> I think that tp can be used by multiple threads at the same time.
>
> The subpools here are specific to the apr_memcache_conn_t, access to
> which is managed via apr_reslist_* in ms_find_conn() - so *should* be
> thread-safe I think?  I think the subpool creation within
> ms_conn_construct() is also safe but may be missing the bigger picture.

Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
If yes, I agree that this is the case and that I missed this and hence my point is mood.

Regards

Rüdiger
Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Joe Orton
On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
> If yes, I agree that this is the case and that I missed this and hence my point is mood.

I think that's right.  I don't want to disappear too far down the rabbit
hole for this, given that the *only* symptom we are seeing is the APR
pool check failing.

Is there any way in which APR is going to guarantee that the internal
pool state is always consistent across threads if one thread is
simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another
calls apr_pool_find()?  I can't see it.

Regards, Joe

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Ruediger Pluem


On 9/2/20 9:10 AM, Joe Orton wrote:

> On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
>> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
>> If yes, I agree that this is the case and that I missed this and hence my point is mood.
>
> I think that's right.  I don't want to disappear too far down the rabbit
> hole for this, given that the *only* symptom we are seeing is the APR
> pool check failing.
>
> Is there any way in which APR is going to guarantee that the internal
> pool state is always consistent across threads if one thread is
> simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another
> calls apr_pool_find()?  I can't see it.

I can't see it as well. Looks like that apr_pool_find / apr_pool_walk_tree uses a mutex,
but this mutex is not used by apr_palloc_debug. But I am not sure how many races we would
have with apr_palloc_debug. I think it could race when we add a new node to the top of the linked list, but
I think the worst thing that could happen in this case is that it does not find a pool for a piece of memory even if it was
allocated by a pool.

Regards

Rüdiger

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Joe Orton
On Wed, Sep 02, 2020 at 10:11:39AM +0200, Ruediger Pluem wrote:

>
>
> On 9/2/20 9:10 AM, Joe Orton wrote:
> > On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
> >> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
> >> If yes, I agree that this is the case and that I missed this and hence my point is mood.
> >
> > I think that's right.  I don't want to disappear too far down the rabbit
> > hole for this, given that the *only* symptom we are seeing is the APR
> > pool check failing.
> >
> > Is there any way in which APR is going to guarantee that the internal
> > pool state is always consistent across threads if one thread is
> > simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another
> > calls apr_pool_find()?  I can't see it.
>
> I can't see it as well. Looks like that apr_pool_find / apr_pool_walk_tree uses a mutex,
> but this mutex is not used by apr_palloc_debug. But I am not sure how many races we would
> have with apr_palloc_debug. I think it could race when we add a new node to the top of the linked list, but
> I think the worst thing that could happen in this case is that it does not find a pool for a piece of memory even if it was
> allocated by a pool.

In this case:

https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1798

    node->beginp[node->index] = mem;
    node->endp[node->index] = (char *)mem + size;
    node->index++;

without any memory ordering/consistency guarantee, is it not possible
for another thread to see

    node->endp[node->index] = (char *)mem + size;
    node->index++;

without the ->beginp update, and hence the node would appear to
"contain" memory in the range [0, endp) since the node struct is
zero-initialized.

....or am I overthinking this?

We could add an "yes this is consistent" atomic int to the debug_node_t
and make pool-debug mode *even slower* to avoid possible races!?

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Ruediger Pluem


On 9/2/20 10:47 AM, Joe Orton wrote:

> On Wed, Sep 02, 2020 at 10:11:39AM +0200, Ruediger Pluem wrote:
>>
>>
>> On 9/2/20 9:10 AM, Joe Orton wrote:
>>> On Tue, Sep 01, 2020 at 03:11:59PM +0200, Ruediger Pluem wrote:
>>>> Your point is that there is no case where multiple threads work on the same apr_memcache_conn_t at the same time, correct?
>>>> If yes, I agree that this is the case and that I missed this and hence my point is mood.
>>>
>>> I think that's right.  I don't want to disappear too far down the rabbit
>>> hole for this, given that the *only* symptom we are seeing is the APR
>>> pool check failing.
>>>
>>> Is there any way in which APR is going to guarantee that the internal
>>> pool state is always consistent across threads if one thread is
>>> simultaneously inside e.g. apr_palloc()/apr_palloc_debug() and another
>>> calls apr_pool_find()?  I can't see it.
>>
>> I can't see it as well. Looks like that apr_pool_find / apr_pool_walk_tree uses a mutex,
>> but this mutex is not used by apr_palloc_debug. But I am not sure how many races we would
>> have with apr_palloc_debug. I think it could race when we add a new node to the top of the linked list, but
>> I think the worst thing that could happen in this case is that it does not find a pool for a piece of memory even if it was
>> allocated by a pool.
>
> In this case:
>
> https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1798
>
>     node->beginp[node->index] = mem;
>     node->endp[node->index] = (char *)mem + size;
>     node->index++;
>
> without any memory ordering/consistency guarantee, is it not possible
> for another thread to see
>
>     node->endp[node->index] = (char *)mem + size;
>     node->index++;
>
> without the ->beginp update, and hence the node would appear to
> "contain" memory in the range [0, endp) since the node struct is
> zero-initialized.

I think this can only happen for fresh nodes as

node->index++;

happens after

node->beginp[node->index] = mem;
node->endp[node->index] = (char *)mem + size;

and even in this case I think

&& node->endp[index] > *pmem) {

(https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l2275)

will always fail since node->endp[index] should be zero. Hence it should not find a pool. Or am I off now?



Regards

Rüdiger

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Joe Orton
On Wed, Sep 02, 2020 at 12:36:26PM +0200, Ruediger Pluem wrote:
> will always fail since node->endp[index] should be zero. Hence it should not find a pool. Or am I off now?

I think we get very little ordering constraints without memory
synchronisation, regardless of the order of the C statements another
thread can see a different/inconsistent memory state, but not really
sure.

I tried reproducing locally and can trigger crashes in these code paths
quite easily with running ab against pool-debug/event build with
memcache or shmcb configured as session cache.

In this case one thread is walking the pool heirarchy and trying to lock
a pool mutex which has already been deleted, which triggers a libc
assert() instead.  The mutex is in ptrans, I assume from another event
thread.

If we see this regularly I'd conclude the apr_pool_find() stuff is
simply unsafe and should be turned off.

(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f842445f895 in __GI_abort () at abort.c:79
#2  0x00007f842445f769 in __assert_fail_base (fmt=0x7f84245cd0b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)",
    file=0x7f8424652104 "tpp.c", line=82, function=<optimized out>) at assert.c:92
#3  0x00007f842446ee76 in __GI___assert_fail (
    assertion=assertion@entry=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)",
    file=file@entry=0x7f8424652104 "tpp.c", line=line@entry=82,
    function=function@entry=0x7f84246521c0 <__PRETTY_FUNCTION__.0> "__pthread_tpp_change_priority") at assert.c:101
#4  0x00007f842464f849 in __pthread_tpp_change_priority (previous_prio=previous_prio@entry=-1, new_prio=new_prio@entry=2088)
    at tpp.c:82
#5  0x00007f84246458a0 in __pthread_mutex_lock_full (mutex=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:545
#6  0x00007f8424645e05 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:73
#7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
#8  0x00007f84246f437e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x7f839000d680)
    at memory/unix/apr_pools.c:1505
#9  apr_pool_walk_tree (pool=0x7f839000d680, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
#10 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x1482e30)
    at memory/unix/apr_pools.c:1511
#11 apr_pool_walk_tree (pool=0x1482e30, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
#12 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14828b0)
    at memory/unix/apr_pools.c:1511
#13 apr_pool_walk_tree (pool=0x14828b0, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
#14 0x00007f84246f54d8 in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0)
    at memory/unix/apr_pools.c:1511
#15 apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0) at memory/unix/apr_pools.c:1492
#16 apr_pool_find (mem=mem@entry=0x7f83a4023f06) at memory/unix/apr_pools.c:2291
#17 0x00007f84246e72cd in apr_table_addn (t=0x7f83a4023ab0, key=key@entry=0x7f83a4023f00 "Host",
    val=val@entry=0x7f83a4023f06 "localhost:8532") at tables/apr_tables.c:823
#18 0x0000000000455bb4 in ap_get_mime_headers_core (r=r@entry=0x7f83a4002370, bb=bb@entry=0x7f83a40230a0) at protocol.c:1353
#19 0x0000000000458a3f in ap_read_request (conn=conn@entry=0x7f83a401b340) at protocol.c:1513
#20 0x0000000000434f6f in ap_process_http_async_connection (c=0x7f83a401b340) at http_core.c:148
#21 ap_process_http_connection (c=0x7f83a401b340) at http_core.c:252
#22 0x000000000044cb00 in ap_run_process_connection (c=c@entry=0x7f83a401b340) at connection.c:42
#23 0x00007f842488c8f1 in process_socket (thd=thd@entry=0x7f8408004920, p=<optimized out>, sock=0x7f8390001480,
    cs=0x7f83a4015ef0, my_child_num=my_child_num@entry=3, my_thread_num=my_thread_num@entry=23) at event.c:1106
#24 0x00007f842488cfd8 in worker_thread (thd=0x7f8408004920, dummy=<optimized out>) at event.c:2396
#25 0x00007f84246ff57e in dummy_worker (opaque=0x7f8408004920) at threadproc/unix/thread.c:145
#26 0x00007f8424643432 in start_thread (arg=<optimized out>) at pthread_create.c:477
#27 0x00007f842453b913 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) up 7
#7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
127        rv = pthread_mutex_lock(&mutex->mutex);
(gdb) print *mutex
$1 = {pool = 0x0, mutex = {__data = {__lock = 1094795585, __count = 1094795585, __owner = 1094795585, __nusers = 1094795585,
      __kind = 1094795585, __spins = 16705, __elision = 16705, __list = {__prev = 0x4141414141414141,
        __next = 0x4141414141414141}}, __size = 'A' <repeats 40 times>, __align = 4702111234474983745},
  cond = 0x4141414141414141, locked = 1094795585, num_waiters = 1094795585}

Reply | Threaded
Open this post in threaded view
|

Re: Broken: apache/httpd#1091 (trunk - 74d32d7)

Ruediger Pluem


On 9/2/20 4:51 PM, Joe Orton wrote:
> On Wed, Sep 02, 2020 at 12:36:26PM +0200, Ruediger Pluem wrote:
>> will always fail since node->endp[index] should be zero. Hence it should not find a pool. Or am I off now?
>
> I think we get very little ordering constraints without memory
> synchronisation, regardless of the order of the C statements another
> thread can see a different/inconsistent memory state, but not really
> sure.

You mean due to CPU caches and because some of the values might be stored in registers? Would volatile for these vars help?

>
> I tried reproducing locally and can trigger crashes in these code paths
> quite easily with running ab against pool-debug/event build with
> memcache or shmcb configured as session cache.
>
> In this case one thread is walking the pool heirarchy and trying to lock
> a pool mutex which has already been deleted, which triggers a libc
> assert() instead.  The mutex is in ptrans, I assume from another event
> thread.

It looks like there is some mutex protection in apr_pool_clear_debug, apr_pool_destroy_debug and
comments that point to apr_pool_walk_tree (e.g.
https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1916). Maybe the issue is that
pool_destroy_debug doesn't do a locking before calling pool_clear_debug.
Can you provide a gdb print for the pool parameters passed in frame 15 till 8 to apr_pool_walk_tree?
I am not sure how pool became zero in the mutex struct you dumped as I see no such action in the mutex cleanup.

>
> If we see this regularly I'd conclude the apr_pool_find() stuff is
> simply unsafe and should be turned off.

As the crashes seem to be caused by an APR problem shall we continue this discussion on dev@apr or should we work this out in more
detail here?

>
> (gdb) where
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f842445f895 in __GI_abort () at abort.c:79
> #2  0x00007f842445f769 in __assert_fail_base (fmt=0x7f84245cd0b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>     assertion=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)",
>     file=0x7f8424652104 "tpp.c", line=82, function=<optimized out>) at assert.c:92
> #3  0x00007f842446ee76 in __GI___assert_fail (
>     assertion=assertion@entry=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)",
>     file=file@entry=0x7f8424652104 "tpp.c", line=line@entry=82,
>     function=function@entry=0x7f84246521c0 <__PRETTY_FUNCTION__.0> "__pthread_tpp_change_priority") at assert.c:101
> #4  0x00007f842464f849 in __pthread_tpp_change_priority (previous_prio=previous_prio@entry=-1, new_prio=new_prio@entry=2088)
>     at tpp.c:82
> #5  0x00007f84246458a0 in __pthread_mutex_lock_full (mutex=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:545
> #6  0x00007f8424645e05 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:73
> #7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
> #8  0x00007f84246f437e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x7f839000d680)
>     at memory/unix/apr_pools.c:1505
> #9  apr_pool_walk_tree (pool=0x7f839000d680, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #10 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x1482e30)
>     at memory/unix/apr_pools.c:1511
> #11 apr_pool_walk_tree (pool=0x1482e30, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #12 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14828b0)
>     at memory/unix/apr_pools.c:1511
> #13 apr_pool_walk_tree (pool=0x14828b0, fn=0x7f84246f4190 <pool_find>, data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #14 0x00007f84246f54d8 in apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0)
>     at memory/unix/apr_pools.c:1511
> #15 apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, pool=0x14822d0) at memory/unix/apr_pools.c:1492
> #16 apr_pool_find (mem=mem@entry=0x7f83a4023f06) at memory/unix/apr_pools.c:2291
> #17 0x00007f84246e72cd in apr_table_addn (t=0x7f83a4023ab0, key=key@entry=0x7f83a4023f00 "Host",
>     val=val@entry=0x7f83a4023f06 "localhost:8532") at tables/apr_tables.c:823
> #18 0x0000000000455bb4 in ap_get_mime_headers_core (r=r@entry=0x7f83a4002370, bb=bb@entry=0x7f83a40230a0) at protocol.c:1353
> #19 0x0000000000458a3f in ap_read_request (conn=conn@entry=0x7f83a401b340) at protocol.c:1513
> #20 0x0000000000434f6f in ap_process_http_async_connection (c=0x7f83a401b340) at http_core.c:148
> #21 ap_process_http_connection (c=0x7f83a401b340) at http_core.c:252
> #22 0x000000000044cb00 in ap_run_process_connection (c=c@entry=0x7f83a401b340) at connection.c:42
> #23 0x00007f842488c8f1 in process_socket (thd=thd@entry=0x7f8408004920, p=<optimized out>, sock=0x7f8390001480,
>     cs=0x7f83a4015ef0, my_child_num=my_child_num@entry=3, my_thread_num=my_thread_num@entry=23) at event.c:1106
> #24 0x00007f842488cfd8 in worker_thread (thd=0x7f8408004920, dummy=<optimized out>) at event.c:2396
> #25 0x00007f84246ff57e in dummy_worker (opaque=0x7f8408004920) at threadproc/unix/thread.c:145
> #26 0x00007f8424643432 in start_thread (arg=<optimized out>) at pthread_create.c:477
> #27 0x00007f842453b913 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (gdb) up 7
> #7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at locks/unix/thread_mutex.c:127
> 127        rv = pthread_mutex_lock(&mutex->mutex);
> (gdb) print *mutex
> $1 = {pool = 0x0, mutex = {__data = {__lock = 1094795585, __count = 1094795585, __owner = 1094795585, __nusers = 1094795585,
>       __kind = 1094795585, __spins = 16705, __elision = 16705, __list = {__prev = 0x4141414141414141,
>         __next = 0x4141414141414141}}, __size = 'A' <repeats 40 times>, __align = 4702111234474983745},
>   cond = 0x4141414141414141, locked = 1094795585, num_waiters = 1094795585}
>
>

Regards

Rüdiger