diff mbox series

[for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit

Message ID 20230716170150.22398-1-richard.henderson@linaro.org
State New
Headers show
Series [for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit | expand

Commit Message

Richard Henderson July 16, 2023, 5:01 p.m. UTC
For user-only, the probe for page writability may race with another
thread's mprotect.  Take the mmap_lock around the operation.  This
is still faster than the start/end_exclusive fallback.

Remove the write probe in load_atomic8_or_exit.  There we don't have
the same machinery for testing the existance of an 8-byte cmpxchg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
 1 file changed, 24 insertions(+), 30 deletions(-)

Comments

Peter Maydell July 17, 2023, 10:12 a.m. UTC | #1
On Sun, 16 Jul 2023 at 18:03, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> For user-only, the probe for page writability may race with another
> thread's mprotect.  Take the mmap_lock around the operation.  This
> is still faster than the start/end_exclusive fallback.
>
> Remove the write probe in load_atomic8_or_exit.  There we don't have
> the same machinery for testing the existance of an 8-byte cmpxchg.

"existence"

>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 30 deletions(-)
>
> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
> index 4de0a80492..e7170f8ba2 100644
> --- a/accel/tcg/ldst_atomicity.c.inc
> +++ b/accel/tcg/ldst_atomicity.c.inc
> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
>          return load_atomic8(pv);
>      }
>
> -#ifdef CONFIG_USER_ONLY
> -    /*
> -     * If the page is not writable, then assume the value is immutable
> -     * and requires no locking.  This ignores the case of MAP_SHARED with
> -     * another process, because the fallback start_exclusive solution
> -     * provides no protection across processes.
> -     */
> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
> -        return *p;
> -    }
> -#endif

I don't really understand the comment in the commit message:
why would it be wrong to wrap this "test writeability and
do the operation" in the mmap-lock, the same way we do for the
16-byte case?

thanks
-- PMM
Alex Bennée July 17, 2023, 10:40 a.m. UTC | #2
Richard Henderson <richard.henderson@linaro.org> writes:

> For user-only, the probe for page writability may race with another
> thread's mprotect.  Take the mmap_lock around the operation.  This
> is still faster than the start/end_exclusive fallback.

Did we have a bug report or replication for this? 

>
> Remove the write probe in load_atomic8_or_exit.  There we don't have
> the same machinery for testing the existance of an 8-byte cmpxchg.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Otherwise makes sense to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Richard Henderson July 17, 2023, 3:07 p.m. UTC | #3
On 7/17/23 11:12, Peter Maydell wrote:
> On Sun, 16 Jul 2023 at 18:03, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> For user-only, the probe for page writability may race with another
>> thread's mprotect.  Take the mmap_lock around the operation.  This
>> is still faster than the start/end_exclusive fallback.
>>
>> Remove the write probe in load_atomic8_or_exit.  There we don't have
>> the same machinery for testing the existance of an 8-byte cmpxchg.
> 
> "existence"
> 
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
>>   1 file changed, 24 insertions(+), 30 deletions(-)
>>
>> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
>> index 4de0a80492..e7170f8ba2 100644
>> --- a/accel/tcg/ldst_atomicity.c.inc
>> +++ b/accel/tcg/ldst_atomicity.c.inc
>> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
>>           return load_atomic8(pv);
>>       }
>>
>> -#ifdef CONFIG_USER_ONLY
>> -    /*
>> -     * If the page is not writable, then assume the value is immutable
>> -     * and requires no locking.  This ignores the case of MAP_SHARED with
>> -     * another process, because the fallback start_exclusive solution
>> -     * provides no protection across processes.
>> -     */
>> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
>> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
>> -        return *p;
>> -    }
>> -#endif
> 
> I don't really understand the comment in the commit message:
> why would it be wrong to wrap this "test writeability and
> do the operation" in the mmap-lock, the same way we do for the
> 16-byte case?

It would not be wrong.  I was just thinking of the cmpxchg8 part, for which we do not have 
a configure probe, and for which I *think* there's no call, because there are no 32-bit 
hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64.


r~
Richard Henderson July 17, 2023, 3:07 p.m. UTC | #4
On 7/17/23 11:40, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> For user-only, the probe for page writability may race with another
>> thread's mprotect.  Take the mmap_lock around the operation.  This
>> is still faster than the start/end_exclusive fallback.
> 
> Did we have a bug report or replication for this?

See the comment:

> +         * We must take mmap_lock so that the query remains valid until
> +         * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
> +         * is an example that can race.


r~
Alex Bennée July 17, 2023, 6:57 p.m. UTC | #5
Richard Henderson <richard.henderson@linaro.org> writes:

> On 7/17/23 11:40, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> For user-only, the probe for page writability may race with another
>>> thread's mprotect.  Take the mmap_lock around the operation.  This
>>> is still faster than the start/end_exclusive fallback.
>> Did we have a bug report or replication for this?
>
> See the comment:
>
>> +         * We must take mmap_lock so that the query remains valid until
>> +         * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
>> +         * is an example that can race.

doh...

>
>
> r~
Peter Maydell July 20, 2023, 12:17 p.m. UTC | #6
On Mon, 17 Jul 2023 at 19:25, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 7/17/23 11:12, Peter Maydell wrote:
> > On Sun, 16 Jul 2023 at 18:03, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> For user-only, the probe for page writability may race with another
> >> thread's mprotect.  Take the mmap_lock around the operation.  This
> >> is still faster than the start/end_exclusive fallback.
> >>
> >> Remove the write probe in load_atomic8_or_exit.  There we don't have
> >> the same machinery for testing the existance of an 8-byte cmpxchg.
> >
> > "existence"
> >
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++-------------------
> >>   1 file changed, 24 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
> >> index 4de0a80492..e7170f8ba2 100644
> >> --- a/accel/tcg/ldst_atomicity.c.inc
> >> +++ b/accel/tcg/ldst_atomicity.c.inc
> >> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
> >>           return load_atomic8(pv);
> >>       }
> >>
> >> -#ifdef CONFIG_USER_ONLY
> >> -    /*
> >> -     * If the page is not writable, then assume the value is immutable
> >> -     * and requires no locking.  This ignores the case of MAP_SHARED with
> >> -     * another process, because the fallback start_exclusive solution
> >> -     * provides no protection across processes.
> >> -     */
> >> -    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
> >> -        uint64_t *p = __builtin_assume_aligned(pv, 8);
> >> -        return *p;
> >> -    }
> >> -#endif
> >
> > I don't really understand the comment in the commit message:
> > why would it be wrong to wrap this "test writeability and
> > do the operation" in the mmap-lock, the same way we do for the
> > 16-byte case?
>
> It would not be wrong.  I was just thinking of the cmpxchg8 part, for which we do not have
> a configure probe, and for which I *think* there's no call, because there are no 32-bit
> hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64.

But this piece of code the patch deletes isn't doing a
cmpxchg8, it just does a plain load. If "take the lock
and do the operation" is faster than always using the
fallback code for the atomic16 case, why don't we make
the same tradeoff choice in atomic8  ?

thanks
-- PMM
diff mbox series

Patch

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 4de0a80492..e7170f8ba2 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -152,19 +152,6 @@  static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
         return load_atomic8(pv);
     }
 
-#ifdef CONFIG_USER_ONLY
-    /*
-     * If the page is not writable, then assume the value is immutable
-     * and requires no locking.  This ignores the case of MAP_SHARED with
-     * another process, because the fallback start_exclusive solution
-     * provides no protection across processes.
-     */
-    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
-        uint64_t *p = __builtin_assume_aligned(pv, 8);
-        return *p;
-    }
-#endif
-
     /* Ultimate fallback: re-execute in serial context. */
     cpu_loop_exit_atomic(env_cpu(env), ra);
 }
@@ -186,25 +173,32 @@  static Int128 load_atomic16_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
         return atomic16_read_ro(p);
     }
 
-#ifdef CONFIG_USER_ONLY
-    /*
-     * We can only use cmpxchg to emulate a load if the page is writable.
-     * If the page is not writable, then assume the value is immutable
-     * and requires no locking.  This ignores the case of MAP_SHARED with
-     * another process, because the fallback start_exclusive solution
-     * provides no protection across processes.
-     */
-    if (page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
-        return *p;
-    }
-#endif
-
-    /*
-     * In system mode all guest pages are writable, and for user-only
-     * we have just checked writability.  Try cmpxchg.
-     */
+    /* We can only use cmpxchg to emulate a load if the page is writable. */
     if (HAVE_ATOMIC128_RW) {
+#ifdef CONFIG_USER_ONLY
+        /*
+         * If the page is not writable, then assume the value is immutable
+         * and requires no locking.  This ignores the case of MAP_SHARED with
+         * another process, because the fallback start_exclusive solution
+         * provides no protection across processes.
+         * We must take mmap_lock so that the query remains valid until
+         * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
+         * is an example that can race.
+         */
+        Int128 r;
+
+        mmap_lock();
+        if (page_get_flags(h2g(p)) & PAGE_WRITE_ORG) {
+            r = atomic16_read_rw(p);
+        } else {
+            r = *p;
+        }
+        mmap_unlock();
+        return r;
+#else
+        /* In system mode all guest pages are host writable. */
         return atomic16_read_rw(p);
+#endif
     }
 
     /* Ultimate fallback: re-execute in serial context. */