Message ID | 20230810154802.16663-1-richard.henderson@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | [for-8.1] accel/tcg: Avoid reading too much in load_atom_{2,4} | expand |
On 10/8/23 17:48, Richard Henderson wrote: > When load_atom_extract_al16_or_al8 is inexpensive, we want to use > it early, in order to avoid the overhead of required_atomicity. > However, we must not read past the end of the page. > > Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc > index e5c590a499..5d92485a49 100644 > --- a/accel/tcg/ldst_atomicity.c.inc > +++ b/accel/tcg/ldst_atomicity.c.inc > @@ -404,7 +404,10 @@ static uint16_t load_atom_2(CPUArchState *env, uintptr_t ra, > return load_atomic2(pv); > } > if (HAVE_ATOMIC128_RO) { > - return load_atom_extract_al16_or_al8(pv, 2); > + intptr_t left_in_page = pi | TARGET_PAGE_MASK; > + if (likely(left_in_page <= -16)) { > + return load_atom_extract_al16_or_al8(pv, 2); > + } > } > > atmax = required_atomicity(env, pi, memop); > @@ -443,7 +446,10 @@ static uint32_t load_atom_4(CPUArchState *env, uintptr_t ra, > return load_atomic4(pv); > } > if (HAVE_ATOMIC128_RO) { > - return load_atom_extract_al16_or_al8(pv, 4); > + intptr_t left_in_page = pi | TARGET_PAGE_MASK; > + if (likely(left_in_page <= -16)) { > + return load_atom_extract_al16_or_al8(pv, 4); > + } > } Makes sense, so to the best of my knowledge: Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
On Thu, 10 Aug 2023 at 16:49, Richard Henderson <richard.henderson@linaro.org> wrote: > > When load_atom_extract_al16_or_al8 is inexpensive, we want to use > it early, in order to avoid the overhead of required_atomicity. > However, we must not read past the end of the page. > > Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > if (HAVE_ATOMIC128_RO) { > - return load_atom_extract_al16_or_al8(pv, 2); > + intptr_t left_in_page = pi | TARGET_PAGE_MASK; isn't left_in_page actually -(pi | TARGET_PAGE_MASK) ? I feel like that would be clearer than leaving it the negative of the number of bytes left in the page and comparing against -16 (and assume the compiler generates equivalent code). (I always have trouble with expressions that combine boolean operations and 2s-complement arithmetic, though.) > + if (likely(left_in_page <= -16)) { > + return load_atom_extract_al16_or_al8(pv, 2); > + } > } Either way Reviewed-by: Peter Maydell <peter.maydell@linaro.org> thanks -- PMM
diff --git a/tests/tcg/aarch64/lse2-fault.c b/tests/tcg/aarch64/lse2-fault.c new file mode 100644 index 0000000000..2187219a08 --- /dev/null +++ b/tests/tcg/aarch64/lse2-fault.c @@ -0,0 +1,38 @@ +#include <sys/mman.h> +#include <sys/shm.h> +#include <unistd.h> +#include <stdio.h> + +int main() +{ + int psize = getpagesize(); + int id; + void *p; + + /* + * We need a shared mapping to enter CF_PARALLEL mode. + * The easiest way to get that is shmat. + */ + id = shmget(IPC_PRIVATE, 2 * psize, IPC_CREAT | 0600); + if (id < 0) { + perror("shmget"); + return 2; + } + p = shmat(id, NULL, 0); + if (p == MAP_FAILED) { + perror("shmat"); + return 2; + } + + /* Protect the second page. */ + if (mprotect(p + psize, psize, PROT_NONE) < 0) { + perror("mprotect"); + return 2; + } + + /* + * Load 4 bytes, 6 bytes from the end of the page. + * On success this will load 0 from the newly allocated shm. + */ + return *(int *)(p + psize - 6); +} diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc index e5c590a499..5d92485a49 100644 --- a/accel/tcg/ldst_atomicity.c.inc +++ b/accel/tcg/ldst_atomicity.c.inc @@ -404,7 +404,10 @@ static uint16_t load_atom_2(CPUArchState *env, uintptr_t ra, return load_atomic2(pv); } if (HAVE_ATOMIC128_RO) { - return load_atom_extract_al16_or_al8(pv, 2); + intptr_t left_in_page = pi | TARGET_PAGE_MASK; + if (likely(left_in_page <= -16)) { + return load_atom_extract_al16_or_al8(pv, 2); + } } atmax = required_atomicity(env, pi, memop); @@ -443,7 +446,10 @@ static uint32_t load_atom_4(CPUArchState *env, uintptr_t ra, return load_atomic4(pv); } if (HAVE_ATOMIC128_RO) { - return load_atom_extract_al16_or_al8(pv, 4); + intptr_t left_in_page = pi | TARGET_PAGE_MASK; + if (likely(left_in_page <= -16)) { + return load_atom_extract_al16_or_al8(pv, 4); + } } atmax = required_atomicity(env, pi, memop); diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target index 617f821613..681dfa077c 100644 --- a/tests/tcg/aarch64/Makefile.target +++ b/tests/tcg/aarch64/Makefile.target @@ -9,7 +9,7 @@ AARCH64_SRC=$(SRC_PATH)/tests/tcg/aarch64 VPATH += $(AARCH64_SRC) # Base architecture tests -AARCH64_TESTS=fcvt pcalign-a64 +AARCH64_TESTS=fcvt pcalign-a64 lse2-fault fcvt: LDFLAGS+=-lm
When load_atom_extract_al16_or_al8 is inexpensive, we want to use it early, in order to avoid the overhead of required_atomicity. However, we must not read past the end of the page. Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- This should solve the problem that Mark reported for m68k. In his system-mode scenario, we would need a tlb fault on the final 7 bytes of the final page of system ram. With aarch64 FEAT_LSE2 I can create an internal alignment fault that leads to the same code path -- the test case fails before the fix. r~ --- tests/tcg/aarch64/lse2-fault.c | 38 +++++++++++++++++++++++++++++++ accel/tcg/ldst_atomicity.c.inc | 10 ++++++-- tests/tcg/aarch64/Makefile.target | 2 +- 3 files changed, 47 insertions(+), 3 deletions(-) create mode 100644 tests/tcg/aarch64/lse2-fault.c