Message ID | 20230909102747.346522-11-mjt@tls.msk.ru |
---|---|
State | New |
Headers | show |
Series | None | expand |
09.09.2023 13:27, Michael Tokarev wrote: > From: Richard Henderson <richard.henderson@linaro.org> > > Acked-by: Alex Bennée <alex.bennee@linaro.org> > Suggested-by: Alex Bennée <alex.bennee@linaro.org> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > (cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc) > Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c > index 3df73542e1..7597dc1c39 100644 > --- a/softmmu/physmem.c > +++ b/softmmu/physmem.c > @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu, > int asidx = cpu_asidx_from_attrs(cpu, attrs); > CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx]; > AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch); > - MemoryRegionSection *sections = d->map.sections; > + int section_index = index & ~TARGET_PAGE_MASK; > + MemoryRegionSection *ret; > + > + assert(section_index < d->map.sections_nb); This assert now triggers on staging-8.1 https://ci.debian.net/data/autopkgtest/testing/amd64/d/dropbear/37993610/log.gz https://ci.debian.net/data/autopkgtest/testing/amd64/c/cryptsetup/37993606/log.gz > + ret = d->map.sections + section_index; > + assert(ret->mr); > + assert(ret->mr->ops); > > - return §ions[index & ~TARGET_PAGE_MASK]; > + return ret; > } > > static void io_mem_init(void) In this upload I removed softmmu-Use-async_run_on_cpu-in-tcg_commit.patch (0d58c660689f6da1), and the test run uses tcg and -smp 4, which is the configuration which 0d58c6606 was supposed to fix. qemu-system-x86_64 -no-user-config -nodefaults -name autopkgtest-cryptsetup-cryptroot-sysvinit \ -machine type=q35,graphics=off -cpu qemu64,-svm,-vmx -smp cpus=4 -m size=2G \ -vga none -display none -object rng.... I wonder if I should keep 0d58c6606 for 8.1.1 (the deadline is tomorrow).. Thanks, /mjt
18.09.2023 12:19, Michael Tokarev wrote: > 09.09.2023 13:27, Michael Tokarev wrote: >> From: Richard Henderson <richard.henderson@linaro.org> >> >> Acked-by: Alex Bennée <alex.bennee@linaro.org> >> Suggested-by: Alex Bennée <alex.bennee@linaro.org> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> (cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc) >> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> >> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >> index 3df73542e1..7597dc1c39 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu, >> int asidx = cpu_asidx_from_attrs(cpu, attrs); >> CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx]; >> AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch); >> - MemoryRegionSection *sections = d->map.sections; >> + int section_index = index & ~TARGET_PAGE_MASK; >> + MemoryRegionSection *ret; >> + >> + assert(section_index < d->map.sections_nb); > > This assert now triggers on staging-8.1 > > https://ci.debian.net/data/autopkgtest/testing/amd64/d/dropbear/37993610/log.gz > https://ci.debian.net/data/autopkgtest/testing/amd64/c/cryptsetup/37993606/log.gz > >> + ret = d->map.sections + section_index; >> + assert(ret->mr); >> + assert(ret->mr->ops); >> - return §ions[index & ~TARGET_PAGE_MASK]; >> + return ret; >> } >> static void io_mem_init(void) > > In this upload I removed softmmu-Use-async_run_on_cpu-in-tcg_commit.patch (0d58c660689f6da1), > and the test run uses tcg and -smp 4, which is the configuration which 0d58c6606 > was supposed to fix. So, should this change not be in 8.1.1 too (together with 0d58c6606), or is it just the "messenger"? Or both should go? Today is the deadline day for 8.1.1. Thanks! /mjt
Michael Tokarev <mjt@tls.msk.ru> writes: > 09.09.2023 13:27, Michael Tokarev wrote: >> From: Richard Henderson <richard.henderson@linaro.org> >> Acked-by: Alex Bennée <alex.bennee@linaro.org> >> Suggested-by: Alex Bennée <alex.bennee@linaro.org> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> (cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc) >> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >> index 3df73542e1..7597dc1c39 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu, >> int asidx = cpu_asidx_from_attrs(cpu, attrs); >> CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx]; >> AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch); >> - MemoryRegionSection *sections = d->map.sections; >> + int section_index = index & ~TARGET_PAGE_MASK; >> + MemoryRegionSection *ret; >> + >> + assert(section_index < d->map.sections_nb); > > This assert now triggers on staging-8.1 > > https://ci.debian.net/data/autopkgtest/testing/amd64/d/dropbear/37993610/log.gz > https://ci.debian.net/data/autopkgtest/testing/amd64/c/cryptsetup/37993606/log.gz The asserts are basically there to detect when we attempt to do a MR lookup and it is not fully committed. If they are firing its because things have gone wrong (which we know because we still have patches in flight for the full fix). The main benefit of the assert is we fail early rather than later on in various cryptic ways (evidenced by the fact we raised 3 different bugs which exhibited slightly different symptoms that where all fundamentally caused by getting bogus memory region data). > >> + ret = d->map.sections + section_index; >> + assert(ret->mr); >> + assert(ret->mr->ops); >> - return §ions[index & ~TARGET_PAGE_MASK]; >> + return ret; >> } >> static void io_mem_init(void) > > In this upload I removed softmmu-Use-async_run_on_cpu-in-tcg_commit.patch (0d58c660689f6da1), > and the test run uses tcg and -smp 4, which is the configuration which 0d58c6606 > was supposed to fix. > > qemu-system-x86_64 -no-user-config -nodefaults -name autopkgtest-cryptsetup-cryptroot-sysvinit \ > -machine type=q35,graphics=off -cpu qemu64,-svm,-vmx -smp cpus=4 -m size=2G \ > -vga none -display none -object rng.... > > I wonder if I should keep 0d58c6606 for 8.1.1 (the deadline is > tomorrow).. Unfortunately 0d58c is not the full fix, it papered over one crack but revealed others. It might be leading to a false sense of security. So I would argue: - keep the assert - better to fail early than to fail later in a hard to understand way - toss a coin for the 0d58c66 fix, if we include it we may end up reverting later once we have the "complete" fix but at least its slightly better for x86 while definitely breaking MIPS > > Thanks, > > /mjt
Michael Tokarev <mjt@tls.msk.ru> writes: > 18.09.2023 12:19, Michael Tokarev wrote: >> 09.09.2023 13:27, Michael Tokarev wrote: >>> From: Richard Henderson <richard.henderson@linaro.org> >>> >>> Acked-by: Alex Bennée <alex.bennee@linaro.org> >>> Suggested-by: Alex Bennée <alex.bennee@linaro.org> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> (cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc) >>> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> >>> >>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >>> index 3df73542e1..7597dc1c39 100644 >>> --- a/softmmu/physmem.c >>> +++ b/softmmu/physmem.c >>> @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu, >>> int asidx = cpu_asidx_from_attrs(cpu, attrs); >>> CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx]; >>> AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch); >>> - MemoryRegionSection *sections = d->map.sections; >>> + int section_index = index & ~TARGET_PAGE_MASK; >>> + MemoryRegionSection *ret; >>> + >>> + assert(section_index < d->map.sections_nb); >> This assert now triggers on staging-8.1 >> https://ci.debian.net/data/autopkgtest/testing/amd64/d/dropbear/37993610/log.gz >> https://ci.debian.net/data/autopkgtest/testing/amd64/c/cryptsetup/37993606/log.gz >> >>> + ret = d->map.sections + section_index; >>> + assert(ret->mr); >>> + assert(ret->mr->ops); >>> - return §ions[index & ~TARGET_PAGE_MASK]; >>> + return ret; >>> } >>> static void io_mem_init(void) >> In this upload I removed >> softmmu-Use-async_run_on_cpu-in-tcg_commit.patch (0d58c660689f6da1), >> and the test run uses tcg and -smp 4, which is the configuration which 0d58c6606 >> was supposed to fix. > > So, should this change not be in 8.1.1 too (together with 0d58c6606), > or is it just the "messenger"? Sorry my previous reply was eaten by my MUA. The main purpose of the asserts is to catch corruption to the Memory Regions early so we don't see weird failures later on (c.f. the 3 separate bugs for crashes in slightly different places). IOW is we are crashing on the asserts in this patch but it's booting without it we are just getting lucky. > > Or both should go? > > Today is the deadline day for 8.1.1. > > Thanks! > > /mjt
20.09.2023 18:04, Alex Bennée wrote: .. > Sorry my previous reply was eaten by my MUA. That happens.. especially when MUA becomes hungry :) > The main purpose of the asserts is to catch corruption to the Memory > Regions early so we don't see weird failures later on (c.f. the 3 > separate bugs for crashes in slightly different places). > > IOW is we are crashing on the asserts in this patch but it's booting > without it we are just getting lucky. Thank you very much for this summary. Yeah, this confirms my thought, but I wanted to be sure. I left it in. /mjt
20.09.2023 12:23, Alex Bennée: .. >> I wonder if I should keep 0d58c6606 for 8.1.1 (the deadline is >> tomorrow).. > > Unfortunately 0d58c is not the full fix, it papered over one crack but > revealed others. It might be leading to a false sense of security. So I > would argue: > > - keep the assert - better to fail early than to fail later in a hard > to understand way > - toss a coin for the 0d58c66 fix, if we include it we may end up > reverting later once we have the "complete" fix but at least its > slightly better for x86 while definitely breaking MIPS Heh. I've read this email just now, way after 8.1.1 has been tagged and the announcement sent. I haven't included 0d58c66 for now, without tossing coins - just to be on-par with 8.1.0, or else it is confusing at best (which stable releases brings with new issues). This whole thing is definitely worth a 8.1.2 once the fix is in. Meanwhile I pushed qemu with 0d58c66 and the "always require can_do_io" patchset to debian, - this one fixed all regressions so far. https://salsa.debian.org/qemu-team/qemu/-/tree/debian/1%258.1.0+ds-6/debian/patches/always-can-do-io-1866 https://gitlab.com/mjt0k/qemu/-/commits/staging-8.1-always-require-can_do_io/ Thank you for the thoughts, much apprecated! /mjt
diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 3df73542e1..7597dc1c39 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu, int asidx = cpu_asidx_from_attrs(cpu, attrs); CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx]; AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch); - MemoryRegionSection *sections = d->map.sections; + int section_index = index & ~TARGET_PAGE_MASK; + MemoryRegionSection *ret; + + assert(section_index < d->map.sections_nb); + ret = d->map.sections + section_index; + assert(ret->mr); + assert(ret->mr->ops); - return §ions[index & ~TARGET_PAGE_MASK]; + return ret; } static void io_mem_init(void)