Message ID | 20240305135237.3111642-18-peter.maydell@linaro.org |
---|---|
State | Accepted |
Commit | 59754f85ed35cbd5f4bf2663ca2136c78d5b2413 |
Headers | show |
Series | [PULL,01/20] hw/i2c: Implement Broadcom Serial Controller (BSC) | expand |
Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: >From: Richard Henderson <richard.henderson@linaro.org> > >If translation is disabled, the default memory type is Device, which >requires alignment checking. This is more optimally done early via >the MemOp given to the TCG memory operation. > >Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >Reported-by: Idan Horowitz <idan.horowitz@gmail.com> >Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >Message-id: 20240301204110.656742-6-richard.henderson@linaro.org >Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 >Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Hi, This change causes an old 4.14.40 Linux kernel to panic on boot using the sabrelite machine: [snip] Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 Address=0x76f34f3e FSR 0x001 Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 Address=0x7e8492bd FSR 0x801 Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb FSR 0x001 Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf FSR 0x001 scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered imx-sgtl5000 sound: snd_soc_register_card failed (-517) Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 Address=0x7e849b05 FSR 0x001 Alignment trap: not handling instruction f8dd5015 at [<76eac95a>] Unhandled fault: alignment exception (0x001) at 0x7e849b05 pgd = 9c59c000 [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 As you can see, some alignment exceptions are handled by the kernel, the last one isn't. I added some additional printk()'s and traced it down to this location in the kernel: <https://github.com/torvalds/linux/blob/v4.14/arch/arm/mm/alignment.c#L762> which claims that ARMv6++ CPUs can handle up to word-sized unaligned accesses, thus no fixup is needed. I hope that this will be sufficient for a fix. Let me know if you need any additional information. Best regards, Bernhard >--- > target/arm/tcg/hflags.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > >diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c >index 8e5d35d9227..5da1b0fc1d4 100644 >--- a/target/arm/tcg/hflags.c >+++ b/target/arm/tcg/hflags.c >@@ -26,6 +26,35 @@ static inline bool fgt_svc(CPUARMState *env, int el) > FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL1); > } > >+/* Return true if memory alignment should be enforced. */ >+static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr) >+{ >+#ifdef CONFIG_USER_ONLY >+ return false; >+#else >+ /* Check the alignment enable bit. */ >+ if (sctlr & SCTLR_A) { >+ return true; >+ } >+ >+ /* >+ * If translation is disabled, then the default memory type is >+ * Device(-nGnRnE) instead of Normal, which requires that alignment >+ * be enforced. Since this affects all ram, it is most efficient >+ * to handle this during translation. >+ */ >+ if (sctlr & SCTLR_M) { >+ /* Translation enabled: memory type in PTE via MAIR_ELx. */ >+ return false; >+ } >+ if (el < 2 && (arm_hcr_el2_eff(env) & (HCR_DC | HCR_VM))) { >+ /* Stage 2 translation enabled: memory type in PTE. */ >+ return false; >+ } >+ return true; >+#endif >+} >+ > static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el, > ARMMMUIdx mmu_idx, > CPUARMTBFlags flags) >@@ -121,8 +150,9 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el, > { > CPUARMTBFlags flags = {}; > int el = arm_current_el(env); >+ uint64_t sctlr = arm_sctlr(env, el); > >- if (arm_sctlr(env, el) & SCTLR_A) { >+ if (aprofile_require_alignment(env, el, sctlr)) { > DP_TBFLAG_ANY(flags, ALIGN_MEM, 1); > } > >@@ -223,7 +253,7 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, > > sctlr = regime_sctlr(env, stage1); > >- if (sctlr & SCTLR_A) { >+ if (aprofile_require_alignment(env, el, sctlr)) { > DP_TBFLAG_ANY(flags, ALIGN_MEM, 1); > } >
Am 25. Mai 2024 13:41:54 UTC schrieb Bernhard Beschow <shentey@gmail.com>: > > >Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: >>From: Richard Henderson <richard.henderson@linaro.org> >> >>If translation is disabled, the default memory type is Device, which >>requires alignment checking. This is more optimally done early via >>the MemOp given to the TCG memory operation. >> >>Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >>Reported-by: Idan Horowitz <idan.horowitz@gmail.com> >>Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>Message-id: 20240301204110.656742-6-richard.henderson@linaro.org >>Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 >>Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > >Hi, > >This change causes an old 4.14.40 Linux kernel to panic on boot using the sabrelite machine: > >[snip] >Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 Address=0x76f34f3e FSR 0x001 >Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 Address=0x7e8492bd FSR 0x801 >Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb FSR 0x001 >Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf FSR 0x001 >scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 >fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered >imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered >imx-sgtl5000 sound: snd_soc_register_card failed (-517) >Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 Address=0x7e849b05 FSR 0x001 >Alignment trap: not handling instruction f8dd5015 at [<76eac95a>] >Unhandled fault: alignment exception (0x001) at 0x7e849b05 >pgd = 9c59c000 >[7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f >Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 > >---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 > >As you can see, some alignment exceptions are handled by the kernel, the last one isn't. I added some additional printk()'s and traced it down to this location in the kernel: <https://github.com/torvalds/linux/blob/v4.14/arch/arm/mm/alignment.c#L762> which claims that ARMv6++ CPUs can handle up to word-sized unaligned accesses, thus no fixup is needed. > >I hope that this will be sufficient for a fix. Let me know if you need any additional information. I'm performing a direct kernel boot. On real hardware, a bootloader is involved which probably enables unaligned access. This may explain why it works there but not in QEMU any longer. To fix direct kernel boot, it seems as if the "built-in bootloader" would need to be adapted/extended [1]. Any ideas? Best regards, Bernhard [1] https://stackoverflow.com/questions/68949890/how-does-qemu-emulate-a-kernel-without-a-bootloader > >Best regards, >Bernhard > >>--- >> target/arm/tcg/hflags.c | 34 ++++++++++++++++++++++++++++++++-- >> 1 file changed, 32 insertions(+), 2 deletions(-) >> >>diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c >>index 8e5d35d9227..5da1b0fc1d4 100644 >>--- a/target/arm/tcg/hflags.c >>+++ b/target/arm/tcg/hflags.c >>@@ -26,6 +26,35 @@ static inline bool fgt_svc(CPUARMState *env, int el) >> FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL1); >> } >> >>+/* Return true if memory alignment should be enforced. */ >>+static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr) >>+{ >>+#ifdef CONFIG_USER_ONLY >>+ return false; >>+#else >>+ /* Check the alignment enable bit. */ >>+ if (sctlr & SCTLR_A) { >>+ return true; >>+ } >>+ >>+ /* >>+ * If translation is disabled, then the default memory type is >>+ * Device(-nGnRnE) instead of Normal, which requires that alignment >>+ * be enforced. Since this affects all ram, it is most efficient >>+ * to handle this during translation. >>+ */ >>+ if (sctlr & SCTLR_M) { >>+ /* Translation enabled: memory type in PTE via MAIR_ELx. */ >>+ return false; >>+ } >>+ if (el < 2 && (arm_hcr_el2_eff(env) & (HCR_DC | HCR_VM))) { >>+ /* Stage 2 translation enabled: memory type in PTE. */ >>+ return false; >>+ } >>+ return true; >>+#endif >>+} >>+ >> static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el, >> ARMMMUIdx mmu_idx, >> CPUARMTBFlags flags) >>@@ -121,8 +150,9 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el, >> { >> CPUARMTBFlags flags = {}; >> int el = arm_current_el(env); >>+ uint64_t sctlr = arm_sctlr(env, el); >> >>- if (arm_sctlr(env, el) & SCTLR_A) { >>+ if (aprofile_require_alignment(env, el, sctlr)) { >> DP_TBFLAG_ANY(flags, ALIGN_MEM, 1); >> } >> >>@@ -223,7 +253,7 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, >> >> sctlr = regime_sctlr(env, stage1); >> >>- if (sctlr & SCTLR_A) { >>+ if (aprofile_require_alignment(env, el, sctlr)) { >> DP_TBFLAG_ANY(flags, ALIGN_MEM, 1); >> } >>
On 5/25/24 13:50, Bernhard Beschow wrote: > > > Am 25. Mai 2024 13:41:54 UTC schrieb Bernhard Beschow <shentey@gmail.com>: >> >> >> Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: >>> From: Richard Henderson <richard.henderson@linaro.org> >>> >>> If translation is disabled, the default memory type is Device, which >>> requires alignment checking. This is more optimally done early via >>> the MemOp given to the TCG memory operation. >>> >>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >>> Reported-by: Idan Horowitz <idan.horowitz@gmail.com> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> Message-id: 20240301204110.656742-6-richard.henderson@linaro.org >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> >> Hi, >> >> This change causes an old 4.14.40 Linux kernel to panic on boot using the sabrelite machine: >> >> [snip] >> Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 Address=0x76f34f3e FSR 0x001 >> Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 Address=0x7e8492bd FSR 0x801 >> Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb FSR 0x001 >> Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf FSR 0x001 >> scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 >> fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered >> imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered >> imx-sgtl5000 sound: snd_soc_register_card failed (-517) >> Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 Address=0x7e849b05 FSR 0x001 >> Alignment trap: not handling instruction f8dd5015 at [<76eac95a>] >> Unhandled fault: alignment exception (0x001) at 0x7e849b05 >> pgd = 9c59c000 >> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 >> >> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 >> >> As you can see, some alignment exceptions are handled by the kernel, the last one isn't. I added some additional printk()'s and traced it down to this location in the kernel: <https://github.com/torvalds/linux/blob/v4.14/arch/arm/mm/alignment.c#L762> which claims that ARMv6++ CPUs can handle up to word-sized unaligned accesses, thus no fixup is needed. >> >> I hope that this will be sufficient for a fix. Let me know if you need any additional information. > > I'm performing a direct kernel boot. On real hardware, a bootloader is involved which probably enables unaligned access. This may explain why it works there but not in QEMU any longer. > > To fix direct kernel boot, it seems as if the "built-in bootloader" would need to be adapted/extended [1]. Any ideas? I strongly suspect a kernel bug. Either mmu disabled or attempting unaligned access on pages mapped as Device instead of Normal. r~
On Mon, 27 May 2024 at 03:36, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 5/25/24 13:50, Bernhard Beschow wrote: > > > > > > Am 25. Mai 2024 13:41:54 UTC schrieb Bernhard Beschow <shentey@gmail.com>: > >> > >> > >> Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: > >>> From: Richard Henderson <richard.henderson@linaro.org> > >>> > >>> If translation is disabled, the default memory type is Device, which > >>> requires alignment checking. This is more optimally done early via > >>> the MemOp given to the TCG memory operation. > >>> > >>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > >>> Reported-by: Idan Horowitz <idan.horowitz@gmail.com> > >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >>> Message-id: 20240301204110.656742-6-richard.henderson@linaro.org > >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 > >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > >> > >> Hi, > >> > >> This change causes an old 4.14.40 Linux kernel to panic on boot using the sabrelite machine: > >> > >> [snip] > >> Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 Address=0x76f34f3e FSR 0x001 > >> Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 Address=0x7e8492bd FSR 0x801 > >> Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb FSR 0x001 > >> Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf FSR 0x001 > >> scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > >> fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered > >> imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered > >> imx-sgtl5000 sound: snd_soc_register_card failed (-517) > >> Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 Address=0x7e849b05 FSR 0x001 > >> Alignment trap: not handling instruction f8dd5015 at [<76eac95a>] > >> Unhandled fault: alignment exception (0x001) at 0x7e849b05 > >> pgd = 9c59c000 > >> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f > >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 > >> > >> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 > >> > >> As you can see, some alignment exceptions are handled by the kernel, the last one isn't. I added some additional printk()'s and traced it down to this location in the kernel: <https://github.com/torvalds/linux/blob/v4.14/arch/arm/mm/alignment.c#L762> which claims that ARMv6++ CPUs can handle up to word-sized unaligned accesses, thus no fixup is needed. > >> > >> I hope that this will be sufficient for a fix. Let me know if you need any additional information. > > > > I'm performing a direct kernel boot. On real hardware, a bootloader is involved which probably enables unaligned access. This may explain why it works there but not in QEMU any longer. > > > > To fix direct kernel boot, it seems as if the "built-in bootloader" would need to be adapted/extended [1]. Any ideas? > > I strongly suspect a kernel bug. Either mmu disabled or attempting unaligned access on > pages mapped as Device instead of Normal. The MMU surely must be enabled by this point in guest boot. This change doesn't affect whether we do alignment checks based on SCTLR.A being set, so it's not a simple "the bootloader was supposed to clear that and it didn't" (besides, A=0 means no checks, so that's the default anyway). So the failure is kind of weird. -- PMM
Am 27. Mai 2024 10:58:54 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: >On Mon, 27 May 2024 at 03:36, Richard Henderson ><richard.henderson@linaro.org> wrote: >> >> On 5/25/24 13:50, Bernhard Beschow wrote: >> > >> > >> > Am 25. Mai 2024 13:41:54 UTC schrieb Bernhard Beschow <shentey@gmail.com>: >> >> >> >> >> >> Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: >> >>> From: Richard Henderson <richard.henderson@linaro.org> >> >>> >> >>> If translation is disabled, the default memory type is Device, which >> >>> requires alignment checking. This is more optimally done early via >> >>> the MemOp given to the TCG memory operation. >> >>> >> >>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >> >>> Reported-by: Idan Horowitz <idan.horowitz@gmail.com> >> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> >>> Message-id: 20240301204110.656742-6-richard.henderson@linaro.org >> >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 >> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> >>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> >> >> >> Hi, >> >> >> >> This change causes an old 4.14.40 Linux kernel to panic on boot using the sabrelite machine: >> >> >> >> [snip] >> >> Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 Address=0x76f34f3e FSR 0x001 >> >> Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 Address=0x7e8492bd FSR 0x801 >> >> Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb FSR 0x001 >> >> Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf FSR 0x001 >> >> scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 >> >> fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered >> >> imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered >> >> imx-sgtl5000 sound: snd_soc_register_card failed (-517) >> >> Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 Address=0x7e849b05 FSR 0x001 >> >> Alignment trap: not handling instruction f8dd5015 at [<76eac95a>] >> >> Unhandled fault: alignment exception (0x001) at 0x7e849b05 >> >> pgd = 9c59c000 >> >> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f >> >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 >> >> >> >> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 >> >> >> >> As you can see, some alignment exceptions are handled by the kernel, the last one isn't. I added some additional printk()'s and traced it down to this location in the kernel: <https://github.com/torvalds/linux/blob/v4.14/arch/arm/mm/alignment.c#L762> which claims that ARMv6++ CPUs can handle up to word-sized unaligned accesses, thus no fixup is needed. >> >> >> >> I hope that this will be sufficient for a fix. Let me know if you need any additional information. >> > >> > I'm performing a direct kernel boot. On real hardware, a bootloader is involved which probably enables unaligned access. This may explain why it works there but not in QEMU any longer. >> > >> > To fix direct kernel boot, it seems as if the "built-in bootloader" would need to be adapted/extended [1]. Any ideas? >> >> I strongly suspect a kernel bug. Either mmu disabled or attempting unaligned access on >> pages mapped as Device instead of Normal. > >The MMU surely must be enabled by this point in guest boot. >This change doesn't affect whether we do alignment checks based >on SCTLR.A being set, so it's not a simple "the bootloader was >supposed to clear that and it didn't" (besides, A=0 means no >checks, so that's the default anyway). So the failure is kind >of weird. I think the kernel's output indicates that the MMU is active: [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f AFAIU, the value in brackets is a virtual address while the pte's are physical ones. Furthermore, the `info mtree` QMP command tells that the physical addresses are RAM addresses: 0000000010000000-000000002fffffff (prio 0, ram): sabrelite.ram So I think we can conclude this to be "normal memory" to speak in ARM terms. Regarding the Linux kernel, it seems to me that it expects the unaligned accesses (up to word size) to be resolved by the hardware. On ARMv7 it can assume this, because the SCTLR.U bit is always set to 1 [1]. It then seems to only deal with cases which the hardware can't handle. In the case above, the unhandled instruction is (output from execlog plugin): 0, 0x76ecc95a, 0x5015f8dd, "ldr.w r5, [sp, #0x15]" Note that the correct order of the machine code is 0xf8dd5015. This is not a pattern handled by the kernel, presumably because it expects it to be handled in hardware, hence the "not handling instruction xy" output. I have the impression that real hardware only traps when the hardware can't handle the unaligned access, and only when the SCTLR.A bit is set. I'm not an ARM expert, so take this with a grain of salt. Best regards, Bernhard [1] https://developer.arm.com/documentation/ddi0406/cb/Appendixes/ARMv6-Differences/Application-level-memory-support/Alignment > >-- PMM
On 5/27/24 08:29, Bernhard Beschow wrote: > I think the kernel's output indicates that the MMU is active: > > [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f > > AFAIU, the value in brackets is a virtual address while the pte's are physical ones. Furthermore, the `info mtree` QMP command tells that the physical addresses are RAM addresses: > > 0000000010000000-000000002fffffff (prio 0, ram): sabrelite.ram > > So I think we can conclude this to be "normal memory" to speak in ARM terms. Normal and Device are attributes on the page table entry. See section G5.7 Memory region attributes in the Arm ARM. But it's unlikely that the Linux kernel has messed this up, even back in 4.x days. If you want to make any progress, you'll have to share a test case. r~
Am 27. Mai 2024 16:20:44 UTC schrieb Richard Henderson <richard.henderson@linaro.org>: >On 5/27/24 08:29, Bernhard Beschow wrote: >> I think the kernel's output indicates that the MMU is active: >> >> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f >> >> AFAIU, the value in brackets is a virtual address while the pte's are physical ones. Furthermore, the `info mtree` QMP command tells that the physical addresses are RAM addresses: >> >> 0000000010000000-000000002fffffff (prio 0, ram): sabrelite.ram >> >> So I think we can conclude this to be "normal memory" to speak in ARM terms. > >Normal and Device are attributes on the page table entry. >See section G5.7 Memory region attributes in the Arm ARM. > >But it's unlikely that the Linux kernel has messed this up, even back in 4.x days. > >If you want to make any progress, you'll have to share a test case. It's a proprietary guest, so I need to strip it down first. This may take some time. Thanks for yor feedbak so far! Best regards, Bernhard > > >r~
Am 27. Mai 2024 17:49:26 UTC schrieb Bernhard Beschow <shentey@gmail.com>: > > >Am 27. Mai 2024 16:20:44 UTC schrieb Richard Henderson <richard.henderson@linaro.org>: >>On 5/27/24 08:29, Bernhard Beschow wrote: >>> I think the kernel's output indicates that the MMU is active: >>> >>> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f >>> >>> AFAIU, the value in brackets is a virtual address while the pte's are physical ones. Furthermore, the `info mtree` QMP command tells that the physical addresses are RAM addresses: >>> >>> 0000000010000000-000000002fffffff (prio 0, ram): sabrelite.ram >>> >>> So I think we can conclude this to be "normal memory" to speak in ARM terms. >> >>Normal and Device are attributes on the page table entry. >>See section G5.7 Memory region attributes in the Arm ARM. >> >>But it's unlikely that the Linux kernel has messed this up, even back in 4.x days. >> >>If you want to make any progress, you'll have to share a test case. > >It's a proprietary guest, so I need to strip it down first. This may take some time. Thanks for yor feedbak so far! I finally had some time to look deeper into it. While this patch triggered alignment issues, it is not the culprit. The culprit is that the sabrelite board sets arm_boot_info::secure_boot = true which causes the Linux kernel to run in EL3 mode where hardware alignment fixing is apparently not performed. Setting it to false fixes all problems and the guest boots just fine. Question: Does it make sense to ignore the secure_boot flag on direct kernel boot? If not, what do you suggest? Thanks, Bernhard > >Best regards, >Bernhard >> >> >>r~
On Fri, 5 Jul 2024 at 12:46, Bernhard Beschow <shentey@gmail.com> wrote: > Am 27. Mai 2024 17:49:26 UTC schrieb Bernhard Beschow <shentey@gmail.com>: > >Am 27. Mai 2024 16:20:44 UTC schrieb Richard Henderson <richard.henderson@linaro.org>: > >>On 5/27/24 08:29, Bernhard Beschow wrote: > >>> I think the kernel's output indicates that the MMU is active: > >>> > >>> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f > >>> > >>> AFAIU, the value in brackets is a virtual address while the pte's are physical ones. Furthermore, the `info mtree` QMP command tells that the physical addresses are RAM addresses: > >>> > >>> 0000000010000000-000000002fffffff (prio 0, ram): sabrelite.ram > >>> > >>> So I think we can conclude this to be "normal memory" to speak in ARM terms. > >> > >>Normal and Device are attributes on the page table entry. > >>See section G5.7 Memory region attributes in the Arm ARM. > >> > >>But it's unlikely that the Linux kernel has messed this up, even back in 4.x days. > >> > >>If you want to make any progress, you'll have to share a test case. > > > >It's a proprietary guest, so I need to strip it down first. This may take some time. Thanks for yor feedbak so far! > > I finally had some time to look deeper into it. While this patch triggered alignment issues, it is not the culprit. The culprit is that the sabrelite board sets arm_boot_info::secure_boot = true which causes the Linux kernel to run in EL3 mode where hardware alignment fixing is apparently not performed. Setting it to false fixes all problems and the guest boots just fine. > > Question: Does it make sense to ignore the secure_boot flag on direct kernel boot? If not, what do you suggest? The secure_boot flag specifically means "when direct booting a kernel, do it in Secure SVC, not NonSecure Hyp or NonSecure SVC"; it has no effect on the boot of bios/baremetal images. It's only supposed to be set on the (very few) boards where that's how kernels are booted, and kernels that boot on those boards are supposed to be able to handle being booted that way... thanks -- PMM
On Sat, 25 May 2024 at 14:41, Bernhard Beschow <shentey@gmail.com> wrote: > > > > Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell <peter.maydell@linaro.org>: > >From: Richard Henderson <richard.henderson@linaro.org> > > > >If translation is disabled, the default memory type is Device, which > >requires alignment checking. This is more optimally done early via > >the MemOp given to the TCG memory operation. > > > >Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > >Reported-by: Idan Horowitz <idan.horowitz@gmail.com> > >Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >Message-id: 20240301204110.656742-6-richard.henderson@linaro.org > >Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 > >Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > > Hi, > > This change causes an old 4.14.40 Linux kernel to panic on boot using the sabrelite machine: > > [snip] > Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 Address=0x76f34f3e FSR 0x001 > Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 Address=0x7e8492bd FSR 0x801 > Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb FSR 0x001 > Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf FSR 0x001 > scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered > imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered > imx-sgtl5000 sound: snd_soc_register_card failed (-517) > Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 Address=0x7e849b05 FSR 0x001 > Alignment trap: not handling instruction f8dd5015 at [<76eac95a>] > Unhandled fault: alignment exception (0x001) at 0x7e849b05 > pgd = 9c59c000 > [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f > Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 > > ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 I think this is the same bug as reported in https://gitlab.com/qemu-project/qemu/-/issues/2326 and which I've just sent a patchset for: https://patchew.org/QEMU/20240809160430.1144805-1-peter.maydell@linaro.org/ (The problem was that we were looking at the wrong banked SCTLR when running at Secure EL0.) thanks -- PMM
05.03.2024 16:52, Peter Maydell wrote: > From: Richard Henderson <richard.henderson@linaro.org> > > If translation is disabled, the default memory type is Device, which > requires alignment checking. This is more optimally done early via > the MemOp given to the TCG memory operation. > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > Reported-by: Idan Horowitz <idan.horowitz@gmail.com> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > Message-id: 20240301204110.656742-6-richard.henderson@linaro.org > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Hi! Apparently this change also breaks picolibc testsuite (between 8.2 and 9.0, bisect points to this commit). For example: ./qemu-system-arm \ -m 1G \ -chardev stdio,mux=on,id=stdio0 \ -semihosting-config enable=on,chardev=stdio0,arg=program-name \ -monitor none \ -serial none \ -machine none,accel=tcg \ -cpu cortex-a8 \ -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \ -nographic (yes, this testsuite uses qemu-system as a substitute of qemu-user, sort of, (ab)using -device loader) Before this change: hello world 1 checking floating point checking pos args checking long long checking c99 formats (exit code = 0) After this change: hello world 1 checking floating point checking pos args ARM fault: undef R0: 0x00000002 R1: 0x00005c90 R2: 0x201ffeac R3: 0x20200000 R4: 0x00000000 R5: 0x20000004 R6: 0x201ffec4 PC: 0x00000364 Another test from the same picolibc: timeout 1s ./qemu-system-arm \ -m 1G \ -chardev stdio,mux=on,id=stdio0 \ -semihosting-config enable=on,chardev=stdio0,arg=program-name \ -monitor none \ -serial none \ -machine none,accel=tcg \ -cpu cortex-a7 \ -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/newlib/testsuite/newlib.string/tstring_thumb_v7_nofp,cpu-num=0 \ -nographic This one succeeds immediately before this change, and just times out (qemu is basically doing nothing, according to strace) after this commit. Exactly the same happens up to current qemu master (ie, 9.1-tobe). So is not https://gitlab.com/qemu-project/qemu/-/issues/2326 and is not fixed by 4c2c0474693229c1f533239bb983495c5427784d "target/arm: Fix usage of MMU indexes when EL3 is AArch32". picolibc is built this way: picolibc-1.8.6$ meson setup . arm-none-eabi \ --prefix=/usr \ -Dc_args='-Wdate-time' \ -Dtests=true \ --cross-file scripts/cross-arm-none-eabi.txt \ -Dspecsdir=/usr/lib/picolibc/arm-none-eabi \ -Dincludedir=lib/picolibc/arm-none-eabi/include \ -Dlibdir=lib/picolibc/arm-none-eabi/lib Thanks, /mjt
On 8/28/24 17:22, Michael Tokarev wrote: > 05.03.2024 16:52, Peter Maydell wrote: >> From: Richard Henderson <richard.henderson@linaro.org> >> >> If translation is disabled, the default memory type is Device, which >> requires alignment checking. This is more optimally done early via >> the MemOp given to the TCG memory operation. >> >> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >> Reported-by: Idan Horowitz <idan.horowitz@gmail.com> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> Message-id: 20240301204110.656742-6-richard.henderson@linaro.org >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > > Hi! > > Apparently this change also breaks picolibc testsuite (between > 8.2 and 9.0, bisect points to this commit). > > For example: > > ./qemu-system-arm \ > -m 1G \ > -chardev stdio,mux=on,id=stdio0 \ > -semihosting-config enable=on,chardev=stdio0,arg=program-name \ > -monitor none \ > -serial none \ > -machine none,accel=tcg \ > -cpu cortex-a8 \ > -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/ > printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \ > -nographic Almost certainly a duplicate of #2326, fixed in master by 4c2c0474693229c1f533239bb983495c5427784d. r~
28.08.2024 14:07, Richard Henderson wrote: > On 8/28/24 17:22, Michael Tokarev wrote: >> 05.03.2024 16:52, Peter Maydell wrote: >>> From: Richard Henderson <richard.henderson@linaro.org> >>> >>> If translation is disabled, the default memory type is Device, which >>> requires alignment checking. This is more optimally done early via >>> the MemOp given to the TCG memory operation. >>> >>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >>> Reported-by: Idan Horowitz <idan.horowitz@gmail.com> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> Message-id: 20240301204110.656742-6-richard.henderson@linaro.org >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> >> Hi! >> >> Apparently this change also breaks picolibc testsuite (between >> 8.2 and 9.0, bisect points to this commit). >> >> For example: >> >> ./qemu-system-arm \ >> -m 1G \ >> -chardev stdio,mux=on,id=stdio0 \ >> -semihosting-config enable=on,chardev=stdio0,arg=program-name \ >> -monitor none \ >> -serial none \ >> -machine none,accel=tcg \ >> -cpu cortex-a8 \ >> -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/ printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \ >> -nographic > > Almost certainly a duplicate of #2326, fixed in master by 4c2c0474693229c1f533239bb983495c5427784d. Hi Richard! You can read my email to the end, where I mentioned that this problem is NOT fixed in current master and by this commit in particular. Thanks, /mjt
On Wed, 28 Aug 2024 at 08:22, Michael Tokarev <mjt@tls.msk.ru> wrote: > > 05.03.2024 16:52, Peter Maydell wrote: > > From: Richard Henderson <richard.henderson@linaro.org> > > > > If translation is disabled, the default memory type is Device, which > > requires alignment checking. This is more optimally done early via > > the MemOp given to the TCG memory operation. > > > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > > Reported-by: Idan Horowitz <idan.horowitz@gmail.com> > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > Message-id: 20240301204110.656742-6-richard.henderson@linaro.org > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > > Hi! > > Apparently this change also breaks picolibc testsuite (between > 8.2 and 9.0, bisect points to this commit). > > For example: > > ./qemu-system-arm \ > -m 1G \ > -chardev stdio,mux=on,id=stdio0 \ > -semihosting-config enable=on,chardev=stdio0,arg=program-name \ > -monitor none \ > -serial none \ > -machine none,accel=tcg \ > -cpu cortex-a8 \ > -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \ > -nographic > > (yes, this testsuite uses qemu-system as a substitute of > qemu-user, sort of, (ab)using -device loader) My immediate guess is that this code won't run on real hardware either -- i.e. that is bare-metal code that was only ever tested and run on QEMU and was previously relying on the incorrect behaviour that we didn't enforce the alignment checks that we're supposed to do when the MMU is off. We'd need to look at the picolibc test harness and build system to be sure, but it needs to do one of: * tell the compiler never to generate nonaligned accesses * set up at least a simple 1:1 set of page tables and turn on the MMU before jumping to C code and my first move would be to check whether it is trying to do either of those things. thanks -- PMM
On Wed, 28 Aug 2024 at 16:51, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Wed, 28 Aug 2024 at 08:22, Michael Tokarev <mjt@tls.msk.ru> wrote: > > > > 05.03.2024 16:52, Peter Maydell wrote: > > > From: Richard Henderson <richard.henderson@linaro.org> > > > > > > If translation is disabled, the default memory type is Device, which > > > requires alignment checking. This is more optimally done early via > > > the MemOp given to the TCG memory operation. > > > > > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > > > Reported-by: Idan Horowitz <idan.horowitz@gmail.com> > > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > > Message-id: 20240301204110.656742-6-richard.henderson@linaro.org > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204 > > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > > > > Hi! > > > > Apparently this change also breaks picolibc testsuite (between > > 8.2 and 9.0, bisect points to this commit). > > > > For example: > > > > ./qemu-system-arm \ > > -m 1G \ > > -chardev stdio,mux=on,id=stdio0 \ > > -semihosting-config enable=on,chardev=stdio0,arg=program-name \ > > -monitor none \ > > -serial none \ > > -machine none,accel=tcg \ > > -cpu cortex-a8 \ > > -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \ > > -nographic > > > > (yes, this testsuite uses qemu-system as a substitute of > > qemu-user, sort of, (ab)using -device loader) > > My immediate guess is that this code won't run on real hardware > either -- i.e. that is bare-metal code that was only ever tested > and run on QEMU and was previously relying on the incorrect > behaviour that we didn't enforce the alignment checks that we're > supposed to do when the MMU is off. I had a look at the test binary you kindly provided in https://gitlab.com/qemu-project/qemu/-/issues/2542 and that confirmed my guess. This binary would never have worked on real hardware, and it only worked on older QEMU because we weren't correctly emulating this corner of the architecture. You need to either use a new enough picolibc that you can turn on its _PICOCRT_ENABLE_MMU option, or else make sure everything is built with gcc's -mstrict-align or similar option to avoid any unaligned loads. thanks -- PMM
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c index 8e5d35d9227..5da1b0fc1d4 100644 --- a/target/arm/tcg/hflags.c +++ b/target/arm/tcg/hflags.c @@ -26,6 +26,35 @@ static inline bool fgt_svc(CPUARMState *env, int el) FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL1); } +/* Return true if memory alignment should be enforced. */ +static bool aprofile_require_alignment(CPUARMState *env, int el, uint64_t sctlr) +{ +#ifdef CONFIG_USER_ONLY + return false; +#else + /* Check the alignment enable bit. */ + if (sctlr & SCTLR_A) { + return true; + } + + /* + * If translation is disabled, then the default memory type is + * Device(-nGnRnE) instead of Normal, which requires that alignment + * be enforced. Since this affects all ram, it is most efficient + * to handle this during translation. + */ + if (sctlr & SCTLR_M) { + /* Translation enabled: memory type in PTE via MAIR_ELx. */ + return false; + } + if (el < 2 && (arm_hcr_el2_eff(env) & (HCR_DC | HCR_VM))) { + /* Stage 2 translation enabled: memory type in PTE. */ + return false; + } + return true; +#endif +} + static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el, ARMMMUIdx mmu_idx, CPUARMTBFlags flags) @@ -121,8 +150,9 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el, { CPUARMTBFlags flags = {}; int el = arm_current_el(env); + uint64_t sctlr = arm_sctlr(env, el); - if (arm_sctlr(env, el) & SCTLR_A) { + if (aprofile_require_alignment(env, el, sctlr)) { DP_TBFLAG_ANY(flags, ALIGN_MEM, 1); } @@ -223,7 +253,7 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, sctlr = regime_sctlr(env, stage1); - if (sctlr & SCTLR_A) { + if (aprofile_require_alignment(env, el, sctlr)) { DP_TBFLAG_ANY(flags, ALIGN_MEM, 1); }