Message ID | 161821396263.48361.2796709239866588652.stgit@jupiter |
---|---|
State | Accepted |
Commit | 5ae5bc12d0728db60a0aa9b62160ffc038875f1a |
Headers | show |
Series | [v2] powerpc/eeh: Fix EEH handling for hugepages in ioremap space. | expand |
On 2021-04-13 10:53:39 Tue, Oliver O'Halloran wrote: > On Mon, Apr 12, 2021 at 5:52 PM Mahesh Salgaonkar <mahesh@linux.ibm.com> wrote: > > > > During the EEH MMIO error checking, the current implementation fails to map > > the (virtual) MMIO address back to the pci device on radix with hugepage > > mappings for I/O. This results into failure to dispatch EEH event with no > > recovery even when EEH capability has been enabled on the device. > > > > eeh_check_failure(token) # token = virtual MMIO address > > addr = eeh_token_to_phys(token); > > edev = eeh_addr_cache_get_dev(addr); > > if (!edev) > > return 0; > > eeh_dev_check_failure(edev); <= Dispatch the EEH event > > > > In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys > > translation that results in wrong physical address, which is then passed to > > eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges > > to get to a PCI device. Hence, it fails to find a match and the EEH event > > never gets dispatched leaving the device in failed state. > > > > The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space") > > introduced following logic to translate virt to phys for hugepage mappings: > > > > eeh_token_to_phys(): > > + pa = pte_pfn(*ptep); > > + > > + /* On radix we can do hugepage mappings for io, so handle that */ > > + if (hugepage_shift) { > > + pa <<= hugepage_shift; <= This is wrong > > + pa |= token & ((1ul << hugepage_shift) - 1); > > + } > > I think I vaguely remember thinking "is this right?" at the time. > Apparently not! > > Reviewed-by: Oliver O'Halloran <oohall@gmail.com> > Thanks for the review. > > It would probably be a good idea to add a debugfs interface to help > with testing the address translation. Maybe something like: > > echo <mmio addr> > /sys/kernel/debug/powerpc/eeh_addr_check > > Then in the kernel: > > struct resource *r = lookup_resource(mmio_addr); > void *virt = ioremap_resource(r); > ret = eeh_check_failure(virt); > iounmap(virt) > > return ret; > > A little tedious, but then you can write a selftest :) Sure, will give a try. Thanks, -Mahesh. -- Mahesh J Salgaonkar
On Mon, 12 Apr 2021 13:22:50 +0530, Mahesh Salgaonkar wrote: > During the EEH MMIO error checking, the current implementation fails to map > the (virtual) MMIO address back to the pci device on radix with hugepage > mappings for I/O. This results into failure to dispatch EEH event with no > recovery even when EEH capability has been enabled on the device. > > eeh_check_failure(token) # token = virtual MMIO address > addr = eeh_token_to_phys(token); > edev = eeh_addr_cache_get_dev(addr); > if (!edev) > return 0; > eeh_dev_check_failure(edev); <= Dispatch the EEH event > > [...] Applied to powerpc/next. [1/1] powerpc/eeh: Fix EEH handling for hugepages in ioremap space. https://git.kernel.org/powerpc/c/5ae5bc12d0728db60a0aa9b62160ffc038875f1a cheers
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index cd60bc1c87011..7040e430a1249 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -362,14 +362,11 @@ static inline unsigned long eeh_token_to_phys(unsigned long token) pa = pte_pfn(*ptep); /* On radix we can do hugepage mappings for io, so handle that */ - if (hugepage_shift) { - pa <<= hugepage_shift; - pa |= token & ((1ul << hugepage_shift) - 1); - } else { - pa <<= PAGE_SHIFT; - pa |= token & (PAGE_SIZE - 1); - } + if (!hugepage_shift) + hugepage_shift = PAGE_SHIFT; + pa <<= PAGE_SHIFT; + pa |= token & ((1ul << hugepage_shift) - 1); return pa; }