Hi! First of all, apologies if this is the wrong place to post a problem report. I figured since I was going to reference a particular commit anyway I might as well reply to the patch series that (seemed to have) introduced the problem. > From: Lu Baolu <baolu.lu@linux.intel.com> > > [ Upstream commit a8ce9ebbecdfda3322bbcece6b3b25888217f8e3 ] > > The Access/Dirty bits in the first level page table entry will be set > whenever a page table entry was used for address translation or write > permission was successfully translated. This is always true when using > the first-level page table for kernel IOVA. Instead of wasting hardware > cycles to update the certain bits, it's better to set them up at the > beginning. This commit seems to trigger a kernel panic very early in boot for me in 5.10.37 (36 is fine): Call Trace: domain_mapping+0x16/0x90 __iommu_map+0xcd/0x120 iommu_create_device_direct_mappins.isra.0+0x175/0x210 bus_iommu_probe+0x15a/0x290 bus_set_iommu+0x7e/0xd0 intel_iommu_init+0xf84/0x112b ? e820__memblock_setup+0x76/0x76 pci_iommu_init+0x11/0x3a do_one_initcall+0x5a/0x190 kernel_init_freeable+0x140/0x185 ? rest_init+0xa4/0xa4 kernel_init+0x5/0xfc ret_from_fork+0x22/0x30 Modules linked in: CR2: 0000000000000000 ---[ end trace 0904a2a0169baf8a ]-- RIP: 0010:__domain_mapping+0xa1/0x3a0 Code: 02 4d 63 ff 0f 85 1b 02 00 00 4c 89 d3 48 c1 e3 0c 4c 09 fb 4d 85 c0 0f 84 2e 01 00 00 45 31 e4 31 ed 45 31 c9 4d 85 e4 75 58 <49> 8b 5d 00 41 8b 45 08 41 8b 4d 0c 48 83 e3 fc 48 2b 1d 28 8f b8 RSP: 0000:ffffafc54002bc00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000dd7e4003 RCX: 000000000000002d RDX: 0000000000000000 RSI: 00000000000dd7e4 RDI: ffff8b260108ac00 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000dd7e4 R11: ffff8b260108ac00 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000000dd7e4 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8b290ec80000(0000) knlGS: 000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 00000000080050033 CR2: 000000000000000 CR3: 0000000009700c001 CR4: 00000000001706e0 Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 I managed to build the kernel with debug info, and could trace the problem to drivers/iommu/intel/iommu.c: (gdb) l *__domain_mapping+0xa1 0xffffffff8146b171 is in __domain_mapping (drivers/iommu/intel/iommu.c:2381). I then had a look at the commits for v5.10.36..v5.10.37 that touched that file, and on a complete hunch reverted this one (there were only 4, and this one looked the most suspect to my eyes). I could successfully boot into the system again after that. I'm unsure what other information about my system to include, please advise. Something to note is that I am compiling the 5.10 series with GCC 11 for which I'm manually pulling in the following commits: 1e860048c53ee77ee9870dcce94847a28544b753 67a5a68013056cbcf0a647e36cb6f4622fb6a470 I have not yet tried building 5.10.37 with GCC 10 because I already cleaned the old compiler from my system. I don't think the compiler is to blame here, however. Thanks a lot, -- Wolfgang
Hi Wolfgang, On 5/15/21 9:28 PM, Wolfgang Müller wrote: > Hi! > > First of all, apologies if this is the wrong place to post a problem > report. I figured since I was going to reference a particular commit > anyway I might as well reply to the patch series that (seemed to have) > introduced the problem. > >> From: Lu Baolu <baolu.lu@linux.intel.com> >> >> [ Upstream commit a8ce9ebbecdfda3322bbcece6b3b25888217f8e3 ] >> >> The Access/Dirty bits in the first level page table entry will be set >> whenever a page table entry was used for address translation or write >> permission was successfully translated. This is always true when using >> the first-level page table for kernel IOVA. Instead of wasting hardware >> cycles to update the certain bits, it's better to set them up at the >> beginning. > > This commit seems to trigger a kernel panic very early in boot for me in > 5.10.37 (36 is fine): It seems due to the back-ported patch: - if (!sg) { - sg_res = nr_pages; - pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; + if (domain->domain.type == IOMMU_DOMAIN_DMA) { + attr |= DMA_FL_PTE_ACCESS; + if (prot & DMA_PTE_WRITE) + attr |= DMA_FL_PTE_DIRTY; + } } + pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; Greg, do you want me to rework this patch, or submit an incremental fix? Best regards, baolu
On Mon, May 17, 2021 at 10:38:42AM +0800, Lu Baolu wrote: >Hi Wolfgang, > >On 5/15/21 9:28 PM, Wolfgang Müller wrote: >>Hi! >> >>First of all, apologies if this is the wrong place to post a problem >>report. I figured since I was going to reference a particular commit >>anyway I might as well reply to the patch series that (seemed to have) >>introduced the problem. >> >>>From: Lu Baolu <baolu.lu@linux.intel.com> >>> >>>[ Upstream commit a8ce9ebbecdfda3322bbcece6b3b25888217f8e3 ] >>> >>>The Access/Dirty bits in the first level page table entry will be set >>>whenever a page table entry was used for address translation or write >>>permission was successfully translated. This is always true when using >>>the first-level page table for kernel IOVA. Instead of wasting hardware >>>cycles to update the certain bits, it's better to set them up at the >>>beginning. >> >>This commit seems to trigger a kernel panic very early in boot for me in >>5.10.37 (36 is fine): > >It seems due to the back-ported patch: > >- if (!sg) { >- sg_res = nr_pages; >- pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; >+ if (domain->domain.type == IOMMU_DOMAIN_DMA) { >+ attr |= DMA_FL_PTE_ACCESS; >+ if (prot & DMA_PTE_WRITE) >+ attr |= DMA_FL_PTE_DIRTY; >+ } > } > >+ pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; > >Greg, do you want me to rework this patch, or submit an incremental fix? Could you send a reworked patch please? -- Thanks, Sasha
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 3295e5e162a4..8010c3895f8c 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1028,8 +1028,11 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE); pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE; - if (domain_use_first_level(domain)) + if (domain_use_first_level(domain)) { pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US; + if (domain->domain.type == IOMMU_DOMAIN_DMA) + pteval |= DMA_FL_PTE_ACCESS; + } if (cmpxchg64(&pte->val, 0ULL, pteval)) /* Someone else set it while we were thinking; use theirs. */ free_pgtable_page(tmp_page); @@ -2354,14 +2357,18 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return -EINVAL; attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP); - if (domain_use_first_level(domain)) + if (domain_use_first_level(domain)) { attr |= DMA_FL_PTE_PRESENT | DMA_FL_PTE_XD | DMA_FL_PTE_US; - if (!sg) { - sg_res = nr_pages; - pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; + if (domain->domain.type == IOMMU_DOMAIN_DMA) { + attr |= DMA_FL_PTE_ACCESS; + if (prot & DMA_PTE_WRITE) + attr |= DMA_FL_PTE_DIRTY; + } } + pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; + while (nr_pages > 0) { uint64_t tmp; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 94522685a0d9..ccaa057faf8c 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -42,6 +42,8 @@ #define DMA_FL_PTE_PRESENT BIT_ULL(0) #define DMA_FL_PTE_US BIT_ULL(2) +#define DMA_FL_PTE_ACCESS BIT_ULL(5) +#define DMA_FL_PTE_DIRTY BIT_ULL(6) #define DMA_FL_PTE_XD BIT_ULL(63) #define ADDR_WIDTH_5LEVEL (57)