Message ID | 1454077035-23872-1-git-send-email-ard.biesheuvel@linaro.org |
---|---|
State | New |
Headers | show |
On 29 January 2016 at 15:17, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Instead of iterating over the PCI config window and performing individual > ioremap() calls on all the adjacent slices, perform a single ioremap() to > map the entire region, and divvy it up later. This not only prevents > leaving some of it mapped if we fail half way through, it also ensures that > archs that support huge-vmap can use section mappings to perform the > mapping. > > On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are > mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, > saving 512 KB worth of page tables. > OK, this math is slightly off: 4 KB for each 2 MB section == 64 * 4 == 128 KB > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > > huge-vmap for arm64 proposed here: > http://article.gmane.org/gmane.linux.kernel.hardened.devel/1661 > > drivers/pci/host/pci-host-generic.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c > index 1652bc70b145..3251cd779278 100644 > --- a/drivers/pci/host/pci-host-generic.c > +++ b/drivers/pci/host/pci-host-generic.c > @@ -161,6 +161,7 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) > struct device *dev = pci->host.dev.parent; > struct device_node *np = dev->of_node; > u32 sz = 1 << pci->cfg.ops->bus_shift; > + void *window; > > err = of_address_to_resource(np, 0, &pci->cfg.res); > if (err) { > @@ -186,14 +187,15 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) > return -ENOMEM; > > bus_range = pci->cfg.bus_range; > + window = devm_ioremap(dev, pci->cfg.res.start, > + (bus_range->end - bus_range->start + 1) * sz); > + if (!window) > + return -ENOMEM; > + > for (busn = bus_range->start; busn <= bus_range->end; ++busn) { > u32 idx = busn - bus_range->start; > > - pci->cfg.win[idx] = devm_ioremap(dev, > - pci->cfg.res.start + idx * sz, > - sz); > - if (!pci->cfg.win[idx]) > - return -ENOMEM; > + pci->cfg.win[idx] = window + idx * sz; > } > > return 0; > -- > 2.5.0 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 29 January 2016 at 15:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 29 January 2016 at 15:17, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> Instead of iterating over the PCI config window and performing individual >> ioremap() calls on all the adjacent slices, perform a single ioremap() to >> map the entire region, and divvy it up later. This not only prevents >> leaving some of it mapped if we fail half way through, it also ensures that >> archs that support huge-vmap can use section mappings to perform the >> mapping. >> >> On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are >> mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, >> saving 512 KB worth of page tables. >> > > OK, this math is slightly off: 4 KB for each 2 MB section == 64 * 4 == 128 KB > Sigh. 64 * 4 == 256 KB >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> --- >> >> huge-vmap for arm64 proposed here: >> http://article.gmane.org/gmane.linux.kernel.hardened.devel/1661 >> >> drivers/pci/host/pci-host-generic.c | 12 +++++++----- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c >> index 1652bc70b145..3251cd779278 100644 >> --- a/drivers/pci/host/pci-host-generic.c >> +++ b/drivers/pci/host/pci-host-generic.c >> @@ -161,6 +161,7 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) >> struct device *dev = pci->host.dev.parent; >> struct device_node *np = dev->of_node; >> u32 sz = 1 << pci->cfg.ops->bus_shift; >> + void *window; >> >> err = of_address_to_resource(np, 0, &pci->cfg.res); >> if (err) { >> @@ -186,14 +187,15 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) >> return -ENOMEM; >> >> bus_range = pci->cfg.bus_range; >> + window = devm_ioremap(dev, pci->cfg.res.start, >> + (bus_range->end - bus_range->start + 1) * sz); >> + if (!window) >> + return -ENOMEM; >> + >> for (busn = bus_range->start; busn <= bus_range->end; ++busn) { >> u32 idx = busn - bus_range->start; >> >> - pci->cfg.win[idx] = devm_ioremap(dev, >> - pci->cfg.res.start + idx * sz, >> - sz); >> - if (!pci->cfg.win[idx]) >> - return -ENOMEM; >> + pci->cfg.win[idx] = window + idx * sz; >> } >> >> return 0; >> -- >> 2.5.0 >> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Ard, On Fri, Jan 29, 2016 at 03:17:15PM +0100, Ard Biesheuvel wrote: > Instead of iterating over the PCI config window and performing individual > ioremap() calls on all the adjacent slices, perform a single ioremap() to > map the entire region, and divvy it up later. This not only prevents > leaving some of it mapped if we fail half way through, it also ensures that > archs that support huge-vmap can use section mappings to perform the > mapping. > > On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are > mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, > saving 512 KB worth of page tables. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- The code was written this way in response to feedback during driver review that we couldn't necessarily grab that much contiguous vmalloc space on 32-bit ARM. Unless that's changed, we probably want to to predicate this change on having a 64-bit arch. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 29 January 2016 at 15:28, Will Deacon <will.deacon@arm.com> wrote: > Hi Ard, > > On Fri, Jan 29, 2016 at 03:17:15PM +0100, Ard Biesheuvel wrote: >> Instead of iterating over the PCI config window and performing individual >> ioremap() calls on all the adjacent slices, perform a single ioremap() to >> map the entire region, and divvy it up later. This not only prevents >> leaving some of it mapped if we fail half way through, it also ensures that >> archs that support huge-vmap can use section mappings to perform the >> mapping. >> >> On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are >> mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, >> saving 512 KB worth of page tables. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> --- > > The code was written this way in response to feedback during driver review > that we couldn't necessarily grab that much contiguous vmalloc space on > 32-bit ARM. Unless that's changed, we probably want to to predicate this > change on having a 64-bit arch. > Ah right. How about testing for the ARCH_HAVE_HUGE_VMAP Kconfig symbol explicitly? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Friday 29 January 2016 15:32:01 Ard Biesheuvel wrote: > On 29 January 2016 at 15:28, Will Deacon <will.deacon@arm.com> wrote: > > Hi Ard, > > > > On Fri, Jan 29, 2016 at 03:17:15PM +0100, Ard Biesheuvel wrote: > >> Instead of iterating over the PCI config window and performing individual > >> ioremap() calls on all the adjacent slices, perform a single ioremap() to > >> map the entire region, and divvy it up later. This not only prevents > >> leaving some of it mapped if we fail half way through, it also ensures that > >> archs that support huge-vmap can use section mappings to perform the > >> mapping. > >> > >> On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are > >> mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, > >> saving 512 KB worth of page tables. > >> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >> --- > > > > The code was written this way in response to feedback during driver review > > that we couldn't necessarily grab that much contiguous vmalloc space on > > 32-bit ARM. Unless that's changed, we probably want to to predicate this > > change on having a 64-bit arch. > > > > Ah right. How about testing for the ARCH_HAVE_HUGE_VMAP Kconfig symbol > explicitly? > Testing for 64BIT should be sufficient. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 29 January 2016 at 15:52, Arnd Bergmann <arnd@arndb.de> wrote: > On Friday 29 January 2016 15:32:01 Ard Biesheuvel wrote: >> On 29 January 2016 at 15:28, Will Deacon <will.deacon@arm.com> wrote: >> > Hi Ard, >> > >> > On Fri, Jan 29, 2016 at 03:17:15PM +0100, Ard Biesheuvel wrote: >> >> Instead of iterating over the PCI config window and performing individual >> >> ioremap() calls on all the adjacent slices, perform a single ioremap() to >> >> map the entire region, and divvy it up later. This not only prevents >> >> leaving some of it mapped if we fail half way through, it also ensures that >> >> archs that support huge-vmap can use section mappings to perform the >> >> mapping. >> >> >> >> On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are >> >> mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, >> >> saving 512 KB worth of page tables. >> >> >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> >> --- >> > >> > The code was written this way in response to feedback during driver review >> > that we couldn't necessarily grab that much contiguous vmalloc space on >> > 32-bit ARM. Unless that's changed, we probably want to to predicate this >> > change on having a 64-bit arch. >> > >> >> Ah right. How about testing for the ARCH_HAVE_HUGE_VMAP Kconfig symbol >> explicitly? >> > > Testing for 64BIT should be sufficient. > Does it make sense to spin a v2 for this patch? Given the discussion we had regarding allocating only the config regions for busses that are populated, perhaps there is a better approach here? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Friday 05 February 2016 11:48:54 Ard Biesheuvel wrote: > On 29 January 2016 at 15:52, Arnd Bergmann <arnd@arndb.de> wrote: > > On Friday 29 January 2016 15:32:01 Ard Biesheuvel wrote: > >> On 29 January 2016 at 15:28, Will Deacon <will.deacon@arm.com> wrote: > >> > Hi Ard, > >> > > >> > On Fri, Jan 29, 2016 at 03:17:15PM +0100, Ard Biesheuvel wrote: > >> >> Instead of iterating over the PCI config window and performing individual > >> >> ioremap() calls on all the adjacent slices, perform a single ioremap() to > >> >> map the entire region, and divvy it up later. This not only prevents > >> >> leaving some of it mapped if we fail half way through, it also ensures that > >> >> archs that support huge-vmap can use section mappings to perform the > >> >> mapping. > >> >> > >> >> On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are > >> >> mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, > >> >> saving 512 KB worth of page tables. > >> >> > >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >> >> --- > >> > > >> > The code was written this way in response to feedback during driver review > >> > that we couldn't necessarily grab that much contiguous vmalloc space on > >> > 32-bit ARM. Unless that's changed, we probably want to to predicate this > >> > change on having a 64-bit arch. > >> > > >> > >> Ah right. How about testing for the ARCH_HAVE_HUGE_VMAP Kconfig symbol > >> explicitly? > >> > > > > Testing for 64BIT should be sufficient. > > > > Does it make sense to spin a v2 for this patch? Given the discussion > we had regarding allocating only the config regions for busses that > are populated, perhaps there is a better approach here? Allocating only the config regions that are actually used would be ideal, the problem is that you need to access the config space in order to know which ones are, so this is certainly a bit tricky. Are there any downsides to the x86 approach of using fixmap to map each patch separately during the access? It's probably a bit slower per access, but we don't do a lot of those accesses after the system is booted. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 5 February 2016 at 14:37, Arnd Bergmann <arnd@arndb.de> wrote: > On Friday 05 February 2016 11:48:54 Ard Biesheuvel wrote: >> On 29 January 2016 at 15:52, Arnd Bergmann <arnd@arndb.de> wrote: >> > On Friday 29 January 2016 15:32:01 Ard Biesheuvel wrote: >> >> On 29 January 2016 at 15:28, Will Deacon <will.deacon@arm.com> wrote: >> >> > Hi Ard, >> >> > >> >> > On Fri, Jan 29, 2016 at 03:17:15PM +0100, Ard Biesheuvel wrote: >> >> >> Instead of iterating over the PCI config window and performing individual >> >> >> ioremap() calls on all the adjacent slices, perform a single ioremap() to >> >> >> map the entire region, and divvy it up later. This not only prevents >> >> >> leaving some of it mapped if we fail half way through, it also ensures that >> >> >> archs that support huge-vmap can use section mappings to perform the >> >> >> mapping. >> >> >> >> >> >> On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are >> >> >> mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, >> >> >> saving 512 KB worth of page tables. >> >> >> >> >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> >> >> --- >> >> > >> >> > The code was written this way in response to feedback during driver review >> >> > that we couldn't necessarily grab that much contiguous vmalloc space on >> >> > 32-bit ARM. Unless that's changed, we probably want to to predicate this >> >> > change on having a 64-bit arch. >> >> > >> >> >> >> Ah right. How about testing for the ARCH_HAVE_HUGE_VMAP Kconfig symbol >> >> explicitly? >> >> >> > >> > Testing for 64BIT should be sufficient. >> > >> >> Does it make sense to spin a v2 for this patch? Given the discussion >> we had regarding allocating only the config regions for busses that >> are populated, perhaps there is a better approach here? > > Allocating only the config regions that are actually used would > be ideal, the problem is that you need to access the config space > in order to know which ones are, so this is certainly a bit tricky. > > Are there any downsides to the x86 approach of using fixmap to > map each patch separately during the access? It's probably a bit > slower per access, but we don't do a lot of those accesses after > the system is booted. > Yes, I guess that makes a lot more sense than ioremapping 10s to 100s of MBs and hardly ever using them. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c index 1652bc70b145..3251cd779278 100644 --- a/drivers/pci/host/pci-host-generic.c +++ b/drivers/pci/host/pci-host-generic.c @@ -161,6 +161,7 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) struct device *dev = pci->host.dev.parent; struct device_node *np = dev->of_node; u32 sz = 1 << pci->cfg.ops->bus_shift; + void *window; err = of_address_to_resource(np, 0, &pci->cfg.res); if (err) { @@ -186,14 +187,15 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) return -ENOMEM; bus_range = pci->cfg.bus_range; + window = devm_ioremap(dev, pci->cfg.res.start, + (bus_range->end - bus_range->start + 1) * sz); + if (!window) + return -ENOMEM; + for (busn = bus_range->start; busn <= bus_range->end; ++busn) { u32 idx = busn - bus_range->start; - pci->cfg.win[idx] = devm_ioremap(dev, - pci->cfg.res.start + idx * sz, - sz); - if (!pci->cfg.win[idx]) - return -ENOMEM; + pci->cfg.win[idx] = window + idx * sz; } return 0;
Instead of iterating over the PCI config window and performing individual ioremap() calls on all the adjacent slices, perform a single ioremap() to map the entire region, and divvy it up later. This not only prevents leaving some of it mapped if we fail half way through, it also ensures that archs that support huge-vmap can use section mappings to perform the mapping. On my Seattle A0 box, this transforms 128 separate 1 MB mappings that are mapped down to 4 KB pages into a single 128 MB mapping using 2 MB sections, saving 512 KB worth of page tables. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- huge-vmap for arm64 proposed here: http://article.gmane.org/gmane.linux.kernel.hardened.devel/1661 drivers/pci/host/pci-host-generic.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) -- 2.5.0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel