mbox series

[00/30] Make a new API for drivers to use to get their FW

Message ID 0-v1-f82a05539a64+5042-iommu_fwspec_p2_jgg@nvidia.com
Headers show
Series Make a new API for drivers to use to get their FW | expand

Message

Jason Gunthorpe Nov. 30, 2023, 1:10 a.m. UTC
This series improves the way drivers get their FW data. It introduces a
series of driver facing APIs and converts every driver to use them and
replaces fwspec with this mechanism.

The new API allows drivers to do less stuff in alot of cases, and permits
all drivers to sequence their operations in a logical way with correct
error handling. It doesn't force the idea of pre or post parse on the
driver, though I choose to implement post-parse here.

Aside from the API change this takes on a few other goals:

 1) New API for drivers to simplify drivers and make them more robust
 2) Remove all FW parsing from _dma_configure()
 3) Make the global lock back into a static
 4) Organize things to reduce the probe time memory allocations from
     fw_spec, N*realloc, dev_iommu, device-private
    To just allocating device-private. The last step to merge dev_iommu
    and device-private would be a follow on couple of patches.
 5) Get everything setup to allow all devices to probe during bus scan,
    not during _dma_configure.
 6) Better layering between the ACPI and IOMMU code, less
    iommu related things in the ACPI directory.

The drivers break down into two broad approaches for accessing FW in a
driver's probe_device() function. AMD, Intel, mtk_v1, omap, S390,
smmu-legacy and tegra can "self probe" using code in their probe_device()
functions to parse their associated FW descriptions.

The rest (DART, smmu-modern, qcom, smmu-v3, exynos, ipmmu, msm, mtk,
rockchip, sprd, sun50i, virtio) use the iommu_fwspec system and rely on
the core code to do the FW parsing for them before invoking
probe_device().

To understand the philosophy of this design it is worth to take a moment
and review the 2009 LWN article on Kernel Design
patterns (https://lwn.net/Articles/336262/) exploring the "midlayer
mistake". While I do not entirely agree that midlayers are unconditionally
bad, the discussion does nicely frame the architectural choice I've made
in this series.

Organize the FW parsing so the drivers have an toolbox style API to
extract the information they require from the FW. Create a set of APIs for
the drivers to call that do exactly what the current set of drivers need
in an efficient and understandable way. The API is designed to
significantly clean and simplify the drivers.

The API transforms the iommu_fwspec from a long-term allocated struct into
a short-term on-stack struct iommu_probe_info. Using a short-term
structure means we no longer have to be memory efficient and can store
temporary details about the ongoing parsing in the structure which is the
key to allowing the new API.

Further, philosophically, the API changes to make the iommu driver
responsible to provide per-device storage for any parsed FW data. It is
stored in whatever format the driver finds appropriate. The API provides
concise helpers to make this easy for the standard 'u32 id[]' cases. The
unusual cases were already using unique per-driver storage and continue to
do so in a cleaner, less convoluted way.

The design allows the API to be implemented either as a 'parse on demand'
or 'parse and cache in advance' (more like fwspec) approach. I've
implemented the 'parse on demand' version here, but I'm not fixed on
it. Let's discuss.

The choice is deliberately not baked into the driver facing API,
iommu_probe_info combined with the generic *_iommu_for_each_id() lower
layer provides a lot of flexibility to do whatever organization we want.

The specific commit messages provide details, but the drivers break down
into three basic idiomatic sequences for their probe_device with the new
API.

Single iommu with no ID list:
  iommu_driver = iommu_of_get_single_iommu(pinf, ops, num_cells,
                                iommu_driver_struct, iommu_member);
  per_driver = kzalloc(..)
  per_driver->iommu = iommu_driver;
  dev_iommu_priv_set(dev, per_driver);
  return &iommu_driver->iommu_member;

Single iommu with a simple u32 ID list:
  iommu_driver = iommu_of_get_single_iommu(pinf, ops, num_cells,
                                iommu_driver_struct, iommu_member);
  per_driver = iommu_fw_alloc_per_device_ids(pinf, per_driver);
  per_driver->iommu = iommu_driver;
  dev_iommu_priv_set(dev, per_driver);
  return &iommu_driver->iommu_member;

The iommu_fw_alloc_per_device_ids() helper allocates a correctly sized
per-driver struct and places a u32 ID list into the flex array
per_driver->ids[]. This removes the need for a fw_spec allocation.

Complex multi-iommu or complex ID list:
  static int driver_of_xlate(struct iommu_device *iommu,
			 struct of_phandle_args *args, void *priv);

  per_driver = kzalloc(...);
  iommu_of_xlate(pinf, &ops, num_cells, &driver_of_xlate, per_driver);

  dev_iommu_priv_set(dev, per_driver);
  return &per_driver->iommu; // The first iommu_device parsed

The driver will process the given (iommu, args) tuples and store them into
the per_driver struct in its own way (eg DART encodes things into a 2D
array). Allocating the per_driver struct before running the parse cleans
all the tortured logic.

The VIOT and IORT ACPI parsers provide the same API. Drivers supporting
ACPI will call the ACPI helper (ie iommu_iort_get_single_iommu()) which
understands how to parse OF if no ACPI is present. Since the SMMU drivers
directly invoke the IORT parser they also get back the extra IORT ACPI
data in a "struct iort_params", which eliminates the need to create SW
properties or store SMMU only stuff in the fwspec.

The bulk of the series is converting each driver to this toolbox API. The
design of the toolbox functions are lightweight from a driver perspective
and generally replace existing steps that the drivers already had to do. A
significant amount of inefficient boiler plate is removed from all drivers
related to how the driver obtains the iommu_device pointers.

The implementation of the three flavours of FW parsers (OF/IORT/VIOT) are
all structured the same. The lower FW level provides
a *_iommu_for_each_id() function which parses and iterates over the actual
FW table and calls a function pointer for each (instance, id). The iommu
layer then uses this for_each primitive to do the IOMMU specific stuff and
create the above APIs.

To allow the iommu_of_get_single_iommu()/iommu_fw_alloc_per_device_ids()
split API the get_single will count all of the IDs and then alloc will
size the flex array. The iommu_probe_info has scratch space for some ids
which can then be memcpy'd without a reparse. Reparse is kept as a backup
so we can handle arbitary complexity without burdening the typical fast
path.

The removal of all FW parsing from _dma_configure leaves only the
initialization of the iommu_probe_info behind. My plan is to add a new bus
op 'init iommu probe info' that will do this step and then we can fully
execute probe outside the *_dma_configure() context.

The big win here is the extensive cleaning of the driver's probe
paths. There is alot of "creative" stuff there, this cleans almost all of
it away.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec

Cc: Hector Martin <marcan@marcan.st>
Cc: André Draszik <andre.draszik@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (30):
  iommu/of: Make a of_iommu_for_each_id()
  ACPI: VIOT: Make a viot_iommu_for_each_id()
  ACPI: IORT: Make a iort_iommu_for_each_id()
  ACPI: IORT: Remove fwspec from the reserved region code
  iommu: Add iommu_probe_info
  iommu: Make iommu_ops_from_fwnode() return the iommu_device
  iommu/of: Call of_iommu_get_resv_regions() directly
  iommu/of: Add iommu_of_get_single_iommu()
  iommu/rockchip: Move to iommu_of_get_single_iommu()
  iommu/sprd: Move to iommu_of_get_single_iommu()
  iommu/sun50i: Move to iommu_of_get_single_iommu()
  iommu/of: Add iommu_of_xlate()
  iommu/dart: Move to iommu_of_xlate()
  iommu/exynos: Move to iommu_of_xlate()
  iommu/msm: Move to iommu_of_xlate()
  iommu/tegra: Route tegra_dev_iommu_get_stream_id() through an op
  iommu: Add iommu_fw_alloc_per_device_ids()
  iommu/tegra: Move to iommu_fw_alloc_per_device_ids()
  iommu/mtk: Move to iommu_fw_alloc_per_device_ids()
  iommu/ipmmu-vmsa: Move to iommu_fw_alloc_per_device_ids()
  iommu/mtk_v1: Move to iommu_fw_alloc_per_device_ids()
  iommu/qcom: Move to iommu_fw_alloc_per_device_ids()
  iommu/viot: Add iommu_viot_get_single_iommu()
  iommu/virtio: Move to iommu_fw_alloc_per_device_ids()
  iommu/iort: Add iommu_iort_get_single_iommu()
  iommu/arm-smmu-v3: Move to iommu_fw_alloc_per_device_ids()
  iommu/arm-smmu: Move to iommu_of_xlate()
  iommu: Call all drivers if there is no fwspec
  iommu: Check for EPROBE_DEFER using the new FW parsers
  iommu: Remove fwspec and related

 drivers/acpi/arm64/iort.c                   | 233 +++++++---------
 drivers/acpi/scan.c                         |  58 +---
 drivers/acpi/viot.c                         |  67 ++---
 drivers/iommu/Makefile                      |   2 +
 drivers/iommu/apple-dart.c                  |  62 +++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  73 ++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   4 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |   6 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       | 209 +++++++-------
 drivers/iommu/arm/arm-smmu/arm-smmu.h       |  13 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c     | 160 +++++------
 drivers/iommu/dma-iommu.c                   |  22 --
 drivers/iommu/dma-iommu.h                   |   6 -
 drivers/iommu/exynos-iommu.c                |  78 +++---
 drivers/iommu/iommu.c                       | 271 ++++++++++++------
 drivers/iommu/iort_iommu.c                  |  99 +++++++
 drivers/iommu/ipmmu-vmsa.c                  |  95 +++----
 drivers/iommu/msm_iommu.c                   |  92 +++----
 drivers/iommu/mtk_iommu.c                   | 115 ++++----
 drivers/iommu/mtk_iommu_v1.c                | 162 +++++------
 drivers/iommu/of_iommu.c                    | 288 ++++++++++++++------
 drivers/iommu/rockchip-iommu.c              |  73 ++---
 drivers/iommu/sprd-iommu.c                  |  31 +--
 drivers/iommu/sun50i-iommu.c                |  59 ++--
 drivers/iommu/tegra-smmu.c                  | 158 ++++-------
 drivers/iommu/viot_iommu.c                  |  71 +++++
 drivers/iommu/virtio-iommu.c                |  71 ++---
 include/acpi/acpi_bus.h                     |   3 -
 include/linux/acpi_iort.h                   |  24 +-
 include/linux/acpi_viot.h                   |  16 +-
 include/linux/iommu-driver.h                | 250 +++++++++++++++++
 include/linux/iommu.h                       | 101 +------
 include/linux/of_iommu.h                    |   8 -
 33 files changed, 1649 insertions(+), 1331 deletions(-)
 create mode 100644 drivers/iommu/iort_iommu.c
 create mode 100644 drivers/iommu/viot_iommu.c
 create mode 100644 include/linux/iommu-driver.h


base-commit: 68ec454bc1514f557686b5895dd9719e18d31705

Comments

Rafael J. Wysocki Nov. 30, 2023, 1:22 p.m. UTC | #1
On Thu, Nov 30, 2023 at 2:11 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> Similar to of_iommu_for_each_id() this parses the VIOT ACPI description
> and invokes a function over each entry in the table.
>
> Have viot_iommu_configure() use the new function to call
> viot_dev_iommu_init().
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/viot.c       | 54 +++++++++++++++++++++++----------------
>  include/linux/acpi_viot.h | 11 ++++++++
>  2 files changed, 43 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> index c8025921c129b2..7ab35ef05c84e0 100644
> --- a/drivers/acpi/viot.c
> +++ b/drivers/acpi/viot.c
> @@ -25,13 +25,6 @@
>  #include <linux/pci.h>
>  #include <linux/platform_device.h>
>
> -struct viot_iommu {
> -       /* Node offset within the table */
> -       unsigned int                    offset;
> -       struct fwnode_handle            *fwnode;
> -       struct list_head                list;
> -};
> -
>  struct viot_endpoint {
>         union {
>                 /* PCI range */
> @@ -304,10 +297,10 @@ void __init acpi_viot_init(void)
>         acpi_put_table(hdr);
>  }
>
> -static int viot_dev_iommu_init(struct device *dev, struct viot_iommu *viommu,
> -                              u32 epid)
> +static int viot_dev_iommu_init(struct viot_iommu *viommu, u32 epid, void *info)
>  {
>         const struct iommu_ops *ops;
> +       struct device *dev = info;
>
>         if (!viommu)
>                 return -ENODEV;
> @@ -324,11 +317,17 @@ static int viot_dev_iommu_init(struct device *dev, struct viot_iommu *viommu,
>         return acpi_iommu_fwspec_init(dev, epid, viommu->fwnode, ops);
>  }
>
> -static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
> +struct viot_pci_iommu_alias_info {
> +       struct device *dev;
> +       viot_for_each_fn fn;
> +       void *info;
> +};
> +
> +static int __for_each_pci_alias(struct pci_dev *pdev, u16 dev_id, void *data)
>  {
>         u32 epid;
>         struct viot_endpoint *ep;
> -       struct device *aliased_dev = data;
> +       struct viot_pci_iommu_alias_info *info = data;
>         u32 domain_nr = pci_domain_nr(pdev->bus);
>
>         list_for_each_entry(ep, &viot_pci_ranges, list) {
> @@ -339,14 +338,14 @@ static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
>                         epid = ((domain_nr - ep->segment_start) << 16) +
>                                 dev_id - ep->bdf_start + ep->endpoint_id;
>
> -                       return viot_dev_iommu_init(aliased_dev, ep->viommu,
> -                                                  epid);
> +                       return info->fn(ep->viommu, epid, info->info);
>                 }
>         }
>         return -ENODEV;
>  }
>
> -static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
> +static int __for_each_platform(struct platform_device *pdev,
> +                              viot_for_each_fn fn, void *info)
>  {
>         struct resource *mem;
>         struct viot_endpoint *ep;
> @@ -357,12 +356,28 @@ static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
>
>         list_for_each_entry(ep, &viot_mmio_endpoints, list) {
>                 if (ep->address == mem->start)
> -                       return viot_dev_iommu_init(&pdev->dev, ep->viommu,
> -                                                  ep->endpoint_id);
> +                       return fn(ep->viommu, ep->endpoint_id, info);
>         }
>         return -ENODEV;
>  }
>
> +int viot_iommu_for_each_id(struct device *dev, viot_for_each_fn fn, void *info)
> +{
> +       if (dev_is_pci(dev)) {
> +               struct viot_pci_iommu_alias_info pci_info = {
> +                       .dev = dev,
> +                       .fn = fn,
> +                       .info = info,
> +               };
> +               return pci_for_each_dma_alias(to_pci_dev(dev),
> +                                             __for_each_pci_alias, &pci_info);
> +       }
> +
> +       if (dev_is_platform(dev))
> +               return __for_each_platform(to_platform_device(dev), fn, info);
> +       return -ENODEV;
> +}
> +
>  /**
>   * viot_iommu_configure - Setup IOMMU ops for an endpoint described by VIOT
>   * @dev: the endpoint
> @@ -371,10 +386,5 @@ static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
>   */
>  int viot_iommu_configure(struct device *dev)
>  {
> -       if (dev_is_pci(dev))
> -               return pci_for_each_dma_alias(to_pci_dev(dev),
> -                                             viot_pci_dev_iommu_init, dev);
> -       else if (dev_is_platform(dev))
> -               return viot_mmio_dev_iommu_init(to_platform_device(dev));
> -       return -ENODEV;
> +       return viot_iommu_for_each_id(dev, viot_dev_iommu_init, dev);
>  }
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> index a5a12243156377..fce4eefcae4aad 100644
> --- a/include/linux/acpi_viot.h
> +++ b/include/linux/acpi_viot.h
> @@ -5,6 +5,17 @@
>
>  #include <linux/acpi.h>
>
> +struct viot_iommu {
> +       /* Node offset within the table */
> +       unsigned int                    offset;
> +       struct fwnode_handle            *fwnode;
> +       struct list_head                list;
> +};
> +
> +typedef int (*viot_for_each_fn)(struct viot_iommu *viommu, u32 epid,
> +                               void *info);
> +int viot_iommu_for_each_id(struct device *dev, viot_for_each_fn fn, void *info);
> +
>  #ifdef CONFIG_ACPI_VIOT
>  void __init acpi_viot_early_init(void);
>  void __init acpi_viot_init(void);
> --
> 2.42.0
>