mbox series

[RFC,00/13] Generic way of dealing with broken 64-bit buses

Message ID 20210226140305.26356-1-nsaenzjulienne@suse.de
Headers show
Series Generic way of dealing with broken 64-bit buses | expand

Message

Nicolas Saenz Julienne Feb. 26, 2021, 2:02 p.m. UTC
BCM2711, Raspberry Pi 4's arm64 system on chip, contains a PCIe bus that can't
handle 64-bit accesses to its MMIO address space. The issue has already been
discussed here[1], and it turns out BCM2711 isn't the only broken device in the
wild.

In most cases, the solution to this issue is to convert writeq/readq() to into
their lo_hi/hi_lo variants and the eventual introduction of some amount of
locking. But that's not good enough for every device. For example, on some
arm's SMMU configurations atomic 64-bit accesses are mandatory. This series
tries to introduce a mechanism for drivers to be able to ascertain whether or
not they are allowed to perform 64-bit accesses.

The big question is the amount of granularity needed to deal with this
(think here of distro images):

- Build-time: if a broken platform included in the image, disallow any 64-bit
  access. Drivers that need 64-bit accesses could simply bypass the check and
  hope for the best. Imposes a performance penalty on otherwise well behaving
  platforms, and features that depend on 64bit access might be disabled
  unnecessarily. It's simple to implement, yet not very generic/future proof.

- Run-time: allow/disallow 64-bit accesses based on boot time checks (i.e.
  check which platform the kernel is running on). Gets rid of all the negative
  aspects imposed to well-behaving platforms. Well-behaving buses can't coexist
  with broken ones while using all features.

- Per-device: each device has its MMIO access properties and can take decisions
  based on its local bus. That said, I'm not aware of a system that absolutely
  needs this ATM.

This series implements the third option mainly as a proof of concept.
It's my personal preference on how to deal with this. That said, my main
aim ATM is to settle on a general approach.

Regards,
Nicolas

[1] https://lore.kernel.org/linux-arm-kernel/c188698ca0de3ed6c56a0cf7880e1578aa753077.camel@suse.de/

---

Nicolas Saenz Julienne (13):
  dt-bindings: Introduce 64bit-mmio-broken
  driver core: Introduce MMIO configuration
  of: device: Introduce of_mmio_configure()
  driver core: plafrom: Introduce platform_mmio_configure()
  pci: Introduce pci_mmio_configure()
  device core: Introduce dev_64bit_mmio_supported()
  arm64: Mark ARCH_MVEBU as needing broken 64bit MMIO support
  arm64: dts: marvell: armada-ap80x: Mark config-space bus as
    64bit-mmio-broken
  iommu/arm-smmu: Make use of dev_64bit_mmio_supported()
  iommu/arm-smmu-impl: Get rid of Marvell's implementation details
  arm64: Mark ARCH_BCM2835 as needing broken 64bit MMIO support
  ARM: dts: bcm2711: Mark PCIe bus as 64bit-mmio-broken
  scsi: megaraid: Make use of dev_64bit_mmio_supported()

 .../devicetree/bindings/common-properties.txt | 15 +++++++++++
 arch/Kconfig                                  |  8 ++++++
 arch/arm/boot/dts/bcm2711.dtsi                |  1 +
 arch/arm64/Kconfig.platforms                  |  2 ++
 arch/arm64/boot/dts/marvell/armada-ap80x.dtsi |  1 +
 drivers/base/dd.c                             |  6 +++++
 drivers/base/platform.c                       |  9 +++++++
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c    | 21 ---------------
 drivers/iommu/arm/arm-smmu/arm-smmu.c         |  9 +++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.h         |  9 +++++--
 drivers/of/device.c                           | 19 ++++++++++++++
 drivers/pci/pci-driver.c                      | 26 +++++++++++++++++++
 drivers/scsi/megaraid/megaraid_sas_fusion.c   | 23 ++++++++--------
 include/linux/device.h                        | 20 ++++++++++++++
 include/linux/device/bus.h                    |  3 +++
 include/linux/of_device.h                     |  8 ++++++
 16 files changed, 145 insertions(+), 35 deletions(-)

Comments

Arnd Bergmann Feb. 26, 2021, 6:06 p.m. UTC | #1
On Fri, Feb 26, 2021 at 3:30 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Fri, Feb 26, 2021 at 3:03 PM Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de> wrote:
>
> >         unsigned long flags;
> > -       spin_lock_irqsave(&instance->hba_lock, flags);
> > -       writel(le32_to_cpu(req_desc->u.low),
> > -               &instance->reg_set->inbound_low_queue_port);
> > -       writel(le32_to_cpu(req_desc->u.high),
> > -               &instance->reg_set->inbound_high_queue_port);
> > -       spin_unlock_irqrestore(&instance->hba_lock, flags);
>
> > +
> > +       if (dev_64bit_mmio_supported(&instance->pdev->dev)) {
> > +               writeq(req_data, &instance->reg_set->inbound_low_queue_port);
> > +       } else {
> > +               spin_lock_irqsave(&instance->hba_lock, flags);
> > +               lo_hi_writeq(req_data, &instance->reg_set->inbound_low_queue_port);
> > +               spin_unlock_irqrestore(&instance->hba_lock, flags);
> > +       }
>
> I see your patch changes the code to the lo_hi_writeq() accessor,
> and it also fixes the endianness bug (double byteswap on big-endian),
> but it does not fix the spinlock bug (writel on pci leaks out of the lock
> unless it's followed by a read).

On second look, it seems your patch breaks the byteorder logic,
rather than fixing it. It would seem better to leave it unchanged
then, or to send a separate rework of the endianness conversion if
you think it is wrong.

       Arnd
Arnd Bergmann March 2, 2021, 9:32 a.m. UTC | #2
On Fri, Feb 26, 2021 at 3:03 PM Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:

>         if (smmu->impl && unlikely(smmu->impl->write_reg))
>                 smmu->impl->write_reg(smmu, page, offset, val);
> -       else
> +       else if (dev_64bit_mmio_supported(smmu->dev))
>                 writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
> +       else
> +               hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
>  }

This is a writel_relaxed(), not a writeq_relaxed(), so I suppose you don't
have to change it at all.

> +       else if (dev_64bit_mmio_supported(smmu->dev))
> +               return readq_relaxed(arm_smmu_page(smmu, page) + offset);
> +       else
> +               return hi_lo_readq_relaxed(arm_smmu_page(smmu, page) + offset);
> }


I see this pattern repeat across multiple drivers. I think Christoph
had originally
suggested folding the if/else logic into the writel_relaxed() that is defined in
include/linux/io-64-nonatomic-hi-lo.h, but of course that doesn't work if you
need to pass a device pointer.

It might still make sense to have another wrapper in that same file though,
something like

static inline hi_lo_writeq_relaxed_if_possible(struct device *dev, __u64 val,
                    volatile void __iomem *addr)
{
       if (dev_64bit_mmio_supported(smmu->dev)) {
              readq_relaxed(arm_smmu_page(smmu, page) + offset);
       } else {
               writel_relaxed(val >> 32, addr + 4);
               writel_relaxed(val, addr);
       }
}

         Arnd
Robin Murphy March 2, 2021, 11:07 a.m. UTC | #3
On 2021-02-26 14:03, Nicolas Saenz Julienne wrote:
> Some arm SMMU implementations might sit on a bus that doesn't support
> 64bit memory accesses. In that case default to using hi_lo_{readq,
> writeq}() and BUG if such platform tries to use AArch64 formats as they
> rely on writeq()'s atomicity.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++++++++
>   drivers/iommu/arm/arm-smmu/arm-smmu.h | 9 +++++++--
>   2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index d8c6bfde6a61..239ff42b20c3 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1889,6 +1889,15 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
>   			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
>   	}
>   
> +	/*
> +	 * 64bit accesses not possible through the interconnect, AArch64
> +	 * formats depend on it.
> +	 */
> +	BUG_ON(!dev_64bit_mmio_supported(smmu->dev) &&
> +	       smmu->features & (ARM_SMMU_FEAT_FMT_AARCH64_4K |
> +				ARM_SMMU_FEAT_FMT_AARCH64_16K |
> +				ARM_SMMU_FEAT_FMT_AARCH64_64K));

No. Crashing the kernel in a probe routine which is free to fail is 
unacceptable either way, but guaranteeing failure in the case that the 
workaround *would* be required is doubly so.

Basically, this logic is backwards - if you really wanted to handle it 
generically, this would be the point at which you'd need to actively 
suppress all the detected hardware features which depend on 64-bit 
atomicity, not complain about them.

> +
>   	if (smmu->impl && smmu->impl->cfg_probe) {
>   		ret = smmu->impl->cfg_probe(smmu);
>   		if (ret)
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index d2a2d1bc58ba..997d13a21717 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -477,15 +477,20 @@ static inline void arm_smmu_writel(struct arm_smmu_device *smmu, int page,
>   {
>   	if (smmu->impl && unlikely(smmu->impl->write_reg))
>   		smmu->impl->write_reg(smmu, page, offset, val);
> -	else
> +	else if (dev_64bit_mmio_supported(smmu->dev))
>   		writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
> +	else
> +		hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);

As Arnd pointed out, this is in completely the wrong place. Also, in 
general it doesn't work if the implementation already needs a hook to 
filter or override register accesses for any other reason. TBH I'm not 
convinced that this isn't *more* of a mess than handling it on a 
SoC-specific basis...

Robin.

>   }
>   
>   static inline u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset)
>   {
>   	if (smmu->impl && unlikely(smmu->impl->read_reg64))
>   		return smmu->impl->read_reg64(smmu, page, offset);
> -	return readq_relaxed(arm_smmu_page(smmu, page) + offset);
> +	else if (dev_64bit_mmio_supported(smmu->dev))
> +		return readq_relaxed(arm_smmu_page(smmu, page) + offset);
> +	else
> +		return hi_lo_readq_relaxed(arm_smmu_page(smmu, page) + offset);
>   }
>   
>   static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
>
Robin Murphy March 2, 2021, 11:29 a.m. UTC | #4
On 2021-02-26 14:02, Nicolas Saenz Julienne wrote:
> Some devices might inadvertently sit on buses that don't support 64bit
> MMIO access, and need a mechanism to query these limitations without
> prejudice to other buses in the system (i.e. defaulting to 32bit access
> system wide isn't an option).
> 
> Introduce a new bus callback, 'mmio_configure(),' which will take care
> of populating the relevant device properties based on the bus'
> limitations.

Devil's advocate: there already exist workarounds for 8-bit and/or 
16-bit accesses not working in various places, does it make sense for a 
64-bit workaround to be so wildly different and disjoint?

> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>   arch/Kconfig               | 8 ++++++++
>   drivers/base/dd.c          | 6 ++++++
>   include/linux/device.h     | 3 +++
>   include/linux/device/bus.h | 3 +++
>   4 files changed, 20 insertions(+)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 2bb30673d8e6..ba7f246b6b9d 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1191,6 +1191,14 @@ config ARCH_SPLIT_ARG64
>   config ARCH_HAS_ELFCORE_COMPAT
>   	bool
>   
> +config ARCH_HAS_64BIT_MMIO_BROKEN
> +	bool
> +	depends on 64BIT

As mentioned previously, 32-bit systems may not need the overrides for 
kernel I/O accessors, but they could still need the same workarounds for 
the memory-mapping implications (if this is to be a proper generic 
mechanism).

> +	default n

Tip: it is always redundant to state that.

Robin.

> +	help
> +	   Arch might contain busses unable to perform 64bit mmio accessses on
> +	   an otherwise 64bit system.
> +
>   source "kernel/gcov/Kconfig"
>   
>   source "scripts/gcc-plugins/Kconfig"
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9179825ff646..8086ce8f17a6 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -538,6 +538,12 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>   			goto probe_failed;
>   	}
>   
> +	if (dev->bus->mmio_configure) {
> +		ret = dev->bus->mmio_configure(dev);
> +		if (ret)
> +			goto probe_failed;
> +	}
> +
>   	if (driver_sysfs_add(dev)) {
>   		pr_err("%s: driver_sysfs_add(%s) failed\n",
>   		       __func__, dev_name(dev));
> diff --git a/include/linux/device.h b/include/linux/device.h
> index ba660731bd25..bd94aa0cbd72 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -553,6 +553,9 @@ struct device {
>   #ifdef CONFIG_DMA_OPS_BYPASS
>   	bool			dma_ops_bypass : 1;
>   #endif
> +#if defined(CONFIG_ARCH_HAS_64BIT_MMIO_BROKEN)
> +	bool			mmio_64bit_broken:1;
> +#endif
>   };
>   
>   /**
> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> index 1ea5e1d1545b..680dfc3b4744 100644
> --- a/include/linux/device/bus.h
> +++ b/include/linux/device/bus.h
> @@ -59,6 +59,8 @@ struct fwnode_handle;
>    *		bus supports.
>    * @dma_configure:	Called to setup DMA configuration on a device on
>    *			this bus.
> + * @mmio_configure:	Called to setup MMIO configuration on a device on
> + *			this bus.
>    * @pm:		Power management operations of this bus, callback the specific
>    *		device driver's pm-ops.
>    * @iommu_ops:  IOMMU specific operations for this bus, used to attach IOMMU
> @@ -103,6 +105,7 @@ struct bus_type {
>   	int (*num_vf)(struct device *dev);
>   
>   	int (*dma_configure)(struct device *dev);
> +	int (*mmio_configure)(struct device *dev);
>   
>   	const struct dev_pm_ops *pm;
>   
>
Nicolas Saenz Julienne March 2, 2021, 12:42 p.m. UTC | #5
Hi Arnd, thanks for the reviews!

On Tue, 2021-03-02 at 10:32 +0100, Arnd Bergmann wrote:
> On Fri, Feb 26, 2021 at 3:03 PM Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de> wrote:
> 
> >         if (smmu->impl && unlikely(smmu->impl->write_reg))
> >                 smmu->impl->write_reg(smmu, page, offset, val);
> > -       else
> > +       else if (dev_64bit_mmio_supported(smmu->dev))
> >                 writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
> > +       else
> > +               hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
> >  }
> 
> This is a writel_relaxed(), not a writeq_relaxed(), so I suppose you don't
> have to change it at all.

Yes, that was silly of me. I was worrying about the semantics of the whole
thing, and missed basic stuff like this.

> > +       else if (dev_64bit_mmio_supported(smmu->dev))
> > +               return readq_relaxed(arm_smmu_page(smmu, page) + offset);
> > +       else
> > +               return hi_lo_readq_relaxed(arm_smmu_page(smmu, page) + offset);
> > }
> 
> 
> I see this pattern repeat across multiple drivers. I think Christoph
> had originally
> suggested folding the if/else logic into the writel_relaxed() that is defined in
> include/linux/io-64-nonatomic-hi-lo.h, but of course that doesn't work if you
> need to pass a device pointer.
> 
> It might still make sense to have another wrapper in that same file though,
> something like
> 
> static inline hi_lo_writeq_relaxed_if_possible(struct device *dev, __u64 val,
>                     volatile void __iomem *addr)
> {
>        if (dev_64bit_mmio_supported(smmu->dev)) {
>               readq_relaxed(arm_smmu_page(smmu, page) + offset);
>        } else {
>                writel_relaxed(val >> 32, addr + 4);
>                writel_relaxed(val, addr);
>        }
> }

I like the idea. I'll try to integrate it into the next revision.

Regards,
Nicolas
Nicolas Saenz Julienne March 2, 2021, 1:38 p.m. UTC | #6
Hi Robin, thanks for taking the time to look at this.

On Tue, 2021-03-02 at 11:07 +0000, Robin Murphy wrote:
> On 2021-02-26 14:03, Nicolas Saenz Julienne wrote:
> > Some arm SMMU implementations might sit on a bus that doesn't support
> > 64bit memory accesses. In that case default to using hi_lo_{readq,
> > writeq}() and BUG if such platform tries to use AArch64 formats as they
> > rely on writeq()'s atomicity.
> > 
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> > ---
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++++++++
> >   drivers/iommu/arm/arm-smmu/arm-smmu.h | 9 +++++++--
> >   2 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index d8c6bfde6a61..239ff42b20c3 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -1889,6 +1889,15 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
> >   			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
> >   	}
> >   
> > 
> > +	/*
> > +	 * 64bit accesses not possible through the interconnect, AArch64
> > +	 * formats depend on it.
> > +	 */
> > +	BUG_ON(!dev_64bit_mmio_supported(smmu->dev) &&
> > +	       smmu->features & (ARM_SMMU_FEAT_FMT_AARCH64_4K |
> > +				ARM_SMMU_FEAT_FMT_AARCH64_16K |
> > +				ARM_SMMU_FEAT_FMT_AARCH64_64K));
> 
> No. Crashing the kernel in a probe routine which is free to fail is 
> unacceptable either way, but guaranteeing failure in the case that the 
> workaround *would* be required is doubly so.
> 
> Basically, this logic is backwards - if you really wanted to handle it 
> generically, this would be the point at which you'd need to actively 
> suppress all the detected hardware features which depend on 64-bit 
> atomicity, not complain about them.

Understood.

> > +
> >   	if (smmu->impl && smmu->impl->cfg_probe) {
> >   		ret = smmu->impl->cfg_probe(smmu);
> >   		if (ret)
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > index d2a2d1bc58ba..997d13a21717 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > @@ -477,15 +477,20 @@ static inline void arm_smmu_writel(struct arm_smmu_device *smmu, int page,
> >   {
> >   	if (smmu->impl && unlikely(smmu->impl->write_reg))
> >   		smmu->impl->write_reg(smmu, page, offset, val);
> > -	else
> > +	else if (dev_64bit_mmio_supported(smmu->dev))
> >   		writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
> > +	else
> > +		hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
> 
> As Arnd pointed out, this is in completely the wrong place. Also, in 

Yes, sorry for that, not too proud of it.

> general it doesn't work if the implementation already needs a hook to 
> filter or override register accesses for any other reason. TBH I'm not 

I'm not sure I get your point here, 'smmu->impl' has precedence over the MMIO
capability check. Custom implementations would still get their callbacks.

> convinced that this isn't *more* of a mess than handling it on a 
> SoC-specific basis...

I see your point.

Just to explain why I went to these lengths: my understanding is that the
specifics of how to perform 32bit accesses to SMMU's 64bit registers is defined
in spec. So it made sense to move it into the non implementation dependent side
of the driver.

All in all, I'll think of something simpler.

Regards,
Nicolas
Nicolas Saenz Julienne March 2, 2021, 2:09 p.m. UTC | #7
Hi Robin,

On Tue, 2021-03-02 at 11:29 +0000, Robin Murphy wrote:
> On 2021-02-26 14:02, Nicolas Saenz Julienne wrote:
> > Some devices might inadvertently sit on buses that don't support 64bit
> > MMIO access, and need a mechanism to query these limitations without
> > prejudice to other buses in the system (i.e. defaulting to 32bit access
> > system wide isn't an option).
> > 
> > Introduce a new bus callback, 'mmio_configure(),' which will take care
> > of populating the relevant device properties based on the bus'
> > limitations.
> 
> Devil's advocate: there already exist workarounds for 8-bit and/or 
> 16-bit accesses not working in various places, does it make sense for a 
> 64-bit workaround to be so wildly different and disjoint?

Can you point out an example of the workarounds?

> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> > ---
> >   arch/Kconfig               | 8 ++++++++
> >   drivers/base/dd.c          | 6 ++++++
> >   include/linux/device.h     | 3 +++
> >   include/linux/device/bus.h | 3 +++
> >   4 files changed, 20 insertions(+)
> > 
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 2bb30673d8e6..ba7f246b6b9d 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1191,6 +1191,14 @@ config ARCH_SPLIT_ARG64
> >   config ARCH_HAS_ELFCORE_COMPAT
> >   	bool
> >   
> > 
> > +config ARCH_HAS_64BIT_MMIO_BROKEN
> > +	bool
> > +	depends on 64BIT
> 
> As mentioned previously, 32-bit systems may not need the overrides for 
> kernel I/O accessors, but they could still need the same workarounds for 
> the memory-mapping implications (if this is to be a proper generic 
> mechanism).

I'll keep it in mind.

> > +	default n
>
> Tip: it is always redundant to state that.

Noted!

Regards,
Nicolas
Arnd Bergmann March 3, 2021, 8:44 a.m. UTC | #8
On Tue, Mar 2, 2021 at 12:07 PM Robin Murphy <robin.murphy@arm.com> wrote:
> On 2021-02-26 14:03, Nicolas Saenz Julienne wrote:


> > index d2a2d1bc58ba..997d13a21717 100644

> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h

> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h

> > @@ -477,15 +477,20 @@ static inline void arm_smmu_writel(struct arm_smmu_device *smmu, int page,

> >   {

> >       if (smmu->impl && unlikely(smmu->impl->write_reg))

> >               smmu->impl->write_reg(smmu, page, offset, val);

> > -     else

> > +     else if (dev_64bit_mmio_supported(smmu->dev))

> >               writel_relaxed(val, arm_smmu_page(smmu, page) + offset);

> > +     else

> > +             hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);

>

> As Arnd pointed out, this is in completely the wrong place. Also, in

> general it doesn't work if the implementation already needs a hook to

> filter or override register accesses for any other reason. TBH I'm not

> convinced that this isn't *more* of a mess than handling it on a

> SoC-specific basis...


I think the main problem for handling it in a SoC specific way is that there is
no device-independent way to do a 64-bit store as two 32-bit stores:

- some devices need hi_lo_writeq_relaxed(), others need lo_hi_writeq_relaxed(),
  and some absolutely require 64-bit stores and cannot work at all behind a
  broken PCI bus.

- if the driver requires the store to be atomic, it needs to use a spinlock
  around the two writel(), but if the register is on a PCI bus or mapped
  with page attributes that allow posted writes (like arm64 ioremap), then
  you may need to read back the register before spin_unlock to serialize
  them properly. However, reading back an mmio register is slow and can
  have side-effects, so you can't put that in driver-independent code either.

        Arnd