diff mbox series

[v4,3/3] iommu: dart: Add DART iommu driver

Message ID 20210627143405.77298-4-sven@svenpeter.dev
State New
Headers show
Series Apple M1 DART IOMMU driver | expand

Commit Message

Sven Peter June 27, 2021, 2:34 p.m. UTC
Apple's new SoCs use iommus for almost all peripherals. These Device
Address Resolution Tables must be setup before these peripherals can
act as DMA masters.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 MAINTAINERS                      |    1 +
 drivers/iommu/Kconfig            |   15 +
 drivers/iommu/Makefile           |    1 +
 drivers/iommu/apple-dart-iommu.c | 1058 ++++++++++++++++++++++++++++++
 4 files changed, 1075 insertions(+)
 create mode 100644 drivers/iommu/apple-dart-iommu.c

Comments

Alyssa Rosenzweig June 30, 2021, 1:49 p.m. UTC | #1
Looks really good! Just a few minor comments. With them addressed,

	Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>


> +	  Say Y here if you are using an Apple SoC with a DART IOMMU.


Nit: Do we need to spell out "with a DART IOMMU"? Don't all the apple
socs need DART?

> +/*

> + * This structure is used to identify a single stream attached to a domain.

> + * It's used as a list inside that domain to be able to attach multiple

> + * streams to a single domain. Since multiple devices can use a single stream

> + * it additionally keeps track of how many devices are represented by this

> + * stream. Once that number reaches zero it is detached from the IOMMU domain

> + * and all translations from this stream are disabled.

> + *

> + * @dart: DART instance to which this stream belongs

> + * @sid: stream id within the DART instance

> + * @num_devices: count of devices attached to this stream

> + * @stream_head: list head for the next stream

> + */

> +struct apple_dart_stream {

> +	struct apple_dart *dart;

> +	u32 sid;

> +

> +	u32 num_devices;

> +

> +	struct list_head stream_head;

> +};


It wasn't obvious to me why we can get away without reference counting.
Looking ahead it looks like we assert locks in each case. Maybe add
that to the comment?

```
> +static void apple_dart_hw_set_ttbr(struct apple_dart *dart, u16 sid, u16 idx,

> +				   phys_addr_t paddr)

> +{

> +	writel(DART_TTBR_VALID | (paddr >> DART_TTBR_SHIFT),

> +	       dart->regs + DART_TTBR(sid, idx));

> +}

```

Should we be checking alignment here? Something like

    BUG_ON(paddr & ((1 << DART_TTBR_SHIFT) - 1));
Sven Peter July 12, 2021, 11:02 a.m. UTC | #2
Hi,


On Wed, Jun 30, 2021, at 15:49, Alyssa Rosenzweig wrote:
> Looks really good! Just a few minor comments. With them addressed,

> 

> 	Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>


Thanks!

> 

> > +	  Say Y here if you are using an Apple SoC with a DART IOMMU.

> 

> Nit: Do we need to spell out "with a DART IOMMU"? Don't all the apple

> socs need DART?


Good point, I'll remove it.

> 

> > +/*

> > + * This structure is used to identify a single stream attached to a domain.

> > + * It's used as a list inside that domain to be able to attach multiple

> > + * streams to a single domain. Since multiple devices can use a single stream

> > + * it additionally keeps track of how many devices are represented by this

> > + * stream. Once that number reaches zero it is detached from the IOMMU domain

> > + * and all translations from this stream are disabled.

> > + *

> > + * @dart: DART instance to which this stream belongs

> > + * @sid: stream id within the DART instance

> > + * @num_devices: count of devices attached to this stream

> > + * @stream_head: list head for the next stream

> > + */

> > +struct apple_dart_stream {

> > +	struct apple_dart *dart;

> > +	u32 sid;

> > +

> > +	u32 num_devices;

> > +

> > +	struct list_head stream_head;

> > +};

> 

> It wasn't obvious to me why we can get away without reference counting.

> Looking ahead it looks like we assert locks in each case. Maybe add

> that to the comment?


Sure, I'll add that to the comment.

> 

> ```

> > +static void apple_dart_hw_set_ttbr(struct apple_dart *dart, u16 sid, u16 idx,

> > +				   phys_addr_t paddr)

> > +{

> > +	writel(DART_TTBR_VALID | (paddr >> DART_TTBR_SHIFT),

> > +	       dart->regs + DART_TTBR(sid, idx));

> > +}

> ```

> 

> Should we be checking alignment here? Something like

> 

>     BUG_ON(paddr & ((1 << DART_TTBR_SHIFT) - 1));

> 


Sure, right now paddr will always be aligned but adding that
BUG_ON doesn't hurt :)



Best,

Sven
Alyssa Rosenzweig July 12, 2021, 1:53 p.m. UTC | #3
> > Should we be checking alignment here? Something like

> > 

> >     BUG_ON(paddr & ((1 << DART_TTBR_SHIFT) - 1));

> > 

> 

> Sure, right now paddr will always be aligned but adding that

> BUG_ON doesn't hurt :)


Probably should have suggested WARN_ON instead of BUG_ON but yes.
Robin Murphy July 13, 2021, 11:23 p.m. UTC | #4
^^ Nit: the subsystem style for the subject format should be 
"iommu/dart: Add..." - similarly on patch #1, which I just realised I 
missed (sorry!)

On 2021-06-27 15:34, Sven Peter wrote:
> Apple's new SoCs use iommus for almost all peripherals. These Device

> Address Resolution Tables must be setup before these peripherals can

> act as DMA masters.

> 

> Signed-off-by: Sven Peter <sven@svenpeter.dev>

> ---

>   MAINTAINERS                      |    1 +

>   drivers/iommu/Kconfig            |   15 +

>   drivers/iommu/Makefile           |    1 +

>   drivers/iommu/apple-dart-iommu.c | 1058 ++++++++++++++++++++++++++++++


I'd be inclined to drop "-iommu" from the filename, unless there's some 
other "apple-dart" functionality that might lead to a module name clash 
in future?

>   4 files changed, 1075 insertions(+)

>   create mode 100644 drivers/iommu/apple-dart-iommu.c

> 

> diff --git a/MAINTAINERS b/MAINTAINERS

> index 29e5541c8f21..c1ffaa56b5f9 100644

> --- a/MAINTAINERS

> +++ b/MAINTAINERS

> @@ -1245,6 +1245,7 @@ M:	Sven Peter <sven@svenpeter.dev>

>   L:	iommu@lists.linux-foundation.org

>   S:	Maintained

>   F:	Documentation/devicetree/bindings/iommu/apple,dart.yaml

> +F:	drivers/iommu/apple-dart-iommu.c

>   

>   APPLE SMC DRIVER

>   M:	Henrik Rydberg <rydberg@bitmath.org>

> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig

> index 1f111b399bca..87882c628b46 100644

> --- a/drivers/iommu/Kconfig

> +++ b/drivers/iommu/Kconfig

> @@ -249,6 +249,21 @@ config SPAPR_TCE_IOMMU

>   	  Enables bits of IOMMU API required by VFIO. The iommu_ops

>   	  is not implemented as it is not necessary for VFIO.

>   

> +config IOMMU_APPLE_DART

> +	tristate "Apple DART IOMMU Support"

> +	depends on ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)

> +	select IOMMU_API

> +	select IOMMU_IO_PGTABLE


This is redundant - the individual formats already select it.

> +	select IOMMU_IO_PGTABLE_LPAE

> +	default ARCH_APPLE

> +	help

> +	  Support for Apple DART (Device Address Resolution Table) IOMMUs

> +	  found in Apple ARM SoCs like the M1.

> +	  This IOMMU is required for most peripherals using DMA to access

> +	  the main memory.

> +

> +	  Say Y here if you are using an Apple SoC with a DART IOMMU.

> +

>   # ARM IOMMU support

>   config ARM_SMMU

>   	tristate "ARM Ltd. System MMU (SMMU) Support"

> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile

> index c0fb0ba88143..8c813f0ebc54 100644

> --- a/drivers/iommu/Makefile

> +++ b/drivers/iommu/Makefile

> @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o

>   obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o

>   obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o

>   obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o

> +obj-$(CONFIG_IOMMU_APPLE_DART) += apple-dart-iommu.o

> diff --git a/drivers/iommu/apple-dart-iommu.c b/drivers/iommu/apple-dart-iommu.c

> new file mode 100644

> index 000000000000..637ba6e7cef9

> --- /dev/null

> +++ b/drivers/iommu/apple-dart-iommu.c

> @@ -0,0 +1,1058 @@

> +// SPDX-License-Identifier: GPL-2.0-only

> +/*

> + * Apple DART (Device Address Resolution Table) IOMMU driver

> + *

> + * Copyright (C) 2021 The Asahi Linux Contributors

> + *

> + * Based on arm/arm-smmu/arm-ssmu.c and arm/arm-smmu-v3/arm-smmu-v3.c

> + *  Copyright (C) 2013 ARM Limited

> + *  Copyright (C) 2015 ARM Limited

> + * and on exynos-iommu.c

> + *  Copyright (c) 2011,2016 Samsung Electronics Co., Ltd.

> + */

> +

> +#include <linux/bitfield.h>

> +#include <linux/clk.h>

> +#include <linux/dma-direct.h>


Oh, right, bus_dma_region. Fair enough :)

> +#include <linux/dma-iommu.h>

> +#include <linux/dma-mapping.h>

> +#include <linux/err.h>

> +#include <linux/interrupt.h>

> +#include <linux/io-pgtable.h>

> +#include <linux/iopoll.h>

> +#include <linux/list.h>

> +#include <linux/lockdep.h>

> +#include <linux/module.h>

> +#include <linux/of.h>

> +#include <linux/of_address.h>

> +#include <linux/of_iommu.h>

> +#include <linux/of_platform.h>

> +#include <linux/pci.h>

> +#include <linux/platform_device.h>

> +#include <linux/ratelimit.h>

> +#include <linux/slab.h>

> +#include <linux/pci.h>


Redundant duplicate

> +#define DART_MAX_STREAMS 16

> +#define DART_MAX_TTBR 4

> +

> +#define DART_STREAM_ALL 0xffff

> +

> +#define DART_PARAMS1 0x00

> +#define DART_PARAMS_PAGE_SHIFT GENMASK(27, 24)

> +

> +#define DART_PARAMS2 0x04

> +#define DART_PARAMS_BYPASS_SUPPORT BIT(0)

> +

> +#define DART_STREAM_COMMAND 0x20

> +#define DART_STREAM_COMMAND_BUSY BIT(2)

> +#define DART_STREAM_COMMAND_INVALIDATE BIT(20)

> +

> +#define DART_STREAM_SELECT 0x34

> +

> +#define DART_ERROR 0x40

> +#define DART_ERROR_STREAM GENMASK(27, 24)

> +#define DART_ERROR_CODE GENMASK(23, 0)

> +#define DART_ERROR_FLAG BIT(31)

> +#define DART_ERROR_READ_FAULT BIT(4)

> +#define DART_ERROR_WRITE_FAULT BIT(3)

> +#define DART_ERROR_NO_PTE BIT(2)

> +#define DART_ERROR_NO_PMD BIT(1)

> +#define DART_ERROR_NO_TTBR BIT(0)

> +

> +#define DART_CONFIG 0x60

> +#define DART_CONFIG_LOCK BIT(15)

> +

> +#define DART_STREAM_COMMAND_BUSY_TIMEOUT 100

> +

> +#define DART_STREAM_REMAP 0x80

> +

> +#define DART_ERROR_ADDR_HI 0x54

> +#define DART_ERROR_ADDR_LO 0x50

> +

> +#define DART_TCR(sid) (0x100 + 4 * (sid))

> +#define DART_TCR_TRANSLATE_ENABLE BIT(7)

> +#define DART_TCR_BYPASS0_ENABLE BIT(8)

> +#define DART_TCR_BYPASS1_ENABLE BIT(12)

> +

> +#define DART_TTBR(sid, idx) (0x200 + 16 * (sid) + 4 * (idx))

> +#define DART_TTBR_VALID BIT(31)

> +#define DART_TTBR_SHIFT 12

> +

> +/*

> + * Private structure associated with each DART device.

> + *

> + * @dev: device struct

> + * @regs: mapped MMIO region

> + * @irq: interrupt number, can be shared with other DARTs

> + * @clks: clocks associated with this DART

> + * @num_clks: number of @clks

> + * @lock: lock for @used_sids and hardware operations involving this dart

> + * @used_sids: bitmap of streams attached to a domain

> + * @pgsize: pagesize supported by this DART

> + * @supports_bypass: indicates if this DART supports bypass mode

> + * @force_bypass: force bypass mode due to pagesize mismatch?

> + * @sw_bypass_cpu_start: offset into cpu address space in software bypass mode

> + * @sw_bypass_dma_start: offset into dma address space in software bypass mode

> + * @sw_bypass_len: length of iova space in software bypass mode

> + * @iommu: iommu core device

> + */

> +struct apple_dart {

> +	struct device *dev;

> +

> +	void __iomem *regs;

> +

> +	int irq;

> +	struct clk_bulk_data *clks;

> +	int num_clks;

> +

> +	spinlock_t lock;

> +

> +	u32 used_sids;

> +	u32 pgsize;

> +

> +	u32 supports_bypass : 1;

> +	u32 force_bypass : 1;

> +

> +	u64 sw_bypass_cpu_start;

> +	u64 sw_bypass_dma_start;

> +	u64 sw_bypass_len;

> +

> +	struct iommu_device iommu;

> +};

> +

> +/*

> + * This structure is used to identify a single stream attached to a domain.

> + * It's used as a list inside that domain to be able to attach multiple

> + * streams to a single domain. Since multiple devices can use a single stream

> + * it additionally keeps track of how many devices are represented by this

> + * stream. Once that number reaches zero it is detached from the IOMMU domain

> + * and all translations from this stream are disabled.


That sounds a lot like something you should be doing properly with groups.

> + * @dart: DART instance to which this stream belongs

> + * @sid: stream id within the DART instance

> + * @num_devices: count of devices attached to this stream

> + * @stream_head: list head for the next stream

> + */

> +struct apple_dart_stream {

> +	struct apple_dart *dart;

> +	u32 sid;


What are the actual SID values like? If they're large and sparse then 
maybe a list makes sense, but if they're small and dense then an array 
hanging off the apple_dart structure itself might be more efficient. 
Given DART_MAX_STREAMS, I'm thinking the latter, and considerably so.

The impression I'm getting so far is that this seems conceptually a bit 
like arm-smmu with stream indexing.

> +	u32 num_devices;

> +

> +	struct list_head stream_head;

> +};

> +

> +/*

> + * This structure is attached to each iommu domain handled by a DART.

> + * A single domain is used to represent a single virtual address space.

> + * It is always allocated together with a page table.

> + *

> + * Streams are the smallest units the DART hardware can differentiate.

> + * These are pointed to the page table of a domain whenever a device is

> + * attached to it. A single stream can only be assigned to a single domain.

> + *

> + * Devices are assigned to at least a single and sometimes multiple individual

> + * streams (using the iommus property in the device tree). Multiple devices

> + * can theoretically be represented by the same stream, though this is usually

> + * not the case.

> + *

> + * We only keep track of streams here and just count how many devices are

> + * represented by each stream. When the last device is removed the whole stream

> + * is removed from the domain.

> + *

> + * @dart: pointer to the DART instance

> + * @pgtbl_ops: pagetable ops allocated by io-pgtable

> + * @type: domain type IOMMU_DOMAIN_IDENTITY_{IDENTITY,DMA,UNMANAGED,BLOCKED}

> + * @sw_bypass_cpu_start: offset into cpu address space in software bypass mode

> + * @sw_bypass_dma_start: offset into dma address space in software bypass mode

> + * @sw_bypass_len: length of iova space in software bypass mode

> + * @streams: list of streams attached to this domain

> + * @lock: spinlock for operations involving the list of streams

> + * @domain: core iommu domain pointer

> + */

> +struct apple_dart_domain {

> +	struct apple_dart *dart;

> +	struct io_pgtable_ops *pgtbl_ops;

> +

> +	unsigned int type;


Given that this is assigned from domain->type it appears to be redundant.

> +	u64 sw_bypass_cpu_start;

> +	u64 sw_bypass_dma_start;

> +	u64 sw_bypass_len;

> +

> +	struct list_head streams;


I'm staring to think this could just be a bitmap, in a u16 even.

> +

> +	spinlock_t lock;

> +

> +	struct iommu_domain domain;

> +};

> +

> +/*

> + * This structure is attached to devices with dev_iommu_priv_set() on of_xlate

> + * and contains a list of streams bound to this device as defined in the

> + * device tree. Multiple DART instances can be attached to a single device

> + * and each stream is identified by its stream id.

> + * It's usually reference by a pointer called *cfg.

> + *

> + * A dynamic array instead of a linked list is used here since in almost

> + * all cases a device will just be attached to a single stream and streams

> + * are never removed after they have been added.

> + *

> + * @num_streams: number of streams attached

> + * @streams: array of structs to identify attached streams and the device link

> + *           to the iommu

> + */

> +struct apple_dart_master_cfg {

> +	int num_streams;

> +	struct {

> +		struct apple_dart *dart;

> +		u32 sid;


Can't you use the fwspec for this?

> +		struct device_link *link;


Is it necessary to use stateless links, or could you use 
DL_FLAG_AUTOREMOVE_SUPPLIER and not have to keep track of them manually?

> +	} streams[];

> +};

> +

> +static struct platform_driver apple_dart_driver;

> +static const struct iommu_ops apple_dart_iommu_ops;

> +static const struct iommu_flush_ops apple_dart_tlb_ops;

> +

> +static struct apple_dart_domain *to_dart_domain(struct iommu_domain *dom)

> +{

> +	return container_of(dom, struct apple_dart_domain, domain);

> +}

> +

> +static void apple_dart_hw_enable_translation(struct apple_dart *dart, u16 sid)

> +{

> +	writel(DART_TCR_TRANSLATE_ENABLE, dart->regs + DART_TCR(sid));

> +}

> +

> +static void apple_dart_hw_disable_dma(struct apple_dart *dart, u16 sid)

> +{

> +	writel(0, dart->regs + DART_TCR(sid));

> +}

> +

> +static void apple_dart_hw_enable_bypass(struct apple_dart *dart, u16 sid)

> +{

> +	WARN_ON(!dart->supports_bypass);

> +	writel(DART_TCR_BYPASS0_ENABLE | DART_TCR_BYPASS1_ENABLE,

> +	       dart->regs + DART_TCR(sid));

> +}

> +

> +static void apple_dart_hw_set_ttbr(struct apple_dart *dart, u16 sid, u16 idx,

> +				   phys_addr_t paddr)

> +{

> +	writel(DART_TTBR_VALID | (paddr >> DART_TTBR_SHIFT),

> +	       dart->regs + DART_TTBR(sid, idx));

> +}

> +

> +static void apple_dart_hw_clear_ttbr(struct apple_dart *dart, u16 sid, u16 idx)

> +{

> +	writel(0, dart->regs + DART_TTBR(sid, idx));

> +}

> +

> +static void apple_dart_hw_clear_all_ttbrs(struct apple_dart *dart, u16 sid)

> +{

> +	int i;

> +

> +	for (i = 0; i < 4; ++i)

> +		apple_dart_hw_clear_ttbr(dart, sid, i);

> +}

> +

> +static int apple_dart_hw_stream_command(struct apple_dart *dart, u16 sid_bitmap,

> +					u32 command)

> +{

> +	unsigned long flags;

> +	int ret;

> +	u32 command_reg;

> +

> +	spin_lock_irqsave(&dart->lock, flags);

> +

> +	writel(sid_bitmap, dart->regs + DART_STREAM_SELECT);

> +	writel(command, dart->regs + DART_STREAM_COMMAND);

> +

> +	ret = readl_poll_timeout_atomic(

> +		dart->regs + DART_STREAM_COMMAND, command_reg,

> +		!(command_reg & DART_STREAM_COMMAND_BUSY), 1,

> +		DART_STREAM_COMMAND_BUSY_TIMEOUT);

> +

> +	spin_unlock_irqrestore(&dart->lock, flags);

> +

> +	if (ret) {

> +		dev_err(dart->dev,

> +			"busy bit did not clear after command %x for streams %x\n",

> +			command, sid_bitmap);

> +		return ret;

> +	}

> +

> +	return 0;

> +}

> +

> +static int apple_dart_hw_invalidate_tlb_global(struct apple_dart *dart)

> +{

> +	return apple_dart_hw_stream_command(dart, DART_STREAM_ALL,

> +					    DART_STREAM_COMMAND_INVALIDATE);

> +}

> +

> +static int apple_dart_hw_invalidate_tlb_stream(struct apple_dart *dart, u16 sid)

> +{

> +	return apple_dart_hw_stream_command(dart, 1 << sid,

> +					    DART_STREAM_COMMAND_INVALIDATE);

> +}

> +

> +static int apple_dart_hw_reset(struct apple_dart *dart)

> +{

> +	int sid;

> +	u32 config;

> +

> +	config = readl(dart->regs + DART_CONFIG);

> +	if (config & DART_CONFIG_LOCK) {

> +		dev_err(dart->dev, "DART is locked down until reboot: %08x\n",

> +			config);

> +		return -EINVAL;

> +	}

> +

> +	for (sid = 0; sid < DART_MAX_STREAMS; ++sid) {

> +		apple_dart_hw_disable_dma(dart, sid);

> +		apple_dart_hw_clear_all_ttbrs(dart, sid);

> +	}

> +

> +	/* restore stream identity map */

> +	writel(0x03020100, dart->regs + DART_STREAM_REMAP);

> +	writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4);

> +	writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8);

> +	writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12);


Any hint of what the magic numbers mean?

> +	/* clear any pending errors before the interrupt is unmasked */

> +	writel(readl(dart->regs + DART_ERROR), dart->regs + DART_ERROR);

> +

> +	return apple_dart_hw_invalidate_tlb_global(dart);

> +}

> +

> +static void apple_dart_domain_flush_tlb(struct apple_dart_domain *domain)

> +{

> +	unsigned long flags;

> +	struct apple_dart_stream *stream;

> +	struct apple_dart *dart = domain->dart;

> +

> +	if (!dart)

> +		return;


Can that happen? Feels like it's probably a bug elsewhere if it could :/

> +	spin_lock_irqsave(&domain->lock, flags);

> +	list_for_each_entry(stream, &domain->streams, stream_head) {

> +		apple_dart_hw_invalidate_tlb_stream(stream->dart, stream->sid);

> +	}

> +	spin_unlock_irqrestore(&domain->lock, flags);

> +}

> +

> +static void apple_dart_flush_iotlb_all(struct iommu_domain *domain)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +

> +	apple_dart_domain_flush_tlb(dart_domain);

> +}

> +

> +static void apple_dart_iotlb_sync(struct iommu_domain *domain,

> +				  struct iommu_iotlb_gather *gather)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +

> +	apple_dart_domain_flush_tlb(dart_domain);

> +}

> +

> +static void apple_dart_iotlb_sync_map(struct iommu_domain *domain,

> +				      unsigned long iova, size_t size)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +

> +	apple_dart_domain_flush_tlb(dart_domain);

> +}

> +

> +static void apple_dart_tlb_flush_all(void *cookie)

> +{

> +	struct apple_dart_domain *domain = cookie;

> +

> +	apple_dart_domain_flush_tlb(domain);

> +}

> +

> +static void apple_dart_tlb_flush_walk(unsigned long iova, size_t size,

> +				      size_t granule, void *cookie)

> +{

> +	struct apple_dart_domain *domain = cookie;

> +

> +	apple_dart_domain_flush_tlb(domain);

> +}

> +

> +static const struct iommu_flush_ops apple_dart_tlb_ops = {

> +	.tlb_flush_all = apple_dart_tlb_flush_all,

> +	.tlb_flush_walk = apple_dart_tlb_flush_walk,

> +	.tlb_add_page = NULL,

> +};

> +

> +static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,

> +					   dma_addr_t iova)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;

> +

> +	if (domain->type == IOMMU_DOMAIN_IDENTITY &&

> +	    dart_domain->dart->supports_bypass)


That second check seems redundant - if you don't support bypass surely 
you shouldn't have allowed attaching an identity domain in the first 
place? And even if you fake one with a pagetable you shouldn't need to 
walk it, for obvious reasons ;)

TBH, dealing with identity domains in iova_to_phys at all irks me - it's 
largely due to dubious hacks in networking drivers which hopefully you 
should never have to deal with on M1 anyway, and either way it's not 
like they can't check the domain type themselves and save a pointless 
indirect call altogether :(

> +		return iova;

> +	if (!ops)

> +		return -ENODEV;

> +

> +	return ops->iova_to_phys(ops, iova);

> +}

> +

> +static int apple_dart_map(struct iommu_domain *domain, unsigned long iova,

> +			  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;

> +

> +	if (!ops)

> +		return -ENODEV;

> +	if (prot & IOMMU_MMIO)

> +		return -EINVAL;

> +	if (prot & IOMMU_NOEXEC)

> +		return -EINVAL;


Hmm, I guess the usual expectation is just to ignore any prot flags you 
can't enforce - after all, some IOMMUs don't even have a notion of read 
or write permissions.

> +	return ops->map(ops, iova, paddr, size, prot, gfp);

> +}

> +

> +static size_t apple_dart_unmap(struct iommu_domain *domain, unsigned long iova,

> +			       size_t size, struct iommu_iotlb_gather *gather)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;

> +

> +	if (!ops)

> +		return 0;


That should never legitimately happen, since no previous mapping could 
have succeeded either.

> +	return ops->unmap(ops, iova, size, gather);

> +}

> +

> +static int apple_dart_prepare_sw_bypass(struct apple_dart *dart,

> +					struct apple_dart_domain *dart_domain,

> +					struct device *dev)

> +{

> +	lockdep_assert_held(&dart_domain->lock);

> +

> +	if (dart->supports_bypass)

> +		return 0;

> +	if (dart_domain->type != IOMMU_DOMAIN_IDENTITY)

> +		return 0;

> +

> +	// use the bus region from the first attached dev for the bypass range

> +	if (!dart->sw_bypass_len) {

> +		const struct bus_dma_region *dma_rgn = dev->dma_range_map;

> +

> +		if (!dma_rgn)

> +			return -EINVAL;

> +

> +		dart->sw_bypass_len = dma_rgn->size;

> +		dart->sw_bypass_cpu_start = dma_rgn->cpu_start;

> +		dart->sw_bypass_dma_start = dma_rgn->dma_start;

> +	}

> +

> +	// ensure that we don't mix different bypass setups

> +	if (dart_domain->sw_bypass_len) {

> +		if (dart->sw_bypass_len != dart_domain->sw_bypass_len)

> +			return -EINVAL;

> +		if (dart->sw_bypass_cpu_start !=

> +		    dart_domain->sw_bypass_cpu_start)

> +			return -EINVAL;

> +		if (dart->sw_bypass_dma_start !=

> +		    dart_domain->sw_bypass_dma_start)

> +			return -EINVAL;

> +	} else {

> +		dart_domain->sw_bypass_len = dart->sw_bypass_len;

> +		dart_domain->sw_bypass_cpu_start = dart->sw_bypass_cpu_start;

> +		dart_domain->sw_bypass_dma_start = dart->sw_bypass_dma_start;

> +	}

> +

> +	return 0;

> +}

> +

> +static int apple_dart_domain_needs_pgtbl_ops(struct apple_dart *dart,

> +					     struct iommu_domain *domain)

> +{

> +	if (domain->type == IOMMU_DOMAIN_DMA)

> +		return 1;

> +	if (domain->type == IOMMU_DOMAIN_UNMANAGED)

> +		return 1;

> +	if (!dart->supports_bypass && domain->type == IOMMU_DOMAIN_IDENTITY)

> +		return 1;

> +	return 0;

> +}

> +

> +static int apple_dart_finalize_domain(struct iommu_domain *domain)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +	struct apple_dart *dart = dart_domain->dart;

> +	struct io_pgtable_cfg pgtbl_cfg;

> +

> +	lockdep_assert_held(&dart_domain->lock);

> +

> +	if (dart_domain->pgtbl_ops)

> +		return 0;

> +	if (!apple_dart_domain_needs_pgtbl_ops(dart, domain))

> +		return 0;

> +

> +	pgtbl_cfg = (struct io_pgtable_cfg){

> +		.pgsize_bitmap = dart->pgsize,

> +		.ias = 32,

> +		.oas = 36,

> +		.coherent_walk = 1,

> +		.tlb = &apple_dart_tlb_ops,

> +		.iommu_dev = dart->dev,

> +	};

> +

> +	dart_domain->pgtbl_ops =

> +		alloc_io_pgtable_ops(ARM_APPLE_DART, &pgtbl_cfg, domain);

> +	if (!dart_domain->pgtbl_ops)

> +		return -ENOMEM;

> +

> +	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;

> +	domain->geometry.aperture_start = 0;

> +	domain->geometry.aperture_end = DMA_BIT_MASK(32);

> +	domain->geometry.force_aperture = true;

> +

> +	/*

> +	 * Some DARTs come without hardware bypass support but we may still

> +	 * be forced to use bypass mode (to e.g. allow kernels with 4K pages to

> +	 * boot). If we reach this point with an identity domain we have to setup

> +	 * bypass mode in software. This is done by creating a static pagetable

> +	 * for a linear map specified by dma-ranges in the device tree.

> +	 */

> +	if (domain->type == IOMMU_DOMAIN_IDENTITY) {

> +		u64 offset;

> +		int ret;

> +

> +		for (offset = 0; offset < dart_domain->sw_bypass_len;

> +		     offset += dart->pgsize) {

> +			ret = dart_domain->pgtbl_ops->map(

> +				dart_domain->pgtbl_ops,

> +				dart_domain->sw_bypass_dma_start + offset,

> +				dart_domain->sw_bypass_cpu_start + offset,

> +				dart->pgsize, IOMMU_READ | IOMMU_WRITE,

> +				GFP_ATOMIC);

> +			if (ret < 0) {

> +				free_io_pgtable_ops(dart_domain->pgtbl_ops);

> +				dart_domain->pgtbl_ops = NULL;

> +				return -EINVAL;

> +			}

> +		}


Could you set up a single per-DART pagetable in prepare_sw_bypass (or 
even better at probe time if you think you're likely to need it) and 
just share that between all fake identity domains? That could be a 
follow-up optimisation, though.

> +	}

> +

> +	return 0;

> +}

> +

> +static void

> +apple_dart_stream_setup_translation(struct apple_dart_domain *domain,

> +				    struct apple_dart *dart, u32 sid)

> +{

> +	int i;

> +	struct io_pgtable_cfg *pgtbl_cfg =

> +		&io_pgtable_ops_to_pgtable(domain->pgtbl_ops)->cfg;

> +

> +	for (i = 0; i < pgtbl_cfg->apple_dart_cfg.n_ttbrs; ++i)

> +		apple_dart_hw_set_ttbr(dart, sid, i,

> +				       pgtbl_cfg->apple_dart_cfg.ttbr[i]);

> +	for (; i < DART_MAX_TTBR; ++i)

> +		apple_dart_hw_clear_ttbr(dart, sid, i);

> +

> +	apple_dart_hw_enable_translation(dart, sid);

> +	apple_dart_hw_invalidate_tlb_stream(dart, sid);

> +}

> +

> +static int apple_dart_attach_stream(struct apple_dart_domain *domain,

> +				    struct apple_dart *dart, u32 sid)

> +{

> +	unsigned long flags;

> +	struct apple_dart_stream *stream;

> +	int ret;

> +

> +	lockdep_assert_held(&domain->lock);

> +

> +	if (WARN_ON(dart->force_bypass &&

> +		    domain->type != IOMMU_DOMAIN_IDENTITY))

> +		return -EINVAL;


Ideally you shouldn't allow that to happen, but I guess if you have 
mixed capabilities afross different instances then in principle an 
unmanaged domain could still slip through. But then again a user of an 
unmanaged domain might be OK with using larger pages anyway. Either way 
I'm not sure it's worthy of a WARN (similarly below) since it doesn't 
represent a "this should never happen" condition if the user has got 
their hands on a VFIO driver and is mucking about, it's just a normal 
failure because you can't support the attachment.

> +	/*

> +	 * we can't mix and match DARTs that support bypass mode with those who don't

> +	 * because the iova space in fake bypass mode generally has an offset

> +	 */


Erm, something doesn't sound right there... IOMMU_DOMAIN_IDENTITY should 
be exactly what it says, regardless of how it's implemented. If you 
can't provide a true identity mapping then you're probably better off 
not pretending to support them in the first place.

> +	if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY &&

> +		    (domain->dart->supports_bypass != dart->supports_bypass)))

> +		return -EINVAL;

> +

> +	list_for_each_entry(stream, &domain->streams, stream_head) {

> +		if (stream->dart == dart && stream->sid == sid) {

> +			stream->num_devices++;

> +			return 0;

> +		}

> +	}

> +

> +	spin_lock_irqsave(&dart->lock, flags);

> +

> +	if (WARN_ON(dart->used_sids & BIT(sid))) {

> +		ret = -EINVAL;

> +		goto error;

> +	}

> +

> +	stream = kzalloc(sizeof(*stream), GFP_ATOMIC);

> +	if (!stream) {

> +		ret = -ENOMEM;

> +		goto error;

> +	}


Couldn't you do this outside the lock? (If, calling back to other 
comments, it can't get refactored out of existence anyway)

> +	stream->dart = dart;

> +	stream->sid = sid;

> +	stream->num_devices = 1;

> +	list_add(&stream->stream_head, &domain->streams);

> +

> +	dart->used_sids |= BIT(sid);

> +	spin_unlock_irqrestore(&dart->lock, flags);

> +

> +	apple_dart_hw_clear_all_ttbrs(stream->dart, stream->sid);

> +

> +	switch (domain->type) {

> +	case IOMMU_DOMAIN_IDENTITY:

> +		if (stream->dart->supports_bypass)

> +			apple_dart_hw_enable_bypass(stream->dart, stream->sid);

> +		else

> +			apple_dart_stream_setup_translation(

> +				domain, stream->dart, stream->sid);

> +		break;

> +	case IOMMU_DOMAIN_BLOCKED:

> +		apple_dart_hw_disable_dma(stream->dart, stream->sid);

> +		break;

> +	case IOMMU_DOMAIN_UNMANAGED:

> +	case IOMMU_DOMAIN_DMA:

> +		apple_dart_stream_setup_translation(domain, stream->dart,

> +						    stream->sid);

> +		break;

> +	}

> +

> +	return 0;

> +

> +error:

> +	spin_unlock_irqrestore(&dart->lock, flags);

> +	return ret;

> +}

> +

> +static void apple_dart_disable_stream(struct apple_dart *dart, u32 sid)

> +{

> +	unsigned long flags;

> +

> +	apple_dart_hw_disable_dma(dart, sid);

> +	apple_dart_hw_clear_all_ttbrs(dart, sid);

> +	apple_dart_hw_invalidate_tlb_stream(dart, sid);

> +

> +	spin_lock_irqsave(&dart->lock, flags);

> +	dart->used_sids &= ~BIT(sid);

> +	spin_unlock_irqrestore(&dart->lock, flags);

> +}

> +

> +static void apple_dart_detach_stream(struct apple_dart_domain *domain,

> +				     struct apple_dart *dart, u32 sid)

> +{

> +	struct apple_dart_stream *stream;

> +

> +	lockdep_assert_held(&domain->lock);

> +

> +	list_for_each_entry(stream, &domain->streams, stream_head) {

> +		if (stream->dart == dart && stream->sid == sid) {

> +			stream->num_devices--;

> +

> +			if (stream->num_devices == 0) {

> +				apple_dart_disable_stream(dart, sid);

> +				list_del(&stream->stream_head);

> +				kfree(stream);

> +			}

> +			return;

> +		}

> +	}

> +}

> +

> +static int apple_dart_attach_dev(struct iommu_domain *domain,

> +				 struct device *dev)

> +{

> +	int ret;

> +	int i, j;

> +	unsigned long flags;

> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +	struct apple_dart *dart = cfg->streams[0].dart;

> +

> +	if (WARN_ON(dart->force_bypass &&

> +		    dart_domain->type != IOMMU_DOMAIN_IDENTITY)) {

> +		dev_warn(

> +			dev,

> +			"IOMMU must be in bypass mode but trying to attach to translated domain.\n");

> +		return -EINVAL;

> +	}


Again, a bit excessive with the warnings. In fact, transpose my comment 
from apple_dart_attach_stream() to here, because this means the 
equivalent warning there is literally unreachable :/

> +	spin_lock_irqsave(&dart_domain->lock, flags);

> +

> +	ret = apple_dart_prepare_sw_bypass(dart, dart_domain, dev);

> +	if (ret)

> +		goto out;

> +

> +	if (!dart_domain->dart)

> +		dart_domain->dart = dart;

> +

> +	ret = apple_dart_finalize_domain(domain);

> +	if (ret)

> +		goto out;

> +

> +	for (i = 0; i < cfg->num_streams; ++i) {

> +		ret = apple_dart_attach_stream(

> +			dart_domain, cfg->streams[i].dart, cfg->streams[i].sid);

> +		if (ret) {

> +			/* try to undo what we did before returning */

> +			for (j = 0; j < i; ++j)

> +				apple_dart_detach_stream(dart_domain,

> +							 cfg->streams[j].dart,

> +							 cfg->streams[j].sid);

> +

> +			goto out;

> +		}

> +	}

> +

> +	ret = 0;

> +

> +out:

> +	spin_unlock_irqrestore(&dart_domain->lock, flags);

> +	return ret;

> +}

> +

> +static void apple_dart_detach_dev(struct iommu_domain *domain,

> +				  struct device *dev)

> +{

> +	int i;

> +	unsigned long flags;

> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +

> +	spin_lock_irqsave(&dart_domain->lock, flags);

> +

> +	for (i = 0; i < cfg->num_streams; ++i)

> +		apple_dart_detach_stream(dart_domain, cfg->streams[i].dart,

> +					 cfg->streams[i].sid);

> +

> +	spin_unlock_irqrestore(&dart_domain->lock, flags);

> +}

> +

> +static struct iommu_device *apple_dart_probe_device(struct device *dev)

> +{

> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> +	int i;

> +

> +	if (!cfg)

> +		return ERR_PTR(-ENODEV);

> +

> +	for (i = 0; i < cfg->num_streams; ++i) {

> +		cfg->streams[i].link =

> +			device_link_add(dev, cfg->streams[i].dart->dev,

> +					DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);

> +	}

> +

> +	return &cfg->streams[0].dart->iommu;

> +}

> +

> +static void apple_dart_release_device(struct device *dev)

> +{

> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> +	int i;

> +

> +	if (!cfg)

> +		return;


Shouldn't happen - if it's disappeared since probe_device succeeded 
you've got bigger problems anyway.

> +

> +	for (i = 0; i < cfg->num_streams; ++i)

> +		device_link_del(cfg->streams[i].link);

> +

> +	dev_iommu_priv_set(dev, NULL);

> +	kfree(cfg);

> +}

> +

> +static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)

> +{

> +	struct apple_dart_domain *dart_domain;

> +

> +	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&

> +	    type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED)

> +		return NULL;


I want to say there's not much point in that, but then I realise I've 
spent the last couple of days writing patches to add a new domain type :)

> +	dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL);

> +	if (!dart_domain)

> +		return NULL;

> +

> +	INIT_LIST_HEAD(&dart_domain->streams);

> +	spin_lock_init(&dart_domain->lock);

> +	iommu_get_dma_cookie(&dart_domain->domain);

> +	dart_domain->type = type;


Yeah, this is "useful" for a handful of CPU cycles until we return and 
iommu_domain_alloc() sets dart_domain->domain->type to the same thing, 
all before *its* caller even knows the domain exists.

> +	return &dart_domain->domain;

> +}

> +

> +static void apple_dart_domain_free(struct iommu_domain *domain)

> +{

> +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> +

> +	WARN_ON(!list_empty(&dart_domain->streams));


Why? This code is perfectly legal API usage:

	d = iommu_domain_alloc(bus)
	if (d)
		iommu_domain_free(d);

Sure it looks pointless, but it's the kind of thing that can 
legitimately happen (with a lot more going on in between) if an 
unmanaged domain user tears itself down before it gets round to 
attaching, due to probe deferral or some other error condition.

> +	kfree(dart_domain);

> +}

> +

> +static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)

> +{

> +	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);

> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> +	unsigned int num_streams = cfg ? cfg->num_streams : 0;

> +	struct apple_dart_master_cfg *cfg_new;

> +	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);

> +

> +	if (args->args_count != 1)

> +		return -EINVAL;

> +

> +	cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1),

> +			   GFP_KERNEL);

> +	if (!cfg_new)

> +		return -ENOMEM;

> +

> +	cfg = cfg_new;

> +	dev_iommu_priv_set(dev, cfg);

> +

> +	cfg->num_streams = num_streams;

> +	cfg->streams[cfg->num_streams].dart = dart;

> +	cfg->streams[cfg->num_streams].sid = args->args[0];

> +	cfg->num_streams++;


Yeah, this is way too reminiscent of the fwspec code for comfort. Even 
if you can't use autoremove links for some reason, an array of 16 
device_link pointers hung off apple_dart still wins over these little 
pointer-heavy structures if you need more than a few of them.

> +	return 0;

> +}

> +

> +static struct iommu_group *apple_dart_device_group(struct device *dev)

> +{

> +#ifdef CONFIG_PCI

> +	struct iommu_group *group;

> +

> +	if (dev_is_pci(dev))

> +		group = pci_device_group(dev);

> +	else

> +		group = generic_device_group(dev);


...and this is where it gets bad :(

If you can have multiple devices behind the same stream such that the 
IOMMU can't tell them apart, you *have* to ensure they get put in the 
same group, so that the IOMMU core knows the topology (and reflects it 
correctly to userspace) and doesn't try to do things that then 
unexpectedly fail. This is the point where you need to check if a stream 
is already known, and return the existing group if so, and then you 
won't need to check and refcount all the time in attach/detach because 
the IOMMU core will do the right thing for you.

Many drivers only run on systems where devices don't alias at the IOMMU 
level (aliasing at the PCI level is already taken care of), or use a 
single group because effectively everything aliases, so it's not the 
most common scenario, but as I mentioned before arm-smmu is one that 
does - take a look at the flow though that in the "!smmu->smrs" cases 
for the closest example.

> +

> +	return group;

> +#else

> +	return generic_device_group(dev);

> +#endif

> +}

> +

> +static int apple_dart_def_domain_type(struct device *dev)

> +{

> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> +	struct apple_dart *dart = cfg->streams[0].dart;

> +

> +	if (dart->force_bypass)

> +		return IOMMU_DOMAIN_IDENTITY;

> +	if (!dart->supports_bypass)

> +		return IOMMU_DOMAIN_DMA;

> +

> +	return 0;

> +}

> +

> +static const struct iommu_ops apple_dart_iommu_ops = {

> +	.domain_alloc = apple_dart_domain_alloc,

> +	.domain_free = apple_dart_domain_free,

> +	.attach_dev = apple_dart_attach_dev,

> +	.detach_dev = apple_dart_detach_dev,

> +	.map = apple_dart_map,

> +	.unmap = apple_dart_unmap,

> +	.flush_iotlb_all = apple_dart_flush_iotlb_all,

> +	.iotlb_sync = apple_dart_iotlb_sync,

> +	.iotlb_sync_map = apple_dart_iotlb_sync_map,

> +	.iova_to_phys = apple_dart_iova_to_phys,

> +	.probe_device = apple_dart_probe_device,

> +	.release_device = apple_dart_release_device,

> +	.device_group = apple_dart_device_group,

> +	.of_xlate = apple_dart_of_xlate,

> +	.def_domain_type = apple_dart_def_domain_type,

> +	.pgsize_bitmap = -1UL, /* Restricted during dart probe */

> +};

> +

> +static irqreturn_t apple_dart_irq(int irq, void *dev)

> +{

> +	struct apple_dart *dart = dev;

> +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,

> +				      DEFAULT_RATELIMIT_BURST);

> +	const char *fault_name = NULL;

> +	u32 error = readl(dart->regs + DART_ERROR);

> +	u32 error_code = FIELD_GET(DART_ERROR_CODE, error);

> +	u32 addr_lo = readl(dart->regs + DART_ERROR_ADDR_LO);

> +	u32 addr_hi = readl(dart->regs + DART_ERROR_ADDR_HI);

> +	u64 addr = addr_lo | (((u64)addr_hi) << 32);

> +	u8 stream_idx = FIELD_GET(DART_ERROR_STREAM, error);

> +

> +	if (!(error & DART_ERROR_FLAG))

> +		return IRQ_NONE;

> +

> +	if (error_code & DART_ERROR_READ_FAULT)

> +		fault_name = "READ FAULT";

> +	else if (error_code & DART_ERROR_WRITE_FAULT)

> +		fault_name = "WRITE FAULT";

> +	else if (error_code & DART_ERROR_NO_PTE)

> +		fault_name = "NO PTE FOR IOVA";

> +	else if (error_code & DART_ERROR_NO_PMD)

> +		fault_name = "NO PMD FOR IOVA";

> +	else if (error_code & DART_ERROR_NO_TTBR)

> +		fault_name = "NO TTBR FOR IOVA";


Can multiple bits be set at once or is there a strict precedence?

> +	if (WARN_ON(fault_name == NULL))


You're already logging a clear and attributable message below; I can 
guarantee that a big noisy backtrace showing that you got here from 
el0_irq() or el1_irq() is not useful over and above that.

> +		fault_name = "unknown";

> +

> +	if (__ratelimit(&rs)) {


Just use dev_err_ratelimited() to hide the guts if you're not doing 
anything tricky.

> +		dev_err(dart->dev,

> +			"translation fault: status:0x%x stream:%d code:0x%x (%s) at 0x%llx",

> +			error, stream_idx, error_code, fault_name, addr);

> +	}

> +

> +	writel(error, dart->regs + DART_ERROR);

> +	return IRQ_HANDLED;

> +}

> +

> +static int apple_dart_probe(struct platform_device *pdev)

> +{

> +	int ret;

> +	u32 dart_params[2];

> +	struct resource *res;

> +	struct apple_dart *dart;

> +	struct device *dev = &pdev->dev;

> +

> +	dart = devm_kzalloc(dev, sizeof(*dart), GFP_KERNEL);

> +	if (!dart)

> +		return -ENOMEM;

> +

> +	dart->dev = dev;

> +	spin_lock_init(&dart->lock);

> +

> +	if (pdev->num_resources < 1)

> +		return -ENODEV;


But you have 2 resources (one MEM and one IRQ)? And anyway their 
respective absences would hardly go unnoticed below.

> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);

> +	if (resource_size(res) < 0x4000) {

> +		dev_err(dev, "MMIO region too small (%pr)\n", res);

> +		return -EINVAL;

> +	}

> +

> +	dart->regs = devm_ioremap_resource(dev, res);

> +	if (IS_ERR(dart->regs))

> +		return PTR_ERR(dart->regs);

> +

> +	ret = devm_clk_bulk_get_all(dev, &dart->clks);

> +	if (ret < 0)

> +		return ret;

> +	dart->num_clks = ret;

> +

> +	ret = clk_bulk_prepare_enable(dart->num_clks, dart->clks);

> +	if (ret)

> +		return ret;

> +

> +	ret = apple_dart_hw_reset(dart);

> +	if (ret)

> +		goto err_clk_disable;

> +

> +	dart_params[0] = readl(dart->regs + DART_PARAMS1);

> +	dart_params[1] = readl(dart->regs + DART_PARAMS2);

> +	dart->pgsize = 1 << FIELD_GET(DART_PARAMS_PAGE_SHIFT, dart_params[0]);

> +	dart->supports_bypass = dart_params[1] & DART_PARAMS_BYPASS_SUPPORT;

> +	dart->force_bypass = dart->pgsize > PAGE_SIZE;

> +

> +	dart->irq = platform_get_irq(pdev, 0);

> +	if (dart->irq < 0) {

> +		ret = -ENODEV;

> +		goto err_clk_disable;

> +	}


FWIW I'd get the IRQ earlier when there's still nothing to clean up on 
failure - it's only the request which needs to wait until you've 
actually set up enough to be able to handle it if it does fire.

> +	ret = devm_request_irq(dart->dev, dart->irq, apple_dart_irq,

> +			       IRQF_SHARED, "apple-dart fault handler", dart);


Be verfy careful with this pattern of mixing devrers-managed IRQs with 
explicitly-managed clocks, especially when IRQF_SHARED is in play. In 
the failure path here, and in remove, you have a period where the clocks 
have been disabled but the IRQ is still live - try CONFIG_DEBUG_SHIRQ 
and don't be surprised if you deadlock trying to read an unclocked register.

If you can't also offload the clock management to devres to guarantee 
ordering relative to the IRQ (I think I saw some patches recently), it's 
probably safest to manually manage the latter.

> +	if (ret)

> +		goto err_clk_disable;

> +

> +	platform_set_drvdata(pdev, dart);

> +

> +	ret = iommu_device_sysfs_add(&dart->iommu, dev, NULL, "apple-dart.%s",

> +				     dev_name(&pdev->dev));

> +	if (ret)

> +		goto err_clk_disable;

> +

> +	ret = iommu_device_register(&dart->iommu, &apple_dart_iommu_ops, dev);

> +	if (ret)

> +		goto err_clk_disable;

> +

> +	if (dev->bus->iommu_ops != &apple_dart_iommu_ops) {

> +		ret = bus_set_iommu(dev->bus, &apple_dart_iommu_ops);

> +		if (ret)

> +			goto err_clk_disable;

> +	}

> +#ifdef CONFIG_PCI

> +	if (dev->bus->iommu_ops != pci_bus_type.iommu_ops) {


But it's still a platform device, not a PCI device?

> +		ret = bus_set_iommu(&pci_bus_type, &apple_dart_iommu_ops);

> +		if (ret)

> +			goto err_clk_disable;


And the platform bus ops?

> +	}

> +#endif

> +

> +	dev_info(

> +		&pdev->dev,

> +		"DART [pagesize %x, bypass support: %d, bypass forced: %d] initialized\n",

> +		dart->pgsize, dart->supports_bypass, dart->force_bypass);

> +	return 0;

> +

> +err_clk_disable:

> +	clk_bulk_disable(dart->num_clks, dart->clks);

> +	clk_bulk_unprepare(dart->num_clks, dart->clks);


No need to open-code clk_bulk_disable_unprepare() ;)

> +	return ret;

> +}

> +

> +static int apple_dart_remove(struct platform_device *pdev)

> +{

> +	struct apple_dart *dart = platform_get_drvdata(pdev);

> +

> +	devm_free_irq(dart->dev, dart->irq, dart);

> +

> +	iommu_device_unregister(&dart->iommu);

> +	iommu_device_sysfs_remove(&dart->iommu);

> +

> +	clk_bulk_disable(dart->num_clks, dart->clks);

> +	clk_bulk_unprepare(dart->num_clks, dart->clks);


Ditto.

And again the bus ops are still installed - that'll get really fun if 
this is a module unload...

> +	return 0;

> +}

> +

> +static void apple_dart_shutdown(struct platform_device *pdev)

> +{

> +	apple_dart_remove(pdev);


The main reason for doing somthing on shutdown is in the case of kexec, 
to put the hardware back into a disable or otherwise sane state so as 
not to trip up whatever the subsequent payload is. If you're not doing 
that (which may be legitimate if the expectation is that software must 
always fully reset and initialise a DART before I/O can work) then 
there's not much point in doing anything, really. Stuff like tidying up 
sysfs is a complete waste of time when the world's about to end ;)

Robin.

> +}

> +

> +static const struct of_device_id apple_dart_of_match[] = {

> +	{ .compatible = "apple,t8103-dart", .data = NULL },

> +	{},

> +};

> +MODULE_DEVICE_TABLE(of, apple_dart_of_match);

> +

> +static struct platform_driver apple_dart_driver = {

> +	.driver	= {

> +		.name			= "apple-dart",

> +		.of_match_table		= apple_dart_of_match,

> +	},

> +	.probe	= apple_dart_probe,

> +	.remove	= apple_dart_remove,

> +	.shutdown = apple_dart_shutdown,

> +};

> +module_platform_driver(apple_dart_driver);

> +

> +MODULE_DESCRIPTION("IOMMU API for Apple's DART");

> +MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>");

> +MODULE_LICENSE("GPL v2");

>
Sven Peter July 15, 2021, 4:41 p.m. UTC | #5
Hi,

Awesome, thanks a lot for the detailed review!


On Wed, Jul 14, 2021, at 01:23, Robin Murphy wrote:
> ^^ Nit: the subsystem style for the subject format should be 

> "iommu/dart: Add..." - similarly on patch #1, which I just realised I 

> missed (sorry!)


Sure!

> 

> On 2021-06-27 15:34, Sven Peter wrote:

> > Apple's new SoCs use iommus for almost all peripherals. These Device

> > Address Resolution Tables must be setup before these peripherals can

> > act as DMA masters.

> > 

> > Signed-off-by: Sven Peter <sven@svenpeter.dev>

> > ---

> >   MAINTAINERS                      |    1 +

> >   drivers/iommu/Kconfig            |   15 +

> >   drivers/iommu/Makefile           |    1 +

> >   drivers/iommu/apple-dart-iommu.c | 1058 ++++++++++++++++++++++++++++++

> 

> I'd be inclined to drop "-iommu" from the filename, unless there's some 

> other "apple-dart" functionality that might lead to a module name clash 

> in future?


Sure, DART should only be an iommu.

> 

> >   4 files changed, 1075 insertions(+)

> >   create mode 100644 drivers/iommu/apple-dart-iommu.c

> > 

> > diff --git a/MAINTAINERS b/MAINTAINERS

> > index 29e5541c8f21..c1ffaa56b5f9 100644

> > --- a/MAINTAINERS

> > +++ b/MAINTAINERS

> > @@ -1245,6 +1245,7 @@ M:	Sven Peter <sven@svenpeter.dev>

> >   L:	iommu@lists.linux-foundation.org

> >   S:	Maintained

> >   F:	Documentation/devicetree/bindings/iommu/apple,dart.yaml

> > +F:	drivers/iommu/apple-dart-iommu.c

> >   

> >   APPLE SMC DRIVER

> >   M:	Henrik Rydberg <rydberg@bitmath.org>

> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig

> > index 1f111b399bca..87882c628b46 100644

> > --- a/drivers/iommu/Kconfig

> > +++ b/drivers/iommu/Kconfig

> > @@ -249,6 +249,21 @@ config SPAPR_TCE_IOMMU

> >   	  Enables bits of IOMMU API required by VFIO. The iommu_ops

> >   	  is not implemented as it is not necessary for VFIO.

> >   

> > +config IOMMU_APPLE_DART

> > +	tristate "Apple DART IOMMU Support"

> > +	depends on ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)

> > +	select IOMMU_API

> > +	select IOMMU_IO_PGTABLE

> 

> This is redundant - the individual formats already select it.


Removed for the next version.

> 

[...]
> > +#include <linux/pci.h>

> 

> Redundant duplicate


Whoops, removed for the next version as well.

> 

> > +#define DART_MAX_STREAMS 16

[...]
> > +

> > +/*

> > + * This structure is used to identify a single stream attached to a domain.

> > + * It's used as a list inside that domain to be able to attach multiple

> > + * streams to a single domain. Since multiple devices can use a single stream

> > + * it additionally keeps track of how many devices are represented by this

> > + * stream. Once that number reaches zero it is detached from the IOMMU domain

> > + * and all translations from this stream are disabled.

> 

> That sounds a lot like something you should be doing properly with groups.


The hint to look at arm-smmu for a similar flow was very helpful, thanks!
Now that I understand how these groups works I completely agree that this
needs to be reworked and done properly.


> 

> > + * @dart: DART instance to which this stream belongs

> > + * @sid: stream id within the DART instance

> > + * @num_devices: count of devices attached to this stream

> > + * @stream_head: list head for the next stream

> > + */

> > +struct apple_dart_stream {

> > +	struct apple_dart *dart;

> > +	u32 sid;

> 

> What are the actual SID values like? If they're large and sparse then 

> maybe a list makes sense, but if they're small and dense then an array 

> hanging off the apple_dart structure itself might be more efficient. 

> Given DART_MAX_STREAMS, I'm thinking the latter, and considerably so.

> 

> The impression I'm getting so far is that this seems conceptually a bit 

> like arm-smmu with stream indexing.


There are two (very similar) types of DARTs.
The one supported with this series has up to 16 stream ids which will be
integers <16. There's another variant used for Thunderbolt for which I will
add support in a follow-up that supports up to 64 stream ids then. 
So at worst this is an array with 64 entries if this structure won't
disappear completely.

And yes, this is conceptually a bit like arm-smmu's stream indexing I think.


> 

> > +	u32 num_devices;

> > +

> > +	struct list_head stream_head;

> > +};

> > +

> > +/*

> > + * This structure is attached to each iommu domain handled by a DART.

> > + * A single domain is used to represent a single virtual address space.

> > + * It is always allocated together with a page table.

> > + *

> > + * Streams are the smallest units the DART hardware can differentiate.

> > + * These are pointed to the page table of a domain whenever a device is

> > + * attached to it. A single stream can only be assigned to a single domain.

> > + *

> > + * Devices are assigned to at least a single and sometimes multiple individual

> > + * streams (using the iommus property in the device tree). Multiple devices

> > + * can theoretically be represented by the same stream, though this is usually

> > + * not the case.

> > + *

> > + * We only keep track of streams here and just count how many devices are

> > + * represented by each stream. When the last device is removed the whole stream

> > + * is removed from the domain.

> > + *

> > + * @dart: pointer to the DART instance

> > + * @pgtbl_ops: pagetable ops allocated by io-pgtable

> > + * @type: domain type IOMMU_DOMAIN_IDENTITY_{IDENTITY,DMA,UNMANAGED,BLOCKED}

> > + * @sw_bypass_cpu_start: offset into cpu address space in software bypass mode

> > + * @sw_bypass_dma_start: offset into dma address space in software bypass mode

> > + * @sw_bypass_len: length of iova space in software bypass mode

> > + * @streams: list of streams attached to this domain

> > + * @lock: spinlock for operations involving the list of streams

> > + * @domain: core iommu domain pointer

> > + */

> > +struct apple_dart_domain {

> > +	struct apple_dart *dart;

> > +	struct io_pgtable_ops *pgtbl_ops;

> > +

> > +	unsigned int type;

> 

> Given that this is assigned from domain->type it appears to be redundant.


Yup, removed.

> 

> > +	u64 sw_bypass_cpu_start;

> > +	u64 sw_bypass_dma_start;

> > +	u64 sw_bypass_len;

> > +

> > +	struct list_head streams;

> 

> I'm staring to think this could just be a bitmap, in a u16 even.


The problem is that these streams may come from two different
DART instances. That is required for e.g. the dwc3 controller which
has a weird quirk where DMA transactions go through two separate
DARTs with no clear pattern (e.g. some xhci control structures use the
first dart while other structures use the second one).
Both of them need to point to the same pagetable.
In the device tree the node will have an entry like this:

dwc3_0: usb@382280000{
   ...
   iommus = <&dwc3_0_dart_0 0>, <&dwc3_0_dart_1 1>;
};

There's no need for a linked list though once I do this properly with
groups. I can just use an array allocated when the first device is
attached, which just contains apple_dart* and streamid values.


> 

> > +

> > +	spinlock_t lock;

> > +

> > +	struct iommu_domain domain;

> > +};

> > +

> > +/*

> > + * This structure is attached to devices with dev_iommu_priv_set() on of_xlate

> > + * and contains a list of streams bound to this device as defined in the

> > + * device tree. Multiple DART instances can be attached to a single device

> > + * and each stream is identified by its stream id.

> > + * It's usually reference by a pointer called *cfg.

> > + *

> > + * A dynamic array instead of a linked list is used here since in almost

> > + * all cases a device will just be attached to a single stream and streams

> > + * are never removed after they have been added.

> > + *

> > + * @num_streams: number of streams attached

> > + * @streams: array of structs to identify attached streams and the device link

> > + *           to the iommu

> > + */

> > +struct apple_dart_master_cfg {

> > +	int num_streams;

> > +	struct {

> > +		struct apple_dart *dart;

> > +		u32 sid;

> 

> Can't you use the fwspec for this?



I'd be happy to use the fwspec code if that's somehow possible.
I'm not sure how though since I need to store both the reference to the DART
_and_ to the stream id. As far as I can tell the fwspec code would only allow
to store the stream ids.
(see also the previous comment regarding the dwc3 node which requires stream
ids from two separate DART instances)

> 

> > +		struct device_link *link;

> 

> Is it necessary to use stateless links, or could you use 

> DL_FLAG_AUTOREMOVE_SUPPLIER and not have to keep track of them manually?


I'll just use DL_FLAG_AUTOREMOVE_SUPPLIER. No idea why I went for stateless links.

>

[...]
> > +	/* restore stream identity map */

> > +	writel(0x03020100, dart->regs + DART_STREAM_REMAP);

> > +	writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4);

> > +	writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8);

> > +	writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12);

> 

> Any hint of what the magic numbers mean?


Yes, it's just 0,1,2,3...,0xe,0xf but I can't do 8bit writes to the bus
and 32 bit writes then require these slightly awkward "swapped" numbers.
I'll add a comment since it's not obvious at first glance.

> 

> > +	/* clear any pending errors before the interrupt is unmasked */

> > +	writel(readl(dart->regs + DART_ERROR), dart->regs + DART_ERROR);

> > +

> > +	return apple_dart_hw_invalidate_tlb_global(dart);

> > +}

> > +

> > +static void apple_dart_domain_flush_tlb(struct apple_dart_domain *domain)

> > +{

> > +	unsigned long flags;

> > +	struct apple_dart_stream *stream;

> > +	struct apple_dart *dart = domain->dart;

> > +

> > +	if (!dart)

> > +		return;

> 

> Can that happen? Feels like it's probably a bug elsewhere if it could :/


No, this can't happen. I'll remove it.

> 

> > +	spin_lock_irqsave(&domain->lock, flags);

> > +	list_for_each_entry(stream, &domain->streams, stream_head) {

> > +		apple_dart_hw_invalidate_tlb_stream(stream->dart, stream->sid);

> > +	}

> > +	spin_unlock_irqrestore(&domain->lock, flags);

> > +}

> > +

> > +static void apple_dart_flush_iotlb_all(struct iommu_domain *domain)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +

> > +	apple_dart_domain_flush_tlb(dart_domain);

> > +}

> > +

> > +static void apple_dart_iotlb_sync(struct iommu_domain *domain,

> > +				  struct iommu_iotlb_gather *gather)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +

> > +	apple_dart_domain_flush_tlb(dart_domain);

> > +}

> > +

> > +static void apple_dart_iotlb_sync_map(struct iommu_domain *domain,

> > +				      unsigned long iova, size_t size)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +

> > +	apple_dart_domain_flush_tlb(dart_domain);

> > +}

> > +

> > +static void apple_dart_tlb_flush_all(void *cookie)

> > +{

> > +	struct apple_dart_domain *domain = cookie;

> > +

> > +	apple_dart_domain_flush_tlb(domain);

> > +}

> > +

> > +static void apple_dart_tlb_flush_walk(unsigned long iova, size_t size,

> > +				      size_t granule, void *cookie)

> > +{

> > +	struct apple_dart_domain *domain = cookie;

> > +

> > +	apple_dart_domain_flush_tlb(domain);

> > +}

> > +

> > +static const struct iommu_flush_ops apple_dart_tlb_ops = {

> > +	.tlb_flush_all = apple_dart_tlb_flush_all,

> > +	.tlb_flush_walk = apple_dart_tlb_flush_walk,

> > +	.tlb_add_page = NULL,

> > +};

> > +

> > +static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,

> > +					   dma_addr_t iova)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;

> > +

> > +	if (domain->type == IOMMU_DOMAIN_IDENTITY &&

> > +	    dart_domain->dart->supports_bypass)

> 

> That second check seems redundant - if you don't support bypass surely 

> you shouldn't have allowed attaching an identity domain in the first 

> place? And even if you fake one with a pagetable you shouldn't need to 

> walk it, for obvious reasons ;)


True, and with the patch you sent I don't need this here either way.

> 

> TBH, dealing with identity domains in iova_to_phys at all irks me - it's 

> largely due to dubious hacks in networking drivers which hopefully you 

> should never have to deal with on M1 anyway, and either way it's not 

> like they can't check the domain type themselves and save a pointless 

> indirect call altogether :(

> 

> > +		return iova;

> > +	if (!ops)

> > +		return -ENODEV;

> > +

> > +	return ops->iova_to_phys(ops, iova);

> > +}

> > +

> > +static int apple_dart_map(struct iommu_domain *domain, unsigned long iova,

> > +			  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;

> > +

> > +	if (!ops)

> > +		return -ENODEV;

> > +	if (prot & IOMMU_MMIO)

> > +		return -EINVAL;

> > +	if (prot & IOMMU_NOEXEC)

> > +		return -EINVAL;

> 

> Hmm, I guess the usual expectation is just to ignore any prot flags you 

> can't enforce - after all, some IOMMUs don't even have a notion of read 

> or write permissions.


Sure, I'll just remove those checks.

> 

> > +	return ops->map(ops, iova, paddr, size, prot, gfp);

> > +}

> > +

> > +static size_t apple_dart_unmap(struct iommu_domain *domain, unsigned long iova,

> > +			       size_t size, struct iommu_iotlb_gather *gather)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;

> > +

> > +	if (!ops)

> > +		return 0;

> 

> That should never legitimately happen, since no previous mapping could 

> have succeeded either.


Ack, removed.

> 

> > +	return ops->unmap(ops, iova, size, gather);

> > +}

> > +

> > +static int apple_dart_prepare_sw_bypass(struct apple_dart *dart,

> > +					struct apple_dart_domain *dart_domain,

> > +					struct device *dev)

> > +{

> > +	lockdep_assert_held(&dart_domain->lock);

> > +

> > +	if (dart->supports_bypass)

> > +		return 0;

> > +	if (dart_domain->type != IOMMU_DOMAIN_IDENTITY)

> > +		return 0;

> > +

> > +	// use the bus region from the first attached dev for the bypass range

> > +	if (!dart->sw_bypass_len) {

> > +		const struct bus_dma_region *dma_rgn = dev->dma_range_map;

> > +

> > +		if (!dma_rgn)

> > +			return -EINVAL;

> > +

> > +		dart->sw_bypass_len = dma_rgn->size;

> > +		dart->sw_bypass_cpu_start = dma_rgn->cpu_start;

> > +		dart->sw_bypass_dma_start = dma_rgn->dma_start;

> > +	}

> > +

> > +	// ensure that we don't mix different bypass setups

> > +	if (dart_domain->sw_bypass_len) {

> > +		if (dart->sw_bypass_len != dart_domain->sw_bypass_len)

> > +			return -EINVAL;

> > +		if (dart->sw_bypass_cpu_start !=

> > +		    dart_domain->sw_bypass_cpu_start)

> > +			return -EINVAL;

> > +		if (dart->sw_bypass_dma_start !=

> > +		    dart_domain->sw_bypass_dma_start)

> > +			return -EINVAL;

> > +	} else {

> > +		dart_domain->sw_bypass_len = dart->sw_bypass_len;

> > +		dart_domain->sw_bypass_cpu_start = dart->sw_bypass_cpu_start;

> > +		dart_domain->sw_bypass_dma_start = dart->sw_bypass_dma_start;

> > +	}

> > +

> > +	return 0;

> > +}

> > +

> > +static int apple_dart_domain_needs_pgtbl_ops(struct apple_dart *dart,

> > +					     struct iommu_domain *domain)

> > +{

> > +	if (domain->type == IOMMU_DOMAIN_DMA)

> > +		return 1;

> > +	if (domain->type == IOMMU_DOMAIN_UNMANAGED)

> > +		return 1;

> > +	if (!dart->supports_bypass && domain->type == IOMMU_DOMAIN_IDENTITY)

> > +		return 1;

> > +	return 0;

> > +}

> > +

> > +static int apple_dart_finalize_domain(struct iommu_domain *domain)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +	struct apple_dart *dart = dart_domain->dart;

> > +	struct io_pgtable_cfg pgtbl_cfg;

> > +

> > +	lockdep_assert_held(&dart_domain->lock);

> > +

> > +	if (dart_domain->pgtbl_ops)

> > +		return 0;

> > +	if (!apple_dart_domain_needs_pgtbl_ops(dart, domain))

> > +		return 0;

> > +

> > +	pgtbl_cfg = (struct io_pgtable_cfg){

> > +		.pgsize_bitmap = dart->pgsize,

> > +		.ias = 32,

> > +		.oas = 36,

> > +		.coherent_walk = 1,

> > +		.tlb = &apple_dart_tlb_ops,

> > +		.iommu_dev = dart->dev,

> > +	};

> > +

> > +	dart_domain->pgtbl_ops =

> > +		alloc_io_pgtable_ops(ARM_APPLE_DART, &pgtbl_cfg, domain);

> > +	if (!dart_domain->pgtbl_ops)

> > +		return -ENOMEM;

> > +

> > +	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;

> > +	domain->geometry.aperture_start = 0;

> > +	domain->geometry.aperture_end = DMA_BIT_MASK(32);

> > +	domain->geometry.force_aperture = true;

> > +

> > +	/*

> > +	 * Some DARTs come without hardware bypass support but we may still

> > +	 * be forced to use bypass mode (to e.g. allow kernels with 4K pages to

> > +	 * boot). If we reach this point with an identity domain we have to setup

> > +	 * bypass mode in software. This is done by creating a static pagetable

> > +	 * for a linear map specified by dma-ranges in the device tree.

> > +	 */

> > +	if (domain->type == IOMMU_DOMAIN_IDENTITY) {

> > +		u64 offset;

> > +		int ret;

> > +

> > +		for (offset = 0; offset < dart_domain->sw_bypass_len;

> > +		     offset += dart->pgsize) {

> > +			ret = dart_domain->pgtbl_ops->map(

> > +				dart_domain->pgtbl_ops,

> > +				dart_domain->sw_bypass_dma_start + offset,

> > +				dart_domain->sw_bypass_cpu_start + offset,

> > +				dart->pgsize, IOMMU_READ | IOMMU_WRITE,

> > +				GFP_ATOMIC);

> > +			if (ret < 0) {

> > +				free_io_pgtable_ops(dart_domain->pgtbl_ops);

> > +				dart_domain->pgtbl_ops = NULL;

> > +				return -EINVAL;

> > +			}

> > +		}

> 

> Could you set up a single per-DART pagetable in prepare_sw_bypass (or 

> even better at probe time if you think you're likely to need it) and 

> just share that between all fake identity domains? That could be a 

> follow-up optimisation, though.


I'll see if that's possible. So essentially I want to setup an identity
mapping with respect to bus_dma_region from the first attached device.
Right now this is always mapping the entire 4G VA space to RAM
starting at 0x8_0000_0000. 
See also my reply to another comment further down since software
bypass mode might have to disappear anyway.

> 

> > +	}

> > +

> > +	return 0;

> > +}

> > +

> > +static void

> > +apple_dart_stream_setup_translation(struct apple_dart_domain *domain,

> > +				    struct apple_dart *dart, u32 sid)

> > +{

> > +	int i;

> > +	struct io_pgtable_cfg *pgtbl_cfg =

> > +		&io_pgtable_ops_to_pgtable(domain->pgtbl_ops)->cfg;

> > +

> > +	for (i = 0; i < pgtbl_cfg->apple_dart_cfg.n_ttbrs; ++i)

> > +		apple_dart_hw_set_ttbr(dart, sid, i,

> > +				       pgtbl_cfg->apple_dart_cfg.ttbr[i]);

> > +	for (; i < DART_MAX_TTBR; ++i)

> > +		apple_dart_hw_clear_ttbr(dart, sid, i);

> > +

> > +	apple_dart_hw_enable_translation(dart, sid);

> > +	apple_dart_hw_invalidate_tlb_stream(dart, sid);

> > +}

> > +

> > +static int apple_dart_attach_stream(struct apple_dart_domain *domain,

> > +				    struct apple_dart *dart, u32 sid)

> > +{

> > +	unsigned long flags;

> > +	struct apple_dart_stream *stream;

> > +	int ret;

> > +

> > +	lockdep_assert_held(&domain->lock);

> > +

> > +	if (WARN_ON(dart->force_bypass &&

> > +		    domain->type != IOMMU_DOMAIN_IDENTITY))

> > +		return -EINVAL;

> 

> Ideally you shouldn't allow that to happen, but I guess if you have 

> mixed capabilities afross different instances then in principle an 

> unmanaged domain could still slip through. But then again a user of an 

> unmanaged domain might be OK with using larger pages anyway. Either way 

> I'm not sure it's worthy of a WARN (similarly below) since it doesn't 

> represent a "this should never happen" condition if the user has got 

> their hands on a VFIO driver and is mucking about, it's just a normal 

> failure because you can't support the attachment.


Makes sense, will remove that WARN (and other below as well).

> 

> > +	/*

> > +	 * we can't mix and match DARTs that support bypass mode with those who don't

> > +	 * because the iova space in fake bypass mode generally has an offset

> > +	 */

> 

> Erm, something doesn't sound right there... IOMMU_DOMAIN_IDENTITY should 

> be exactly what it says, regardless of how it's implemented. If you 

> can't provide a true identity mapping then you're probably better off 

> not pretending to support them in the first place.


Some background: the PCIe DART only supports a 32bit VA space but RAM
on these machines starts at 0x8_0000_0000. I have something like 
  dma-ranges = <0x42000000 0 0 0x8 0 0 0xffff0000>;
in the pcie nodes to add that offset to dma addresses.

What I want to do here then is to setup an identity mapping with respect
to the DMA layer understanding of addresses encoded in bus_dma_region.
Now this will always just be a constant offset of 0x8_0000_0000 for
all M1s but I didn't want to hardcode that.
The code here is just there to guard against a situation where someone
somehow manages to attach two devices with different offsets to the same
domain.

If that's not how the abstraction is supposed to work and/or too big of a hack
I'll just remove the software bypass mode altogether.
PCIe won't work on 4k kernels then but the only people using this so far 
build their own kernels with patches either way and won't complain.
And by the time Linux will actually be useful for "normal" setups
the dma-iommu layer can hopefully just handle a larger page granularity.



> 

> > +	if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY &&

> > +		    (domain->dart->supports_bypass != dart->supports_bypass)))

> > +		return -EINVAL;

> > +

> > +	list_for_each_entry(stream, &domain->streams, stream_head) {

> > +		if (stream->dart == dart && stream->sid == sid) {

> > +			stream->num_devices++;

> > +			return 0;

> > +		}

> > +	}

> > +

> > +	spin_lock_irqsave(&dart->lock, flags);

> > +

> > +	if (WARN_ON(dart->used_sids & BIT(sid))) {

> > +		ret = -EINVAL;

> > +		goto error;

> > +	}

> > +

> > +	stream = kzalloc(sizeof(*stream), GFP_ATOMIC);

> > +	if (!stream) {

> > +		ret = -ENOMEM;

> > +		goto error;

> > +	}

> 

> Couldn't you do this outside the lock? (If, calling back to other 

> comments, it can't get refactored out of existence anyway)


Probably, but I'll first see if I can just refactor it away.

> 

> > +	stream->dart = dart;

> > +	stream->sid = sid;

> > +	stream->num_devices = 1;

> > +	list_add(&stream->stream_head, &domain->streams);

> > +

> > +	dart->used_sids |= BIT(sid);

> > +	spin_unlock_irqrestore(&dart->lock, flags);

> > +

> > +	apple_dart_hw_clear_all_ttbrs(stream->dart, stream->sid);

> > +

> > +	switch (domain->type) {

> > +	case IOMMU_DOMAIN_IDENTITY:

> > +		if (stream->dart->supports_bypass)

> > +			apple_dart_hw_enable_bypass(stream->dart, stream->sid);

> > +		else

> > +			apple_dart_stream_setup_translation(

> > +				domain, stream->dart, stream->sid);

> > +		break;

> > +	case IOMMU_DOMAIN_BLOCKED:

> > +		apple_dart_hw_disable_dma(stream->dart, stream->sid);

> > +		break;

> > +	case IOMMU_DOMAIN_UNMANAGED:

> > +	case IOMMU_DOMAIN_DMA:

> > +		apple_dart_stream_setup_translation(domain, stream->dart,

> > +						    stream->sid);

> > +		break;

> > +	}

> > +

> > +	return 0;

> > +

> > +error:

> > +	spin_unlock_irqrestore(&dart->lock, flags);

> > +	return ret;

> > +}

> > +

> > +static void apple_dart_disable_stream(struct apple_dart *dart, u32 sid)

> > +{

> > +	unsigned long flags;

> > +

> > +	apple_dart_hw_disable_dma(dart, sid);

> > +	apple_dart_hw_clear_all_ttbrs(dart, sid);

> > +	apple_dart_hw_invalidate_tlb_stream(dart, sid);

> > +

> > +	spin_lock_irqsave(&dart->lock, flags);

> > +	dart->used_sids &= ~BIT(sid);

> > +	spin_unlock_irqrestore(&dart->lock, flags);

> > +}

> > +

> > +static void apple_dart_detach_stream(struct apple_dart_domain *domain,

> > +				     struct apple_dart *dart, u32 sid)

> > +{

> > +	struct apple_dart_stream *stream;

> > +

> > +	lockdep_assert_held(&domain->lock);

> > +

> > +	list_for_each_entry(stream, &domain->streams, stream_head) {

> > +		if (stream->dart == dart && stream->sid == sid) {

> > +			stream->num_devices--;

> > +

> > +			if (stream->num_devices == 0) {

> > +				apple_dart_disable_stream(dart, sid);

> > +				list_del(&stream->stream_head);

> > +				kfree(stream);

> > +			}

> > +			return;

> > +		}

> > +	}

> > +}

> > +

> > +static int apple_dart_attach_dev(struct iommu_domain *domain,

> > +				 struct device *dev)

> > +{

> > +	int ret;

> > +	int i, j;

> > +	unsigned long flags;

> > +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +	struct apple_dart *dart = cfg->streams[0].dart;

> > +

> > +	if (WARN_ON(dart->force_bypass &&

> > +		    dart_domain->type != IOMMU_DOMAIN_IDENTITY)) {

> > +		dev_warn(

> > +			dev,

> > +			"IOMMU must be in bypass mode but trying to attach to translated domain.\n");

> > +		return -EINVAL;

> > +	}

> 

> Again, a bit excessive with the warnings. In fact, transpose my comment 

> from apple_dart_attach_stream() to here, because this means the 

> equivalent warning there is literally unreachable :/


Okay, I'll go through the code paths again, get rid of these warnings and
make sure I don't check the same thing more than once.

> 

[...]
> > +

> > +static void apple_dart_release_device(struct device *dev)

> > +{

> > +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> > +	int i;

> > +

> > +	if (!cfg)

> > +		return;

> 

> Shouldn't happen - if it's disappeared since probe_device succeeded 

> you've got bigger problems anyway.


Ok, will remove it.

> 

> > +

> > +	for (i = 0; i < cfg->num_streams; ++i)

> > +		device_link_del(cfg->streams[i].link);

> > +

> > +	dev_iommu_priv_set(dev, NULL);

> > +	kfree(cfg);

> > +}

> > +

> > +static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)

> > +{

> > +	struct apple_dart_domain *dart_domain;

> > +

> > +	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&

> > +	    type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED)

> > +		return NULL;

> 

> I want to say there's not much point in that, but then I realise I've 

> spent the last couple of days writing patches to add a new domain type :)


Hah! Just because I'm curious: What is that new domain type going to be? :)

> 

> > +	dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL);

> > +	if (!dart_domain)

> > +		return NULL;

> > +

> > +	INIT_LIST_HEAD(&dart_domain->streams);

> > +	spin_lock_init(&dart_domain->lock);

> > +	iommu_get_dma_cookie(&dart_domain->domain);

> > +	dart_domain->type = type;

> 

> Yeah, this is "useful" for a handful of CPU cycles until we return and 

> iommu_domain_alloc() sets dart_domain->domain->type to the same thing, 

> all before *its* caller even knows the domain exists.


True, will remove it.

> 

> > +	return &dart_domain->domain;

> > +}

> > +

> > +static void apple_dart_domain_free(struct iommu_domain *domain)

> > +{

> > +	struct apple_dart_domain *dart_domain = to_dart_domain(domain);

> > +

> > +	WARN_ON(!list_empty(&dart_domain->streams));

> 

> Why? This code is perfectly legal API usage:

> 

> 	d = iommu_domain_alloc(bus)

> 	if (d)

> 		iommu_domain_free(d);

> 

> Sure it looks pointless, but it's the kind of thing that can 

> legitimately happen (with a lot more going on in between) if an 

> unmanaged domain user tears itself down before it gets round to 

> attaching, due to probe deferral or some other error condition.


Ah, makes sense. I'll remove the warning!

> 

> > +	kfree(dart_domain);

> > +}

> > +

> > +static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)

> > +{

> > +	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);

> > +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> > +	unsigned int num_streams = cfg ? cfg->num_streams : 0;

> > +	struct apple_dart_master_cfg *cfg_new;

> > +	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);

> > +

> > +	if (args->args_count != 1)

> > +		return -EINVAL;

> > +

> > +	cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1),

> > +			   GFP_KERNEL);

> > +	if (!cfg_new)

> > +		return -ENOMEM;

> > +

> > +	cfg = cfg_new;

> > +	dev_iommu_priv_set(dev, cfg);

> > +

> > +	cfg->num_streams = num_streams;

> > +	cfg->streams[cfg->num_streams].dart = dart;

> > +	cfg->streams[cfg->num_streams].sid = args->args[0];

> > +	cfg->num_streams++;

> 

> Yeah, this is way too reminiscent of the fwspec code for comfort. Even 

> if you can't use autoremove links for some reason, an array of 16 

> device_link pointers hung off apple_dart still wins over these little 

> pointer-heavy structures if you need more than a few of them.


I can get rid of the links, but I'll still need some way to store
both the apple_dart and the sid here. Like mentioned above, I'll
be happy to reuse the fwspec code but I don't see how yet.

> 

> > +	return 0;

> > +}

> > +

> > +static struct iommu_group *apple_dart_device_group(struct device *dev)

> > +{

> > +#ifdef CONFIG_PCI

> > +	struct iommu_group *group;

> > +

> > +	if (dev_is_pci(dev))

> > +		group = pci_device_group(dev);

> > +	else

> > +		group = generic_device_group(dev);

> 

> ...and this is where it gets bad :(

> 

> If you can have multiple devices behind the same stream such that the 

> IOMMU can't tell them apart, you *have* to ensure they get put in the 

> same group, so that the IOMMU core knows the topology (and reflects it 

> correctly to userspace) and doesn't try to do things that then 

> unexpectedly fail. This is the point where you need to check if a stream 

> is already known, and return the existing group if so, and then you 

> won't need to check and refcount all the time in attach/detach because 

> the IOMMU core will do the right thing for you.

> 

> Many drivers only run on systems where devices don't alias at the IOMMU 

> level (aliasing at the PCI level is already taken care of), or use a 

> single group because effectively everything aliases, so it's not the 

> most common scenario, but as I mentioned before arm-smmu is one that 

> does - take a look at the flow though that in the "!smmu->smrs" cases 

> for the closest example.


Okay, this is very good to know. Thanks again for the pointer to the
arm-smmu code, it really helped me understand how iommu_groups are
supposed to work. I'll do this the proper way for v5, which should also
simplify this driver :-)


> 

> > +

> > +	return group;

> > +#else

> > +	return generic_device_group(dev);

> > +#endif

> > +}

> > +

> > +static int apple_dart_def_domain_type(struct device *dev)

> > +{

> > +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> > +	struct apple_dart *dart = cfg->streams[0].dart;

> > +

> > +	if (dart->force_bypass)

> > +		return IOMMU_DOMAIN_IDENTITY;

> > +	if (!dart->supports_bypass)

> > +		return IOMMU_DOMAIN_DMA;

> > +

> > +	return 0;

> > +}

> > +

> > +static const struct iommu_ops apple_dart_iommu_ops = {

> > +	.domain_alloc = apple_dart_domain_alloc,

> > +	.domain_free = apple_dart_domain_free,

> > +	.attach_dev = apple_dart_attach_dev,

> > +	.detach_dev = apple_dart_detach_dev,

> > +	.map = apple_dart_map,

> > +	.unmap = apple_dart_unmap,

> > +	.flush_iotlb_all = apple_dart_flush_iotlb_all,

> > +	.iotlb_sync = apple_dart_iotlb_sync,

> > +	.iotlb_sync_map = apple_dart_iotlb_sync_map,

> > +	.iova_to_phys = apple_dart_iova_to_phys,

> > +	.probe_device = apple_dart_probe_device,

> > +	.release_device = apple_dart_release_device,

> > +	.device_group = apple_dart_device_group,

> > +	.of_xlate = apple_dart_of_xlate,

> > +	.def_domain_type = apple_dart_def_domain_type,

> > +	.pgsize_bitmap = -1UL, /* Restricted during dart probe */

> > +};

> > +

> > +static irqreturn_t apple_dart_irq(int irq, void *dev)

> > +{

> > +	struct apple_dart *dart = dev;

> > +	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,

> > +				      DEFAULT_RATELIMIT_BURST);

> > +	const char *fault_name = NULL;

> > +	u32 error = readl(dart->regs + DART_ERROR);

> > +	u32 error_code = FIELD_GET(DART_ERROR_CODE, error);

> > +	u32 addr_lo = readl(dart->regs + DART_ERROR_ADDR_LO);

> > +	u32 addr_hi = readl(dart->regs + DART_ERROR_ADDR_HI);

> > +	u64 addr = addr_lo | (((u64)addr_hi) << 32);

> > +	u8 stream_idx = FIELD_GET(DART_ERROR_STREAM, error);

> > +

> > +	if (!(error & DART_ERROR_FLAG))

> > +		return IRQ_NONE;

> > +

> > +	if (error_code & DART_ERROR_READ_FAULT)

> > +		fault_name = "READ FAULT";

> > +	else if (error_code & DART_ERROR_WRITE_FAULT)

> > +		fault_name = "WRITE FAULT";

> > +	else if (error_code & DART_ERROR_NO_PTE)

> > +		fault_name = "NO PTE FOR IOVA";

> > +	else if (error_code & DART_ERROR_NO_PMD)

> > +		fault_name = "NO PMD FOR IOVA";

> > +	else if (error_code & DART_ERROR_NO_TTBR)

> > +		fault_name = "NO TTBR FOR IOVA";

> 

> Can multiple bits be set at once or is there a strict precedence?


I'll double check and either add a comment that there's a precedence or
print names for all bits that are set.

> 

> > +	if (WARN_ON(fault_name == NULL))

> 

> You're already logging a clear and attributable message below; I can 

> guarantee that a big noisy backtrace showing that you got here from 

> el0_irq() or el1_irq() is not useful over and above that.

> 

> > +		fault_name = "unknown";

> > +

> > +	if (__ratelimit(&rs)) {

> 

> Just use dev_err_ratelimited() to hide the guts if you're not doing 

> anything tricky.


Ack.

> 

> > +		dev_err(dart->dev,

> > +			"translation fault: status:0x%x stream:%d code:0x%x (%s) at 0x%llx",

> > +			error, stream_idx, error_code, fault_name, addr);

> > +	}

> > +

> > +	writel(error, dart->regs + DART_ERROR);

> > +	return IRQ_HANDLED;

> > +}

> > +

> > +static int apple_dart_probe(struct platform_device *pdev)

> > +{

> > +	int ret;

> > +	u32 dart_params[2];

> > +	struct resource *res;

> > +	struct apple_dart *dart;

> > +	struct device *dev = &pdev->dev;

> > +

> > +	dart = devm_kzalloc(dev, sizeof(*dart), GFP_KERNEL);

> > +	if (!dart)

> > +		return -ENOMEM;

> > +

> > +	dart->dev = dev;

> > +	spin_lock_init(&dart->lock);

> > +

> > +	if (pdev->num_resources < 1)

> > +		return -ENODEV;

> 

> But you have 2 resources (one MEM and one IRQ)? And anyway their 

> respective absences would hardly go unnoticed below.


Probably a leftover from when I just had the MEM resource.
I'll just remove the check here.

> 

> > +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);

> > +	if (resource_size(res) < 0x4000) {

> > +		dev_err(dev, "MMIO region too small (%pr)\n", res);

> > +		return -EINVAL;

> > +	}

> > +

> > +	dart->regs = devm_ioremap_resource(dev, res);

> > +	if (IS_ERR(dart->regs))

> > +		return PTR_ERR(dart->regs);

> > +

> > +	ret = devm_clk_bulk_get_all(dev, &dart->clks);

> > +	if (ret < 0)

> > +		return ret;

> > +	dart->num_clks = ret;

> > +

> > +	ret = clk_bulk_prepare_enable(dart->num_clks, dart->clks);

> > +	if (ret)

> > +		return ret;

> > +

> > +	ret = apple_dart_hw_reset(dart);

> > +	if (ret)

> > +		goto err_clk_disable;

> > +

> > +	dart_params[0] = readl(dart->regs + DART_PARAMS1);

> > +	dart_params[1] = readl(dart->regs + DART_PARAMS2);

> > +	dart->pgsize = 1 << FIELD_GET(DART_PARAMS_PAGE_SHIFT, dart_params[0]);

> > +	dart->supports_bypass = dart_params[1] & DART_PARAMS_BYPASS_SUPPORT;

> > +	dart->force_bypass = dart->pgsize > PAGE_SIZE;

> > +

> > +	dart->irq = platform_get_irq(pdev, 0);

> > +	if (dart->irq < 0) {

> > +		ret = -ENODEV;

> > +		goto err_clk_disable;

> > +	}

> 

> FWIW I'd get the IRQ earlier when there's still nothing to clean up on 

> failure - it's only the request which needs to wait until you've 

> actually set up enough to be able to handle it if it does fire.


Good point, will move it further above.

> 

> > +	ret = devm_request_irq(dart->dev, dart->irq, apple_dart_irq,

> > +			       IRQF_SHARED, "apple-dart fault handler", dart);

> 

> Be verfy careful with this pattern of mixing devrers-managed IRQs with 

> explicitly-managed clocks, especially when IRQF_SHARED is in play. In 

> the failure path here, and in remove, you have a period where the clocks 

> have been disabled but the IRQ is still live - try CONFIG_DEBUG_SHIRQ 

> and don't be surprised if you deadlock trying to read an unclocked register.


Good catch, I didn't even think about that situation.
"Luckily" the clocks are usually shared with the master device(s) attached to
the iommu, so they are already on long before apple_dart_probe is called.

> 

> If you can't also offload the clock management to devres to guarantee 

> ordering relative to the IRQ (I think I saw some patches recently), it's 

> probably safest to manually manage the latter.


Okay, will take a look and see if I can offload it and otherwise manage it
manually then.

> 

> > +	if (ret)

> > +		goto err_clk_disable;

> > +

> > +	platform_set_drvdata(pdev, dart);

> > +

> > +	ret = iommu_device_sysfs_add(&dart->iommu, dev, NULL, "apple-dart.%s",

> > +				     dev_name(&pdev->dev));

> > +	if (ret)

> > +		goto err_clk_disable;

> > +

> > +	ret = iommu_device_register(&dart->iommu, &apple_dart_iommu_ops, dev);

> > +	if (ret)

> > +		goto err_clk_disable;

> > +

> > +	if (dev->bus->iommu_ops != &apple_dart_iommu_ops) {

> > +		ret = bus_set_iommu(dev->bus, &apple_dart_iommu_ops);

> > +		if (ret)

> > +			goto err_clk_disable;

> > +	}

> > +#ifdef CONFIG_PCI

> > +	if (dev->bus->iommu_ops != pci_bus_type.iommu_ops) {

> 

> But it's still a platform device, not a PCI device?


Er, yes, I will fix this code here by doing something similar to what
arm-smmu does:

        if (!iommu_present(&platform_bus_type)) {
                ret = bus_set_iommu(&platform_bus_type, &apple_dart_iommu_ops);
                if (ret)
                        goto err_clk_disable;
        }
#ifdef CONFIG_PCI
        if (!iommu_present(&pci_bus_type)) {
                ret = bus_set_iommu(&pci_bus_type, &apple_dart_iommu_ops);
                if (ret)
                        goto err_reset_platform_ops;
        }
#endif


> 

> > +		ret = bus_set_iommu(&pci_bus_type, &apple_dart_iommu_ops);

> > +		if (ret)

> > +			goto err_clk_disable;

> 

> And the platform bus ops?


ugh, good catch. will clean them up as well.

> 

> > +	}

> > +#endif

> > +

> > +	dev_info(

> > +		&pdev->dev,

> > +		"DART [pagesize %x, bypass support: %d, bypass forced: %d] initialized\n",

> > +		dart->pgsize, dart->supports_bypass, dart->force_bypass);

> > +	return 0;

> > +

> > +err_clk_disable:

> > +	clk_bulk_disable(dart->num_clks, dart->clks);

> > +	clk_bulk_unprepare(dart->num_clks, dart->clks);

> 

> No need to open-code clk_bulk_disable_unprepare() ;)


True :-)

> 

> > +	return ret;

> > +}

> > +

> > +static int apple_dart_remove(struct platform_device *pdev)

> > +{

> > +	struct apple_dart *dart = platform_get_drvdata(pdev);

> > +

> > +	devm_free_irq(dart->dev, dart->irq, dart);

> > +

> > +	iommu_device_unregister(&dart->iommu);

> > +	iommu_device_sysfs_remove(&dart->iommu);

> > +

> > +	clk_bulk_disable(dart->num_clks, dart->clks);

> > +	clk_bulk_unprepare(dart->num_clks, dart->clks);

> 

> Ditto.

> 

> And again the bus ops are still installed - that'll get really fun if 

> this is a module unload...


Ugh, yeah. I'll fix that as well. I'll have to see how to make this work
correctly with multiple DART instances. I guess I should only remove the
bus ops once the last one is removed. Now that I think about it, this
could also get tricky in the cleanup paths of apple_dart_probe.

Maybe just add a module_init that sets up the bus ops when it finds at
least one DART node and module_exit to tear them down again?

> 

> > +	return 0;

> > +}

> > +

> > +static void apple_dart_shutdown(struct platform_device *pdev)

> > +{

> > +	apple_dart_remove(pdev);

> 

> The main reason for doing somthing on shutdown is in the case of kexec, 

> to put the hardware back into a disable or otherwise sane state so as 

> not to trip up whatever the subsequent payload is. If you're not doing 

> that (which may be legitimate if the expectation is that software must 

> always fully reset and initialise a DART before I/O can work) then 

> there's not much point in doing anything, really. Stuff like tidying up 

> sysfs is a complete waste of time when the world's about to end ;)


Makes sense, I'll see if I can put the DARTs back into a sane state
for whatever the next payload is here then.

> 

> Robin.

> 

> > +}

> > +

> > +static const struct of_device_id apple_dart_of_match[] = {

> > +	{ .compatible = "apple,t8103-dart", .data = NULL },

> > +	{},

> > +};

> > +MODULE_DEVICE_TABLE(of, apple_dart_of_match);

> > +

> > +static struct platform_driver apple_dart_driver = {

> > +	.driver	= {

> > +		.name			= "apple-dart",

> > +		.of_match_table		= apple_dart_of_match,

> > +	},

> > +	.probe	= apple_dart_probe,

> > +	.remove	= apple_dart_remove,

> > +	.shutdown = apple_dart_shutdown,

> > +};

> > +module_platform_driver(apple_dart_driver);

> > +

> > +MODULE_DESCRIPTION("IOMMU API for Apple's DART");

> > +MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>");

> > +MODULE_LICENSE("GPL v2");

> > 

> 


Sven
Robin Murphy July 19, 2021, 6:15 p.m. UTC | #6
On 2021-07-15 17:41, Sven Peter via iommu wrote:
[...]
>>> +	u64 sw_bypass_cpu_start;

>>> +	u64 sw_bypass_dma_start;

>>> +	u64 sw_bypass_len;

>>> +

>>> +	struct list_head streams;

>>

>> I'm staring to think this could just be a bitmap, in a u16 even.

> 

> The problem is that these streams may come from two different

> DART instances. That is required for e.g. the dwc3 controller which

> has a weird quirk where DMA transactions go through two separate

> DARTs with no clear pattern (e.g. some xhci control structures use the

> first dart while other structures use the second one).


Ah right, I do remember discussing that situation, but I think I 
misinterpreted dart_domain->dart representing "the DART instance" here 
to mean we weren't trying to accommodate that just yet.

> Both of them need to point to the same pagetable.

> In the device tree the node will have an entry like this:

> 

> dwc3_0: usb@382280000{

>     ...

>     iommus = <&dwc3_0_dart_0 0>, <&dwc3_0_dart_1 1>;

> };

> 

> There's no need for a linked list though once I do this properly with

> groups. I can just use an array allocated when the first device is

> attached, which just contains apple_dart* and streamid values.

> 

> 

>>

>>> +

>>> +	spinlock_t lock;

>>> +

>>> +	struct iommu_domain domain;

>>> +};

>>> +

>>> +/*

>>> + * This structure is attached to devices with dev_iommu_priv_set() on of_xlate

>>> + * and contains a list of streams bound to this device as defined in the

>>> + * device tree. Multiple DART instances can be attached to a single device

>>> + * and each stream is identified by its stream id.

>>> + * It's usually reference by a pointer called *cfg.

>>> + *

>>> + * A dynamic array instead of a linked list is used here since in almost

>>> + * all cases a device will just be attached to a single stream and streams

>>> + * are never removed after they have been added.

>>> + *

>>> + * @num_streams: number of streams attached

>>> + * @streams: array of structs to identify attached streams and the device link

>>> + *           to the iommu

>>> + */

>>> +struct apple_dart_master_cfg {

>>> +	int num_streams;

>>> +	struct {

>>> +		struct apple_dart *dart;

>>> +		u32 sid;

>>

>> Can't you use the fwspec for this?

> 

> 

> I'd be happy to use the fwspec code if that's somehow possible.

> I'm not sure how though since I need to store both the reference to the DART

> _and_ to the stream id. As far as I can tell the fwspec code would only allow

> to store the stream ids.

> (see also the previous comment regarding the dwc3 node which requires stream

> ids from two separate DART instances)


Hmm, yes, as above I was overlooking that, although there are still 
various ideas that come to mind; the question becomes whether they're 
actually worthwhile or just too-clever-for-their-own-good hacks. The 
exact format of fwspec->ids is not fixed (other than the ACPI IORT code 
having a common understanding with the Arm SMMU drivers) so in principle 
you could munge some sort of DART instance index or indeed anything, but 
if it remains cleaner to manage your own data internally then by all 
means keep doing that.

>>> +		struct device_link *link;

>>

>> Is it necessary to use stateless links, or could you use

>> DL_FLAG_AUTOREMOVE_SUPPLIER and not have to keep track of them manually?

> 

> I'll just use DL_FLAG_AUTOREMOVE_SUPPLIER. No idea why I went for stateless links.

> 

>>

> [...]

>>> +	/* restore stream identity map */

>>> +	writel(0x03020100, dart->regs + DART_STREAM_REMAP);

>>> +	writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4);

>>> +	writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8);

>>> +	writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12);

>>

>> Any hint of what the magic numbers mean?

> 

> Yes, it's just 0,1,2,3...,0xe,0xf but I can't do 8bit writes to the bus

> and 32 bit writes then require these slightly awkward "swapped" numbers.

> I'll add a comment since it's not obvious at first glance.


Sure, I guessed that much from "identity map" - it was more a question 
of why that means 0x03020100... rather than, say, 0x0c0d0e0f... or 
0x76543210..., and perhaps the reason for "restoring" it in the first place.

[...]
>>> +	/*

>>> +	 * we can't mix and match DARTs that support bypass mode with those who don't

>>> +	 * because the iova space in fake bypass mode generally has an offset

>>> +	 */

>>

>> Erm, something doesn't sound right there... IOMMU_DOMAIN_IDENTITY should

>> be exactly what it says, regardless of how it's implemented. If you

>> can't provide a true identity mapping then you're probably better off

>> not pretending to support them in the first place.

> 

> Some background: the PCIe DART only supports a 32bit VA space but RAM

> on these machines starts at 0x8_0000_0000. I have something like

>    dma-ranges = <0x42000000 0 0 0x8 0 0 0xffff0000>;

> in the pcie nodes to add that offset to dma addresses.

> 

> What I want to do here then is to setup an identity mapping with respect

> to the DMA layer understanding of addresses encoded in bus_dma_region.

> Now this will always just be a constant offset of 0x8_0000_0000 for

> all M1s but I didn't want to hardcode that.

> The code here is just there to guard against a situation where someone

> somehow manages to attach two devices with different offsets to the same

> domain.


Urgh, *now* I think I get it - the addressing limitation WRT the 
physical memory map layout had also slipped my mind. So you describe the 
RC *as if* it had a physical bus offset, rely on iommu-dma ignoring it 
when active (which is more by luck than design - we don't expect to ever 
see a device with a real hard-wired offset upstream of an IOMMU, 
although I did initially try to support it back in the very early days), 
and otherwise statically program a translation such that anyone else who 
*does* respect bus_dma_regions finds things work as expected.

That actually seems like an even stronger argument for having the 
fake-bypass table belong to the DART rather than the domain, and at that 
point you shouldn't even need the mismatch restriction, since as long as 
you haven't described the fake offset for any devices who *can* achieve 
real bypass, then "attach to an identity domain" simply comes down to 
doing the appropriate thing for each individual stream, regardless of 
whether it's the same nominal identity domain that another device is 
using or a distinct one (it's highly unlikely that two groups would ever 
get attached to one identity domain rather than simply having their own 
anyway, but it is technically possible).

> If that's not how the abstraction is supposed to work and/or too big of a hack

> I'll just remove the software bypass mode altogether.

> PCIe won't work on 4k kernels then but the only people using this so far

> build their own kernels with patches either way and won't complain.

> And by the time Linux will actually be useful for "normal" setups

> the dma-iommu layer can hopefully just handle a larger page granularity.


It's certainly... "creative", and TBH I don't hate it (in a "play the 
hand you've been given" kind of way), but the one significant downside 
is that if the DART driver isn't loaded for any reason, PCI DMA will 
look like it should be usable but then just silently (or not so 
silently) fail.

FWIW if you do want to keep the option open, I'd be inclined to have the 
DT just give an "honest" description of just the 32-bit limitation, then 
have the DART driver's .probe_device sneakily modify the bus_dma_region 
to match the relevant fake-bypass table as appropriate. It's possible 
other folks might hate that even more though :D

>>> +	if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY &&

>>> +		    (domain->dart->supports_bypass != dart->supports_bypass)))

>>> +		return -EINVAL;

>>> +

>>> +	list_for_each_entry(stream, &domain->streams, stream_head) {

>>> +		if (stream->dart == dart && stream->sid == sid) {

>>> +			stream->num_devices++;

>>> +			return 0;

>>> +		}

>>> +	}

>>> +

>>> +	spin_lock_irqsave(&dart->lock, flags);

>>> +

>>> +	if (WARN_ON(dart->used_sids & BIT(sid))) {

>>> +		ret = -EINVAL;

>>> +		goto error;

>>> +	}

>>> +

>>> +	stream = kzalloc(sizeof(*stream), GFP_ATOMIC);

>>> +	if (!stream) {

>>> +		ret = -ENOMEM;

>>> +		goto error;

>>> +	}

>>

>> Couldn't you do this outside the lock? (If, calling back to other

>> comments, it can't get refactored out of existence anyway)

> 

> Probably, but I'll first see if I can just refactor it away.


Actually I missed that we're already under dart_domain->lock at this 
point anyway, so it's not going to make much difference, but it does 
mean that the spin_lock_irqsave() above could just be spin_lock(), 
unless it's possible to relax the domain locking a bit such that we 
don't have to do the whole domain init with IRQs masked.

[...]
>>> +static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)

>>> +{

>>> +	struct apple_dart_domain *dart_domain;

>>> +

>>> +	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&

>>> +	    type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED)

>>> +		return NULL;

>>

>> I want to say there's not much point in that, but then I realise I've

>> spent the last couple of days writing patches to add a new domain type :)

> 

> Hah! Just because I'm curious: What is that new domain type going to be? :)


Splitting IOMMU_DOMAIN_DMA into two to replace iommu_dma_strict being an 
orthogonal thing.

[...]
>>> +static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)

>>> +{

>>> +	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);

>>> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

>>> +	unsigned int num_streams = cfg ? cfg->num_streams : 0;

>>> +	struct apple_dart_master_cfg *cfg_new;

>>> +	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);

>>> +

>>> +	if (args->args_count != 1)

>>> +		return -EINVAL;

>>> +

>>> +	cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1),

>>> +			   GFP_KERNEL);

>>> +	if (!cfg_new)

>>> +		return -ENOMEM;

>>> +

>>> +	cfg = cfg_new;

>>> +	dev_iommu_priv_set(dev, cfg);

>>> +

>>> +	cfg->num_streams = num_streams;

>>> +	cfg->streams[cfg->num_streams].dart = dart;

>>> +	cfg->streams[cfg->num_streams].sid = args->args[0];

>>> +	cfg->num_streams++;

>>

>> Yeah, this is way too reminiscent of the fwspec code for comfort. Even

>> if you can't use autoremove links for some reason, an array of 16

>> device_link pointers hung off apple_dart still wins over these little

>> pointer-heavy structures if you need more than a few of them.

> 

> I can get rid of the links, but I'll still need some way to store

> both the apple_dart and the sid here. Like mentioned above, I'll

> be happy to reuse the fwspec code but I don't see how yet.


As before, if you can fit in some kind of DART instance identifier which 
isn't impractical to unpack than it makes sense to use the fwspec since 
it's already there. However if you still need to allocate something 
per-device rather than just stashing an existing pointer in iommu_priv, 
then you may as well keep everything together there. If the worst known 
case could still fit in just two DART pointers and two 64-bit bitmaps, 
I'd be inclined to just have that as a fixed structure and save all the 
extra bother - you're not cross-architecture like the fwspec code, and 
arm64's minimum kmalloc granularity has just gone back up to 128 bytes 
(but even at 64 bytes you'd have had plenty of room).

[...]
>>> +static int apple_dart_remove(struct platform_device *pdev)

>>> +{

>>> +	struct apple_dart *dart = platform_get_drvdata(pdev);

>>> +

>>> +	devm_free_irq(dart->dev, dart->irq, dart);

>>> +

>>> +	iommu_device_unregister(&dart->iommu);

>>> +	iommu_device_sysfs_remove(&dart->iommu);

>>> +

>>> +	clk_bulk_disable(dart->num_clks, dart->clks);

>>> +	clk_bulk_unprepare(dart->num_clks, dart->clks);

>>

>> Ditto.

>>

>> And again the bus ops are still installed - that'll get really fun if

>> this is a module unload...

> 

> Ugh, yeah. I'll fix that as well. I'll have to see how to make this work

> correctly with multiple DART instances. I guess I should only remove the

> bus ops once the last one is removed. Now that I think about it, this

> could also get tricky in the cleanup paths of apple_dart_probe.

> 

> Maybe just add a module_init that sets up the bus ops when it finds at

> least one DART node and module_exit to tear them down again?


Actually by this point it was late and I wasn't thinking as clearly as I 
could have been, apologies ;)

I believe a module unload is in fact the *only* time you should expect 
to see .remove called - you want to set .suppress_bind_attrs in your 
driver data because there's basically no way to prevent manual unbinding 
from blowing up - so it should be fine to unconditionally clear the ops 
at this point (being removed means you must have successfully probed, so 
any ops must be yours).

Cheers,
Robin.
Sven Peter July 25, 2021, 12:40 p.m. UTC | #7
On Mon, Jul 19, 2021, at 20:15, Robin Murphy wrote:
> On 2021-07-15 17:41, Sven Peter via iommu wrote:

> [...]

> >>> +	u64 sw_bypass_cpu_start;

> >>> +	u64 sw_bypass_dma_start;

> >>> +	u64 sw_bypass_len;

> >>> +

> >>> +	struct list_head streams;

> >>

> >> I'm staring to think this could just be a bitmap, in a u16 even.

> > 

> > The problem is that these streams may come from two different

> > DART instances. That is required for e.g. the dwc3 controller which

> > has a weird quirk where DMA transactions go through two separate

> > DARTs with no clear pattern (e.g. some xhci control structures use the

> > first dart while other structures use the second one).

> 

> Ah right, I do remember discussing that situation, but I think I 

> misinterpreted dart_domain->dart representing "the DART instance" here 

> to mean we weren't trying to accommodate that just yet.

> 

> > Both of them need to point to the same pagetable.

> > In the device tree the node will have an entry like this:

> > 

> > dwc3_0: usb@382280000{

> >     ...

> >     iommus = <&dwc3_0_dart_0 0>, <&dwc3_0_dart_1 1>;

> > };

> > 

> > There's no need for a linked list though once I do this properly with

> > groups. I can just use an array allocated when the first device is

> > attached, which just contains apple_dart* and streamid values.

> > 

> > 

> >>

> >>> +

> >>> +	spinlock_t lock;

> >>> +

> >>> +	struct iommu_domain domain;

> >>> +};

> >>> +

> >>> +/*

> >>> + * This structure is attached to devices with dev_iommu_priv_set() on of_xlate

> >>> + * and contains a list of streams bound to this device as defined in the

> >>> + * device tree. Multiple DART instances can be attached to a single device

> >>> + * and each stream is identified by its stream id.

> >>> + * It's usually reference by a pointer called *cfg.

> >>> + *

> >>> + * A dynamic array instead of a linked list is used here since in almost

> >>> + * all cases a device will just be attached to a single stream and streams

> >>> + * are never removed after they have been added.

> >>> + *

> >>> + * @num_streams: number of streams attached

> >>> + * @streams: array of structs to identify attached streams and the device link

> >>> + *           to the iommu

> >>> + */

> >>> +struct apple_dart_master_cfg {

> >>> +	int num_streams;

> >>> +	struct {

> >>> +		struct apple_dart *dart;

> >>> +		u32 sid;

> >>

> >> Can't you use the fwspec for this?

> > 

> > 

> > I'd be happy to use the fwspec code if that's somehow possible.

> > I'm not sure how though since I need to store both the reference to the DART

> > _and_ to the stream id. As far as I can tell the fwspec code would only allow

> > to store the stream ids.

> > (see also the previous comment regarding the dwc3 node which requires stream

> > ids from two separate DART instances)

> 

> Hmm, yes, as above I was overlooking that, although there are still 

> various ideas that come to mind; the question becomes whether they're 

> actually worthwhile or just too-clever-for-their-own-good hacks. The 

> exact format of fwspec->ids is not fixed (other than the ACPI IORT code 

> having a common understanding with the Arm SMMU drivers) so in principle 

> you could munge some sort of DART instance index or indeed anything, but 

> if it remains cleaner to manage your own data internally then by all 

> means keep doing that.


Yeah, I can think of some hacks as well (like storing a global id->apple_dart* map
or stuffing the 64bit pointer into two ints) and I've tried a few of them in the past
days but didn't like either of them.

I do like the idea to just put two (struct apple_dart *dart, u16 sidmap)
in there though which will be plenty for all current configurations.

> 

> >>> +		struct device_link *link;

> >>

> >> Is it necessary to use stateless links, or could you use

> >> DL_FLAG_AUTOREMOVE_SUPPLIER and not have to keep track of them manually?

> > 

> > I'll just use DL_FLAG_AUTOREMOVE_SUPPLIER. No idea why I went for stateless links.

> > 

> >>

> > [...]

> >>> +	/* restore stream identity map */

> >>> +	writel(0x03020100, dart->regs + DART_STREAM_REMAP);

> >>> +	writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4);

> >>> +	writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8);

> >>> +	writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12);

> >>

> >> Any hint of what the magic numbers mean?

> > 

> > Yes, it's just 0,1,2,3...,0xe,0xf but I can't do 8bit writes to the bus

> > and 32 bit writes then require these slightly awkward "swapped" numbers.

> > I'll add a comment since it's not obvious at first glance.

> 

> Sure, I guessed that much from "identity map" - it was more a question 

> of why that means 0x03020100... rather than, say, 0x0c0d0e0f... or 

> 0x76543210..., and perhaps the reason for "restoring" it in the first place.


So what this feature does is to allow the DART to take an incoming DMA stream
tagged with id i and pretend that it's actually been tagged with
readb(dart->regs + 0x80 + i) instead. That's as much as I can figure out by
poking the hardware. More details are probably only available to Apple.

Now the reason I thought I needed this was that I assumed we are handed these DARTs
in an unclean state because Apple makes use of this internally:
In their device tree they have a sid-remap property which I believe is a hack to make
their driver simpler. The dwc3 controller requires stream 0 of dartA and stream 1 of
dartB to be configured the same way. They configure dartB to remap stream 1 to stream 0
and then just mirror all MMIO writes from dartA to dartB and pretend that dwc3 only
needs a single DART.

As it actually turns out though, iBoot doesn't use the USB DARTs and we already get
them in the sane state. I can just drop this code. (And if we actually need it
for other DARTs I can also just restore those in our bootloader or add it in a
follow up).

> 

> [...]

> >>> +	/*

> >>> +	 * we can't mix and match DARTs that support bypass mode with those who don't

> >>> +	 * because the iova space in fake bypass mode generally has an offset

> >>> +	 */

> >>

> >> Erm, something doesn't sound right there... IOMMU_DOMAIN_IDENTITY should

> >> be exactly what it says, regardless of how it's implemented. If you

> >> can't provide a true identity mapping then you're probably better off

> >> not pretending to support them in the first place.

> > 

> > Some background: the PCIe DART only supports a 32bit VA space but RAM

> > on these machines starts at 0x8_0000_0000. I have something like

> >    dma-ranges = <0x42000000 0 0 0x8 0 0 0xffff0000>;

> > in the pcie nodes to add that offset to dma addresses.

> > 

> > What I want to do here then is to setup an identity mapping with respect

> > to the DMA layer understanding of addresses encoded in bus_dma_region.

> > Now this will always just be a constant offset of 0x8_0000_0000 for

> > all M1s but I didn't want to hardcode that.

> > The code here is just there to guard against a situation where someone

> > somehow manages to attach two devices with different offsets to the same

> > domain.

> 

> Urgh, *now* I think I get it - the addressing limitation WRT the 

> physical memory map layout had also slipped my mind. So you describe the 

> RC *as if* it had a physical bus offset, rely on iommu-dma ignoring it 

> when active (which is more by luck than design - we don't expect to ever 

> see a device with a real hard-wired offset upstream of an IOMMU, 

> although I did initially try to support it back in the very early days), 

> and otherwise statically program a translation such that anyone else who 

> *does* respect bus_dma_regions finds things work as expected.


Yes, exactly. It's not very nice but it works...

> 

> That actually seems like an even stronger argument for having the 

> fake-bypass table belong to the DART rather than the domain, and at that 

> point you shouldn't even need the mismatch restriction, since as long as 

> you haven't described the fake offset for any devices who *can* achieve 

> real bypass, then "attach to an identity domain" simply comes down to 

> doing the appropriate thing for each individual stream, regardless of 

> whether it's the same nominal identity domain that another device is 

> using or a distinct one (it's highly unlikely that two groups would ever 

> get attached to one identity domain rather than simply having their own 

> anyway, but it is technically possible).

> 


Agreed. That sounds a lot nicer actually.


> > If that's not how the abstraction is supposed to work and/or too big of a hack

> > I'll just remove the software bypass mode altogether.

> > PCIe won't work on 4k kernels then but the only people using this so far

> > build their own kernels with patches either way and won't complain.

> > And by the time Linux will actually be useful for "normal" setups

> > the dma-iommu layer can hopefully just handle a larger page granularity.

> 

> It's certainly... "creative", and TBH I don't hate it (in a "play the 

> hand you've been given" kind of way), but the one significant downside 

> is that if the DART driver isn't loaded for any reason, PCI DMA will 

> look like it should be usable but then just silently (or not so 

> silently) fail.


Good point!

> 

> FWIW if you do want to keep the option open, I'd be inclined to have the 

> DT just give an "honest" description of just the 32-bit limitation, then 

> have the DART driver's .probe_device sneakily modify the bus_dma_region 

> to match the relevant fake-bypass table as appropriate. It's possible 

> other folks might hate that even more though :D


I've given that one a try and I kinda like it so far :D
I'll keep it for v5 and just drop it in case someone complains.

> 

> >>> +	if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY &&

> >>> +		    (domain->dart->supports_bypass != dart->supports_bypass)))

> >>> +		return -EINVAL;

> >>> +

> >>> +	list_for_each_entry(stream, &domain->streams, stream_head) {

> >>> +		if (stream->dart == dart && stream->sid == sid) {

> >>> +			stream->num_devices++;

> >>> +			return 0;

> >>> +		}

> >>> +	}

> >>> +

> >>> +	spin_lock_irqsave(&dart->lock, flags);

> >>> +

> >>> +	if (WARN_ON(dart->used_sids & BIT(sid))) {

> >>> +		ret = -EINVAL;

> >>> +		goto error;

> >>> +	}

> >>> +

> >>> +	stream = kzalloc(sizeof(*stream), GFP_ATOMIC);

> >>> +	if (!stream) {

> >>> +		ret = -ENOMEM;

> >>> +		goto error;

> >>> +	}

> >>

> >> Couldn't you do this outside the lock? (If, calling back to other

> >> comments, it can't get refactored out of existence anyway)

> > 

> > Probably, but I'll first see if I can just refactor it away.

> 

> Actually I missed that we're already under dart_domain->lock at this 

> point anyway, so it's not going to make much difference, but it does 

> mean that the spin_lock_irqsave() above could just be spin_lock(), 

> unless it's possible to relax the domain locking a bit such that we 

> don't have to do the whole domain init with IRQs masked.


I can relax the locking quite a bit.
Right now, I only need a spinlock around the TLB flush MMIO writes
and a mutex to protect domain initialization.

> 

> [...]

> >>> +static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)

> >>> +{

> >>> +	struct apple_dart_domain *dart_domain;

> >>> +

> >>> +	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&

> >>> +	    type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED)

> >>> +		return NULL;

> >>

> >> I want to say there's not much point in that, but then I realise I've

> >> spent the last couple of days writing patches to add a new domain type :)

> > 

> > Hah! Just because I'm curious: What is that new domain type going to be? :)

> 

> Splitting IOMMU_DOMAIN_DMA into two to replace iommu_dma_strict being an 

> orthogonal thing.

> 

> [...]

> >>> +static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)

> >>> +{

> >>> +	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);

> >>> +	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);

> >>> +	unsigned int num_streams = cfg ? cfg->num_streams : 0;

> >>> +	struct apple_dart_master_cfg *cfg_new;

> >>> +	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);

> >>> +

> >>> +	if (args->args_count != 1)

> >>> +		return -EINVAL;

> >>> +

> >>> +	cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1),

> >>> +			   GFP_KERNEL);

> >>> +	if (!cfg_new)

> >>> +		return -ENOMEM;

> >>> +

> >>> +	cfg = cfg_new;

> >>> +	dev_iommu_priv_set(dev, cfg);

> >>> +

> >>> +	cfg->num_streams = num_streams;

> >>> +	cfg->streams[cfg->num_streams].dart = dart;

> >>> +	cfg->streams[cfg->num_streams].sid = args->args[0];

> >>> +	cfg->num_streams++;

> >>

> >> Yeah, this is way too reminiscent of the fwspec code for comfort. Even

> >> if you can't use autoremove links for some reason, an array of 16

> >> device_link pointers hung off apple_dart still wins over these little

> >> pointer-heavy structures if you need more than a few of them.

> > 

> > I can get rid of the links, but I'll still need some way to store

> > both the apple_dart and the sid here. Like mentioned above, I'll

> > be happy to reuse the fwspec code but I don't see how yet.

> 

> As before, if you can fit in some kind of DART instance identifier which 

> isn't impractical to unpack than it makes sense to use the fwspec since 

> it's already there. However if you still need to allocate something 

> per-device rather than just stashing an existing pointer in iommu_priv, 

> then you may as well keep everything together there. If the worst known 

> case could still fit in just two DART pointers and two 64-bit bitmaps, 

> I'd be inclined to just have that as a fixed structure and save all the 

> extra bother - you're not cross-architecture like the fwspec code, and 

> arm64's minimum kmalloc granularity has just gone back up to 128 bytes 

> (but even at 64 bytes you'd have had plenty of room).


That's a very good point, I somehow tried to make this part as general
as possible and didn't realize that this only has to work on essentially
one SoC for now. I also don't expect Apple to require more than two
DARTs for a single master in the future.

I've tried the fixed structure now and I really like it so far.

> 

> [...]

> >>> +static int apple_dart_remove(struct platform_device *pdev)

> >>> +{

> >>> +	struct apple_dart *dart = platform_get_drvdata(pdev);

> >>> +

> >>> +	devm_free_irq(dart->dev, dart->irq, dart);

> >>> +

> >>> +	iommu_device_unregister(&dart->iommu);

> >>> +	iommu_device_sysfs_remove(&dart->iommu);

> >>> +

> >>> +	clk_bulk_disable(dart->num_clks, dart->clks);

> >>> +	clk_bulk_unprepare(dart->num_clks, dart->clks);

> >>

> >> Ditto.

> >>

> >> And again the bus ops are still installed - that'll get really fun if

> >> this is a module unload...

> > 

> > Ugh, yeah. I'll fix that as well. I'll have to see how to make this work

> > correctly with multiple DART instances. I guess I should only remove the

> > bus ops once the last one is removed. Now that I think about it, this

> > could also get tricky in the cleanup paths of apple_dart_probe.

> > 

> > Maybe just add a module_init that sets up the bus ops when it finds at

> > least one DART node and module_exit to tear them down again?

> 

> Actually by this point it was late and I wasn't thinking as clearly as I 

> could have been, apologies ;)

> 

> I believe a module unload is in fact the *only* time you should expect 

> to see .remove called - you want to set .suppress_bind_attrs in your 

> driver data because there's basically no way to prevent manual unbinding 

> from blowing up - so it should be fine to unconditionally clear the ops 

> at this point (being removed means you must have successfully probed, so 

> any ops must be yours).


Makes sense, thanks!

I'll let my current version simmer for a bit and wait until it's been
tested by a few people and will send it in a week or so then!


Best,

Sven
Alyssa Rosenzweig July 26, 2021, 1:19 p.m. UTC | #8
> I'll let my current version simmer for a bit and wait until it's been

> tested by a few people and will send it in a week or so then!


New version has my T-b :)
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 29e5541c8f21..c1ffaa56b5f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1245,6 +1245,7 @@  M:	Sven Peter <sven@svenpeter.dev>
 L:	iommu@lists.linux-foundation.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/iommu/apple,dart.yaml
+F:	drivers/iommu/apple-dart-iommu.c
 
 APPLE SMC DRIVER
 M:	Henrik Rydberg <rydberg@bitmath.org>
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..87882c628b46 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -249,6 +249,21 @@  config SPAPR_TCE_IOMMU
 	  Enables bits of IOMMU API required by VFIO. The iommu_ops
 	  is not implemented as it is not necessary for VFIO.
 
+config IOMMU_APPLE_DART
+	tristate "Apple DART IOMMU Support"
+	depends on ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
+	select IOMMU_API
+	select IOMMU_IO_PGTABLE
+	select IOMMU_IO_PGTABLE_LPAE
+	default ARCH_APPLE
+	help
+	  Support for Apple DART (Device Address Resolution Table) IOMMUs
+	  found in Apple ARM SoCs like the M1.
+	  This IOMMU is required for most peripherals using DMA to access
+	  the main memory.
+
+	  Say Y here if you are using an Apple SoC with a DART IOMMU.
+
 # ARM IOMMU support
 config ARM_SMMU
 	tristate "ARM Ltd. System MMU (SMMU) Support"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c0fb0ba88143..8c813f0ebc54 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
+obj-$(CONFIG_IOMMU_APPLE_DART) += apple-dart-iommu.o
diff --git a/drivers/iommu/apple-dart-iommu.c b/drivers/iommu/apple-dart-iommu.c
new file mode 100644
index 000000000000..637ba6e7cef9
--- /dev/null
+++ b/drivers/iommu/apple-dart-iommu.c
@@ -0,0 +1,1058 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Apple DART (Device Address Resolution Table) IOMMU driver
+ *
+ * Copyright (C) 2021 The Asahi Linux Contributors
+ *
+ * Based on arm/arm-smmu/arm-ssmu.c and arm/arm-smmu-v3/arm-smmu-v3.c
+ *  Copyright (C) 2013 ARM Limited
+ *  Copyright (C) 2015 ARM Limited
+ * and on exynos-iommu.c
+ *  Copyright (c) 2011,2016 Samsung Electronics Co., Ltd.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/dma-direct.h>
+#include <linux/dma-iommu.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io-pgtable.h>
+#include <linux/iopoll.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_iommu.h>
+#include <linux/of_platform.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+#include <linux/ratelimit.h>
+#include <linux/slab.h>
+#include <linux/pci.h>
+
+#define DART_MAX_STREAMS 16
+#define DART_MAX_TTBR 4
+
+#define DART_STREAM_ALL 0xffff
+
+#define DART_PARAMS1 0x00
+#define DART_PARAMS_PAGE_SHIFT GENMASK(27, 24)
+
+#define DART_PARAMS2 0x04
+#define DART_PARAMS_BYPASS_SUPPORT BIT(0)
+
+#define DART_STREAM_COMMAND 0x20
+#define DART_STREAM_COMMAND_BUSY BIT(2)
+#define DART_STREAM_COMMAND_INVALIDATE BIT(20)
+
+#define DART_STREAM_SELECT 0x34
+
+#define DART_ERROR 0x40
+#define DART_ERROR_STREAM GENMASK(27, 24)
+#define DART_ERROR_CODE GENMASK(23, 0)
+#define DART_ERROR_FLAG BIT(31)
+#define DART_ERROR_READ_FAULT BIT(4)
+#define DART_ERROR_WRITE_FAULT BIT(3)
+#define DART_ERROR_NO_PTE BIT(2)
+#define DART_ERROR_NO_PMD BIT(1)
+#define DART_ERROR_NO_TTBR BIT(0)
+
+#define DART_CONFIG 0x60
+#define DART_CONFIG_LOCK BIT(15)
+
+#define DART_STREAM_COMMAND_BUSY_TIMEOUT 100
+
+#define DART_STREAM_REMAP 0x80
+
+#define DART_ERROR_ADDR_HI 0x54
+#define DART_ERROR_ADDR_LO 0x50
+
+#define DART_TCR(sid) (0x100 + 4 * (sid))
+#define DART_TCR_TRANSLATE_ENABLE BIT(7)
+#define DART_TCR_BYPASS0_ENABLE BIT(8)
+#define DART_TCR_BYPASS1_ENABLE BIT(12)
+
+#define DART_TTBR(sid, idx) (0x200 + 16 * (sid) + 4 * (idx))
+#define DART_TTBR_VALID BIT(31)
+#define DART_TTBR_SHIFT 12
+
+/*
+ * Private structure associated with each DART device.
+ *
+ * @dev: device struct
+ * @regs: mapped MMIO region
+ * @irq: interrupt number, can be shared with other DARTs
+ * @clks: clocks associated with this DART
+ * @num_clks: number of @clks
+ * @lock: lock for @used_sids and hardware operations involving this dart
+ * @used_sids: bitmap of streams attached to a domain
+ * @pgsize: pagesize supported by this DART
+ * @supports_bypass: indicates if this DART supports bypass mode
+ * @force_bypass: force bypass mode due to pagesize mismatch?
+ * @sw_bypass_cpu_start: offset into cpu address space in software bypass mode
+ * @sw_bypass_dma_start: offset into dma address space in software bypass mode
+ * @sw_bypass_len: length of iova space in software bypass mode
+ * @iommu: iommu core device
+ */
+struct apple_dart {
+	struct device *dev;
+
+	void __iomem *regs;
+
+	int irq;
+	struct clk_bulk_data *clks;
+	int num_clks;
+
+	spinlock_t lock;
+
+	u32 used_sids;
+	u32 pgsize;
+
+	u32 supports_bypass : 1;
+	u32 force_bypass : 1;
+
+	u64 sw_bypass_cpu_start;
+	u64 sw_bypass_dma_start;
+	u64 sw_bypass_len;
+
+	struct iommu_device iommu;
+};
+
+/*
+ * This structure is used to identify a single stream attached to a domain.
+ * It's used as a list inside that domain to be able to attach multiple
+ * streams to a single domain. Since multiple devices can use a single stream
+ * it additionally keeps track of how many devices are represented by this
+ * stream. Once that number reaches zero it is detached from the IOMMU domain
+ * and all translations from this stream are disabled.
+ *
+ * @dart: DART instance to which this stream belongs
+ * @sid: stream id within the DART instance
+ * @num_devices: count of devices attached to this stream
+ * @stream_head: list head for the next stream
+ */
+struct apple_dart_stream {
+	struct apple_dart *dart;
+	u32 sid;
+
+	u32 num_devices;
+
+	struct list_head stream_head;
+};
+
+/*
+ * This structure is attached to each iommu domain handled by a DART.
+ * A single domain is used to represent a single virtual address space.
+ * It is always allocated together with a page table.
+ *
+ * Streams are the smallest units the DART hardware can differentiate.
+ * These are pointed to the page table of a domain whenever a device is
+ * attached to it. A single stream can only be assigned to a single domain.
+ *
+ * Devices are assigned to at least a single and sometimes multiple individual
+ * streams (using the iommus property in the device tree). Multiple devices
+ * can theoretically be represented by the same stream, though this is usually
+ * not the case.
+ *
+ * We only keep track of streams here and just count how many devices are
+ * represented by each stream. When the last device is removed the whole stream
+ * is removed from the domain.
+ *
+ * @dart: pointer to the DART instance
+ * @pgtbl_ops: pagetable ops allocated by io-pgtable
+ * @type: domain type IOMMU_DOMAIN_IDENTITY_{IDENTITY,DMA,UNMANAGED,BLOCKED}
+ * @sw_bypass_cpu_start: offset into cpu address space in software bypass mode
+ * @sw_bypass_dma_start: offset into dma address space in software bypass mode
+ * @sw_bypass_len: length of iova space in software bypass mode
+ * @streams: list of streams attached to this domain
+ * @lock: spinlock for operations involving the list of streams
+ * @domain: core iommu domain pointer
+ */
+struct apple_dart_domain {
+	struct apple_dart *dart;
+	struct io_pgtable_ops *pgtbl_ops;
+
+	unsigned int type;
+
+	u64 sw_bypass_cpu_start;
+	u64 sw_bypass_dma_start;
+	u64 sw_bypass_len;
+
+	struct list_head streams;
+
+	spinlock_t lock;
+
+	struct iommu_domain domain;
+};
+
+/*
+ * This structure is attached to devices with dev_iommu_priv_set() on of_xlate
+ * and contains a list of streams bound to this device as defined in the
+ * device tree. Multiple DART instances can be attached to a single device
+ * and each stream is identified by its stream id.
+ * It's usually reference by a pointer called *cfg.
+ *
+ * A dynamic array instead of a linked list is used here since in almost
+ * all cases a device will just be attached to a single stream and streams
+ * are never removed after they have been added.
+ *
+ * @num_streams: number of streams attached
+ * @streams: array of structs to identify attached streams and the device link
+ *           to the iommu
+ */
+struct apple_dart_master_cfg {
+	int num_streams;
+	struct {
+		struct apple_dart *dart;
+		u32 sid;
+
+		struct device_link *link;
+	} streams[];
+};
+
+static struct platform_driver apple_dart_driver;
+static const struct iommu_ops apple_dart_iommu_ops;
+static const struct iommu_flush_ops apple_dart_tlb_ops;
+
+static struct apple_dart_domain *to_dart_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct apple_dart_domain, domain);
+}
+
+static void apple_dart_hw_enable_translation(struct apple_dart *dart, u16 sid)
+{
+	writel(DART_TCR_TRANSLATE_ENABLE, dart->regs + DART_TCR(sid));
+}
+
+static void apple_dart_hw_disable_dma(struct apple_dart *dart, u16 sid)
+{
+	writel(0, dart->regs + DART_TCR(sid));
+}
+
+static void apple_dart_hw_enable_bypass(struct apple_dart *dart, u16 sid)
+{
+	WARN_ON(!dart->supports_bypass);
+	writel(DART_TCR_BYPASS0_ENABLE | DART_TCR_BYPASS1_ENABLE,
+	       dart->regs + DART_TCR(sid));
+}
+
+static void apple_dart_hw_set_ttbr(struct apple_dart *dart, u16 sid, u16 idx,
+				   phys_addr_t paddr)
+{
+	writel(DART_TTBR_VALID | (paddr >> DART_TTBR_SHIFT),
+	       dart->regs + DART_TTBR(sid, idx));
+}
+
+static void apple_dart_hw_clear_ttbr(struct apple_dart *dart, u16 sid, u16 idx)
+{
+	writel(0, dart->regs + DART_TTBR(sid, idx));
+}
+
+static void apple_dart_hw_clear_all_ttbrs(struct apple_dart *dart, u16 sid)
+{
+	int i;
+
+	for (i = 0; i < 4; ++i)
+		apple_dart_hw_clear_ttbr(dart, sid, i);
+}
+
+static int apple_dart_hw_stream_command(struct apple_dart *dart, u16 sid_bitmap,
+					u32 command)
+{
+	unsigned long flags;
+	int ret;
+	u32 command_reg;
+
+	spin_lock_irqsave(&dart->lock, flags);
+
+	writel(sid_bitmap, dart->regs + DART_STREAM_SELECT);
+	writel(command, dart->regs + DART_STREAM_COMMAND);
+
+	ret = readl_poll_timeout_atomic(
+		dart->regs + DART_STREAM_COMMAND, command_reg,
+		!(command_reg & DART_STREAM_COMMAND_BUSY), 1,
+		DART_STREAM_COMMAND_BUSY_TIMEOUT);
+
+	spin_unlock_irqrestore(&dart->lock, flags);
+
+	if (ret) {
+		dev_err(dart->dev,
+			"busy bit did not clear after command %x for streams %x\n",
+			command, sid_bitmap);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int apple_dart_hw_invalidate_tlb_global(struct apple_dart *dart)
+{
+	return apple_dart_hw_stream_command(dart, DART_STREAM_ALL,
+					    DART_STREAM_COMMAND_INVALIDATE);
+}
+
+static int apple_dart_hw_invalidate_tlb_stream(struct apple_dart *dart, u16 sid)
+{
+	return apple_dart_hw_stream_command(dart, 1 << sid,
+					    DART_STREAM_COMMAND_INVALIDATE);
+}
+
+static int apple_dart_hw_reset(struct apple_dart *dart)
+{
+	int sid;
+	u32 config;
+
+	config = readl(dart->regs + DART_CONFIG);
+	if (config & DART_CONFIG_LOCK) {
+		dev_err(dart->dev, "DART is locked down until reboot: %08x\n",
+			config);
+		return -EINVAL;
+	}
+
+	for (sid = 0; sid < DART_MAX_STREAMS; ++sid) {
+		apple_dart_hw_disable_dma(dart, sid);
+		apple_dart_hw_clear_all_ttbrs(dart, sid);
+	}
+
+	/* restore stream identity map */
+	writel(0x03020100, dart->regs + DART_STREAM_REMAP);
+	writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4);
+	writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8);
+	writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12);
+
+	/* clear any pending errors before the interrupt is unmasked */
+	writel(readl(dart->regs + DART_ERROR), dart->regs + DART_ERROR);
+
+	return apple_dart_hw_invalidate_tlb_global(dart);
+}
+
+static void apple_dart_domain_flush_tlb(struct apple_dart_domain *domain)
+{
+	unsigned long flags;
+	struct apple_dart_stream *stream;
+	struct apple_dart *dart = domain->dart;
+
+	if (!dart)
+		return;
+
+	spin_lock_irqsave(&domain->lock, flags);
+	list_for_each_entry(stream, &domain->streams, stream_head) {
+		apple_dart_hw_invalidate_tlb_stream(stream->dart, stream->sid);
+	}
+	spin_unlock_irqrestore(&domain->lock, flags);
+}
+
+static void apple_dart_flush_iotlb_all(struct iommu_domain *domain)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+
+	apple_dart_domain_flush_tlb(dart_domain);
+}
+
+static void apple_dart_iotlb_sync(struct iommu_domain *domain,
+				  struct iommu_iotlb_gather *gather)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+
+	apple_dart_domain_flush_tlb(dart_domain);
+}
+
+static void apple_dart_iotlb_sync_map(struct iommu_domain *domain,
+				      unsigned long iova, size_t size)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+
+	apple_dart_domain_flush_tlb(dart_domain);
+}
+
+static void apple_dart_tlb_flush_all(void *cookie)
+{
+	struct apple_dart_domain *domain = cookie;
+
+	apple_dart_domain_flush_tlb(domain);
+}
+
+static void apple_dart_tlb_flush_walk(unsigned long iova, size_t size,
+				      size_t granule, void *cookie)
+{
+	struct apple_dart_domain *domain = cookie;
+
+	apple_dart_domain_flush_tlb(domain);
+}
+
+static const struct iommu_flush_ops apple_dart_tlb_ops = {
+	.tlb_flush_all = apple_dart_tlb_flush_all,
+	.tlb_flush_walk = apple_dart_tlb_flush_walk,
+	.tlb_add_page = NULL,
+};
+
+static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,
+					   dma_addr_t iova)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
+
+	if (domain->type == IOMMU_DOMAIN_IDENTITY &&
+	    dart_domain->dart->supports_bypass)
+		return iova;
+	if (!ops)
+		return -ENODEV;
+
+	return ops->iova_to_phys(ops, iova);
+}
+
+static int apple_dart_map(struct iommu_domain *domain, unsigned long iova,
+			  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
+
+	if (!ops)
+		return -ENODEV;
+	if (prot & IOMMU_MMIO)
+		return -EINVAL;
+	if (prot & IOMMU_NOEXEC)
+		return -EINVAL;
+
+	return ops->map(ops, iova, paddr, size, prot, gfp);
+}
+
+static size_t apple_dart_unmap(struct iommu_domain *domain, unsigned long iova,
+			       size_t size, struct iommu_iotlb_gather *gather)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+	struct io_pgtable_ops *ops = dart_domain->pgtbl_ops;
+
+	if (!ops)
+		return 0;
+
+	return ops->unmap(ops, iova, size, gather);
+}
+
+static int apple_dart_prepare_sw_bypass(struct apple_dart *dart,
+					struct apple_dart_domain *dart_domain,
+					struct device *dev)
+{
+	lockdep_assert_held(&dart_domain->lock);
+
+	if (dart->supports_bypass)
+		return 0;
+	if (dart_domain->type != IOMMU_DOMAIN_IDENTITY)
+		return 0;
+
+	// use the bus region from the first attached dev for the bypass range
+	if (!dart->sw_bypass_len) {
+		const struct bus_dma_region *dma_rgn = dev->dma_range_map;
+
+		if (!dma_rgn)
+			return -EINVAL;
+
+		dart->sw_bypass_len = dma_rgn->size;
+		dart->sw_bypass_cpu_start = dma_rgn->cpu_start;
+		dart->sw_bypass_dma_start = dma_rgn->dma_start;
+	}
+
+	// ensure that we don't mix different bypass setups
+	if (dart_domain->sw_bypass_len) {
+		if (dart->sw_bypass_len != dart_domain->sw_bypass_len)
+			return -EINVAL;
+		if (dart->sw_bypass_cpu_start !=
+		    dart_domain->sw_bypass_cpu_start)
+			return -EINVAL;
+		if (dart->sw_bypass_dma_start !=
+		    dart_domain->sw_bypass_dma_start)
+			return -EINVAL;
+	} else {
+		dart_domain->sw_bypass_len = dart->sw_bypass_len;
+		dart_domain->sw_bypass_cpu_start = dart->sw_bypass_cpu_start;
+		dart_domain->sw_bypass_dma_start = dart->sw_bypass_dma_start;
+	}
+
+	return 0;
+}
+
+static int apple_dart_domain_needs_pgtbl_ops(struct apple_dart *dart,
+					     struct iommu_domain *domain)
+{
+	if (domain->type == IOMMU_DOMAIN_DMA)
+		return 1;
+	if (domain->type == IOMMU_DOMAIN_UNMANAGED)
+		return 1;
+	if (!dart->supports_bypass && domain->type == IOMMU_DOMAIN_IDENTITY)
+		return 1;
+	return 0;
+}
+
+static int apple_dart_finalize_domain(struct iommu_domain *domain)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+	struct apple_dart *dart = dart_domain->dart;
+	struct io_pgtable_cfg pgtbl_cfg;
+
+	lockdep_assert_held(&dart_domain->lock);
+
+	if (dart_domain->pgtbl_ops)
+		return 0;
+	if (!apple_dart_domain_needs_pgtbl_ops(dart, domain))
+		return 0;
+
+	pgtbl_cfg = (struct io_pgtable_cfg){
+		.pgsize_bitmap = dart->pgsize,
+		.ias = 32,
+		.oas = 36,
+		.coherent_walk = 1,
+		.tlb = &apple_dart_tlb_ops,
+		.iommu_dev = dart->dev,
+	};
+
+	dart_domain->pgtbl_ops =
+		alloc_io_pgtable_ops(ARM_APPLE_DART, &pgtbl_cfg, domain);
+	if (!dart_domain->pgtbl_ops)
+		return -ENOMEM;
+
+	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+	domain->geometry.aperture_start = 0;
+	domain->geometry.aperture_end = DMA_BIT_MASK(32);
+	domain->geometry.force_aperture = true;
+
+	/*
+	 * Some DARTs come without hardware bypass support but we may still
+	 * be forced to use bypass mode (to e.g. allow kernels with 4K pages to
+	 * boot). If we reach this point with an identity domain we have to setup
+	 * bypass mode in software. This is done by creating a static pagetable
+	 * for a linear map specified by dma-ranges in the device tree.
+	 */
+	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+		u64 offset;
+		int ret;
+
+		for (offset = 0; offset < dart_domain->sw_bypass_len;
+		     offset += dart->pgsize) {
+			ret = dart_domain->pgtbl_ops->map(
+				dart_domain->pgtbl_ops,
+				dart_domain->sw_bypass_dma_start + offset,
+				dart_domain->sw_bypass_cpu_start + offset,
+				dart->pgsize, IOMMU_READ | IOMMU_WRITE,
+				GFP_ATOMIC);
+			if (ret < 0) {
+				free_io_pgtable_ops(dart_domain->pgtbl_ops);
+				dart_domain->pgtbl_ops = NULL;
+				return -EINVAL;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static void
+apple_dart_stream_setup_translation(struct apple_dart_domain *domain,
+				    struct apple_dart *dart, u32 sid)
+{
+	int i;
+	struct io_pgtable_cfg *pgtbl_cfg =
+		&io_pgtable_ops_to_pgtable(domain->pgtbl_ops)->cfg;
+
+	for (i = 0; i < pgtbl_cfg->apple_dart_cfg.n_ttbrs; ++i)
+		apple_dart_hw_set_ttbr(dart, sid, i,
+				       pgtbl_cfg->apple_dart_cfg.ttbr[i]);
+	for (; i < DART_MAX_TTBR; ++i)
+		apple_dart_hw_clear_ttbr(dart, sid, i);
+
+	apple_dart_hw_enable_translation(dart, sid);
+	apple_dart_hw_invalidate_tlb_stream(dart, sid);
+}
+
+static int apple_dart_attach_stream(struct apple_dart_domain *domain,
+				    struct apple_dart *dart, u32 sid)
+{
+	unsigned long flags;
+	struct apple_dart_stream *stream;
+	int ret;
+
+	lockdep_assert_held(&domain->lock);
+
+	if (WARN_ON(dart->force_bypass &&
+		    domain->type != IOMMU_DOMAIN_IDENTITY))
+		return -EINVAL;
+
+	/*
+	 * we can't mix and match DARTs that support bypass mode with those who don't
+	 * because the iova space in fake bypass mode generally has an offset
+	 */
+	if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY &&
+		    (domain->dart->supports_bypass != dart->supports_bypass)))
+		return -EINVAL;
+
+	list_for_each_entry(stream, &domain->streams, stream_head) {
+		if (stream->dart == dart && stream->sid == sid) {
+			stream->num_devices++;
+			return 0;
+		}
+	}
+
+	spin_lock_irqsave(&dart->lock, flags);
+
+	if (WARN_ON(dart->used_sids & BIT(sid))) {
+		ret = -EINVAL;
+		goto error;
+	}
+
+	stream = kzalloc(sizeof(*stream), GFP_ATOMIC);
+	if (!stream) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	stream->dart = dart;
+	stream->sid = sid;
+	stream->num_devices = 1;
+	list_add(&stream->stream_head, &domain->streams);
+
+	dart->used_sids |= BIT(sid);
+	spin_unlock_irqrestore(&dart->lock, flags);
+
+	apple_dart_hw_clear_all_ttbrs(stream->dart, stream->sid);
+
+	switch (domain->type) {
+	case IOMMU_DOMAIN_IDENTITY:
+		if (stream->dart->supports_bypass)
+			apple_dart_hw_enable_bypass(stream->dart, stream->sid);
+		else
+			apple_dart_stream_setup_translation(
+				domain, stream->dart, stream->sid);
+		break;
+	case IOMMU_DOMAIN_BLOCKED:
+		apple_dart_hw_disable_dma(stream->dart, stream->sid);
+		break;
+	case IOMMU_DOMAIN_UNMANAGED:
+	case IOMMU_DOMAIN_DMA:
+		apple_dart_stream_setup_translation(domain, stream->dart,
+						    stream->sid);
+		break;
+	}
+
+	return 0;
+
+error:
+	spin_unlock_irqrestore(&dart->lock, flags);
+	return ret;
+}
+
+static void apple_dart_disable_stream(struct apple_dart *dart, u32 sid)
+{
+	unsigned long flags;
+
+	apple_dart_hw_disable_dma(dart, sid);
+	apple_dart_hw_clear_all_ttbrs(dart, sid);
+	apple_dart_hw_invalidate_tlb_stream(dart, sid);
+
+	spin_lock_irqsave(&dart->lock, flags);
+	dart->used_sids &= ~BIT(sid);
+	spin_unlock_irqrestore(&dart->lock, flags);
+}
+
+static void apple_dart_detach_stream(struct apple_dart_domain *domain,
+				     struct apple_dart *dart, u32 sid)
+{
+	struct apple_dart_stream *stream;
+
+	lockdep_assert_held(&domain->lock);
+
+	list_for_each_entry(stream, &domain->streams, stream_head) {
+		if (stream->dart == dart && stream->sid == sid) {
+			stream->num_devices--;
+
+			if (stream->num_devices == 0) {
+				apple_dart_disable_stream(dart, sid);
+				list_del(&stream->stream_head);
+				kfree(stream);
+			}
+			return;
+		}
+	}
+}
+
+static int apple_dart_attach_dev(struct iommu_domain *domain,
+				 struct device *dev)
+{
+	int ret;
+	int i, j;
+	unsigned long flags;
+	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+	struct apple_dart *dart = cfg->streams[0].dart;
+
+	if (WARN_ON(dart->force_bypass &&
+		    dart_domain->type != IOMMU_DOMAIN_IDENTITY)) {
+		dev_warn(
+			dev,
+			"IOMMU must be in bypass mode but trying to attach to translated domain.\n");
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&dart_domain->lock, flags);
+
+	ret = apple_dart_prepare_sw_bypass(dart, dart_domain, dev);
+	if (ret)
+		goto out;
+
+	if (!dart_domain->dart)
+		dart_domain->dart = dart;
+
+	ret = apple_dart_finalize_domain(domain);
+	if (ret)
+		goto out;
+
+	for (i = 0; i < cfg->num_streams; ++i) {
+		ret = apple_dart_attach_stream(
+			dart_domain, cfg->streams[i].dart, cfg->streams[i].sid);
+		if (ret) {
+			/* try to undo what we did before returning */
+			for (j = 0; j < i; ++j)
+				apple_dart_detach_stream(dart_domain,
+							 cfg->streams[j].dart,
+							 cfg->streams[j].sid);
+
+			goto out;
+		}
+	}
+
+	ret = 0;
+
+out:
+	spin_unlock_irqrestore(&dart_domain->lock, flags);
+	return ret;
+}
+
+static void apple_dart_detach_dev(struct iommu_domain *domain,
+				  struct device *dev)
+{
+	int i;
+	unsigned long flags;
+	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+
+	spin_lock_irqsave(&dart_domain->lock, flags);
+
+	for (i = 0; i < cfg->num_streams; ++i)
+		apple_dart_detach_stream(dart_domain, cfg->streams[i].dart,
+					 cfg->streams[i].sid);
+
+	spin_unlock_irqrestore(&dart_domain->lock, flags);
+}
+
+static struct iommu_device *apple_dart_probe_device(struct device *dev)
+{
+	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	int i;
+
+	if (!cfg)
+		return ERR_PTR(-ENODEV);
+
+	for (i = 0; i < cfg->num_streams; ++i) {
+		cfg->streams[i].link =
+			device_link_add(dev, cfg->streams[i].dart->dev,
+					DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);
+	}
+
+	return &cfg->streams[0].dart->iommu;
+}
+
+static void apple_dart_release_device(struct device *dev)
+{
+	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	int i;
+
+	if (!cfg)
+		return;
+
+	for (i = 0; i < cfg->num_streams; ++i)
+		device_link_del(cfg->streams[i].link);
+
+	dev_iommu_priv_set(dev, NULL);
+	kfree(cfg);
+}
+
+static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
+{
+	struct apple_dart_domain *dart_domain;
+
+	if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED &&
+	    type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	dart_domain = kzalloc(sizeof(*dart_domain), GFP_KERNEL);
+	if (!dart_domain)
+		return NULL;
+
+	INIT_LIST_HEAD(&dart_domain->streams);
+	spin_lock_init(&dart_domain->lock);
+	iommu_get_dma_cookie(&dart_domain->domain);
+	dart_domain->type = type;
+
+	return &dart_domain->domain;
+}
+
+static void apple_dart_domain_free(struct iommu_domain *domain)
+{
+	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
+
+	WARN_ON(!list_empty(&dart_domain->streams));
+
+	kfree(dart_domain);
+}
+
+static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
+	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	unsigned int num_streams = cfg ? cfg->num_streams : 0;
+	struct apple_dart_master_cfg *cfg_new;
+	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);
+
+	if (args->args_count != 1)
+		return -EINVAL;
+
+	cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1),
+			   GFP_KERNEL);
+	if (!cfg_new)
+		return -ENOMEM;
+
+	cfg = cfg_new;
+	dev_iommu_priv_set(dev, cfg);
+
+	cfg->num_streams = num_streams;
+	cfg->streams[cfg->num_streams].dart = dart;
+	cfg->streams[cfg->num_streams].sid = args->args[0];
+	cfg->num_streams++;
+
+	return 0;
+}
+
+static struct iommu_group *apple_dart_device_group(struct device *dev)
+{
+#ifdef CONFIG_PCI
+	struct iommu_group *group;
+
+	if (dev_is_pci(dev))
+		group = pci_device_group(dev);
+	else
+		group = generic_device_group(dev);
+
+	return group;
+#else
+	return generic_device_group(dev);
+#endif
+}
+
+static int apple_dart_def_domain_type(struct device *dev)
+{
+	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart *dart = cfg->streams[0].dart;
+
+	if (dart->force_bypass)
+		return IOMMU_DOMAIN_IDENTITY;
+	if (!dart->supports_bypass)
+		return IOMMU_DOMAIN_DMA;
+
+	return 0;
+}
+
+static const struct iommu_ops apple_dart_iommu_ops = {
+	.domain_alloc = apple_dart_domain_alloc,
+	.domain_free = apple_dart_domain_free,
+	.attach_dev = apple_dart_attach_dev,
+	.detach_dev = apple_dart_detach_dev,
+	.map = apple_dart_map,
+	.unmap = apple_dart_unmap,
+	.flush_iotlb_all = apple_dart_flush_iotlb_all,
+	.iotlb_sync = apple_dart_iotlb_sync,
+	.iotlb_sync_map = apple_dart_iotlb_sync_map,
+	.iova_to_phys = apple_dart_iova_to_phys,
+	.probe_device = apple_dart_probe_device,
+	.release_device = apple_dart_release_device,
+	.device_group = apple_dart_device_group,
+	.of_xlate = apple_dart_of_xlate,
+	.def_domain_type = apple_dart_def_domain_type,
+	.pgsize_bitmap = -1UL, /* Restricted during dart probe */
+};
+
+static irqreturn_t apple_dart_irq(int irq, void *dev)
+{
+	struct apple_dart *dart = dev;
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	const char *fault_name = NULL;
+	u32 error = readl(dart->regs + DART_ERROR);
+	u32 error_code = FIELD_GET(DART_ERROR_CODE, error);
+	u32 addr_lo = readl(dart->regs + DART_ERROR_ADDR_LO);
+	u32 addr_hi = readl(dart->regs + DART_ERROR_ADDR_HI);
+	u64 addr = addr_lo | (((u64)addr_hi) << 32);
+	u8 stream_idx = FIELD_GET(DART_ERROR_STREAM, error);
+
+	if (!(error & DART_ERROR_FLAG))
+		return IRQ_NONE;
+
+	if (error_code & DART_ERROR_READ_FAULT)
+		fault_name = "READ FAULT";
+	else if (error_code & DART_ERROR_WRITE_FAULT)
+		fault_name = "WRITE FAULT";
+	else if (error_code & DART_ERROR_NO_PTE)
+		fault_name = "NO PTE FOR IOVA";
+	else if (error_code & DART_ERROR_NO_PMD)
+		fault_name = "NO PMD FOR IOVA";
+	else if (error_code & DART_ERROR_NO_TTBR)
+		fault_name = "NO TTBR FOR IOVA";
+
+	if (WARN_ON(fault_name == NULL))
+		fault_name = "unknown";
+
+	if (__ratelimit(&rs)) {
+		dev_err(dart->dev,
+			"translation fault: status:0x%x stream:%d code:0x%x (%s) at 0x%llx",
+			error, stream_idx, error_code, fault_name, addr);
+	}
+
+	writel(error, dart->regs + DART_ERROR);
+	return IRQ_HANDLED;
+}
+
+static int apple_dart_probe(struct platform_device *pdev)
+{
+	int ret;
+	u32 dart_params[2];
+	struct resource *res;
+	struct apple_dart *dart;
+	struct device *dev = &pdev->dev;
+
+	dart = devm_kzalloc(dev, sizeof(*dart), GFP_KERNEL);
+	if (!dart)
+		return -ENOMEM;
+
+	dart->dev = dev;
+	spin_lock_init(&dart->lock);
+
+	if (pdev->num_resources < 1)
+		return -ENODEV;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (resource_size(res) < 0x4000) {
+		dev_err(dev, "MMIO region too small (%pr)\n", res);
+		return -EINVAL;
+	}
+
+	dart->regs = devm_ioremap_resource(dev, res);
+	if (IS_ERR(dart->regs))
+		return PTR_ERR(dart->regs);
+
+	ret = devm_clk_bulk_get_all(dev, &dart->clks);
+	if (ret < 0)
+		return ret;
+	dart->num_clks = ret;
+
+	ret = clk_bulk_prepare_enable(dart->num_clks, dart->clks);
+	if (ret)
+		return ret;
+
+	ret = apple_dart_hw_reset(dart);
+	if (ret)
+		goto err_clk_disable;
+
+	dart_params[0] = readl(dart->regs + DART_PARAMS1);
+	dart_params[1] = readl(dart->regs + DART_PARAMS2);
+	dart->pgsize = 1 << FIELD_GET(DART_PARAMS_PAGE_SHIFT, dart_params[0]);
+	dart->supports_bypass = dart_params[1] & DART_PARAMS_BYPASS_SUPPORT;
+	dart->force_bypass = dart->pgsize > PAGE_SIZE;
+
+	dart->irq = platform_get_irq(pdev, 0);
+	if (dart->irq < 0) {
+		ret = -ENODEV;
+		goto err_clk_disable;
+	}
+
+	ret = devm_request_irq(dart->dev, dart->irq, apple_dart_irq,
+			       IRQF_SHARED, "apple-dart fault handler", dart);
+	if (ret)
+		goto err_clk_disable;
+
+	platform_set_drvdata(pdev, dart);
+
+	ret = iommu_device_sysfs_add(&dart->iommu, dev, NULL, "apple-dart.%s",
+				     dev_name(&pdev->dev));
+	if (ret)
+		goto err_clk_disable;
+
+	ret = iommu_device_register(&dart->iommu, &apple_dart_iommu_ops, dev);
+	if (ret)
+		goto err_clk_disable;
+
+	if (dev->bus->iommu_ops != &apple_dart_iommu_ops) {
+		ret = bus_set_iommu(dev->bus, &apple_dart_iommu_ops);
+		if (ret)
+			goto err_clk_disable;
+	}
+#ifdef CONFIG_PCI
+	if (dev->bus->iommu_ops != pci_bus_type.iommu_ops) {
+		ret = bus_set_iommu(&pci_bus_type, &apple_dart_iommu_ops);
+		if (ret)
+			goto err_clk_disable;
+	}
+#endif
+
+	dev_info(
+		&pdev->dev,
+		"DART [pagesize %x, bypass support: %d, bypass forced: %d] initialized\n",
+		dart->pgsize, dart->supports_bypass, dart->force_bypass);
+	return 0;
+
+err_clk_disable:
+	clk_bulk_disable(dart->num_clks, dart->clks);
+	clk_bulk_unprepare(dart->num_clks, dart->clks);
+
+	return ret;
+}
+
+static int apple_dart_remove(struct platform_device *pdev)
+{
+	struct apple_dart *dart = platform_get_drvdata(pdev);
+
+	devm_free_irq(dart->dev, dart->irq, dart);
+
+	iommu_device_unregister(&dart->iommu);
+	iommu_device_sysfs_remove(&dart->iommu);
+
+	clk_bulk_disable(dart->num_clks, dart->clks);
+	clk_bulk_unprepare(dart->num_clks, dart->clks);
+
+	return 0;
+}
+
+static void apple_dart_shutdown(struct platform_device *pdev)
+{
+	apple_dart_remove(pdev);
+}
+
+static const struct of_device_id apple_dart_of_match[] = {
+	{ .compatible = "apple,t8103-dart", .data = NULL },
+	{},
+};
+MODULE_DEVICE_TABLE(of, apple_dart_of_match);
+
+static struct platform_driver apple_dart_driver = {
+	.driver	= {
+		.name			= "apple-dart",
+		.of_match_table		= apple_dart_of_match,
+	},
+	.probe	= apple_dart_probe,
+	.remove	= apple_dart_remove,
+	.shutdown = apple_dart_shutdown,
+};
+module_platform_driver(apple_dart_driver);
+
+MODULE_DESCRIPTION("IOMMU API for Apple's DART");
+MODULE_AUTHOR("Sven Peter <sven@svenpeter.dev>");
+MODULE_LICENSE("GPL v2");