diff mbox

For the problem when using swiotlb

Message ID 20141119112910.GD7156@e104818-lin.cambridge.arm.com
State New
Headers show

Commit Message

Catalin Marinas Nov. 19, 2014, 11:29 a.m. UTC
On Wed, Nov 19, 2014 at 08:45:43AM +0000, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 11:17:15 Ding Tianhong wrote:
> > On 2014/11/18 2:09, Catalin Marinas wrote:
> > > On Mon, Nov 17, 2014 at 12:18:42PM +0000, Arnd Bergmann wrote:
> > >> On Monday 17 November 2014 19:56:27 Ding Tianhong wrote:
> > >>>         The commit 3690951fc6d42f3a0903987677d0e592c49dd8db(arm64: Use swiotlb late initialisation)
> > >>> switches the DMA mapping code to swiotlb_tlb_late_init_with_default_size(), this will occur a problem
> > >>> when I run the scsi stress tests, the message as below:
> > >>>
> > >>>         sas_controller b1000000.sas: swiotlb buffer is full (sz: 65536 bytes)..
> > >>>         DMA: Out of SW-IOMMU space for 65536 bytes at device b1000000.sas
> > >>>
> > >>> The reason is that the swiotlb_tlb_late_init_with_default_size() could only alloc 16M memory for DMA-mapping,
> > >>> and the param in cmdline "swiotlb=xxx" is useless because the get_free_pages() only use the buddy to assigned a
> > >>> maximum memory of 16M(The MAX_ORDER is 13 for 4k pages), obviously 16M is too small in many scenes, but
> > >>> the swiotlb_init() which could reserved a bigger memory as wished could work well for most drivers.
> > >>>
> > >>> I could not get a better way to fix this problem except to revert this patch, so could you please give me some
> > >>> advise and help me, thanks very much.
> > >>
> > >> In general, you should not need to use swiotlb for most devices, in
> > >> particular for high-performance devices like network or block.
> > >>
> > >> Please make sure that you have set up the dma-ranges properties in
> > >> your DT properly to allow 64-bit DMA if the device supports it.
> > > 
> > > That's the problem indeed, the DMA API ends up using swiotlb bounce
> > > buffers because the physical address of the pages passed to (or
> > > allocated by) the driver are beyond 32-bit limit (which is the default
> > > dma mask).
> > > 
> > 
> > Thanks everyone, I think I found the way to fix it, need to enable DMA_CMA, to reserve a big memory
> > for CMA and set coherent mask for dev, then dma_alloc and dma_mapping will not use the swiotlb until
> > the memory out of mask or swiotlb_force is enabled.
> > 
> > If I still understand uncorrectly, please inform me.
> 
> Please do not use CMA to work around the problem, but fix the underlying bug
> instead.

Agree.

> The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> dma mask, and check whether that succeeded. However, the code implementing
> dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> the dma-ranges property (see of_dma_configure()), and check if the mask
> is possible.

dma_set_mask_and_coherent() is a generic function. I think the
of_dma_configure() should start with a coherent_dma_mask based on
dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
comment in of_dma_configure() says that devices should set up the
supported mask but it's not always up to them but the bus they are
connected to.

Something like below, untested:

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Arnd Bergmann Nov. 19, 2014, 12:48 p.m. UTC | #1
On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
> > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > dma mask, and check whether that succeeded. However, the code implementing
> > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > the dma-ranges property (see of_dma_configure()), and check if the mask
> > is possible.
> 
> dma_set_mask_and_coherent() is a generic function. I think the
> of_dma_configure() should start with a coherent_dma_mask based on
> dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
> comment in of_dma_configure() says that devices should set up the
> supported mask but it's not always up to them but the bus they are
> connected to.
> 
> Something like below, untested:
> 
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 3b64d0bf5bba..dff34883db45 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
>         /* DMA ranges found. Calculate and set dma_pfn_offset */
>         dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
>         dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> +
> +       /* Set the coherent_dma_mask based on the dma-ranges property */
> +       if (size)
> +               dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
>  }
>  
> 

We have discussed this in the past, and the problem with this is
that the actual mask depends both on the capabilities of the
device and the bus. In particular, if the device can only do
32-bit DMA, we must not set the mask to something higher.

The normal rule here is that a driver that wants to do 64-bit
DMA must call dma_set_mask_and_coherent() with the higher mask,
while a device that can not access all of the 32-bit address space
must call dma_set_mask_and_coherent() with the smaller mask
before doing calling any of the other DMA interfaces.

However, if the bus is not capable of addressing the entire
32-bit range (as on some modern shmobile machines, or some of the
really old machines), we need to limit the mask here already.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Catalin Marinas Nov. 19, 2014, 3:46 p.m. UTC | #2
On Wed, Nov 19, 2014 at 12:48:58PM +0000, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
> > > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > > dma mask, and check whether that succeeded. However, the code implementing
> > > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > > the dma-ranges property (see of_dma_configure()), and check if the mask
> > > is possible.
> > 
> > dma_set_mask_and_coherent() is a generic function. I think the
> > of_dma_configure() should start with a coherent_dma_mask based on
> > dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
> > comment in of_dma_configure() says that devices should set up the
> > supported mask but it's not always up to them but the bus they are
> > connected to.
> > 
> > Something like below, untested:
> > 
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 3b64d0bf5bba..dff34883db45 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> >         /* DMA ranges found. Calculate and set dma_pfn_offset */
> >         dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> >         dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> > +
> > +       /* Set the coherent_dma_mask based on the dma-ranges property */
> > +       if (size)
> > +               dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> >  }
> 
> We have discussed this in the past, and the problem with this is
> that the actual mask depends both on the capabilities of the
> device and the bus. In particular, if the device can only do
> 32-bit DMA, we must not set the mask to something higher.

So is the dma-ranges property allowed to specify a size bigger than what
the device supports?

> The normal rule here is that a driver that wants to do 64-bit
> DMA must call dma_set_mask_and_coherent() with the higher mask,
> while a device that can not access all of the 32-bit address space
> must call dma_set_mask_and_coherent() with the smaller mask
> before doing calling any of the other DMA interfaces.

OK, looking at the DMA API docs, it looks like the default mask is
32-bit and any other value should be explicitly set by the driver.

What we don't have on arm64 yet is taking dma_pfn_offset into account
when generating the dma address (but so far I haven't seen any request
for this from hardware vendors; it can easily be fixed). So if that's
not the case for Ding, I'm not sure dma-ranges property would help.

> However, if the bus is not capable of addressing the entire
> 32-bit range (as on some modern shmobile machines, or some of the
> really old machines), we need to limit the mask here already.

Would the limiting be based on the dma-ranges size property? Such
information is not easily available after of_dma_configure(), maybe we
could store it somewhere in struct device.

Going back to original topic, the dma_supported() function on arm64
calls swiotlb_dma_supported() which actually checks whether the swiotlb
bounce buffer is within the dma mask. This transparent bouncing (unlike
arm32 where it needs to be explicit) is not always optimal, though
required for 32-bit only devices on a 64-bit system. The problem is when
the driver is 64-bit capable but forgets to call
dma_set_mask_and_coherent() (that's not the only question I got about
running out of swiotlb buffers).
Arnd Bergmann Nov. 19, 2014, 3:56 p.m. UTC | #3
On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> On Wed, Nov 19, 2014 at 12:48:58PM +0000, Arnd Bergmann wrote:
> > On Wednesday 19 November 2014 11:29:10 Catalin Marinas wrote:
> > > > The driver should call 'dma_set_mask_and_coherent()' with the appropriate
> > > > dma mask, and check whether that succeeded. However, the code implementing
> > > > dma_set_mask_and_coherent on arm64 also needs to be changed to look up
> > > > the dma-ranges property (see of_dma_configure()), and check if the mask
> > > > is possible.
> > > 
> > > dma_set_mask_and_coherent() is a generic function. I think the
> > > of_dma_configure() should start with a coherent_dma_mask based on
> > > dma-ranges if given rather than defaulting to DMA_BIT_MASK(32). The
> > > comment in of_dma_configure() says that devices should set up the
> > > supported mask but it's not always up to them but the bus they are
> > > connected to.
> > > 
> > > Something like below, untested:
> > > 
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index 3b64d0bf5bba..dff34883db45 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -200,6 +200,10 @@ static void of_dma_configure(struct device *dev)
> > >         /* DMA ranges found. Calculate and set dma_pfn_offset */
> > >         dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
> > >         dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
> > > +
> > > +       /* Set the coherent_dma_mask based on the dma-ranges property */
> > > +       if (size)
> > > +               dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
> > >  }
> > 
> > We have discussed this in the past, and the problem with this is
> > that the actual mask depends both on the capabilities of the
> > device and the bus. In particular, if the device can only do
> > 32-bit DMA, we must not set the mask to something higher.
> 
> So is the dma-ranges property allowed to specify a size bigger than what
> the device supports?

dma-ranges is a property that describes the bus, but you can have
devices with different capabilities on the same bus. In particular
on PCI, you will have 32-bit and 64-bit masters on the same host
controller.

> > The normal rule here is that a driver that wants to do 64-bit
> > DMA must call dma_set_mask_and_coherent() with the higher mask,
> > while a device that can not access all of the 32-bit address space
> > must call dma_set_mask_and_coherent() with the smaller mask
> > before doing calling any of the other DMA interfaces.
> 
> OK, looking at the DMA API docs, it looks like the default mask is
> 32-bit and any other value should be explicitly set by the driver.
> 
> What we don't have on arm64 yet is taking dma_pfn_offset into account
> when generating the dma address (but so far I haven't seen any request
> for this from hardware vendors; it can easily be fixed). So if that's
> not the case for Ding, I'm not sure dma-ranges property would help.

You can ignore dma_pfn_offset for now, until someone needs it. What
we definitely need is the size of the range.

> > However, if the bus is not capable of addressing the entire
> > 32-bit range (as on some modern shmobile machines, or some of the
> > really old machines), we need to limit the mask here already.
> 
> Would the limiting be based on the dma-ranges size property?

Yes.

> Such information is not easily available after of_dma_configure(),
> maybe we could store it somewhere in struct device.

I don't think it's necessary. Setting the dma mask is a rare operation,
and we only need to check again if the new mask is larger than the
old one.

> Going back to original topic, the dma_supported() function on arm64
> calls swiotlb_dma_supported() which actually checks whether the swiotlb
> bounce buffer is within the dma mask. This transparent bouncing (unlike
> arm32 where it needs to be explicit) is not always optimal, though
> required for 32-bit only devices on a 64-bit system. The problem is when
> the driver is 64-bit capable but forgets to call
> dma_set_mask_and_coherent() (that's not the only question I got about
> running out of swiotlb buffers).

I think it would be nice to warn once per device that starts using the
swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
attached.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Catalin Marinas Nov. 21, 2014, 11:06 a.m. UTC | #4
On Wed, Nov 19, 2014 at 03:56:42PM +0000, Arnd Bergmann wrote:
> On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > Going back to original topic, the dma_supported() function on arm64
> > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > arm32 where it needs to be explicit) is not always optimal, though
> > required for 32-bit only devices on a 64-bit system. The problem is when
> > the driver is 64-bit capable but forgets to call
> > dma_set_mask_and_coherent() (that's not the only question I got about
> > running out of swiotlb buffers).
> 
> I think it would be nice to warn once per device that starts using the
> swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> attached.

It would be nice to have a dev_warn_once().

I think it makes sense on arm64 to avoid swiotlb bounce buffers for
coherent allocations altogether. The __dma_alloc_coherent() function
already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
have a device that cannot even cope with a 32-bit ZONE_DMA, we should
just not support DMA at all on it (without an IOMMU). The arm32
__dma_supported() has a similar check.

Swiotlb is still required for the streaming DMA since we get bouncing
for pages allocated outside the driver control (e.g. VFS layer which
doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.

Ding seems to imply that CMA fixes the problem, which means that the
issue is indeed coherent allocations.
Arnd Bergmann Nov. 21, 2014, 11:26 a.m. UTC | #5
On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
> On Wed, Nov 19, 2014 at 03:56:42PM +0000, Arnd Bergmann wrote:
> > On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > > Going back to original topic, the dma_supported() function on arm64
> > > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > > arm32 where it needs to be explicit) is not always optimal, though
> > > required for 32-bit only devices on a 64-bit system. The problem is when
> > > the driver is 64-bit capable but forgets to call
> > > dma_set_mask_and_coherent() (that's not the only question I got about
> > > running out of swiotlb buffers).
> > 
> > I think it would be nice to warn once per device that starts using the
> > swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> > attached.
> 
> It would be nice to have a dev_warn_once().
> 
> I think it makes sense on arm64 to avoid swiotlb bounce buffers for
> coherent allocations altogether. The __dma_alloc_coherent() function
> already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
> have a device that cannot even cope with a 32-bit ZONE_DMA, we should
> just not support DMA at all on it (without an IOMMU). The arm32
> __dma_supported() has a similar check.

If we ever encounter this case, we may have to add a smaller ZONE_DMA
and use ZONE_DMA32 for the normal dma allocations.

> Swiotlb is still required for the streaming DMA since we get bouncing
> for pages allocated outside the driver control (e.g. VFS layer which
> doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.
> 
> Ding seems to imply that CMA fixes the problem, which means that the
> issue is indeed coherent allocations.

I wonder what's going on here, since swiotlb_alloc_coherent() actually
tries a regular __get_free_pages(flags, order) first, and when ZONE_DMA
is set here, it just work without using the pool.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Catalin Marinas Nov. 21, 2014, 11:36 a.m. UTC | #6
On Fri, Nov 21, 2014 at 11:26:45AM +0000, Arnd Bergmann wrote:
> On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
> > On Wed, Nov 19, 2014 at 03:56:42PM +0000, Arnd Bergmann wrote:
> > > On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > > > Going back to original topic, the dma_supported() function on arm64
> > > > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > > > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > > > arm32 where it needs to be explicit) is not always optimal, though
> > > > required for 32-bit only devices on a 64-bit system. The problem is when
> > > > the driver is 64-bit capable but forgets to call
> > > > dma_set_mask_and_coherent() (that's not the only question I got about
> > > > running out of swiotlb buffers).
> > > 
> > > I think it would be nice to warn once per device that starts using the
> > > swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> > > attached.
> > 
> > It would be nice to have a dev_warn_once().
> > 
> > I think it makes sense on arm64 to avoid swiotlb bounce buffers for
> > coherent allocations altogether. The __dma_alloc_coherent() function
> > already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
> > have a device that cannot even cope with a 32-bit ZONE_DMA, we should
> > just not support DMA at all on it (without an IOMMU). The arm32
> > __dma_supported() has a similar check.
> 
> If we ever encounter this case, we may have to add a smaller ZONE_DMA
> and use ZONE_DMA32 for the normal dma allocations.

Traditionally on x86 I think ZONE_DMA was for ISA and ZONE_DMA32 had to
cover the 32-bit physical address space. On arm64 we don't expect ISA,
so we only use ZONE_DMA (which is 4G, similar to IA-64, sparc). We had
ZONE_DMA32 originally but it broke swiotlb which assumes ZONE_DMA for
its bounce buffer.

> > Swiotlb is still required for the streaming DMA since we get bouncing
> > for pages allocated outside the driver control (e.g. VFS layer which
> > doesn't care about GFP_DMA), hoping a 16M bounce buffer would be enough.
> > 
> > Ding seems to imply that CMA fixes the problem, which means that the
> > issue is indeed coherent allocations.
> 
> I wonder what's going on here, since swiotlb_alloc_coherent() actually
> tries a regular __get_free_pages(flags, order) first, and when ZONE_DMA
> is set here, it just work without using the pool.

As long as coherent_dma_mask is sufficient for ZONE_DMA. I have no idea
what this mask is set to in Ding's case (but I've seen the problem
previously with an out of tree driver where coherent_dma_mask was some
random number; so better reporting here would help).

There could be another case where dma_pfn_offset is required but let's
wait for some more info from Ding (ZONE_DMA is 32-bit from the start of
RAM which could be 40-bit like on Seattle, so basically such devices
would need to set dma_pfn_offset).
Arnd Bergmann Nov. 21, 2014, 12:27 p.m. UTC | #7
On Friday 21 November 2014 11:36:18 Catalin Marinas wrote:
> On Fri, Nov 21, 2014 at 11:26:45AM +0000, Arnd Bergmann wrote:
> > On Friday 21 November 2014 11:06:10 Catalin Marinas wrote:
> > > On Wed, Nov 19, 2014 at 03:56:42PM +0000, Arnd Bergmann wrote:
> > > > On Wednesday 19 November 2014 15:46:35 Catalin Marinas wrote:
> > > > > Going back to original topic, the dma_supported() function on arm64
> > > > > calls swiotlb_dma_supported() which actually checks whether the swiotlb
> > > > > bounce buffer is within the dma mask. This transparent bouncing (unlike
> > > > > arm32 where it needs to be explicit) is not always optimal, though
> > > > > required for 32-bit only devices on a 64-bit system. The problem is when
> > > > > the driver is 64-bit capable but forgets to call
> > > > > dma_set_mask_and_coherent() (that's not the only question I got about
> > > > > running out of swiotlb buffers).
> > > > 
> > > > I think it would be nice to warn once per device that starts using the
> > > > swiotlb. Really all 32-bit DMA masters should have a proper IOMMU
> > > > attached.
> > > 
> > > It would be nice to have a dev_warn_once().
> > > 
> > > I think it makes sense on arm64 to avoid swiotlb bounce buffers for
> > > coherent allocations altogether. The __dma_alloc_coherent() function
> > > already checks coherent_dma_mask and sets GFP_DMA accordingly. If we
> > > have a device that cannot even cope with a 32-bit ZONE_DMA, we should
> > > just not support DMA at all on it (without an IOMMU). The arm32
> > > __dma_supported() has a similar check.
> > 
> > If we ever encounter this case, we may have to add a smaller ZONE_DMA
> > and use ZONE_DMA32 for the normal dma allocations.
> 
> Traditionally on x86 I think ZONE_DMA was for ISA and ZONE_DMA32 had to
> cover the 32-bit physical address space. On arm64 we don't expect ISA,
> so we only use ZONE_DMA (which is 4G, similar to IA-64, sparc). We had
> ZONE_DMA32 originally but it broke swiotlb which assumes ZONE_DMA for
> its bounce buffer.

Right, I'm just saying that we might encounter some broken hardware
that needs e.g. a 31-bit dma mask for one device, and we decide that
this chip is important enough that we have to support it.

We can of course hope that this won't happen.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3b64d0bf5bba..dff34883db45 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -200,6 +200,10 @@  static void of_dma_configure(struct device *dev)
 	/* DMA ranges found. Calculate and set dma_pfn_offset */
 	dev->dma_pfn_offset = PFN_DOWN(paddr - dma_addr);
 	dev_dbg(dev, "dma_pfn_offset(%#08lx)\n", dev->dma_pfn_offset);
+
+	/* Set the coherent_dma_mask based on the dma-ranges property */
+	if (size)
+		dev->coherent_dma_mask = DMA_BIT_MASK(ilog2(size));
 }
 
 /**