diff mbox series

[v3,3/3] mmc: xenon: Fix 2G limitation on AC5 SoC

Message ID 20221205105931.410686-4-vadym.kochan@plvision.eu
State New
Headers show
Series mmc: xenon: Fix 2G DMA limitation on AC5 SoC | expand

Commit Message

Vadym Kochan Dec. 5, 2022, 10:59 a.m. UTC
There is a limitation on AC5 SoC that mmc controller
can't have DMA access over 2G memory, so use SDMA with
a bounce buffer. Swiotlb can't help because on arm64 arch
it reserves memblock's at the end of the memory.

Additionally set mask to 34 bit since on AC5 SoC RAM starts
at 0x2_00000000.

Co-developed-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
---
 drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
 drivers/mmc/host/sdhci-xenon.h |  3 ++-
 2 files changed, 40 insertions(+), 1 deletion(-)

Comments

Linus Walleij Dec. 7, 2022, 1:40 p.m. UTC | #1
On Mon, Dec 5, 2022 at 12:00 PM Vadym Kochan <vadym.kochan@plvision.eu> wrote:

> There is a limitation on AC5 SoC that mmc controller
> can't have DMA access over 2G memory,

That sounds like a pretty common problem when someone
uses a 32bit address register in their DMA controller, or
the integration engineer not connecting all address lines... :/

>  so use SDMA with a bounce buffer. Swiotlb can't help because
> on arm64 arch it reserves memblock's at the end of the memory.

OK

This:

> Additionally set mask to 34 bit since on AC5 SoC RAM starts
> at 0x2_00000000.
(...)
> +static int xenon_set_dma_mask(struct sdhci_host *host)
> +{
> +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> +       struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
> +       struct mmc_host *mmc = host->mmc;
> +       struct device *dev = mmc_dev(mmc);
> +
> +       if (priv->hw_version == XENON_AC5) {
> +               host->flags &= ~SDHCI_USE_64_BIT_DMA;
> +
> +               return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
> +       }
> +
> +       return sdhci_set_dma_mask(host);
> +}
(...)
> +       .set_dma_mask           = xenon_set_dma_mask,

I don't know if is so good to assume the size and location of the
SoC RAM like you do, that looks really fragile.

Can't you check what physical RAM Linux actually think
it has and where? You partly check it with meminfo below.

> +static int xenon_ac5_probe(struct sdhci_host *host)
> +{
> +       struct sysinfo si;
> +
> +       si_meminfo(&si);
> +
> +       if ((si.totalram * si.mem_unit) > SZ_2G)

This looks like a bug since you mention that the RAM does
not start at 0x00000000 this means if the memory is
2G it will partly be at physical addresses above 2G....

> +               host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
> +
> +       return 0;
> +}

Here you check how big the RAM is using meminfo (if the
bug is fixed).

But is this really a good solution? ADMA still works on the lower
2GB does it not?

What you want is a new quirk that bounces only when you
go above SZ_4G.

There *is* SDHCI_QUIRK_32BIT_DMA_ADDR have you
tried this? A 32bit DMA address is literally 4GB.
I think all you need to do is set this flag for xenon.

Yours,
Linus Walleij
Elad Nachman Dec. 8, 2022, 10:19 a.m. UTC | #2
Hi,

I would like to explain how this HW mechanism works:

The lower 31-bits of the address placed in the ADMA is passed through the interconnect, and remapped to the base of the DDR.

Hence only addressing of the lower 2GB of the DDR memory is supported for eMMC in this device family (AC5/X).

So the quirk needs to kick in above 2GB of physical memory accessed from the base of the DDR.

Since we cannot control the location of the various buffers passed in the Linux kernel via VFS to the eMMC driver, the only remedy is to kick in the quirk whenever buffer which point to the physical memory above 2GB * can * arrive to the eMMC driver, hence the quirk kicks in whenever a total physical memory size greater than 2GB is detected in the system.

This is why a quirk which only kicks in above 4GB is not sufficient.

Furthermore, SDHCI_QUIRK_32BIT_DMA_ADDR is checked in sdhci_prepare_data() as a way to disable DMA when the offset of the scatter-list DMA address is not 32-bit aligned. If the address is aligned, this quirk does not disable the DMA, and will not solve our problem.

Hopefully this explains the motivation for the way the patch is written.

FYI,

Elad.

-----Original Message-----
From: Linus Walleij <linus.walleij@linaro.org> 
Sent: Wednesday, December 7, 2022 3:40 PM
To: Vadym Kochan <vadym.kochan@plvision.eu>
Cc: Hu Ziji <huziji@marvell.com>; Ulf Hansson <ulf.hansson@linaro.org>; Rob Herring <robh+dt@kernel.org>; Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>; Adrian Hunter <adrian.hunter@intel.com>; linux-mmc@vger.kernel.org; devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Elad Nachman <enachman@marvell.com>; Chris Packham <chris.packham@alliedtelesis.co.nz>
Subject: [EXT] Re: [PATCH v3 3/3] mmc: xenon: Fix 2G limitation on AC5 SoC

External Email

----------------------------------------------------------------------
On Mon, Dec 5, 2022 at 12:00 PM Vadym Kochan <vadym.kochan@plvision.eu> wrote:

> There is a limitation on AC5 SoC that mmc controller can't have DMA 
> access over 2G memory,

That sounds like a pretty common problem when someone uses a 32bit address register in their DMA controller, or the integration engineer not connecting all address lines... :/

>  so use SDMA with a bounce buffer. Swiotlb can't help because on arm64 
> arch it reserves memblock's at the end of the memory.

OK

This:

> Additionally set mask to 34 bit since on AC5 SoC RAM starts at 
> 0x2_00000000.
(...)
> +static int xenon_set_dma_mask(struct sdhci_host *host) {
> +       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> +       struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
> +       struct mmc_host *mmc = host->mmc;
> +       struct device *dev = mmc_dev(mmc);
> +
> +       if (priv->hw_version == XENON_AC5) {
> +               host->flags &= ~SDHCI_USE_64_BIT_DMA;
> +
> +               return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
> +       }
> +
> +       return sdhci_set_dma_mask(host); }
(...)
> +       .set_dma_mask           = xenon_set_dma_mask,

I don't know if is so good to assume the size and location of the SoC RAM like you do, that looks really fragile.

Can't you check what physical RAM Linux actually think it has and where? You partly check it with meminfo below.

> +static int xenon_ac5_probe(struct sdhci_host *host) {
> +       struct sysinfo si;
> +
> +       si_meminfo(&si);
> +
> +       if ((si.totalram * si.mem_unit) > SZ_2G)

This looks like a bug since you mention that the RAM does not start at 0x00000000 this means if the memory is 2G it will partly be at physical addresses above 2G....

> +               host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
> +
> +       return 0;
> +}

Here you check how big the RAM is using meminfo (if the bug is fixed).

But is this really a good solution? ADMA still works on the lower 2GB does it not?

What you want is a new quirk that bounces only when you go above SZ_4G.

There *is* SDHCI_QUIRK_32BIT_DMA_ADDR have you tried this? A 32bit DMA address is literally 4GB.
I think all you need to do is set this flag for xenon.

Yours,
Linus Walleij
Linus Walleij Dec. 8, 2022, 9:35 p.m. UTC | #3
Hi Elad,

I get it, I think. I was a bit confused by the 3G/4G terminology.

On Thu, Dec 8, 2022 at 11:20 AM Elad Nachman <enachman@marvell.com> wrote:

> The lower 31-bits of the address placed in the ADMA is passed through the interconnect, and remapped to the base of the DDR.
>
> Hence only addressing of the lower 2GB of the DDR memory is supported for eMMC in this device family (AC5/X).
>
> So the quirk needs to kick in above 2GB of physical memory accessed from the base of the DDR.

How "clever" to skip bit 32. This should be in the patch description.

> This is why a quirk which only kicks in above 4GB is not sufficient.

So the author of the patch should create a new quirk that kicks in above 2GB,
devised to be similar in style of the 4GB quirk we already have.

> Furthermore, SDHCI_QUIRK_32BIT_DMA_ADDR is checked in sdhci_prepare_data() as a way to
> disable DMA when the offset of the scatter-list DMA address is not 32-bit aligned. If the address is
> aligned, this quirk does not disable the DMA, and will not solve our problem.

That's right.

Let's just create a new quirk:

SDHCI_QUIRK_31BIT_DMA_ROOF

Define the semantics such that this will allow DMA for buffers that are below
the 31st bit, but does not have the semantics to limit scatter-gather buffers to
be 32-bit aligned.

Yours,
Linus Walleij
Adrian Hunter Dec. 9, 2022, 7:23 a.m. UTC | #4
On 5/12/22 12:59, Vadym Kochan wrote:
> There is a limitation on AC5 SoC that mmc controller
> can't have DMA access over 2G memory, so use SDMA with
> a bounce buffer. Swiotlb can't help because on arm64 arch
> it reserves memblock's at the end of the memory.
> 
> Additionally set mask to 34 bit since on AC5 SoC RAM starts
> at 0x2_00000000.

Can you explain more about how a 34-bit DMA mask works when
SDMA only supports 32-bit addresses?

> 
> Co-developed-by: Elad Nachman <enachman@marvell.com>
> Signed-off-by: Elad Nachman <enachman@marvell.com>
> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
> ---
>  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
>  drivers/mmc/host/sdhci-xenon.h |  3 ++-
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
> index 08e838400b52..5f3db0425674 100644
> --- a/drivers/mmc/host/sdhci-xenon.c
> +++ b/drivers/mmc/host/sdhci-xenon.c
> @@ -13,7 +13,9 @@
>  
>  #include <linux/acpi.h>
>  #include <linux/delay.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/ktime.h>
> +#include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/pm.h>
> @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
>  		return pltfm_host->clock;
>  }
>  
> +static int xenon_set_dma_mask(struct sdhci_host *host)
> +{
> +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
> +	struct mmc_host *mmc = host->mmc;
> +	struct device *dev = mmc_dev(mmc);
> +
> +	if (priv->hw_version == XENON_AC5) {
> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> +
> +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
> +	}
> +
> +	return sdhci_set_dma_mask(host);
> +}
> +
>  static const struct sdhci_ops sdhci_xenon_ops = {
>  	.voltage_switch		= xenon_voltage_switch,
>  	.set_clock		= sdhci_set_clock,
> @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
>  	.reset			= xenon_reset,
>  	.set_uhs_signaling	= xenon_set_uhs_signaling,
>  	.get_max_clock		= xenon_get_max_clock,
> +	.set_dma_mask		= xenon_set_dma_mask,
>  };
>  
>  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
> @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
>  	xenon_disable_sdhc(host, sdhc_id);
>  }
>  
> +static int xenon_ac5_probe(struct sdhci_host *host)
> +{
> +	struct sysinfo si;
> +
> +	si_meminfo(&si);
> +
> +	if ((si.totalram * si.mem_unit) > SZ_2G)
> +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
> +
> +	return 0;
> +}
> +
>  static int xenon_probe(struct platform_device *pdev)
>  {
>  	struct sdhci_pltfm_host *pltfm_host;
> @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
>  		}
>  	}
>  
> +	if (priv->hw_version == XENON_AC5) {
> +		err = xenon_ac5_probe(host);
> +		if (err)
> +			goto err_clk_axi;
> +	}
> +
>  	err = mmc_of_parse(host->mmc);
>  	if (err)
>  		goto err_clk_axi;
> @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
>  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
>  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
>  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
> +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
>  	{}
>  };
>  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
> diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
> index 3e9c6c908a79..0460d97aad26 100644
> --- a/drivers/mmc/host/sdhci-xenon.h
> +++ b/drivers/mmc/host/sdhci-xenon.h
> @@ -57,7 +57,8 @@ enum xenon_variant {
>  	XENON_A3700,
>  	XENON_AP806,
>  	XENON_AP807,
> -	XENON_CP110
> +	XENON_CP110,
> +	XENON_AC5
>  };
>  
>  struct xenon_priv {
Vadym Kochan Dec. 9, 2022, 11:39 a.m. UTC | #5
Hi Adrian,

On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 5/12/22 12:59, Vadym Kochan wrote:
> > There is a limitation on AC5 SoC that mmc controller
> > can't have DMA access over 2G memory, so use SDMA with
> > a bounce buffer. Swiotlb can't help because on arm64 arch
> > it reserves memblock's at the end of the memory.
> > 
> > Additionally set mask to 34 bit since on AC5 SoC RAM starts
> > at 0x2_00000000.
> 
> Can you explain more about how a 34-bit DMA mask works when
> SDMA only supports 32-bit addresses?
> 

So, after I set

> > +		host->flags &= ~SDHCI_USE_64_BIT_DMA;

then sdhc core sets mask to 32 bit, but then dma_map fails to map
bounce buffer because the base address is higher than 32bit - 0x2_00000000,
and 34bit mask fixed it.

> > 
> > Co-developed-by: Elad Nachman <enachman@marvell.com>
> > Signed-off-by: Elad Nachman <enachman@marvell.com>
> > Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
> > ---
> >  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
> >  drivers/mmc/host/sdhci-xenon.h |  3 ++-
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
> > index 08e838400b52..5f3db0425674 100644
> > --- a/drivers/mmc/host/sdhci-xenon.c
> > +++ b/drivers/mmc/host/sdhci-xenon.c
> > @@ -13,7 +13,9 @@
> >  
> >  #include <linux/acpi.h>
> >  #include <linux/delay.h>
> > +#include <linux/dma-mapping.h>
> >  #include <linux/ktime.h>
> > +#include <linux/mm.h>
> >  #include <linux/module.h>
> >  #include <linux/of.h>
> >  #include <linux/pm.h>
> > @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
> >  		return pltfm_host->clock;
> >  }
> >  
> > +static int xenon_set_dma_mask(struct sdhci_host *host)
> > +{
> > +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
> > +	struct mmc_host *mmc = host->mmc;
> > +	struct device *dev = mmc_dev(mmc);
> > +
> > +	if (priv->hw_version == XENON_AC5) {
> > +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> > +
> > +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
> > +	}
> > +
> > +	return sdhci_set_dma_mask(host);
> > +}
> > +
> >  static const struct sdhci_ops sdhci_xenon_ops = {
> >  	.voltage_switch		= xenon_voltage_switch,
> >  	.set_clock		= sdhci_set_clock,
> > @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
> >  	.reset			= xenon_reset,
> >  	.set_uhs_signaling	= xenon_set_uhs_signaling,
> >  	.get_max_clock		= xenon_get_max_clock,
> > +	.set_dma_mask		= xenon_set_dma_mask,
> >  };
> >  
> >  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
> > @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
> >  	xenon_disable_sdhc(host, sdhc_id);
> >  }
> >  
> > +static int xenon_ac5_probe(struct sdhci_host *host)
> > +{
> > +	struct sysinfo si;
> > +
> > +	si_meminfo(&si);
> > +
> > +	if ((si.totalram * si.mem_unit) > SZ_2G)
> > +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
> > +
> > +	return 0;
> > +}
> > +
> >  static int xenon_probe(struct platform_device *pdev)
> >  {
> >  	struct sdhci_pltfm_host *pltfm_host;
> > @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
> >  		}
> >  	}
> >  
> > +	if (priv->hw_version == XENON_AC5) {
> > +		err = xenon_ac5_probe(host);
> > +		if (err)
> > +			goto err_clk_axi;
> > +	}
> > +
> >  	err = mmc_of_parse(host->mmc);
> >  	if (err)
> >  		goto err_clk_axi;
> > @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
> >  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
> >  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
> >  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
> > +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
> >  	{}
> >  };
> >  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
> > diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
> > index 3e9c6c908a79..0460d97aad26 100644
> > --- a/drivers/mmc/host/sdhci-xenon.h
> > +++ b/drivers/mmc/host/sdhci-xenon.h
> > @@ -57,7 +57,8 @@ enum xenon_variant {
> >  	XENON_A3700,
> >  	XENON_AP806,
> >  	XENON_AP807,
> > -	XENON_CP110
> > +	XENON_CP110,
> > +	XENON_AC5
> >  };
> >  
> >  struct xenon_priv {
> 

Regards,
Adrian Hunter Dec. 9, 2022, 11:53 a.m. UTC | #6
On 9/12/22 13:39, Vadym Kochan wrote:
> Hi Adrian,
> 
> On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 5/12/22 12:59, Vadym Kochan wrote:
>>> There is a limitation on AC5 SoC that mmc controller
>>> can't have DMA access over 2G memory, so use SDMA with
>>> a bounce buffer. Swiotlb can't help because on arm64 arch
>>> it reserves memblock's at the end of the memory.
>>>
>>> Additionally set mask to 34 bit since on AC5 SoC RAM starts
>>> at 0x2_00000000.
>>
>> Can you explain more about how a 34-bit DMA mask works when
>> SDMA only supports 32-bit addresses?
>>
> 
> So, after I set
> 
>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> 
> then sdhc core sets mask to 32 bit, but then dma_map fails to map
> bounce buffer because the base address is higher than 32bit - 0x2_00000000,
> and 34bit mask fixed it.

What happens if the bounce buffer gets mapped in the range
0x1_00000000 to 0x1_ffffffff ?

> 
>>>
>>> Co-developed-by: Elad Nachman <enachman@marvell.com>
>>> Signed-off-by: Elad Nachman <enachman@marvell.com>
>>> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
>>> ---
>>>  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
>>>  drivers/mmc/host/sdhci-xenon.h |  3 ++-
>>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
>>> index 08e838400b52..5f3db0425674 100644
>>> --- a/drivers/mmc/host/sdhci-xenon.c
>>> +++ b/drivers/mmc/host/sdhci-xenon.c
>>> @@ -13,7 +13,9 @@
>>>  
>>>  #include <linux/acpi.h>
>>>  #include <linux/delay.h>
>>> +#include <linux/dma-mapping.h>
>>>  #include <linux/ktime.h>
>>> +#include <linux/mm.h>
>>>  #include <linux/module.h>
>>>  #include <linux/of.h>
>>>  #include <linux/pm.h>
>>> @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
>>>  		return pltfm_host->clock;
>>>  }
>>>  
>>> +static int xenon_set_dma_mask(struct sdhci_host *host)
>>> +{
>>> +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
>>> +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
>>> +	struct mmc_host *mmc = host->mmc;
>>> +	struct device *dev = mmc_dev(mmc);
>>> +
>>> +	if (priv->hw_version == XENON_AC5) {
>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
>>> +
>>> +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
>>> +	}
>>> +
>>> +	return sdhci_set_dma_mask(host);
>>> +}
>>> +
>>>  static const struct sdhci_ops sdhci_xenon_ops = {
>>>  	.voltage_switch		= xenon_voltage_switch,
>>>  	.set_clock		= sdhci_set_clock,
>>> @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
>>>  	.reset			= xenon_reset,
>>>  	.set_uhs_signaling	= xenon_set_uhs_signaling,
>>>  	.get_max_clock		= xenon_get_max_clock,
>>> +	.set_dma_mask		= xenon_set_dma_mask,
>>>  };
>>>  
>>>  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
>>> @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
>>>  	xenon_disable_sdhc(host, sdhc_id);
>>>  }
>>>  
>>> +static int xenon_ac5_probe(struct sdhci_host *host)
>>> +{
>>> +	struct sysinfo si;
>>> +
>>> +	si_meminfo(&si);
>>> +
>>> +	if ((si.totalram * si.mem_unit) > SZ_2G)
>>> +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static int xenon_probe(struct platform_device *pdev)
>>>  {
>>>  	struct sdhci_pltfm_host *pltfm_host;
>>> @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
>>>  		}
>>>  	}
>>>  
>>> +	if (priv->hw_version == XENON_AC5) {
>>> +		err = xenon_ac5_probe(host);
>>> +		if (err)
>>> +			goto err_clk_axi;
>>> +	}
>>> +
>>>  	err = mmc_of_parse(host->mmc);
>>>  	if (err)
>>>  		goto err_clk_axi;
>>> @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
>>>  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
>>>  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
>>>  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
>>> +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
>>>  	{}
>>>  };
>>>  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
>>> diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
>>> index 3e9c6c908a79..0460d97aad26 100644
>>> --- a/drivers/mmc/host/sdhci-xenon.h
>>> +++ b/drivers/mmc/host/sdhci-xenon.h
>>> @@ -57,7 +57,8 @@ enum xenon_variant {
>>>  	XENON_A3700,
>>>  	XENON_AP806,
>>>  	XENON_AP807,
>>> -	XENON_CP110
>>> +	XENON_CP110,
>>> +	XENON_AC5
>>>  };
>>>  
>>>  struct xenon_priv {
>>
> 
> Regards,
Vadym Kochan Dec. 9, 2022, 12:10 p.m. UTC | #7
Hi Adrian,

On Fri, 9 Dec 2022 13:53:58 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 9/12/22 13:39, Vadym Kochan wrote:
> > Hi Adrian,
> > 
> > On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >> On 5/12/22 12:59, Vadym Kochan wrote:
> >>> There is a limitation on AC5 SoC that mmc controller
> >>> can't have DMA access over 2G memory, so use SDMA with
> >>> a bounce buffer. Swiotlb can't help because on arm64 arch
> >>> it reserves memblock's at the end of the memory.
> >>>
> >>> Additionally set mask to 34 bit since on AC5 SoC RAM starts
> >>> at 0x2_00000000.
> >>
> >> Can you explain more about how a 34-bit DMA mask works when
> >> SDMA only supports 32-bit addresses?
> >>
> > 
> > So, after I set
> > 
> >>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> > 
> > then sdhc core sets mask to 32 bit, but then dma_map fails to map
> > bounce buffer because the base address is higher than 32bit - 0x2_00000000,
> > and 34bit mask fixed it.
> 
> What happens if the bounce buffer gets mapped in the range
> 0x1_00000000 to 0x1_ffffffff ?
>
Adrian Hunter Dec. 9, 2022, 12:13 p.m. UTC | #8
On 9/12/22 14:10, Vadym Kochan wrote:
> Hi Adrian,
> 
> On Fri, 9 Dec 2022 13:53:58 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 9/12/22 13:39, Vadym Kochan wrote:
>>> Hi Adrian,
>>>
>>> On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>> On 5/12/22 12:59, Vadym Kochan wrote:
>>>>> There is a limitation on AC5 SoC that mmc controller
>>>>> can't have DMA access over 2G memory, so use SDMA with
>>>>> a bounce buffer. Swiotlb can't help because on arm64 arch
>>>>> it reserves memblock's at the end of the memory.
>>>>>
>>>>> Additionally set mask to 34 bit since on AC5 SoC RAM starts
>>>>> at 0x2_00000000.
>>>>
>>>> Can you explain more about how a 34-bit DMA mask works when
>>>> SDMA only supports 32-bit addresses?
>>>>
>>>
>>> So, after I set
>>>
>>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
>>>
>>> then sdhc core sets mask to 32 bit, but then dma_map fails to map
>>> bounce buffer because the base address is higher than 32bit - 0x2_00000000,
>>> and 34bit mask fixed it.
>>
>> What happens if the bounce buffer gets mapped in the range
>> 0x1_00000000 to 0x1_ffffffff ?
>>
> 
> From my understanding, on the AC5 SoC RAM starts at 0x2_00000000 so the bounce
> buffer can be mapped in the range 0x2_00000000..0x2_ffffffff

Right but I guess I meant what about 0x3_00000000..0x3_ffffffff ?
Isn't that also in DMA_BIT_MASK(34)

> 
>>>
>>>>>
>>>>> Co-developed-by: Elad Nachman <enachman@marvell.com>
>>>>> Signed-off-by: Elad Nachman <enachman@marvell.com>
>>>>> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
>>>>> ---
>>>>>  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
>>>>>  drivers/mmc/host/sdhci-xenon.h |  3 ++-
>>>>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
>>>>> index 08e838400b52..5f3db0425674 100644
>>>>> --- a/drivers/mmc/host/sdhci-xenon.c
>>>>> +++ b/drivers/mmc/host/sdhci-xenon.c
>>>>> @@ -13,7 +13,9 @@
>>>>>  
>>>>>  #include <linux/acpi.h>
>>>>>  #include <linux/delay.h>
>>>>> +#include <linux/dma-mapping.h>
>>>>>  #include <linux/ktime.h>
>>>>> +#include <linux/mm.h>
>>>>>  #include <linux/module.h>
>>>>>  #include <linux/of.h>
>>>>>  #include <linux/pm.h>
>>>>> @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
>>>>>  		return pltfm_host->clock;
>>>>>  }
>>>>>  
>>>>> +static int xenon_set_dma_mask(struct sdhci_host *host)
>>>>> +{
>>>>> +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
>>>>> +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
>>>>> +	struct mmc_host *mmc = host->mmc;
>>>>> +	struct device *dev = mmc_dev(mmc);
>>>>> +
>>>>> +	if (priv->hw_version == XENON_AC5) {
>>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
>>>>> +
>>>>> +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
>>>>> +	}
>>>>> +
>>>>> +	return sdhci_set_dma_mask(host);
>>>>> +}
>>>>> +
>>>>>  static const struct sdhci_ops sdhci_xenon_ops = {
>>>>>  	.voltage_switch		= xenon_voltage_switch,
>>>>>  	.set_clock		= sdhci_set_clock,
>>>>> @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
>>>>>  	.reset			= xenon_reset,
>>>>>  	.set_uhs_signaling	= xenon_set_uhs_signaling,
>>>>>  	.get_max_clock		= xenon_get_max_clock,
>>>>> +	.set_dma_mask		= xenon_set_dma_mask,
>>>>>  };
>>>>>  
>>>>>  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
>>>>> @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
>>>>>  	xenon_disable_sdhc(host, sdhc_id);
>>>>>  }
>>>>>  
>>>>> +static int xenon_ac5_probe(struct sdhci_host *host)
>>>>> +{
>>>>> +	struct sysinfo si;
>>>>> +
>>>>> +	si_meminfo(&si);
>>>>> +
>>>>> +	if ((si.totalram * si.mem_unit) > SZ_2G)
>>>>> +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>>  static int xenon_probe(struct platform_device *pdev)
>>>>>  {
>>>>>  	struct sdhci_pltfm_host *pltfm_host;
>>>>> @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
>>>>>  		}
>>>>>  	}
>>>>>  
>>>>> +	if (priv->hw_version == XENON_AC5) {
>>>>> +		err = xenon_ac5_probe(host);
>>>>> +		if (err)
>>>>> +			goto err_clk_axi;
>>>>> +	}
>>>>> +
>>>>>  	err = mmc_of_parse(host->mmc);
>>>>>  	if (err)
>>>>>  		goto err_clk_axi;
>>>>> @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
>>>>>  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
>>>>>  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
>>>>>  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
>>>>> +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
>>>>>  	{}
>>>>>  };
>>>>>  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
>>>>> diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
>>>>> index 3e9c6c908a79..0460d97aad26 100644
>>>>> --- a/drivers/mmc/host/sdhci-xenon.h
>>>>> +++ b/drivers/mmc/host/sdhci-xenon.h
>>>>> @@ -57,7 +57,8 @@ enum xenon_variant {
>>>>>  	XENON_A3700,
>>>>>  	XENON_AP806,
>>>>>  	XENON_AP807,
>>>>> -	XENON_CP110
>>>>> +	XENON_CP110,
>>>>> +	XENON_AC5
>>>>>  };
>>>>>  
>>>>>  struct xenon_priv {
>>>>
>>>
>>> Regards,
>>
Vadym Kochan Dec. 9, 2022, 1:27 p.m. UTC | #9
On Fri, 9 Dec 2022 14:13:06 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 9/12/22 14:10, Vadym Kochan wrote:
> > Hi Adrian,
> > 
> > On Fri, 9 Dec 2022 13:53:58 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >> On 9/12/22 13:39, Vadym Kochan wrote:
> >>> Hi Adrian,
> >>>
> >>> On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>>> On 5/12/22 12:59, Vadym Kochan wrote:
> >>>>> There is a limitation on AC5 SoC that mmc controller
> >>>>> can't have DMA access over 2G memory, so use SDMA with
> >>>>> a bounce buffer. Swiotlb can't help because on arm64 arch
> >>>>> it reserves memblock's at the end of the memory.
> >>>>>
> >>>>> Additionally set mask to 34 bit since on AC5 SoC RAM starts
> >>>>> at 0x2_00000000.
> >>>>
> >>>> Can you explain more about how a 34-bit DMA mask works when
> >>>> SDMA only supports 32-bit addresses?
> >>>>
> >>>
> >>> So, after I set
> >>>
> >>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> >>>
> >>> then sdhc core sets mask to 32 bit, but then dma_map fails to map
> >>> bounce buffer because the base address is higher than 32bit - 0x2_00000000,
> >>> and 34bit mask fixed it.
> >>
> >> What happens if the bounce buffer gets mapped in the range
> >> 0x1_00000000 to 0x1_ffffffff ?
> >>
> > 
> > From my understanding, on the AC5 SoC RAM starts at 0x2_00000000 so the bounce
> > buffer can be mapped in the range 0x2_00000000..0x2_ffffffff
> 
> Right but I guess I meant what about 0x3_00000000..0x3_ffffffff ?
> Isn't that also in DMA_BIT_MASK(34)

Yes, you are right.

> 
> > 
> >>>
> >>>>>
> >>>>> Co-developed-by: Elad Nachman <enachman@marvell.com>
> >>>>> Signed-off-by: Elad Nachman <enachman@marvell.com>
> >>>>> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
> >>>>> ---
> >>>>>  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
> >>>>>  drivers/mmc/host/sdhci-xenon.h |  3 ++-
> >>>>>  2 files changed, 40 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
> >>>>> index 08e838400b52..5f3db0425674 100644
> >>>>> --- a/drivers/mmc/host/sdhci-xenon.c
> >>>>> +++ b/drivers/mmc/host/sdhci-xenon.c
> >>>>> @@ -13,7 +13,9 @@
> >>>>>  
> >>>>>  #include <linux/acpi.h>
> >>>>>  #include <linux/delay.h>
> >>>>> +#include <linux/dma-mapping.h>
> >>>>>  #include <linux/ktime.h>
> >>>>> +#include <linux/mm.h>
> >>>>>  #include <linux/module.h>
> >>>>>  #include <linux/of.h>
> >>>>>  #include <linux/pm.h>
> >>>>> @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
> >>>>>  		return pltfm_host->clock;
> >>>>>  }
> >>>>>  
> >>>>> +static int xenon_set_dma_mask(struct sdhci_host *host)
> >>>>> +{
> >>>>> +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> >>>>> +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
> >>>>> +	struct mmc_host *mmc = host->mmc;
> >>>>> +	struct device *dev = mmc_dev(mmc);
> >>>>> +
> >>>>> +	if (priv->hw_version == XENON_AC5) {
> >>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> >>>>> +
> >>>>> +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
> >>>>> +	}
> >>>>> +
> >>>>> +	return sdhci_set_dma_mask(host);
> >>>>> +}
> >>>>> +
> >>>>>  static const struct sdhci_ops sdhci_xenon_ops = {
> >>>>>  	.voltage_switch		= xenon_voltage_switch,
> >>>>>  	.set_clock		= sdhci_set_clock,
> >>>>> @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
> >>>>>  	.reset			= xenon_reset,
> >>>>>  	.set_uhs_signaling	= xenon_set_uhs_signaling,
> >>>>>  	.get_max_clock		= xenon_get_max_clock,
> >>>>> +	.set_dma_mask		= xenon_set_dma_mask,
> >>>>>  };
> >>>>>  
> >>>>>  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
> >>>>> @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
> >>>>>  	xenon_disable_sdhc(host, sdhc_id);
> >>>>>  }
> >>>>>  
> >>>>> +static int xenon_ac5_probe(struct sdhci_host *host)
> >>>>> +{
> >>>>> +	struct sysinfo si;
> >>>>> +
> >>>>> +	si_meminfo(&si);
> >>>>> +
> >>>>> +	if ((si.totalram * si.mem_unit) > SZ_2G)
> >>>>> +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>>  static int xenon_probe(struct platform_device *pdev)
> >>>>>  {
> >>>>>  	struct sdhci_pltfm_host *pltfm_host;
> >>>>> @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
> >>>>>  		}
> >>>>>  	}
> >>>>>  
> >>>>> +	if (priv->hw_version == XENON_AC5) {
> >>>>> +		err = xenon_ac5_probe(host);
> >>>>> +		if (err)
> >>>>> +			goto err_clk_axi;
> >>>>> +	}
> >>>>> +
> >>>>>  	err = mmc_of_parse(host->mmc);
> >>>>>  	if (err)
> >>>>>  		goto err_clk_axi;
> >>>>> @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
> >>>>>  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
> >>>>>  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
> >>>>>  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
> >>>>> +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
> >>>>>  	{}
> >>>>>  };
> >>>>>  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
> >>>>> diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
> >>>>> index 3e9c6c908a79..0460d97aad26 100644
> >>>>> --- a/drivers/mmc/host/sdhci-xenon.h
> >>>>> +++ b/drivers/mmc/host/sdhci-xenon.h
> >>>>> @@ -57,7 +57,8 @@ enum xenon_variant {
> >>>>>  	XENON_A3700,
> >>>>>  	XENON_AP806,
> >>>>>  	XENON_AP807,
> >>>>> -	XENON_CP110
> >>>>> +	XENON_CP110,
> >>>>> +	XENON_AC5
> >>>>>  };
> >>>>>  
> >>>>>  struct xenon_priv {
> >>>>
> >>>
> >>> Regards,
> >>
>
Adrian Hunter Dec. 12, 2022, 8:42 a.m. UTC | #10
On 9/12/22 15:27, Vadym Kochan wrote:
> On Fri, 9 Dec 2022 14:13:06 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 9/12/22 14:10, Vadym Kochan wrote:
>>> Hi Adrian,
>>>
>>> On Fri, 9 Dec 2022 13:53:58 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>> On 9/12/22 13:39, Vadym Kochan wrote:
>>>>> Hi Adrian,
>>>>>
>>>>> On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>>>> On 5/12/22 12:59, Vadym Kochan wrote:
>>>>>>> There is a limitation on AC5 SoC that mmc controller
>>>>>>> can't have DMA access over 2G memory, so use SDMA with
>>>>>>> a bounce buffer. Swiotlb can't help because on arm64 arch
>>>>>>> it reserves memblock's at the end of the memory.
>>>>>>>
>>>>>>> Additionally set mask to 34 bit since on AC5 SoC RAM starts
>>>>>>> at 0x2_00000000.
>>>>>>
>>>>>> Can you explain more about how a 34-bit DMA mask works when
>>>>>> SDMA only supports 32-bit addresses?
>>>>>>
>>>>>
>>>>> So, after I set
>>>>>
>>>>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
>>>>>
>>>>> then sdhc core sets mask to 32 bit, but then dma_map fails to map
>>>>> bounce buffer because the base address is higher than 32bit - 0x2_00000000,
>>>>> and 34bit mask fixed it.
>>>>
>>>> What happens if the bounce buffer gets mapped in the range
>>>> 0x1_00000000 to 0x1_ffffffff ?
>>>>
>>>
>>> From my understanding, on the AC5 SoC RAM starts at 0x2_00000000 so the bounce
>>> buffer can be mapped in the range 0x2_00000000..0x2_ffffffff
>>
>> Right but I guess I meant what about 0x3_00000000..0x3_ffffffff ?
>> Isn't that also in DMA_BIT_MASK(34)
> 
> Yes, you are right.

So it would fail in that case?  Is it possible to use devicetree
reserved memory or some such, to set aside 64k for the bounce
buffer DMA mapping?

> 
>>
>>>
>>>>>
>>>>>>>
>>>>>>> Co-developed-by: Elad Nachman <enachman@marvell.com>
>>>>>>> Signed-off-by: Elad Nachman <enachman@marvell.com>
>>>>>>> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
>>>>>>> ---
>>>>>>>  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
>>>>>>>  drivers/mmc/host/sdhci-xenon.h |  3 ++-
>>>>>>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
>>>>>>> index 08e838400b52..5f3db0425674 100644
>>>>>>> --- a/drivers/mmc/host/sdhci-xenon.c
>>>>>>> +++ b/drivers/mmc/host/sdhci-xenon.c
>>>>>>> @@ -13,7 +13,9 @@
>>>>>>>  
>>>>>>>  #include <linux/acpi.h>
>>>>>>>  #include <linux/delay.h>
>>>>>>> +#include <linux/dma-mapping.h>
>>>>>>>  #include <linux/ktime.h>
>>>>>>> +#include <linux/mm.h>
>>>>>>>  #include <linux/module.h>
>>>>>>>  #include <linux/of.h>
>>>>>>>  #include <linux/pm.h>
>>>>>>> @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
>>>>>>>  		return pltfm_host->clock;
>>>>>>>  }
>>>>>>>  
>>>>>>> +static int xenon_set_dma_mask(struct sdhci_host *host)
>>>>>>> +{
>>>>>>> +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
>>>>>>> +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
>>>>>>> +	struct mmc_host *mmc = host->mmc;
>>>>>>> +	struct device *dev = mmc_dev(mmc);
>>>>>>> +
>>>>>>> +	if (priv->hw_version == XENON_AC5) {
>>>>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
>>>>>>> +
>>>>>>> +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return sdhci_set_dma_mask(host);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static const struct sdhci_ops sdhci_xenon_ops = {
>>>>>>>  	.voltage_switch		= xenon_voltage_switch,
>>>>>>>  	.set_clock		= sdhci_set_clock,
>>>>>>> @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
>>>>>>>  	.reset			= xenon_reset,
>>>>>>>  	.set_uhs_signaling	= xenon_set_uhs_signaling,
>>>>>>>  	.get_max_clock		= xenon_get_max_clock,
>>>>>>> +	.set_dma_mask		= xenon_set_dma_mask,
>>>>>>>  };
>>>>>>>  
>>>>>>>  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
>>>>>>> @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
>>>>>>>  	xenon_disable_sdhc(host, sdhc_id);
>>>>>>>  }
>>>>>>>  
>>>>>>> +static int xenon_ac5_probe(struct sdhci_host *host)
>>>>>>> +{
>>>>>>> +	struct sysinfo si;
>>>>>>> +
>>>>>>> +	si_meminfo(&si);
>>>>>>> +
>>>>>>> +	if ((si.totalram * si.mem_unit) > SZ_2G)
>>>>>>> +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>  static int xenon_probe(struct platform_device *pdev)
>>>>>>>  {
>>>>>>>  	struct sdhci_pltfm_host *pltfm_host;
>>>>>>> @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
>>>>>>>  		}
>>>>>>>  	}
>>>>>>>  
>>>>>>> +	if (priv->hw_version == XENON_AC5) {
>>>>>>> +		err = xenon_ac5_probe(host);
>>>>>>> +		if (err)
>>>>>>> +			goto err_clk_axi;
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	err = mmc_of_parse(host->mmc);
>>>>>>>  	if (err)
>>>>>>>  		goto err_clk_axi;
>>>>>>> @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
>>>>>>>  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
>>>>>>>  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
>>>>>>>  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
>>>>>>> +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
>>>>>>>  	{}
>>>>>>>  };
>>>>>>>  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
>>>>>>> diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
>>>>>>> index 3e9c6c908a79..0460d97aad26 100644
>>>>>>> --- a/drivers/mmc/host/sdhci-xenon.h
>>>>>>> +++ b/drivers/mmc/host/sdhci-xenon.h
>>>>>>> @@ -57,7 +57,8 @@ enum xenon_variant {
>>>>>>>  	XENON_A3700,
>>>>>>>  	XENON_AP806,
>>>>>>>  	XENON_AP807,
>>>>>>> -	XENON_CP110
>>>>>>> +	XENON_CP110,
>>>>>>> +	XENON_AC5
>>>>>>>  };
>>>>>>>  
>>>>>>>  struct xenon_priv {
>>>>>>
>>>>>
>>>>> Regards,
>>>>
>>
Vadym Kochan Dec. 12, 2022, 11:34 a.m. UTC | #11
Hi Adrian,

On Mon, 12 Dec 2022 10:42:36 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 9/12/22 15:27, Vadym Kochan wrote:
> > On Fri, 9 Dec 2022 14:13:06 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >> On 9/12/22 14:10, Vadym Kochan wrote:
> >>> Hi Adrian,
> >>>
> >>> On Fri, 9 Dec 2022 13:53:58 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>>> On 9/12/22 13:39, Vadym Kochan wrote:
> >>>>> Hi Adrian,
> >>>>>
> >>>>> On Fri, 9 Dec 2022 09:23:05 +0200, Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>>>>> On 5/12/22 12:59, Vadym Kochan wrote:
> >>>>>>> There is a limitation on AC5 SoC that mmc controller
> >>>>>>> can't have DMA access over 2G memory, so use SDMA with
> >>>>>>> a bounce buffer. Swiotlb can't help because on arm64 arch
> >>>>>>> it reserves memblock's at the end of the memory.
> >>>>>>>
> >>>>>>> Additionally set mask to 34 bit since on AC5 SoC RAM starts
> >>>>>>> at 0x2_00000000.
> >>>>>>
> >>>>>> Can you explain more about how a 34-bit DMA mask works when
> >>>>>> SDMA only supports 32-bit addresses?
> >>>>>>
> >>>>>
> >>>>> So, after I set
> >>>>>
> >>>>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> >>>>>
> >>>>> then sdhc core sets mask to 32 bit, but then dma_map fails to map
> >>>>> bounce buffer because the base address is higher than 32bit - 0x2_00000000,
> >>>>> and 34bit mask fixed it.
> >>>>
> >>>> What happens if the bounce buffer gets mapped in the range
> >>>> 0x1_00000000 to 0x1_ffffffff ?
> >>>>
> >>>
> >>> From my understanding, on the AC5 SoC RAM starts at 0x2_00000000 so the bounce
> >>> buffer can be mapped in the range 0x2_00000000..0x2_ffffffff
> >>
> >> Right but I guess I meant what about 0x3_00000000..0x3_ffffffff ?
> >> Isn't that also in DMA_BIT_MASK(34)
> > 
> > Yes, you are right.
> 
> So it would fail in that case?  Is it possible to use devicetree
> reserved memory or some such, to set aside 64k for the bounce
> buffer DMA mapping?
> 

The main restriction is that only lower 2GB can be used for DMA.

I already did send solution based on reserved memory, I can send it again in context of this series.
Also what about the solution which Linus suggested ?

[cut]

Let's just create a new quirk:

SDHCI_QUIRK_31BIT_DMA_ROOF

Define the semantics such that this will allow DMA for buffers that are below
the 31st bit, but does not have the semantics to limit scatter-gather buffers to
be 32-bit aligned.

[/cut]

Thanks,

> > 
> >>
> >>>
> >>>>>
> >>>>>>>
> >>>>>>> Co-developed-by: Elad Nachman <enachman@marvell.com>
> >>>>>>> Signed-off-by: Elad Nachman <enachman@marvell.com>
> >>>>>>> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
> >>>>>>> ---
> >>>>>>>  drivers/mmc/host/sdhci-xenon.c | 38 ++++++++++++++++++++++++++++++++++
> >>>>>>>  drivers/mmc/host/sdhci-xenon.h |  3 ++-
> >>>>>>>  2 files changed, 40 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
> >>>>>>> index 08e838400b52..5f3db0425674 100644
> >>>>>>> --- a/drivers/mmc/host/sdhci-xenon.c
> >>>>>>> +++ b/drivers/mmc/host/sdhci-xenon.c
> >>>>>>> @@ -13,7 +13,9 @@
> >>>>>>>  
> >>>>>>>  #include <linux/acpi.h>
> >>>>>>>  #include <linux/delay.h>
> >>>>>>> +#include <linux/dma-mapping.h>
> >>>>>>>  #include <linux/ktime.h>
> >>>>>>> +#include <linux/mm.h>
> >>>>>>>  #include <linux/module.h>
> >>>>>>>  #include <linux/of.h>
> >>>>>>>  #include <linux/pm.h>
> >>>>>>> @@ -253,6 +255,22 @@ static unsigned int xenon_get_max_clock(struct sdhci_host *host)
> >>>>>>>  		return pltfm_host->clock;
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +static int xenon_set_dma_mask(struct sdhci_host *host)
> >>>>>>> +{
> >>>>>>> +	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> >>>>>>> +	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
> >>>>>>> +	struct mmc_host *mmc = host->mmc;
> >>>>>>> +	struct device *dev = mmc_dev(mmc);
> >>>>>>> +
> >>>>>>> +	if (priv->hw_version == XENON_AC5) {
> >>>>>>> +		host->flags &= ~SDHCI_USE_64_BIT_DMA;
> >>>>>>> +
> >>>>>>> +		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	return sdhci_set_dma_mask(host);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static const struct sdhci_ops sdhci_xenon_ops = {
> >>>>>>>  	.voltage_switch		= xenon_voltage_switch,
> >>>>>>>  	.set_clock		= sdhci_set_clock,
> >>>>>>> @@ -261,6 +279,7 @@ static const struct sdhci_ops sdhci_xenon_ops = {
> >>>>>>>  	.reset			= xenon_reset,
> >>>>>>>  	.set_uhs_signaling	= xenon_set_uhs_signaling,
> >>>>>>>  	.get_max_clock		= xenon_get_max_clock,
> >>>>>>> +	.set_dma_mask		= xenon_set_dma_mask,
> >>>>>>>  };
> >>>>>>>  
> >>>>>>>  static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
> >>>>>>> @@ -486,6 +505,18 @@ static void xenon_sdhc_unprepare(struct sdhci_host *host)
> >>>>>>>  	xenon_disable_sdhc(host, sdhc_id);
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +static int xenon_ac5_probe(struct sdhci_host *host)
> >>>>>>> +{
> >>>>>>> +	struct sysinfo si;
> >>>>>>> +
> >>>>>>> +	si_meminfo(&si);
> >>>>>>> +
> >>>>>>> +	if ((si.totalram * si.mem_unit) > SZ_2G)
> >>>>>>> +		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
> >>>>>>> +
> >>>>>>> +	return 0;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static int xenon_probe(struct platform_device *pdev)
> >>>>>>>  {
> >>>>>>>  	struct sdhci_pltfm_host *pltfm_host;
> >>>>>>> @@ -533,6 +564,12 @@ static int xenon_probe(struct platform_device *pdev)
> >>>>>>>  		}
> >>>>>>>  	}
> >>>>>>>  
> >>>>>>> +	if (priv->hw_version == XENON_AC5) {
> >>>>>>> +		err = xenon_ac5_probe(host);
> >>>>>>> +		if (err)
> >>>>>>> +			goto err_clk_axi;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>>  	err = mmc_of_parse(host->mmc);
> >>>>>>>  	if (err)
> >>>>>>>  		goto err_clk_axi;
> >>>>>>> @@ -682,6 +719,7 @@ static const struct of_device_id sdhci_xenon_dt_ids[] = {
> >>>>>>>  	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
> >>>>>>>  	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
> >>>>>>>  	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
> >>>>>>> +	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
> >>>>>>>  	{}
> >>>>>>>  };
> >>>>>>>  MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
> >>>>>>> diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
> >>>>>>> index 3e9c6c908a79..0460d97aad26 100644
> >>>>>>> --- a/drivers/mmc/host/sdhci-xenon.h
> >>>>>>> +++ b/drivers/mmc/host/sdhci-xenon.h
> >>>>>>> @@ -57,7 +57,8 @@ enum xenon_variant {
> >>>>>>>  	XENON_A3700,
> >>>>>>>  	XENON_AP806,
> >>>>>>>  	XENON_AP807,
> >>>>>>> -	XENON_CP110
> >>>>>>> +	XENON_CP110,
> >>>>>>> +	XENON_AC5
> >>>>>>>  };
> >>>>>>>  
> >>>>>>>  struct xenon_priv {
> >>>>>>
> >>>>>
> >>>>> Regards,
> >>>>
> >>
>
Linus Walleij Dec. 13, 2022, 9:16 a.m. UTC | #12
On Mon, Dec 12, 2022 at 9:43 AM Adrian Hunter <adrian.hunter@intel.com> wrote:

> >> Right but I guess I meant what about 0x3_00000000..0x3_ffffffff ?
> >> Isn't that also in DMA_BIT_MASK(34)
> >
> > Yes, you are right.
>
> So it would fail in that case?  Is it possible to use devicetree
> reserved memory or some such, to set aside 64k for the bounce
> buffer DMA mapping?

Yups spot on, reserved-memory can be used along with the kernel
CONFIG_DMA_CMA to achieve exactly this:
Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
kernel/dma/Kconfig

Yours,
Linus Walleij
Vadym Kochan Dec. 13, 2022, 9:21 a.m. UTC | #13
Hi Linus, Adrian,

On Tue, 13 Dec 2022 10:16:57 +0100, Linus Walleij <linus.walleij@linaro.org> wrote:
> On Mon, Dec 12, 2022 at 9:43 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
> > >> Right but I guess I meant what about 0x3_00000000..0x3_ffffffff ?
> > >> Isn't that also in DMA_BIT_MASK(34)
> > >
> > > Yes, you are right.
> >
> > So it would fail in that case?  Is it possible to use devicetree
> > reserved memory or some such, to set aside 64k for the bounce
> > buffer DMA mapping?
> 
> Yups spot on, reserved-memory can be used along with the kernel
> CONFIG_DMA_CMA to achieve exactly this:
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
> kernel/dma/Kconfig
> 
> Yours,
> Linus Walleij

Just in case, here is an old series with reserved-memory solution:
https://lore.kernel.org/lkml/20220806085818.9873-4-vadym.kochan@plvision.eu/T/

But, what about to start with PIO solution (which is conceptually easier solution) with
checking on ram size and then develop better one ?

Thanks,
Linus Walleij Dec. 13, 2022, 9:22 a.m. UTC | #14
On Mon, Dec 12, 2022 at 12:40 PM Vadym Kochan <vadym.kochan@plvision.eu> wrote:

> The main restriction is that only lower 2GB can be used for DMA.
>
> I already did send solution based on reserved memory, I can send it again in context of this series.
> Also what about the solution which Linus suggested ?
>
> [cut]
>
> Let's just create a new quirk:
>
> SDHCI_QUIRK_31BIT_DMA_ROOF
>
> Define the semantics such that this will allow DMA for buffers that are below
> the 31st bit, but does not have the semantics to limit scatter-gather buffers to
> be 32-bit aligned.
>
> [/cut]

One does not exclude the other, so you could technically let buffers below
2^31 pass directly to the DMA engine, but bounce any request above that
limit to a low memory bounce buffer.

As Adrian points out there is also the code complexity question, the solution
should be simple and elegant, if possible. I think always using a bounce
buffer might be both nice and efficient.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
index 08e838400b52..5f3db0425674 100644
--- a/drivers/mmc/host/sdhci-xenon.c
+++ b/drivers/mmc/host/sdhci-xenon.c
@@ -13,7 +13,9 @@ 
 
 #include <linux/acpi.h>
 #include <linux/delay.h>
+#include <linux/dma-mapping.h>
 #include <linux/ktime.h>
+#include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/pm.h>
@@ -253,6 +255,22 @@  static unsigned int xenon_get_max_clock(struct sdhci_host *host)
 		return pltfm_host->clock;
 }
 
+static int xenon_set_dma_mask(struct sdhci_host *host)
+{
+	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+	struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
+	struct mmc_host *mmc = host->mmc;
+	struct device *dev = mmc_dev(mmc);
+
+	if (priv->hw_version == XENON_AC5) {
+		host->flags &= ~SDHCI_USE_64_BIT_DMA;
+
+		return dma_set_mask_and_coherent(dev, DMA_BIT_MASK(34));
+	}
+
+	return sdhci_set_dma_mask(host);
+}
+
 static const struct sdhci_ops sdhci_xenon_ops = {
 	.voltage_switch		= xenon_voltage_switch,
 	.set_clock		= sdhci_set_clock,
@@ -261,6 +279,7 @@  static const struct sdhci_ops sdhci_xenon_ops = {
 	.reset			= xenon_reset,
 	.set_uhs_signaling	= xenon_set_uhs_signaling,
 	.get_max_clock		= xenon_get_max_clock,
+	.set_dma_mask		= xenon_set_dma_mask,
 };
 
 static const struct sdhci_pltfm_data sdhci_xenon_pdata = {
@@ -486,6 +505,18 @@  static void xenon_sdhc_unprepare(struct sdhci_host *host)
 	xenon_disable_sdhc(host, sdhc_id);
 }
 
+static int xenon_ac5_probe(struct sdhci_host *host)
+{
+	struct sysinfo si;
+
+	si_meminfo(&si);
+
+	if ((si.totalram * si.mem_unit) > SZ_2G)
+		host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
+
+	return 0;
+}
+
 static int xenon_probe(struct platform_device *pdev)
 {
 	struct sdhci_pltfm_host *pltfm_host;
@@ -533,6 +564,12 @@  static int xenon_probe(struct platform_device *pdev)
 		}
 	}
 
+	if (priv->hw_version == XENON_AC5) {
+		err = xenon_ac5_probe(host);
+		if (err)
+			goto err_clk_axi;
+	}
+
 	err = mmc_of_parse(host->mmc);
 	if (err)
 		goto err_clk_axi;
@@ -682,6 +719,7 @@  static const struct of_device_id sdhci_xenon_dt_ids[] = {
 	{ .compatible = "marvell,armada-ap807-sdhci", .data = (void *)XENON_AP807},
 	{ .compatible = "marvell,armada-cp110-sdhci", .data =  (void *)XENON_CP110},
 	{ .compatible = "marvell,armada-3700-sdhci", .data =  (void *)XENON_A3700},
+	{ .compatible = "marvell,ac5-sdhci", .data = (void *)XENON_AC5},
 	{}
 };
 MODULE_DEVICE_TABLE(of, sdhci_xenon_dt_ids);
diff --git a/drivers/mmc/host/sdhci-xenon.h b/drivers/mmc/host/sdhci-xenon.h
index 3e9c6c908a79..0460d97aad26 100644
--- a/drivers/mmc/host/sdhci-xenon.h
+++ b/drivers/mmc/host/sdhci-xenon.h
@@ -57,7 +57,8 @@  enum xenon_variant {
 	XENON_A3700,
 	XENON_AP806,
 	XENON_AP807,
-	XENON_CP110
+	XENON_CP110,
+	XENON_AC5
 };
 
 struct xenon_priv {