diff mbox series

nvme: Invalidate dcache before submitting admin cmd

Message ID 20200607112226.76561-1-jagan@amarulasolutions.com
State Accepted
Commit 1a027a90aaa65ea429a55035f0316eadd0d83180
Headers show
Series nvme: Invalidate dcache before submitting admin cmd | expand

Commit Message

Jagan Teki June 7, 2020, 11:22 a.m. UTC
Some architecture like ARM Cortex A53, A72 would need
to invalidate dcache to sync the cache with the memory
contents before flushing the cache to memory.

The NVME here submitting the admin command using dma_addr
to the memory without prior cache invalidation. This causing
dma_addr is pointing to an invalid location in the memory
and found the sane nvme_ctrl result.

Below example shows the nvme disk scan result improper result

=> nvme scan
nvme_get_info_from_identify: nn = 544502629, vwc = 100,
sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105

So, invalidating the cache before submitting the admin command
makes the dma_addr points to a valid location in the memory.

Cc: Andre Przywara <andre.przywara at arm.com> 
Reported-by: Suniel Mahesh <sunil at amarulasolutions.com>
Signed-off-by: Michael Trimarchi <michael at amarulasolutions.com>
Signed-off-by: Jagan Teki <jagan at amarulasolutions.com>
Tested-by: Suniel Mahesh <sunil at amarulasolutions.com>
---
 drivers/nvme/nvme.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bin Meng June 8, 2020, 1:18 p.m. UTC | #1
On Sun, Jun 7, 2020 at 7:22 PM Jagan Teki <jagan at amarulasolutions.com> wrote:
>
> Some architecture like ARM Cortex A53, A72 would need
> to invalidate dcache to sync the cache with the memory
> contents before flushing the cache to memory.
>
> The NVME here submitting the admin command using dma_addr
> to the memory without prior cache invalidation. This causing
> dma_addr is pointing to an invalid location in the memory
> and found the sane nvme_ctrl result.

insane?

>
> Below example shows the nvme disk scan result improper result
>
> => nvme scan
> nvme_get_info_from_identify: nn = 544502629, vwc = 100,
> sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
>
> So, invalidating the cache before submitting the admin command
> makes the dma_addr points to a valid location in the memory.
>
> Cc: Andre Przywara <andre.przywara at arm.com>
> Reported-by: Suniel Mahesh <sunil at amarulasolutions.com>
> Signed-off-by: Michael Trimarchi <michael at amarulasolutions.com>
> Signed-off-by: Jagan Teki <jagan at amarulasolutions.com>
> Tested-by: Suniel Mahesh <sunil at amarulasolutions.com>
> ---
>  drivers/nvme/nvme.c | 3 +++
>  1 file changed, 3 insertions(+)
>

Reviewed-by: Bin Meng <bmeng.cn at gmail.com>
Andre Przywara June 8, 2020, 1:51 p.m. UTC | #2
On 07/06/2020 12:22, Jagan Teki wrote:

Hi,

(CC: ing Mark)

Without looking to deep, I think invalidating the cache might be the
right thing to do, but the rationale or at least the wording of it seems
somehow flawed:

> Some architecture like ARM Cortex A53, A72 would need

Please don't mix the term "architecture" with actual implementations.
And the problem is more general, I would say. This *might* be a case
here where this is due to speculation, and then I would expect it to
only show on an A72, for instance. I guess this is about NVMe on RK3399?
Does U-Boot run on an A53 or an A72 core?

> to invalidate dcache to sync the cache with the memory
> contents before flushing the cache to memory.

That is somewhat confusing. What does "sync" and "flush" mean here?
AFAIK only "clean" and "invalidate" are clearly defined terms.
The command buffer is cleaned to DRAM in nvme_submit_cmd(), so this is
purely about the buffer for the *return* data, in which case I would
expect this being a pure invalidation problem.

This issue looks like the case described on slide 30 and 31 here:
https://elinux.org/images/6/6d/Rutland2.pdf

> The NVME here submitting the admin command using dma_addr
> to the memory without prior cache invalidation. This causing
> dma_addr is pointing to an invalid location in the memory

This does not sound right: dma_addr is always pointing to the right
location in memory.
The actual reason this fixes something might be that (some) cache lines
of the DMA buffer were dirty and were evicted *before* the existing
invalidation, but *after* the DMA transfer. That sounds unlikely in an
U-Boot context, though, but would match the case described in Mark's slides.

So there might be more to this. Are we missing a barrier here? Should we
use coherent buffers (memory mapped as un-cached) for the return DMA in
the first place (but I don't think U-Boot provides those)?

> and found the sane nvme_ctrl result.
> 
> Below example shows the nvme disk scan result improper result
> 
> => nvme scan
> nvme_get_info_from_identify: nn = 544502629, vwc = 100,
> sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
> 
> So, invalidating the cache before submitting the admin command
> makes the dma_addr points to a valid location in the memory.

If this shows up already, I think we should address all the comments in
the driver where it says we should do cache maintenance. And it makes me
wonder how this actually works today ...

Cheers,
Andre.

> 
> Cc: Andre Przywara <andre.przywara at arm.com> 
> Reported-by: Suniel Mahesh <sunil at amarulasolutions.com>
> Signed-off-by: Michael Trimarchi <michael at amarulasolutions.com>
> Signed-off-by: Jagan Teki <jagan at amarulasolutions.com>
> Tested-by: Suniel Mahesh <sunil at amarulasolutions.com>
> ---
>  drivers/nvme/nvme.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/nvme.c b/drivers/nvme/nvme.c
> index 0357aba7f1..fc64d93ab8 100644
> --- a/drivers/nvme/nvme.c
> +++ b/drivers/nvme/nvme.c
> @@ -466,6 +466,9 @@ int nvme_identify(struct nvme_dev *dev, unsigned nsid,
>  
>  	c.identify.cns = cpu_to_le32(cns);
>  
> +	invalidate_dcache_range(dma_addr,
> +				dma_addr + sizeof(struct nvme_id_ctrl));
> +
>  	ret = nvme_submit_admin_cmd(dev, &c, NULL);
>  	if (!ret)
>  		invalidate_dcache_range(dma_addr,
>
Michael Trimarchi June 8, 2020, 1:59 p.m. UTC | #3
Hi

On Mon, Jun 8, 2020 at 3:52 PM Andr? Przywara <andre.przywara at arm.com> wrote:
>
> On 07/06/2020 12:22, Jagan Teki wrote:
>
> Hi,
>
> (CC: ing Mark)
>
> Without looking to deep, I think invalidating the cache might be the
> right thing to do, but the rationale or at least the wording of it seems
> somehow flawed:
>
> > Some architecture like ARM Cortex A53, A72 would need
>
> Please don't mix the term "architecture" with actual implementations.
> And the problem is more general, I would say. This *might* be a case
> here where this is due to speculation, and then I would expect it to
> only show on an A72, for instance. I guess this is about NVMe on RK3399?
> Does U-Boot run on an A53 or an A72 core?
>
> > to invalidate dcache to sync the cache with the memory
> > contents before flushing the cache to memory.
>

We will adjust commit message. The idea is that when we invalidate after the DMA
transfer, a cache flush happen for the invalidation itself

> That is somewhat confusing. What does "sync" and "flush" mean here?
> AFAIK only "clean" and "invalidate" are clearly defined terms.
> The command buffer is cleaned to DRAM in nvme_submit_cmd(), so this is
> purely about the buffer for the *return* data, in which case I would
> expect this being a pure invalidation problem.
>

Correct

> This issue looks like the case described on slide 30 and 31 here:
> https://elinux.org/images/6/6d/Rutland2.pdf
>

Thank you for point out to the slide

> > The NVME here submitting the admin command using dma_addr
> > to the memory without prior cache invalidation. This causing
> > dma_addr is pointing to an invalid location in the memory
>
> This does not sound right: dma_addr is always pointing to the right
> location in memory.
> The actual reason this fixes something might be that (some) cache lines
> of the DMA buffer were dirty and were evicted *before* the existing
> invalidation, but *after* the DMA transfer. That sounds unlikely in an
> U-Boot context, though, but would match the case described in Mark's slides.

Does it not depend on how invalidation work in the particular CPU?

>
> So there might be more to this. Are we missing a barrier here? Should we
> use coherent buffers (memory mapped as un-cached) for the return DMA in
> the first place (but I don't think U-Boot provides those)?

Right, I don't find a better way and yes this is  general problem in uboot

Michael


>
> > and found the sane nvme_ctrl result.
> >
> > Below example shows the nvme disk scan result improper result
> >
> > => nvme scan
> > nvme_get_info_from_identify: nn = 544502629, vwc = 100,
> > sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
> >
> > So, invalidating the cache before submitting the admin command
> > makes the dma_addr points to a valid location in the memory.
>
> If this shows up already, I think we should address all the comments in
> the driver where it says we should do cache maintenance. And it makes me
> wonder how this actually works today ...
>
> Cheers,
> Andre.
>
> >
> > Cc: Andre Przywara <andre.przywara at arm.com>
> > Reported-by: Suniel Mahesh <sunil at amarulasolutions.com>
> > Signed-off-by: Michael Trimarchi <michael at amarulasolutions.com>
> > Signed-off-by: Jagan Teki <jagan at amarulasolutions.com>
> > Tested-by: Suniel Mahesh <sunil at amarulasolutions.com>
> > ---
> >  drivers/nvme/nvme.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/nvme/nvme.c b/drivers/nvme/nvme.c
> > index 0357aba7f1..fc64d93ab8 100644
> > --- a/drivers/nvme/nvme.c
> > +++ b/drivers/nvme/nvme.c
> > @@ -466,6 +466,9 @@ int nvme_identify(struct nvme_dev *dev, unsigned nsid,
> >
> >       c.identify.cns = cpu_to_le32(cns);
> >
> > +     invalidate_dcache_range(dma_addr,
> > +                             dma_addr + sizeof(struct nvme_id_ctrl));
> > +
> >       ret = nvme_submit_admin_cmd(dev, &c, NULL);
> >       if (!ret)
> >               invalidate_dcache_range(dma_addr,
> >
>
Bin Meng June 9, 2020, 4:13 a.m. UTC | #4
Hi Andr?,

On Mon, Jun 8, 2020 at 9:52 PM Andr? Przywara <andre.przywara at arm.com> wrote:
>
> On 07/06/2020 12:22, Jagan Teki wrote:
>
> Hi,
>
> (CC: ing Mark)
>
> Without looking to deep, I think invalidating the cache might be the
> right thing to do, but the rationale or at least the wording of it seems
> somehow flawed:
>
> > Some architecture like ARM Cortex A53, A72 would need
>
> Please don't mix the term "architecture" with actual implementations.
> And the problem is more general, I would say. This *might* be a case
> here where this is due to speculation, and then I would expect it to
> only show on an A72, for instance. I guess this is about NVMe on RK3399?
> Does U-Boot run on an A53 or an A72 core?
>
> > to invalidate dcache to sync the cache with the memory
> > contents before flushing the cache to memory.
>
> That is somewhat confusing. What does "sync" and "flush" mean here?
> AFAIK only "clean" and "invalidate" are clearly defined terms.
> The command buffer is cleaned to DRAM in nvme_submit_cmd(), so this is
> purely about the buffer for the *return* data, in which case I would
> expect this being a pure invalidation problem.
>
> This issue looks like the case described on slide 30 and 31 here:
> https://elinux.org/images/6/6d/Rutland2.pdf

Thanks for the slides. This is really helpful.

>
> > The NVME here submitting the admin command using dma_addr
> > to the memory without prior cache invalidation. This causing
> > dma_addr is pointing to an invalid location in the memory
>
> This does not sound right: dma_addr is always pointing to the right
> location in memory.
> The actual reason this fixes something might be that (some) cache lines
> of the DMA buffer were dirty and were evicted *before* the existing
> invalidation, but *after* the DMA transfer. That sounds unlikely in an
> U-Boot context, though, but would match the case described in Mark's slides.
>
> So there might be more to this. Are we missing a barrier here? Should we
> use coherent buffers (memory mapped as un-cached) for the return DMA in
> the first place (but I don't think U-Boot provides those)?
>
> > and found the sane nvme_ctrl result.
> >
> > Below example shows the nvme disk scan result improper result
> >
> > => nvme scan
> > nvme_get_info_from_identify: nn = 544502629, vwc = 100,
> > sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
> >
> > So, invalidating the cache before submitting the admin command
> > makes the dma_addr points to a valid location in the memory.
>
> If this shows up already, I think we should address all the comments in
> the driver where it says we should do cache maintenance. And it makes me
> wonder how this actually works today ...
>

I believe the driver was only validated on x86 and NXP's PowerPC
series where cache coherence is ensured between the DMA master and the
system memory. I suspect it was never validated on ARM hence this
patch.

I agree we should address all the comments in the driver about cache
maintenance.

Regards,
Bin
Michael Trimarchi June 9, 2020, 6:47 p.m. UTC | #5
Hi all

On Tue, Jun 9, 2020 at 6:13 AM Bin Meng <bmeng.cn at gmail.com> wrote:
>
> Hi Andr?,
>
> On Mon, Jun 8, 2020 at 9:52 PM Andr? Przywara <andre.przywara at arm.com> wrote:
> >
> > On 07/06/2020 12:22, Jagan Teki wrote:
> >
> > Hi,
> >
> > (CC: ing Mark)
> >
> > Without looking to deep, I think invalidating the cache might be the
> > right thing to do, but the rationale or at least the wording of it seems
> > somehow flawed:
> >
> > > Some architecture like ARM Cortex A53, A72 would need
> >
> > Please don't mix the term "architecture" with actual implementations.
> > And the problem is more general, I would say. This *might* be a case
> > here where this is due to speculation, and then I would expect it to
> > only show on an A72, for instance. I guess this is about NVMe on RK3399?
> > Does U-Boot run on an A53 or an A72 core?
> >
> > > to invalidate dcache to sync the cache with the memory
> > > contents before flushing the cache to memory.
> >
> > That is somewhat confusing. What does "sync" and "flush" mean here?
> > AFAIK only "clean" and "invalidate" are clearly defined terms.
> > The command buffer is cleaned to DRAM in nvme_submit_cmd(), so this is
> > purely about the buffer for the *return* data, in which case I would
> > expect this being a pure invalidation problem.
> >
> > This issue looks like the case described on slide 30 and 31 here:
> > https://elinux.org/images/6/6d/Rutland2.pdf
>
> Thanks for the slides. This is really helpful.
>
> >
> > > The NVME here submitting the admin command using dma_addr
> > > to the memory without prior cache invalidation. This causing
> > > dma_addr is pointing to an invalid location in the memory
> >
> > This does not sound right: dma_addr is always pointing to the right
> > location in memory.
> > The actual reason this fixes something might be that (some) cache lines
> > of the DMA buffer were dirty and were evicted *before* the existing
> > invalidation, but *after* the DMA transfer. That sounds unlikely in an
> > U-Boot context, though, but would match the case described in Mark's slides.
> >
> > So there might be more to this. Are we missing a barrier here? Should we
> > use coherent buffers (memory mapped as un-cached) for the return DMA in
> > the first place (but I don't think U-Boot provides those)?
> >
> > > and found the sane nvme_ctrl result.
> > >
> > > Below example shows the nvme disk scan result improper result
> > >
> > > => nvme scan
> > > nvme_get_info_from_identify: nn = 544502629, vwc = 100,
> > > sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
> > >
> > > So, invalidating the cache before submitting the admin command
> > > makes the dma_addr points to a valid location in the memory.
> >
> > If this shows up already, I think we should address all the comments in
> > the driver where it says we should do cache maintenance. And it makes me
> > wonder how this actually works today ...
> >
>
> I believe the driver was only validated on x86 and NXP's PowerPC
> series where cache coherence is ensured between the DMA master and the
> system memory. I suspect it was never validated on ARM hence this
> patch.
>
> I agree we should address all the comments in the driver about cache
> maintenance.

What about this description:

nvme: Invalidate dcache before submitting admin cmd

This patch tries to avoid eviction of dirty lines during DMA
transfer. The code right now executes the following step:

- allocate the buffer
- start a DMA operation using the non-coherent DMA buffer
- invalidate cache lines associated with the buffer
- read the buffer

This can lead to reading back not valid information. CPU can still read
stale data. In order to avoid the eviction of dirty lines, a new
invalidation is required before starting the DMA operation. The patch
just adds an invalidation before submitting the DMA command.

The example below shows the nvme disk scan result without the following
patch

=> nvme scan
nvme_get_info_from_identify: nn = 544502629, vwc = 100,
sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105

So, invalidating the cache before submitting the admin command,
fix the cpu read.

Michael
>
> Regards,
> Bin




--
| Michael Nazzareno Trimarchi                     Amarula Solutions BV |
| COO  -  Founder                                      Cruquiuskade 47 |
| +31(0)851119172                                 Amsterdam 1018 AM NL |
|                  [`as] http://www.amarulasolutions.com               |
Andre Przywara June 10, 2020, 11:27 a.m. UTC | #6
On 09/06/2020 19:47, Michael Nazzareno Trimarchi wrote:

Hi,

that looks much better now, thanks! Some suggestion below.

> On Tue, Jun 9, 2020 at 6:13 AM Bin Meng <bmeng.cn at gmail.com> wrote:
>>
>> Hi Andr?,
>>
>> On Mon, Jun 8, 2020 at 9:52 PM Andr? Przywara <andre.przywara at arm.com> wrote:
>>>
>>> On 07/06/2020 12:22, Jagan Teki wrote:
>>>
>>> Hi,
>>>
>>> (CC: ing Mark)
>>>
>>> Without looking to deep, I think invalidating the cache might be the
>>> right thing to do, but the rationale or at least the wording of it seems
>>> somehow flawed:
>>>
>>>> Some architecture like ARM Cortex A53, A72 would need
>>>
>>> Please don't mix the term "architecture" with actual implementations.
>>> And the problem is more general, I would say. This *might* be a case
>>> here where this is due to speculation, and then I would expect it to
>>> only show on an A72, for instance. I guess this is about NVMe on RK3399?
>>> Does U-Boot run on an A53 or an A72 core?
>>>
>>>> to invalidate dcache to sync the cache with the memory
>>>> contents before flushing the cache to memory.
>>>
>>> That is somewhat confusing. What does "sync" and "flush" mean here?
>>> AFAIK only "clean" and "invalidate" are clearly defined terms.
>>> The command buffer is cleaned to DRAM in nvme_submit_cmd(), so this is
>>> purely about the buffer for the *return* data, in which case I would
>>> expect this being a pure invalidation problem.
>>>
>>> This issue looks like the case described on slide 30 and 31 here:
>>> https://elinux.org/images/6/6d/Rutland2.pdf
>>
>> Thanks for the slides. This is really helpful.
>>
>>>
>>>> The NVME here submitting the admin command using dma_addr
>>>> to the memory without prior cache invalidation. This causing
>>>> dma_addr is pointing to an invalid location in the memory
>>>
>>> This does not sound right: dma_addr is always pointing to the right
>>> location in memory.
>>> The actual reason this fixes something might be that (some) cache lines
>>> of the DMA buffer were dirty and were evicted *before* the existing
>>> invalidation, but *after* the DMA transfer. That sounds unlikely in an
>>> U-Boot context, though, but would match the case described in Mark's slides.
>>>
>>> So there might be more to this. Are we missing a barrier here? Should we
>>> use coherent buffers (memory mapped as un-cached) for the return DMA in
>>> the first place (but I don't think U-Boot provides those)?
>>>
>>>> and found the sane nvme_ctrl result.
>>>>
>>>> Below example shows the nvme disk scan result improper result
>>>>
>>>> => nvme scan
>>>> nvme_get_info_from_identify: nn = 544502629, vwc = 100,
>>>> sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
>>>>
>>>> So, invalidating the cache before submitting the admin command
>>>> makes the dma_addr points to a valid location in the memory.
>>>
>>> If this shows up already, I think we should address all the comments in
>>> the driver where it says we should do cache maintenance. And it makes me
>>> wonder how this actually works today ...
>>>
>>
>> I believe the driver was only validated on x86 and NXP's PowerPC
>> series where cache coherence is ensured between the DMA master and the
>> system memory. I suspect it was never validated on ARM hence this
>> patch.
>>
>> I agree we should address all the comments in the driver about cache
>> maintenance.
> 
> What about this description:
> 
> nvme: Invalidate dcache before submitting admin cmd
> 
> This patch tries to avoid eviction of dirty lines during DMA
> transfer. The code right now executes the following step:
> 
> - allocate the buffer
> - start a DMA operation using the non-coherent DMA buffer
> - invalidate cache lines associated with the buffer
> - read the buffer
> 
> This can lead to reading back not valid information. CPU can still read
> stale data. In order to avoid the eviction of dirty lines, a new
> invalidation is required before starting the DMA operation. The patch
> just adds an invalidation before submitting the DMA command.

As I think this case is not obvious, I would make it more clear what
actually happens here, for instance (replace your paragraph above):

======================
This can lead to reading back not valid information, because the cache
controller could evict dirty cache lines belonging to the buffer *after*
the DMA operation has started to fill the DRAM.
In order to avoid this, a new invalidation is required *before* starting
the DMA operation. The patch just adds an invalidation before submitting
the DMA command.
======================

And your point about NXP PCIe and x86 being cache coherent is a good
one: Actually the ARM SBSA specification demands the same, so systems
complying with this (servers, typically) would probably also work
already. But those smartphone-class SoCs are probably not coherent.

It seems like drivers for PCI devices in U-Boot seem to assume some
properties based on the host controllers and systems they have been
tested on, as I found other issues lately. Some drivers do cache
maintenance, others don't at all.

I will try to test NVMe on some other ARM platforms as soon as I can get
a hand on one.

Many thanks!
Andre


> The example below shows the nvme disk scan result without the following
> patch
> 
> => nvme scan
> nvme_get_info_from_identify: nn = 544502629, vwc = 100,
> sn = dev_0T, mn = `?\?, fr = t_part, mdts = 105
> 
> So, invalidating the cache before submitting the admin command,
> fix the cpu read.
> 
> Michael
>>
>> Regards,
>> Bin
> 
> 
> 
> 
> --
> | Michael Nazzareno Trimarchi                     Amarula Solutions BV |
> | COO  -  Founder                                      Cruquiuskade 47 |
> | +31(0)851119172                                 Amsterdam 1018 AM NL |
> |                  [`as] http://www.amarulasolutions.com               |
>
diff mbox series

Patch

diff --git a/drivers/nvme/nvme.c b/drivers/nvme/nvme.c
index 0357aba7f1..fc64d93ab8 100644
--- a/drivers/nvme/nvme.c
+++ b/drivers/nvme/nvme.c
@@ -466,6 +466,9 @@  int nvme_identify(struct nvme_dev *dev, unsigned nsid,
 
 	c.identify.cns = cpu_to_le32(cns);
 
+	invalidate_dcache_range(dma_addr,
+				dma_addr + sizeof(struct nvme_id_ctrl));
+
 	ret = nvme_submit_admin_cmd(dev, &c, NULL);
 	if (!ret)
 		invalidate_dcache_range(dma_addr,