[1/2] asm-generic/io: Pass result on inX() accessor to __io_par()

Message ID 20190211174544.4302-2-will.deacon@arm.com
State New
Headers show
Series
  • Ensure inX() is ordered wrt delay() routines
Related show

Commit Message

Will Deacon Feb. 11, 2019, 5:45 p.m.
The inX() I/O accessors must enforce ordering against subsequent calls
to the delay() routines, so that a read-back from a device can be used
to postpone a subsequent write to the same device.

On some architectures, including arm64, this ordering can only be
achieved by creating a dependency on the value returned by the inX()
operation, so we need to pass the value we read to the __io_par()
macro in this case.

Reported-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 include/asm-generic/io.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.11.0

Comments

Arnd Bergmann Feb. 12, 2019, 11:55 a.m. | #1
On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <will.deacon@arm.com> wrote:
>

> The inX() I/O accessors must enforce ordering against subsequent calls

> to the delay() routines, so that a read-back from a device can be used

> to postpone a subsequent write to the same device.

>

> On some architectures, including arm64, this ordering can only be

> achieved by creating a dependency on the value returned by the inX()

> operation, so we need to pass the value we read to the __io_par()

> macro in this case.

>

> Reported-by: Andrew Murray <andrew.murray@arm.com>

> Signed-off-by: Will Deacon <will.deacon@arm.com>

> ---

>  include/asm-generic/io.h | 8 ++++----


For changing the asm-generic file in the arm64 tree,

Acked-by: Arnd Bergmann <arnd@arndb.de>


For all I can see, this should not conflict with the usage of the
same macros on RISC-V, though it does make add a significant
difference, so I'd like to see an Ack from the RISC-V folks as
well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
to do a corresponding change.

       Arnd
Will Deacon Feb. 13, 2019, 5:46 p.m. | #2
On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
> On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <will.deacon@arm.com> wrote:

> >

> > The inX() I/O accessors must enforce ordering against subsequent calls

> > to the delay() routines, so that a read-back from a device can be used

> > to postpone a subsequent write to the same device.

> >

> > On some architectures, including arm64, this ordering can only be

> > achieved by creating a dependency on the value returned by the inX()

> > operation, so we need to pass the value we read to the __io_par()

> > macro in this case.

> >

> > Reported-by: Andrew Murray <andrew.murray@arm.com>

> > Signed-off-by: Will Deacon <will.deacon@arm.com>

> > ---

> >  include/asm-generic/io.h | 8 ++++----

> 

> For changing the asm-generic file in the arm64 tree,

> 

> Acked-by: Arnd Bergmann <arnd@arndb.de>


Thanks, Arnd.

> For all I can see, this should not conflict with the usage of the

> same macros on RISC-V, though it does make add a significant

> difference, so I'd like to see an Ack from the RISC-V folks as

> well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h

> to do a corresponding change.


There's already a comment in that header which says that the accesses are
ordered wrt timer reads, so I don't think anything needs to change there.
For consistency with the macro arguments, I could augment their __io_par to
take the read value as an unused argument, if that's what you mean?

Will
Arnd Bergmann Feb. 13, 2019, 8:59 p.m. | #3
On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote:

> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:

>

> > For all I can see, this should not conflict with the usage of the

> > same macros on RISC-V, though it does make add a significant

> > difference, so I'd like to see an Ack from the RISC-V folks as

> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h

> > to do a corresponding change.

>

> There's already a comment in that header which says that the accesses are

> ordered wrt timer reads, so I don't think anything needs to change there.

> For consistency with the macro arguments, I could augment their __io_par to

> take the read value as an unused argument, if that's what you mean?


Yes, that's what I meant, I should have been clearer there.

     Arnd
Palmer Dabbelt Feb. 13, 2019, 9:57 p.m. | #4
On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote:
> On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote:

>

>> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:

>>

>> > For all I can see, this should not conflict with the usage of the

>> > same macros on RISC-V, though it does make add a significant

>> > difference, so I'd like to see an Ack from the RISC-V folks as

>> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h

>> > to do a corresponding change.


Thanks, the original patches didn't make it through my filters.

>> There's already a comment in that header which says that the accesses are

>> ordered wrt timer reads, so I don't think anything needs to change there.

>> For consistency with the macro arguments, I could augment their __io_par to

>> take the read value as an unused argument, if that's what you mean?


FWIW, we don't really have a way to mandate this in the ISA yet as there's no 
formal model for either CSR orderings or the IO memory space.  

> Yes, that's what I meant, I should have been clearer there.


That sounds reasonable to me.  It looks like we can also go ahead and delete a 
bunch of arch/riscv/include/asm/io.h now that this stuff is in asm-generic, 
which would cause us to actually start using these things.  I didn't know this 
had all been moved to asm-generic otherwise I would have cleaned this up 
earlier.

I think this should do it, but this does bring up a bit of an issue: the RISC-V 
versions of reads and friends put barriers outside the loop, while the 
asm-generic version don't.  What are these actually supposed to do?

Either way that resolves, feel free to consider something like

diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index b269451e7e85..378975f180a7 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
  * writes.
  */
 #define __io_pbr()	__asm__ __volatile__ ("fence io,i"  : : : "memory");
-#define __io_par()	__asm__ __volatile__ ("fence i,ior" : : : "memory");
+#define __io_par(v)	__asm__ __volatile__ ("fence i,ior" : : : "memory");
 #define __io_pbw()	__asm__ __volatile__ ("fence iow,o" : : : "memory");
 #define __io_paw()	__asm__ __volatile__ ("fence o,io"  : : : "memory");

-#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
-#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
-#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
+#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
+#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
+#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })

 #define outb(v,c)	({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
 #define outw(v,c)	({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
 #define outl(v,c)	({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })

 #ifdef CONFIG_64BIT
-#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; })
+#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; })
 #define outq(v,c)	({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); })
 #endif

@@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar())
 #define readsw(addr, buffer, count) __readsw(addr, buffer, count)
 #define readsl(addr, buffer, count) __readsl(addr, buffer, count)

-__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par())
-__io_reads_ins(ins, u16, w, __io_pbr(), __io_par())
-__io_reads_ins(ins, u32, l, __io_pbr(), __io_par())
+__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par(addr))
+__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
+__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
 #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count)
 #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count)
 #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count)

as

Revewied-by: Palmer Dabbelt <palmer@sifive.com>

when included along with the other diff.  That way we can at least keep the 
macro signatures matching, the cleanup can come later...

Thanks!
Will Deacon Feb. 18, 2019, 3:56 p.m. | #5
On Wed, Feb 13, 2019 at 01:57:50PM -0800, Palmer Dabbelt wrote:
> On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote:

> > On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote:

> > 

> > > On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:

> > > 

> > > > For all I can see, this should not conflict with the usage of the

> > > > same macros on RISC-V, though it does make add a significant

> > > > difference, so I'd like to see an Ack from the RISC-V folks as

> > > > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h

> > > > to do a corresponding change.

> 

> Thanks, the original patches didn't make it through my filters.

> 

> > > There's already a comment in that header which says that the accesses are

> > > ordered wrt timer reads, so I don't think anything needs to change there.

> > > For consistency with the macro arguments, I could augment their __io_par to

> > > take the read value as an unused argument, if that's what you mean?

> 

> FWIW, we don't really have a way to mandate this in the ISA yet as there's

> no formal model for either CSR orderings or the IO memory space.


Ah, so you may end up needing the dependency trick too, depending on where
you land with the ISA.

> > Yes, that's what I meant, I should have been clearer there.

> 

> That sounds reasonable to me.  It looks like we can also go ahead and delete

> a bunch of arch/riscv/include/asm/io.h now that this stuff is in

> asm-generic, which would cause us to actually start using these things.  I

> didn't know this had all been moved to asm-generic otherwise I would have

> cleaned this up earlier.

> 

> I think this should do it, but this does bring up a bit of an issue: the

> RISC-V versions of reads and friends put barriers outside the loop, while

> the asm-generic version don't.  What are these actually supposed to do?


You're referring to the string accessors (e.g. insb() and readsw()), right?
arm and arm64 don't provide barriers here either, and I don't think they
should have to given that these routines are usually used to poll data
register-based FIFOs and therefore don't need to provide ordering guarantees
against DMA operations. However, this is woefully undocumented and I shall
try to address this in the next version of my memory-barriers.txt patch
relating to this area [1].

> Either way that resolves, feel free to consider something like

> 

> diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h

> index b269451e7e85..378975f180a7 100644

> --- a/arch/riscv/include/asm/io.h

> +++ b/arch/riscv/include/asm/io.h

> @@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)

>  * writes.

>  */

> #define __io_pbr()	__asm__ __volatile__ ("fence io,i"  : : : "memory");

> -#define __io_par()	__asm__ __volatile__ ("fence i,ior" : : : "memory");

> +#define __io_par(v)	__asm__ __volatile__ ("fence i,ior" : : : "memory");

> #define __io_pbw()	__asm__ __volatile__ ("fence iow,o" : : : "memory");

> #define __io_paw()	__asm__ __volatile__ ("fence o,io"  : : : "memory");

> 

> -#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })

> -#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })

> -#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })

> +#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })

> +#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })

> +#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })

> 

> #define outb(v,c)	({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })

> #define outw(v,c)	({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })

> #define outl(v,c)	({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })

> 

> #ifdef CONFIG_64BIT

> -#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; })

> +#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; })

> #define outq(v,c)	({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); })

> #endif

> 

> @@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar())

> #define readsw(addr, buffer, count) __readsw(addr, buffer, count)

> #define readsl(addr, buffer, count) __readsl(addr, buffer, count)

> 

> -__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par())

> -__io_reads_ins(ins, u16, w, __io_pbr(), __io_par())

> -__io_reads_ins(ins, u32, l, __io_pbr(), __io_par())

> +__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par(addr))

> +__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))

> +__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))

> #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count)

> #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count)

> #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count)

> 

> as

> 

> Revewied-by: Palmer Dabbelt <palmer@sifive.com>

> 

> when included along with the other diff.  That way we can at least keep the

> macro signatures matching, the cleanup can come later...


Thanks, Palmer! I'll send a v2 of this patch, updated to fix up insq() as
well as the readX() macros too, since they're likely to suffer the exact
same issues as inX() in this regard.

Will

[1] https://lkml.org/lkml/2019/2/11/1803

Patch

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index d356f802945a..b5737c0d8316 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -65,7 +65,7 @@ 
 #endif
 
 #ifndef __io_par
-#define __io_par()     __io_ar()
+#define __io_par(v)     __io_ar()
 #endif
 
 
@@ -471,7 +471,7 @@  static inline u8 inb(unsigned long addr)
 
 	__io_pbr();
 	val = __raw_readb(PCI_IOBASE + addr);
-	__io_par();
+	__io_par(val);
 	return val;
 }
 #endif
@@ -484,7 +484,7 @@  static inline u16 inw(unsigned long addr)
 
 	__io_pbr();
 	val = __le16_to_cpu(__raw_readw(PCI_IOBASE + addr));
-	__io_par();
+	__io_par(val);
 	return val;
 }
 #endif
@@ -497,7 +497,7 @@  static inline u32 inl(unsigned long addr)
 
 	__io_pbr();
 	val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
-	__io_par();
+	__io_par(val);
 	return val;
 }
 #endif