diff mbox

asm-generic/io.h: Implement read[bwlq]_relaxed()

Message ID 1410264760-29756-1-git-send-email-daniel.thompson@linaro.org
State New
Headers show

Commit Message

Daniel Thompson Sept. 9, 2014, 12:12 p.m. UTC
Currently the read[bwlq]_relaxed() family are implemented on every
architecture except blackfin, m68k[1], metag, openrisc, s390[2] and
score. Increasingly drivers are being optimized to exploit relaxed
reads putting these architectures at risk of compilation failures for
shared drivers.

This patch addresses this by providing implementations of
read[bwlq]_relaxed() that are identical to the equivalent read[bwlq]().
All the above architectures include asm-generic/io.h .

Note that currently only eight architectures (alpha, arm, arm64, avr32,
hexagon, microblaze, mips and sh) implement write[bwlq]_relaxed() meaning
these functions are deliberately not included in this patch.

[1] m68k includes the relaxed family only when configured *without* MMU.
[2] s390 requires CONFIG_PCI to include the relaxed family.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
---
 include/asm-generic/io.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

--
1.9.3

Comments

Will Deacon Sept. 9, 2014, 12:28 p.m. UTC | #1
Hi Daniel,

On Tue, Sep 09, 2014 at 01:12:40PM +0100, Daniel Thompson wrote:
> Currently the read[bwlq]_relaxed() family are implemented on every
> architecture except blackfin, m68k[1], metag, openrisc, s390[2] and
> score. Increasingly drivers are being optimized to exploit relaxed
> reads putting these architectures at risk of compilation failures for
> shared drivers.
> 
> This patch addresses this by providing implementations of
> read[bwlq]_relaxed() that are identical to the equivalent read[bwlq]().
> All the above architectures include asm-generic/io.h .
> 
> Note that currently only eight architectures (alpha, arm, arm64, avr32,
> hexagon, microblaze, mips and sh) implement write[bwlq]_relaxed() meaning
> these functions are deliberately not included in this patch.
> 
> [1] m68k includes the relaxed family only when configured *without* MMU.
> [2] s390 requires CONFIG_PCI to include the relaxed family.
> 
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: linux-arch@vger.kernel.org
> ---
>  include/asm-generic/io.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

I have a larger series adding these (and the write equivalents) to all
architectures that I periodically post and then fail to get on top of.

The key part you're missing is defining some generic semantics for these
accessors. Without those, I don't think it makes sense to put them into
asm-generic, because drivers can't safely infer any meaning from the relaxed
definition.

Ben and I agreed on something back in May:

  https://lkml.org/lkml/2014/5/22/468

but I need to send a new version including:

  - ioreadX_relaxed and iowriteX_relaxed
  - Strengthening non-relaxed I/O accessors on architectures with non-empty
    mmiowb()

I'll bump it up the list. In the meantime, you can have a look at my io
branch on kernel.org

Will
Geert Uytterhoeven Sept. 9, 2014, 12:31 p.m. UTC | #2
Hi Daniel,

On Tue, Sep 9, 2014 at 2:12 PM, Daniel Thompson
<daniel.thompson@linaro.org> wrote:
> Currently the read[bwlq]_relaxed() family are implemented on every
> architecture except blackfin, m68k[1], metag, openrisc, s390[2] and
> score. Increasingly drivers are being optimized to exploit relaxed
> reads putting these architectures at risk of compilation failures for
> shared drivers.
>
> This patch addresses this by providing implementations of
> read[bwlq]_relaxed() that are identical to the equivalent read[bwlq]().
> All the above architectures include asm-generic/io.h .

m68k does not include asm-generic/io.h.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Daniel Thompson Sept. 9, 2014, 1:03 p.m. UTC | #3
On 09/09/14 13:28, Will Deacon wrote:
> Hi Daniel,
> 
> On Tue, Sep 09, 2014 at 01:12:40PM +0100, Daniel Thompson wrote:
>> Currently the read[bwlq]_relaxed() family are implemented on every
>> architecture except blackfin, m68k[1], metag, openrisc, s390[2] and
>> score. Increasingly drivers are being optimized to exploit relaxed
>> reads putting these architectures at risk of compilation failures for
>> shared drivers.
>>
>> This patch addresses this by providing implementations of
>> read[bwlq]_relaxed() that are identical to the equivalent read[bwlq]().
>> All the above architectures include asm-generic/io.h .
>>
>> Note that currently only eight architectures (alpha, arm, arm64, avr32,
>> hexagon, microblaze, mips and sh) implement write[bwlq]_relaxed() meaning
>> these functions are deliberately not included in this patch.
>>
>> [1] m68k includes the relaxed family only when configured *without* MMU.
>> [2] s390 requires CONFIG_PCI to include the relaxed family.
>>
>> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: linux-arch@vger.kernel.org
>> ---
>>  include/asm-generic/io.h | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
> 
> I have a larger series adding these (and the write equivalents) to all
> architectures that I periodically post and then fail to get on top of.

That's why you're on Cc:...


> The key part you're missing is defining some generic semantics for these
> accessors. Without those, I don't think it makes sense to put them into
> asm-generic, because drivers can't safely infer any meaning from the relaxed
> definition.

Currently the semantics are described as:
--- cut here ---
PCI ordering rules also guarantee that PIO read responses arrive after
any outstanding DMA writes from that bus, since for some devices the
result of a readb call may signal to the driver that a DMA transaction
is complete. In many cases, however, the driver may want to indicate
that the next readb call has no relation to any previous DMA writes
performed by the device. The driver can use readb_relaxed for these
cases, although only some platforms will honor the relaxed semantics.
Using the relaxed read functions will provide significant performance
benefits on platforms that support it. The qla2xxx driver provides
examples of how to use readX_relaxed . In many cases, a majority of the
driver’s readX calls can safely be converted to readX_relaxed calls,
since only a few will indicate or depend on DMA completion.
--- cut here ---

The implementation provided in the patch trivially meets this definition
(by not honouring the relaxedness).


> Ben and I agreed on something back in May:
> 
>   https://lkml.org/lkml/2014/5/22/468

... and didn't you also conclude with hpa that the very relaxed x86
implementation of readl_relaxed() already meets this definition (as do
these changes to asm-generic/io.h).

Thus allowing its use to perculate more widely really shouldn't do an harm.


> but I need to send a new version including:
> 
>   - ioreadX_relaxed and iowriteX_relaxed
>   - Strengthening non-relaxed I/O accessors on architectures with non-empty
>     mmiowb()
> 
> I'll bump it up the list. In the meantime, you can have a look at my io
> branch on kernel.org

I'd really like to see your work included (which I spotted after I wrote
the patch and when it occured to me to visit
https://www.google.com/search?q=asm-generic+readl_relaxed to see if
there was a well known reason not to make this change).

However... I really can't see why we should delay introducing an already
documented function to the remaining architectures.


Daniel.
Daniel Thompson Sept. 9, 2014, 1:11 p.m. UTC | #4
On 09/09/14 13:31, Geert Uytterhoeven wrote:
> Hi Daniel,
> 
> On Tue, Sep 9, 2014 at 2:12 PM, Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
>> Currently the read[bwlq]_relaxed() family are implemented on every
>> architecture except blackfin, m68k[1], metag, openrisc, s390[2] and
>> score. Increasingly drivers are being optimized to exploit relaxed
>> reads putting these architectures at risk of compilation failures for
>> shared drivers.
>>
>> This patch addresses this by providing implementations of
>> read[bwlq]_relaxed() that are identical to the equivalent read[bwlq]().
>> All the above architectures include asm-generic/io.h .
> 
> m68k does not include asm-generic/io.h.

Looking back through my command history I can't figure out how I made
this mistake.

I will change the comment.

For the record I have also just double checked blackfin, metag,
openrisc, s390 and score and these really do use asm-generic/io.h .


Daniel.
Daniel Thompson Sept. 9, 2014, 1:14 p.m. UTC | #5
On 09/09/14 14:03, Daniel Thompson wrote:
> On 09/09/14 13:28, Will Deacon wrote:
>> Hi Daniel,
>>
>> On Tue, Sep 09, 2014 at 01:12:40PM +0100, Daniel Thompson wrote:
>>> Currently the read[bwlq]_relaxed() family are implemented on every
>>> architecture except blackfin, m68k[1], metag, openrisc, s390[2] and
>>> score. Increasingly drivers are being optimized to exploit relaxed
>>> reads putting these architectures at risk of compilation failures for
>>> shared drivers.
>>>
>>> This patch addresses this by providing implementations of
>>> read[bwlq]_relaxed() that are identical to the equivalent read[bwlq]().
>>> All the above architectures include asm-generic/io.h .
>>>
>>> Note that currently only eight architectures (alpha, arm, arm64, avr32,
>>> hexagon, microblaze, mips and sh) implement write[bwlq]_relaxed() meaning
>>> these functions are deliberately not included in this patch.
>>>
>>> [1] m68k includes the relaxed family only when configured *without* MMU.
>>> [2] s390 requires CONFIG_PCI to include the relaxed family.
>>>
>>> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: linux-arch@vger.kernel.org
>>> ---
>>>  include/asm-generic/io.h | 14 ++++++++++++++
>>>  1 file changed, 14 insertions(+)
>>
>> I have a larger series adding these (and the write equivalents) to all
>> architectures that I periodically post and then fail to get on top of.
> 
> That's why you're on Cc:...
> 
> 
>> The key part you're missing is defining some generic semantics for these
>> accessors. Without those, I don't think it makes sense to put them into
>> asm-generic, because drivers can't safely infer any meaning from the relaxed
>> definition.
> 
> Currently the semantics are described as:
> --- cut here ---
> PCI ordering rules also guarantee that PIO read responses arrive after
> any outstanding DMA writes from that bus, since for some devices the
> result of a readb call may signal to the driver that a DMA transaction
> is complete. In many cases, however, the driver may want to indicate
> that the next readb call has no relation to any previous DMA writes
> performed by the device. The driver can use readb_relaxed for these
> cases, although only some platforms will honor the relaxed semantics.
> Using the relaxed read functions will provide significant performance
> benefits on platforms that support it. The qla2xxx driver provides
> examples of how to use readX_relaxed . In many cases, a majority of the
> driver’s readX calls can safely be converted to readX_relaxed calls,
> since only a few will indicate or depend on DMA completion.
> --- cut here ---
> 
> The implementation provided in the patch trivially meets this definition
> (by not honouring the relaxedness).
> 
> 
>> Ben and I agreed on something back in May:
>>
>>   https://lkml.org/lkml/2014/5/22/468
> 
> ... and didn't you also conclude with hpa that the very relaxed x86
> implementation of readl_relaxed() already meets this definition (as do
> these changes to asm-generic/io.h).

Sorry. "very relaxed" is always a very stupid thing to say about x86
(especially to an arm guy).

More exactly I was referring to the absence of memory clobber in x86
readl_relaxed().


> 
> Thus allowing its use to perculate more widely really shouldn't do an harm.
> 
> 
>> but I need to send a new version including:
>>
>>   - ioreadX_relaxed and iowriteX_relaxed
>>   - Strengthening non-relaxed I/O accessors on architectures with non-empty
>>     mmiowb()
>>
>> I'll bump it up the list. In the meantime, you can have a look at my io
>> branch on kernel.org
> 
> I'd really like to see your work included (which I spotted after I wrote
> the patch and when it occured to me to visit
> https://www.google.com/search?q=asm-generic+readl_relaxed to see if
> there was a well known reason not to make this change).
> 
> However... I really can't see why we should delay introducing an already
> documented function to the remaining architectures.
> 
> 
> Daniel.
>
Will Deacon Sept. 9, 2014, 2:15 p.m. UTC | #6
On Tue, Sep 09, 2014 at 02:14:54PM +0100, Daniel Thompson wrote:
> On 09/09/14 14:03, Daniel Thompson wrote:
> > On 09/09/14 13:28, Will Deacon wrote:
> >> I have a larger series adding these (and the write equivalents) to all
> >> architectures that I periodically post and then fail to get on top of.
> > 
> > That's why you're on Cc:...

Ok, so why not just pick the asm-generic patch out of my series?

> >> The key part you're missing is defining some generic semantics for these
> >> accessors. Without those, I don't think it makes sense to put them into
> >> asm-generic, because drivers can't safely infer any meaning from the relaxed
> >> definition.
> > 
> > Currently the semantics are described as:
> > --- cut here ---
> > PCI ordering rules also guarantee that PIO read responses arrive after
> > any outstanding DMA writes from that bus, since for some devices the
> > result of a readb call may signal to the driver that a DMA transaction
> > is complete. In many cases, however, the driver may want to indicate
> > that the next readb call has no relation to any previous DMA writes
> > performed by the device. The driver can use readb_relaxed for these
> > cases, although only some platforms will honor the relaxed semantics.
> > Using the relaxed read functions will provide significant performance
> > benefits on platforms that support it. The qla2xxx driver provides
> > examples of how to use readX_relaxed . In many cases, a majority of the
> > driver’s readX calls can safely be converted to readX_relaxed calls,
> > since only a few will indicate or depend on DMA completion.
> > --- cut here ---
> > 
> > The implementation provided in the patch trivially meets this definition
> > (by not honouring the relaxedness).

I still think we need to mention ordering of relaxed reads against each
other and also against spinlocks.

> >> Ben and I agreed on something back in May:
> >>
> >>   https://lkml.org/lkml/2014/5/22/468
> > 
> > ... and didn't you also conclude with hpa that the very relaxed x86
> > implementation of readl_relaxed() already meets this definition (as do
> > these changes to asm-generic/io.h).
> 
> Sorry. "very relaxed" is always a very stupid thing to say about x86
> (especially to an arm guy).
> 
> More exactly I was referring to the absence of memory clobber in x86
> readl_relaxed().

Yeah, my series just adds the relaxed write accessors for x86.

> > Thus allowing its use to perculate more widely really shouldn't do an harm.
> > 
> > 
> >> but I need to send a new version including:
> >>
> >>   - ioreadX_relaxed and iowriteX_relaxed
> >>   - Strengthening non-relaxed I/O accessors on architectures with non-empty
> >>     mmiowb()
> >>
> >> I'll bump it up the list. In the meantime, you can have a look at my io
> >> branch on kernel.org
> > 
> > I'd really like to see your work included (which I spotted after I wrote
> > the patch and when it occured to me to visit
> > https://www.google.com/search?q=asm-generic+readl_relaxed to see if
> > there was a well known reason not to make this change).
> > 
> > However... I really can't see why we should delay introducing an already
> > documented function to the remaining architectures.

I'd just rather fix the interface once instead of churning it about. How
about I dust off the series again?

Will
Daniel Thompson Sept. 9, 2014, 2:51 p.m. UTC | #7
On 09/09/14 15:15, Will Deacon wrote:
> On Tue, Sep 09, 2014 at 02:14:54PM +0100, Daniel Thompson wrote:
>> On 09/09/14 14:03, Daniel Thompson wrote:
>>> On 09/09/14 13:28, Will Deacon wrote:
>>>> I have a larger series adding these (and the write equivalents) to all
>>>> architectures that I periodically post and then fail to get on top of.
>>>
>>> That's why you're on Cc:...
> 
> Ok, so why not just pick the asm-generic patch out of my series?

Only really that your patch introducing the writeX_relaxed() family at
the same time.

However it would be fine to subset your patch rather than use mine. You
did post it first...


>>>> The key part you're missing is defining some generic semantics for these
>>>> accessors. Without those, I don't think it makes sense to put them into
>>>> asm-generic, because drivers can't safely infer any meaning from the relaxed
>>>> definition.
>>>
>>> Currently the semantics are described as:
>>> --- cut here ---
>>> PCI ordering rules also guarantee that PIO read responses arrive after
>>> any outstanding DMA writes from that bus, since for some devices the
>>> result of a readb call may signal to the driver that a DMA transaction
>>> is complete. In many cases, however, the driver may want to indicate
>>> that the next readb call has no relation to any previous DMA writes
>>> performed by the device. The driver can use readb_relaxed for these
>>> cases, although only some platforms will honor the relaxed semantics.
>>> Using the relaxed read functions will provide significant performance
>>> benefits on platforms that support it. The qla2xxx driver provides
>>> examples of how to use readX_relaxed . In many cases, a majority of the
>>> driver’s readX calls can safely be converted to readX_relaxed calls,
>>> since only a few will indicate or depend on DMA completion.
>>> --- cut here ---
>>>
>>> The implementation provided in the patch trivially meets this definition
>>> (by not honouring the relaxedness).
> 
> I still think we need to mention ordering of relaxed reads against each
> other and also against spinlocks.

I don't disagree.

I just think the documentation being sub-optimal is not a good reason to
avoid implementing the read functions.


>>>> Ben and I agreed on something back in May:
>>>>
>>>>   https://lkml.org/lkml/2014/5/22/468
>>>
>>> ... and didn't you also conclude with hpa that the very relaxed x86
>>> implementation of readl_relaxed() already meets this definition (as do
>>> these changes to asm-generic/io.h).
>>
>> Sorry. "very relaxed" is always a very stupid thing to say about x86
>> (especially to an arm guy).
>>
>> More exactly I was referring to the absence of memory clobber in x86
>> readl_relaxed().
> 
> Yeah, my series just adds the relaxed write accessors for x86.
> 
>>> Thus allowing its use to perculate more widely really shouldn't do an harm.
>>>
>>>
>>>> but I need to send a new version including:
>>>>
>>>>   - ioreadX_relaxed and iowriteX_relaxed
>>>>   - Strengthening non-relaxed I/O accessors on architectures with non-empty
>>>>     mmiowb()
>>>>
>>>> I'll bump it up the list. In the meantime, you can have a look at my io
>>>> branch on kernel.org
>>>
>>> I'd really like to see your work included (which I spotted after I wrote
>>> the patch and when it occured to me to visit
>>> https://www.google.com/search?q=asm-generic+readl_relaxed to see if
>>> there was a well known reason not to make this change).
>>>
>>> However... I really can't see why we should delay introducing an already
>>> documented function to the remaining architectures.
> 
> I'd just rather fix the interface once instead of churning it about

Churn? 12 lines of code where two people independently produce the same
thing (apart from ordering within the file)?


> How about I dust off the series again?

Dusting off the series again would be great.

Would you consider putting readX_relaxed() and its documentation at the
front of the patchset? That way if the writel_relaxed() side log jams
again we can still get some of it delivered.


Daniel.
diff mbox

Patch

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 975e1cc..85ea117 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -66,6 +66,16 @@  static inline u32 readl(const volatile void __iomem *addr)
 	return __le32_to_cpu(__raw_readl(addr));
 }

+#ifndef readb_relaxed
+#define readb_relaxed __raw_readb
+#endif
+#ifndef readw_relaxed
+#define readw_relaxed readw
+#endif
+#ifndef readl_relaxed
+#define readl_relaxed readl
+#endif
+
 #ifndef __raw_writeb
 static inline void __raw_writeb(u8 b, volatile void __iomem *addr)
 {
@@ -105,6 +115,10 @@  static inline u64 readq(const volatile void __iomem *addr)
 	return __le64_to_cpu(__raw_readq(addr));
 }

+#ifndef readq_relaxed
+#define readq_relaxed readq
+#endif
+
 #ifndef __raw_writeq
 static inline void __raw_writeq(u64 b, volatile void __iomem *addr)
 {