diff mbox series

[3/7] usb: gadget: pxa25x_udc: use readl/writel for mmio

Message ID 1453997843-3489728-3-git-send-email-arnd@arndb.de
State Accepted
Commit 65bc0fba4ed6e2ea7d89539985feb576ae2f84b9
Headers show
Series USB changes for rare warnings | expand

Commit Message

Arnd Bergmann Jan. 28, 2016, 4:17 p.m. UTC
This converts the pxa25x udc driver to use readl/writel as normal
driver should do, rather than dereferencing __iomem pointers
themselves.

Based on the earlier preparation work, we can now also pass
the register start in the device pointer so we no longer need
the global variable.

The unclear part here is for IXP4xx, which supports both big-endian
and little-endian configurations. So far, the driver has done
no byteswap in either case. I suspect that is wrong and it would
actually need to swap in one or the other case, but I don't know
which. It's also possible that there is some magic setting in
the chip that makes the endianess of the MMIO register match the
CPU, and in that case, the code actually does the right thing
for all configurations, both before and after this patch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
 drivers/usb/gadget/udc/pxa25x_udc.c | 61 ++++++++++++++++++++++++-------------
 drivers/usb/gadget/udc/pxa25x_udc.h |  1 +
 2 files changed, 40 insertions(+), 22 deletions(-)

-- 
2.7.0

Comments

Robert Jarzmik Jan. 29, 2016, 10:17 a.m. UTC | #1
Arnd Bergmann <arnd@arndb.de> writes:

> This converts the pxa25x udc driver to use readl/writel as normal

> driver should do, rather than dereferencing __iomem pointers

> themselves.

>

> Based on the earlier preparation work, we can now also pass

> the register start in the device pointer so we no longer need

> the global variable.

>

> The unclear part here is for IXP4xx, which supports both big-endian

> and little-endian configurations. So far, the driver has done

> no byteswap in either case. I suspect that is wrong and it would

> actually need to swap in one or the other case, but I don't know

> which. It's also possible that there is some magic setting in

> the chip that makes the endianess of the MMIO register match the

> CPU, and in that case, the code actually does the right thing

> for all configurations, both before and after this patch.

>

> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

For pxa25x_udc:
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>


Cheers.

--
Robert
Krzysztof Hałasa Jan. 29, 2016, 4:18 p.m. UTC | #2
Arnd Bergmann <arnd@arndb.de> writes:

> The unclear part here is for IXP4xx, which supports both big-endian

> and little-endian configurations. So far, the driver has done

> no byteswap in either case. I suspect that is wrong and it would

> actually need to swap in one or the other case, but I don't know

> which.


If at all, I guess it should swap in LE mode. But it's far from certain.

> It's also possible that there is some magic setting in

> the chip that makes the endianess of the MMIO register match the

> CPU, and in that case, the code actually does the right thing

> for all configurations, both before and after this patch.


This is IMHO most probable.

Actually, the IXP4xx is "natural" in BE mode (except for PCI) and
normally in LE mode it's order-coherent, meaning 32-bit "integer"
accesses need no swapping, but 8-bit and (mostly unused) 16-bit
transfers need swapping.

Anyway, I think readl()/writel() do the right thing: in BE mode they
swap PCI accesses and don't swap normal registers, in LE mode nothing is
swapped. LE data-coherent mode (which has never landed in the
official kernel) is a bit different, but still, readl()/writel() do the
right thing.
I remember the "string" (block) functions preserve 8-bit ordering, and
thus 32-bit values transfered using them may need swapping.
-- 
Krzysztof Halasa

Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
Sergei Shtylyov Jan. 29, 2016, 6:03 p.m. UTC | #3
Hello.

On 01/29/2016 07:18 PM, Krzysztof Hałasa wrote:

>> The unclear part here is for IXP4xx, which supports both big-endian

>> and little-endian configurations. So far, the driver has done

>> no byteswap in either case. I suspect that is wrong and it would

>> actually need to swap in one or the other case, but I don't know

>> which.

>

> If at all, I guess it should swap in LE mode. But it's far from certain.

>

>> It's also possible that there is some magic setting in

>> the chip that makes the endianess of the MMIO register match the

>> CPU, and in that case, the code actually does the right thing

>> for all configurations, both before and after this patch.

>

> This is IMHO most probable.

>

> Actually, the IXP4xx is "natural" in BE mode (except for PCI) and

> normally in LE mode it's order-coherent, meaning 32-bit "integer"

> accesses need no swapping, but 8-bit and (mostly unused) 16-bit

> transfers need swapping.

>

> Anyway, I think readl()/writel() do the right thing: in BE mode they

> swap PCI accesses and don't swap normal registers,


    Alas, readl()/writel() don't know what registers you are calling them for, 
they were designed for PCI which is little-endian, so they will swap in BE mode.

MBR, Sergei
Arnd Bergmann Jan. 29, 2016, 9:02 p.m. UTC | #4
On Friday 29 January 2016 21:03:40 Sergei Shtylyov wrote:
> On 01/29/2016 07:18 PM, Krzysztof Hałasa wrote:

> 

> >> The unclear part here is for IXP4xx, which supports both big-endian

> >> and little-endian configurations. So far, the driver has done

> >> no byteswap in either case. I suspect that is wrong and it would

> >> actually need to swap in one or the other case, but I don't know

> >> which.

> >

> > If at all, I guess it should swap in LE mode. But it's far from certain.

> >

> >> It's also possible that there is some magic setting in

> >> the chip that makes the endianess of the MMIO register match the

> >> CPU, and in that case, the code actually does the right thing

> >> for all configurations, both before and after this patch.

> >

> > This is IMHO most probable.

> >

> > Actually, the IXP4xx is "natural" in BE mode (except for PCI) and

> > normally in LE mode it's order-coherent, meaning 32-bit "integer"

> > accesses need no swapping, but 8-bit and (mostly unused) 16-bit

> > transfers need swapping.

> >

> > Anyway, I think readl()/writel() do the right thing: in BE mode they

> > swap PCI accesses and don't swap normal registers,

> 

>     Alas, readl()/writel() don't know what registers you are calling them for, 

> they were designed for PCI which is little-endian, so they will swap in BE mode.

> 


In general, that is correct, but as Krzysztof said, ixp4xx is rather
special here, and it implements its own readl/writel functions
that behave differently for PCI spaces than for on-chip addresses,
See arch/arm/mach-ixp4xx/include/mach/io.h.

This platform evidently does its own tricks with byte swapping so
while a normal big-endian platform has to swap on readl (but not
readsl), ixp4xx never does any swaps when CONFIG_IXP4XX_INDIRECT_PCI
is set (neither readl nor readsl, neither PCI nor on-chip MMU)

However if CONFIG_IXP4XX_INDIRECT_PCI is disabled (which I think
is almost always the case), it basically behaves like all the other
platforms, and in big-endian configurations performsm swaps for
inl/readl/ioread32 but not readsl/ioread32_rep/insl (it swaps
twice for the last one).

	Arnd
Krzysztof Hałasa Feb. 15, 2016, 7:33 a.m. UTC | #5
Arnd Bergmann <arnd@arndb.de> writes:

>> Anyway, I think readl()/writel() do the right thing: in BE mode they

>> swap PCI accesses and don't swap normal registers, in LE mode nothing is

>> swapped.

>

> This seems to be true when CONFIG_IXP4XX_INDIRECT_PCI is set, but

> not otherwise. For the indirect variant, writel() is a __raw_writel()

> without barrier or byteswap on non-PCI memory, while with that

> option disabled, we use the standard implementation that has both

> a byteswap and a barrier.

>

> According to your description, that would mean the version without

> indirect PCI access is broken and it appears to have been that way

> since before the start of git history in 2.6.12.

>

> It's possible that nobody cared because all drivers for non-PCI

> devices on ixp4xx (the on chip ones) just use __raw_readl or

> direct pointer references.


Well, it is possible. I recall I probably used __raw_* instead of
readl()/writel() in qmgr/npe/Ethernet drivers for this very reason.

I think Direct pointer references are only used for locations in RAM
(DMA buffers), they have their own share of problems because the
peripherals are always big endian.

I think I still have an early IXP425 board with UDC connector so I will
try to dig it up and test this code.

I wonder if we should switch the driver to use __raw_*, too. The problem
is almost nobody uses UDC with this CPU.
-- 
Krzysztof Halasa

Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
Krzysztof Hałasa Feb. 15, 2016, 1:51 p.m. UTC | #6
Arnd Bergmann <arnd@arndb.de> writes:

> I consider the use of __raw_* accessors a bug, I don't think we should

> ever do that because it hides how the hardware actually works, it doesn't

> work with spinlocks, and it can lead to the compiler splitting up accesses

> into byte sized ones (not on ARM with the current definition, but

> possible in general).


Well, then maybe we should fix them, or add another set.
Why don't they work with spinlocks?

To be honest, I remember this was already discussed a bit years ago.
I think I proposed back then a set of read_le32 (which would be
equivalent of current readl(), and could be named pci_readl() as well),
read_be32, read_host (without swapping).
The names could be better, though.

> Almost all hardware is fixed-endian, so you have to use swapping accessors

> when the CPU is the other way, except for device RAM and FIFO registers

> that are always used to transfer a byte stream (see the definition of

> readsl() and memcpy_fromio()). When you have hardware that adds byteswaps

> on the bus interface, you typically end up with MMIO registers requiring

> no swap (or double swap) and readsl()/memcpy_fromio()) suddenly requiring

> a swap that is counterintuitive.


Sure, but the __raw_* are used just to be sure there is absolutely no
swapping.
E.g. for IXP4xx, the registers never require swapping, thus readl() etc.
are not suitable for this. What is needed here is simple "atomic" 32-bit
straight to/from register (MM)IO (assuming 4-byte address alignment).

If not __raw_* then what?
-- 
Krzysztof Halasa

Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
Arnd Bergmann Feb. 15, 2016, 4:12 p.m. UTC | #7
On Monday 15 February 2016 14:51:13 Krzysztof Hałasa wrote:
> Arnd Bergmann <arnd@arndb.de> writes:

> 

> > I consider the use of __raw_* accessors a bug, I don't think we should

> > ever do that because it hides how the hardware actually works, it doesn't

> > work with spinlocks, and it can lead to the compiler splitting up accesses

> > into byte sized ones (not on ARM with the current definition, but

> > possible in general).

> 

> Well, then maybe we should fix them, or add another set.

> Why don't they work with spinlocks?


The barriers on a spinlock synchronize between CPUs but not an external
bus, so (on some architectures) a spinlock protecting an MMIO register
does not guarantee that two CPUs doing

	spin_lock();
	__raw_writel(address);
	__raw_writel(data);
	spin_unlock();

would cause pairs of address/data to be seen on the bus.

Of course this is meaningless on ixp4xx, as there is only one CPU.

> To be honest, I remember this was already discussed a bit years ago.

> I think I proposed back then a set of read_le32 (which would be

> equivalent of current readl(), and could be named pci_readl() as well),

> read_be32, read_host (without swapping).

> The names could be better, though.


It keeps coming back, but so far the status quo has been stronger,
any it generally works for portable code either hardcoding whichever
endianess the device has, or using runtime detection when the same
device can be used in various ways (e.g. USB ehci).

On powerpc, we have in_le32/in_be32 for SoC-internal register access,
while only PCI devices are allowed to be accessed using readl().

> > Almost all hardware is fixed-endian, so you have to use swapping accessors

> > when the CPU is the other way, except for device RAM and FIFO registers

> > that are always used to transfer a byte stream (see the definition of

> > readsl() and memcpy_fromio()). When you have hardware that adds byteswaps

> > on the bus interface, you typically end up with MMIO registers requiring

> > no swap (or double swap) and readsl()/memcpy_fromio()) suddenly requiring

> > a swap that is counterintuitive.

> 

> Sure, but the __raw_* are used just to be sure there is absolutely no

> swapping.

> E.g. for IXP4xx, the registers never require swapping, thus readl() etc.

> are not suitable for this. What is needed here is simple "atomic" 32-bit

> straight to/from register (MM)IO (assuming 4-byte address alignment).

> 

> If not __raw_* then what?


I would suggest using an ixp4xx specific set of accessors that comes down
to either readl() or ioread32_be(), depending on whether CONFIG_CPU_BIG_ENDIAN
is set. That makes it clear that there is a magic bus involved and that it
works on this platform but not in portable code.

	Arnd
Krzysztof Hałasa Feb. 16, 2016, 1:24 p.m. UTC | #8
Arnd Bergmann <arnd@arndb.de> writes:

> Both writes leave the CPU core within the spinlock but are not serialized

> with anything else, so there is no ordering between the CPUs when they

> enter the shared bus, other than having address before data. You'd

> expect to see address0, data0, address1, data1, but it could also

> be e.g. address0, address1, data1, data0.


Ah, so it's a matter of flushing the write buffers before exiting the
spinlock-protected code.

> The point is more what the common case is. Almost all machines we support

> have fixed-endian devices, and the drivers are correct when using readl()

> or in_le32() but are not endian-safe when using __raw_readl().


Sure, e.g. PCI does it this way (eventually swapping the data lanes if
needed).

> Using __raw_readl() has the big danger of someone accidentally "fixing"

> the driver to work like all the others in order to solve a theoretical

> endian problem, while it should really be made obvious that the hardware

> actively swaps its data on the bus.


Sure - if this is the case. On-chip IXP4xx peripherals don't swap data
at all (i.e., they match CPU endianess) - accessing their registers is
like accessing a normal CPU register. That's why they don't use
PCI-style readl() etc. - however a better name than __raw_* would
probably help here.

Using __raw_* in a PCI driver would be generally wrong.
-- 
Krzysztof Halasa

Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
Arnd Bergmann Feb. 16, 2016, 1:55 p.m. UTC | #9
On Tuesday 16 February 2016 14:24:10 Krzysztof Hałasa wrote:
> Arnd Bergmann <arnd@arndb.de> writes:

> 

> > Both writes leave the CPU core within the spinlock but are not serialized

> > with anything else, so there is no ordering between the CPUs when they

> > enter the shared bus, other than having address before data. You'd

> > expect to see address0, data0, address1, data1, but it could also

> > be e.g. address0, address1, data1, data0.

> 

> Ah, so it's a matter of flushing the write buffers before exiting the

> spinlock-protected code.


Yes, whatever the specific architecture requires. The normal readl()
functions are documented to having all the required barriers and
flushes, the __raw_* variants may (e.g. on x86) or may not have that.

> > The point is more what the common case is. Almost all machines we support

> > have fixed-endian devices, and the drivers are correct when using readl()

> > or in_le32() but are not endian-safe when using __raw_readl().

> 

> Sure, e.g. PCI does it this way (eventually swapping the data lanes if

> needed).

> 

> > Using __raw_readl() has the big danger of someone accidentally "fixing"

> > the driver to work like all the others in order to solve a theoretical

> > endian problem, while it should really be made obvious that the hardware

> > actively swaps its data on the bus.

> 

> Sure - if this is the case. On-chip IXP4xx peripherals don't swap data

> at all (i.e., they match CPU endianess) - accessing their registers is

> like accessing a normal CPU register. That's why they don't use

> PCI-style readl() etc. - however a better name than __raw_* would

> probably help here.

> 

> Using __raw_* in a PCI driver would be generally wrong.


I was really talking about built-in devices here. There are other platforms
that have an internal byteswap logic on the bus interface and they also
use that for PCI, see e.g. arch/sh/include/mach-common/mach/mangle-port.h.

ixp4xx is really special in that it performs hardware swapping for
internal devices based on CPU endianess but not on PCI devices.

I think some Broadcom MIPS systems are in the same category as IXP4xx,
but only because they have an extra byteswap on the PCI bus that they
can turn on, but very few other systems are.

Coming back to the specific pxa25x_udc case: using __raw_* accessors
in the driver would possibly end up breaking the PXA25x machines in
the (very unlikely) case that someone wants to make it work with
big-endian kernels, assuming it does not have the same hardware
byteswap logic as ixp4xx.

	Arnd
Krzysztof Hałasa Feb. 17, 2016, 8:36 a.m. UTC | #10
Arnd Bergmann <arnd@arndb.de> writes:

> ixp4xx is really special in that it performs hardware swapping for

> internal devices based on CPU endianess but not on PCI devices.


Again, IXP4xx does not perform hardware (nor any other) swapping for
registers of on-chip devices. The registers are connected 1:1,
bit 0 to bit 0 etc.

(Yes, IXP4xx can be optionally programmed for such swapping, depending
on silicon revision, but it is not used in mainline kernel).

The only hardware swapping happens on PCI bus (in BE mode), to be
compatible with other platforms and non-IXP4xx-specific PCI drivers.

> Coming back to the specific pxa25x_udc case: using __raw_* accessors

> in the driver would possibly end up breaking the PXA25x machines in

> the (very unlikely) case that someone wants to make it work with

> big-endian kernels, assuming it does not have the same hardware

> byteswap logic as ixp4xx.


I'd expect both CPUs to behave in exactly the same manner, i.e., to
not swap anything on the internal bus. If true, it would mean it should
"just work" in both BE and LE modes (including BE mode on PXA, should
it be actually possible).
-- 
Krzysztof Halasa

Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
Arnd Bergmann Feb. 17, 2016, 10:36 a.m. UTC | #11
On Wednesday 17 February 2016 09:36:37 Krzysztof Hałasa wrote:
> Arnd Bergmann <arnd@arndb.de> writes:

> 

> > ixp4xx is really special in that it performs hardware swapping for

> > internal devices based on CPU endianess but not on PCI devices.

> 

> Again, IXP4xx does not perform hardware (nor any other) swapping for

> registers of on-chip devices. The registers are connected 1:1,

> bit 0 to bit 0 etc.

> 

> (Yes, IXP4xx can be optionally programmed for such swapping, depending

> on silicon revision, but it is not used in mainline kernel).

> 

> The only hardware swapping happens on PCI bus (in BE mode), to be

> compatible with other platforms and non-IXP4xx-specific PCI drivers.


Ok, so I guess what this means is that ixp4xx (or xscale in general)
implements its big-endian mode by adding a byteswap on its DRAM
and PCI interfaces in be32 mode, rather than by changing the behavior of
the load/store operations (as be8 mode does) or by having the byteswap
in its load/store pipeline or the top-level AHB bridge?

It's a bit surprising but it sounds plausible and explains most of
the code we see.

I'm still unsure about __indirect_readsl()/ioread32_rep()/insl()/readsl().

insl() does a double-swap on big-endian, which seems right, as we
end up with four swaps total, preserving correct byte order.

__raw_readsl() performs no swap, which would be correct for PCI
(same swap on PCI and RAM, so byteorder is preserved), but wrong
for on-chip FIFO registers (one swap on RAM, no swap on MMIO).
This is probably fine as well, as I don't see any use of __raw_readsl()
on non-PCI devices.

However, when CONFIG_IXP4XX_INDIRECT_PCI is set, both
ioread32_rep() and readsl() call __indirect_readsl(), which
in turn swaps the data once, so I think we actually need this patch:

do for inw/inl/outw/outl.

> > Coming back to the specific pxa25x_udc case: using __raw_* accessors

> > in the driver would possibly end up breaking the PXA25x machines in

> > the (very unlikely) case that someone wants to make it work with

> > big-endian kernels, assuming it does not have the same hardware

> > byteswap logic as ixp4xx.

> 

> I'd expect both CPUs to behave in exactly the same manner, i.e., to

> not swap anything on the internal bus. If true, it would mean it should

> "just work" in both BE and LE modes (including BE mode on PXA, should

> it be actually possible).


Ok, I'll change the patch accordingly and resubmit.

	Arnddiff --git a/arch/arm/mach-ixp4xx/include/mach/io.h b/arch/arm/mach-ixp4xx/include/mach/io.h
index e770858b490a..871f92f3504a 100644
--- a/arch/arm/mach-ixp4xx/include/mach/io.h
+++ b/arch/arm/mach-ixp4xx/include/mach/io.h
@@ -85,6 +85,8 @@ u8 __indirect_readb(const volatile void __iomem *p);
 u16 __indirect_readw(const volatile void __iomem *p);
 u32 __indirect_readl(const volatile void __iomem *p);
 
+/* string functions may need to swap data back to revert the byte swap on
+   big-endian __indirect_{read,write}{w,l}() accesses */
 static inline void __indirect_writesb(volatile void __iomem *bus_addr,
 				      const void *p, int count)
 {
@@ -100,7 +102,7 @@ static inline void __indirect_writesw(volatile void __iomem *bus_addr,
 	const u16 *vaddr = p;
 
 	while (count--)
-		writew(*vaddr++, bus_addr);
+		writew((u16 __force)cpu_to_le32(*vaddr++), bus_addr);
 }
 
 static inline void __indirect_writesl(volatile void __iomem *bus_addr,
@@ -108,7 +110,7 @@ static inline void __indirect_writesl(volatile void __iomem *bus_addr,
 {
 	const u32 *vaddr = p;
 	while (count--)
-		writel(*vaddr++, bus_addr);
+		writel((u32 __force)cpu_to_le32(*vaddr++), bus_addr);
 }
 
 static inline void __indirect_readsb(const volatile void __iomem *bus_addr,
@@ -126,7 +128,7 @@ static inline void __indirect_readsw(const volatile void __iomem *bus_addr,
 	u16 *vaddr = p;
 
 	while (count--)
-		*vaddr++ = readw(bus_addr);
+		*vaddr++ = (u16 __force)cpu_to_le16(readw(bus_addr));
 }
 
 static inline void __indirect_readsl(const volatile void __iomem *bus_addr,
@@ -135,7 +137,7 @@ static inline void __indirect_readsl(const volatile void __iomem *bus_addr,
 	u32 *vaddr = p;
 
 	while (count--)
-		*vaddr++ = readl(bus_addr);
+		*vaddr++ = (u32 __force)cpu_to_le32(readl(bus_addr));
 }
 
Does that make sense to you? This is essentially the same thing we already

Krzysztof Hałasa Feb. 17, 2016, 4:14 p.m. UTC | #12
Arnd Bergmann <arnd@arndb.de> writes:

> Ok, so I guess what this means is that ixp4xx (or xscale in general)

> implements its big-endian mode by adding a byteswap on its DRAM

> and PCI interfaces in be32 mode, rather than by changing the behavior of

> the load/store operations (as be8 mode does) or by having the byteswap

> in its load/store pipeline or the top-level AHB bridge?


Hmm... IIRC, there is normally no swapping on DRAM bus. E.g. if you
write 0x12345678 to RAM and change endianness, it will still read as
0x12345678. The CPU will still be able to execute opcodes after
switching endianness, but byte-oriented data will be messed up.

PCI swaps in BE mode, so byte order is preserved but readl() must
"unswap" it.

> I'm still unsure about

> __indirect_readsl()/ioread32_rep()/insl()/readsl().


Indirect ops should behave the same as direct. I think I have tested
them at some point. The "string" operations don't have to swap (on PCI)
because the PCI bus controller does it for them (in BE mode).

> insl() does a double-swap on big-endian, which seems right, as we

> end up with four swaps total, preserving correct byte order.


static inline void insl(u32 io_addr, void *p, u32 count)
{
        u32 *vaddr = p;
        while (count--)
                *vaddr++ = le32_to_cpu(inl(io_addr));
}

inl() does indirect input (preserving value, not byte order), so there seem
to be just one swap here (le32_to_cpu) preserving byte order.

> __raw_readsl() performs no swap, which would be correct for PCI

> (same swap on PCI and RAM, so byteorder is preserved),


No, a single swap on PCI, this means the byte order is preserved :-)

> but wrong

> for on-chip FIFO registers (one swap on RAM, no swap on MMIO).


But there aren't any such registers. Basically, almost all registers are
32-bit, even if they only hold an 8-bit value.
Exceptions such as 16550 UARTs are taken care of in platform structs
(using offset = 3).

> However, when CONFIG_IXP4XX_INDIRECT_PCI is set, both

> ioread32_rep() and readsl() call __indirect_readsl(), which

> in turn swaps the data once, so I think we actually need this patch:

>

> diff --git a/arch/arm/mach-ixp4xx/include/mach/io.h b/arch/arm/mach-ixp4xx/include/mach/io.h


> @@ -100,7 +102,7 @@ static inline void __indirect_writesw(volatile void __iomem *bus_addr,

>  	const u16 *vaddr = p;

>  

>  	while (count--)

> -		writew(*vaddr++, bus_addr);

> +		writew((u16 __force)cpu_to_le32(*vaddr++), bus_addr);

>  }


...

> Does that make sense to you? This is essentially the same thing we already

> do for inw/inl/outw/outl.


Well, we may need something like this. It seems writesw() (and thus
__indirect_writesw()) etc. use le16 values (preserving byte order), so
the above should probably use le16_to_cpu() instead (and le32_to_cpu in
__indirect_writesl()). I think the only thing that can use it on my hw
is VIA PATA adapter (throught ioread32_rep() etc). I will have to dig it
up as well.
I wouldn't rather touch this stuff without verifying that it fixes
things up.


Thanks for looking into this.
-- 
Krzysztof Halasa

Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
Robert Jarzmik Feb. 20, 2016, 8:54 p.m. UTC | #13
Arnd Bergmann <arnd@arndb.de> writes:

> Coming back to the specific pxa25x_udc case: using __raw_* accessors

> in the driver would possibly end up breaking the PXA25x machines in

> the (very unlikely) case that someone wants to make it work with

> big-endian kernels, assuming it does not have the same hardware

> byteswap logic as ixp4xx.


As far as I know, pxa25x machine implies the kernel is little endian. From an
hardware perspective, pxa25x doesn't support big endian (old ARM platform).

So I don't think considering BE for pxa25x is ... necessary.

Cheers.

-- 
Robert
diff mbox series

Patch

diff --git a/drivers/usb/gadget/udc/pxa25x_udc.c b/drivers/usb/gadget/udc/pxa25x_udc.c
index 5db429f3cfb2..6d3acbf3b311 100644
--- a/drivers/usb/gadget/udc/pxa25x_udc.c
+++ b/drivers/usb/gadget/udc/pxa25x_udc.c
@@ -52,9 +52,6 @@ 
 #include <mach/lubbock.h>
 #endif
 
-static void __iomem *pxa25x_udc_reg_base;
-#define __UDC_REG(x) (*((volatile u32 *)(pxa25x_udc_reg_base + (x))))
-
 #define UDCCR	 0x0000 /* UDC Control Register */
 #define UDC_RES1 0x0004 /* UDC Undocumented - Reserved1 */
 #define UDC_RES2 0x0008 /* UDC Undocumented - Reserved2 */
@@ -292,16 +289,36 @@  static void pullup_on(void)
 		mach->udc_command(PXA2XX_UDC_CMD_CONNECT);
 }
 
-static inline void udc_set_reg(struct pxa25x_udc *dev, u32 reg, u32 uicr)
+#if defined(CONFIG_ARCH_IXP4XX) && defined(CONFIG_CPU_BIG_ENDIAN)
+/*
+ * not sure if this is the correct behavior on ixp4xx in both
+ * bit-endian and little-endian modes, but it's what the driver
+ * has always done using direct pointer dereferences:
+ * We assume that there is a byteswap done in hardware at the
+ * MMIO register that matches what the CPU setting is, so we
+ * never swap in software.
+ */
+static inline void udc_set_reg(struct pxa25x_udc *dev, u32 reg, u32 val)
 {
-	__UDC_REG(reg) = uicr;
+	iowrite32be(val, dev->regs + reg);
 }
 
 static inline u32 udc_get_reg(struct pxa25x_udc *dev, u32 reg)
 {
-	return __UDC_REG(reg);
+	return ioread32be(dev->regs + reg);
+}
+#else
+static inline void udc_set_reg(struct pxa25x_udc *dev, u32 reg, u32 val)
+{
+	writel(val, dev->regs + reg);
 }
 
+static inline u32 udc_get_reg(struct pxa25x_udc *dev, u32 reg)
+{
+	return readl(dev->regs + reg);
+}
+#endif
+
 static void pio_irq_enable(struct pxa25x_ep *ep)
 {
 	u32 bEndpointAddress = ep->bEndpointAddress & 0xf;
@@ -337,59 +354,59 @@  static void pio_irq_disable(struct pxa25x_ep *ep)
 
 static inline void udc_set_mask_UDCCR(struct pxa25x_udc *dev, int mask)
 {
-	u32 udccr = __UDC_REG(UDCCR);
+	u32 udccr = udc_get_reg(dev, UDCCR);
 
-	__UDC_REG(UDCCR) = (udccr & UDCCR_MASK_BITS) | (mask & UDCCR_MASK_BITS);
+	udc_set_reg(dev, (udccr & UDCCR_MASK_BITS) | (mask & UDCCR_MASK_BITS), UDCCR);
 }
 
 static inline void udc_clear_mask_UDCCR(struct pxa25x_udc *dev, int mask)
 {
-	u32 udccr = __UDC_REG(UDCCR);
+	u32 udccr = udc_get_reg(dev, UDCCR);
 
-	__UDC_REG(UDCCR) = (udccr & UDCCR_MASK_BITS) & ~(mask & UDCCR_MASK_BITS);
+	udc_set_reg(dev, (udccr & UDCCR_MASK_BITS) & ~(mask & UDCCR_MASK_BITS), UDCCR);
 }
 
 static inline void udc_ack_int_UDCCR(struct pxa25x_udc *dev, int mask)
 {
 	/* udccr contains the bits we dont want to change */
-	u32 udccr = __UDC_REG(UDCCR) & UDCCR_MASK_BITS;
+	u32 udccr = udc_get_reg(dev, UDCCR) & UDCCR_MASK_BITS;
 
-	__UDC_REG(UDCCR) = udccr | (mask & ~UDCCR_MASK_BITS);
+	udc_set_reg(dev, udccr | (mask & ~UDCCR_MASK_BITS), UDCCR);
 }
 
 static inline u32 udc_ep_get_UDCCS(struct pxa25x_ep *ep)
 {
-	return __UDC_REG(ep->regoff_udccs);
+	return udc_get_reg(ep->dev, ep->regoff_udccs);
 }
 
 static inline void udc_ep_set_UDCCS(struct pxa25x_ep *ep, u32 data)
 {
-	__UDC_REG(ep->regoff_udccs) = data;
+	udc_set_reg(ep->dev, data, ep->regoff_udccs);
 }
 
 static inline u32 udc_ep0_get_UDCCS(struct pxa25x_udc *dev)
 {
-	return __UDC_REG(UDCCS0);
+	return udc_get_reg(dev, UDCCS0);
 }
 
 static inline void udc_ep0_set_UDCCS(struct pxa25x_udc *dev, u32 data)
 {
-	__UDC_REG(UDCCS0) = data;
+	udc_set_reg(dev, data, UDCCS0);
 }
 
 static inline u32 udc_ep_get_UDDR(struct pxa25x_ep *ep)
 {
-	return __UDC_REG(ep->regoff_uddr);
+	return udc_get_reg(ep->dev, ep->regoff_uddr);
 }
 
 static inline void udc_ep_set_UDDR(struct pxa25x_ep *ep, u32 data)
 {
-	__UDC_REG(ep->regoff_uddr) = data;
+	udc_set_reg(ep->dev, data, ep->regoff_uddr);
 }
 
 static inline u32 udc_ep_get_UBCR(struct pxa25x_ep *ep)
 {
-	return __UDC_REG(ep->regoff_ubcr);
+	return udc_get_reg(ep->dev, ep->regoff_ubcr);
 }
 
 /*
@@ -2369,9 +2386,9 @@  static int pxa25x_udc_probe(struct platform_device *pdev)
 		return -ENODEV;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	pxa25x_udc_reg_base = devm_ioremap_resource(&pdev->dev, res);
-	if (IS_ERR(pxa25x_udc_reg_base))
-		return PTR_ERR(pxa25x_udc_reg_base);
+	dev->regs = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(dev->regs))
+		return PTR_ERR(dev->regs);
 
 	dev->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(dev->clk))
diff --git a/drivers/usb/gadget/udc/pxa25x_udc.h b/drivers/usb/gadget/udc/pxa25x_udc.h
index f884513f7390..4b8b72d7ab37 100644
--- a/drivers/usb/gadget/udc/pxa25x_udc.h
+++ b/drivers/usb/gadget/udc/pxa25x_udc.h
@@ -125,6 +125,7 @@  struct pxa25x_udc {
 #ifdef CONFIG_USB_GADGET_DEBUG_FS
 	struct dentry				*debugfs_udc;
 #endif
+	void __iomem				*regs;
 };
 #define to_pxa25x(g)	(container_of((g), struct pxa25x_udc, gadget))