Message ID | 20250114-easiness-pregame-d1d2d4b57e7b@spud |
---|---|
State | New |
Headers | show |
Series | spi: microchip-core: prevent RX overflows when transmit size > FIFO size | expand |
On Tue, Jan 14, 2025 at 05:13:49PM +0000, Conor Dooley wrote: > When the size of a transfer exceeds the size of the FIFO (32 bytes), RX > overflows will be generated and receive data will be corrupted and > warnings will be produced. For example, here's an error generated by a > transfer of 36 bytes: > spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0 > I am not entirely sure how this happens, as rxlen being 4 means that 32 > of 36 bytes have been read from the RX FIFO so there should be > sufficient room for 4 more bytes but timing is likely a factor as simply > adding a delay in the transmit path is enough to avoid the overflows. The reads from the FIFO happen prior to the check for overflow in the interrupt handler so if we overflow then take the interrupt we will read the full 32 byte FIFO before it sees that there was an overflow. > @@ -221,6 +221,13 @@ static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi) > while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) { > u32 word; > > + /* > + * If the transfer is larger than FIFO_DEPTH, spin until space > + * is made in the RX FIFO to avoid losing data to RX overflows > + */ > + while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_FULL) > + ; > + So, this is the transmit side but we're polling the RX FIFO and not doing anything to clear it? I see that the FIFO reads are driven from interrupt context. If I had to guess I'd say that there's latency in the interrupt being delivered (possibly to a different CPU) and when the transfer is being driven by the TX side it's stuffing data into the TX FIFO faster than interrupts are being delivered (the TX side just seems to busy wait on there being space in the FIFO which can't be good for slower speeds...) so the TX and RX sides of the transfer get out of sync. Given that AFAICT the controller has to RX all the time I suspect you want to move the RX processing out of interrupt context into the main _transfer_one() function and busy wait for that too, or push the TX side into the interrupt handler (which at first glance looks simpler). Either way the two directions will be driven from the same place and so not get out of sync.
On Tue, Jan 14, 2025 at 07:44:26PM +0000, Mark Brown wrote: > On Tue, Jan 14, 2025 at 05:13:49PM +0000, Conor Dooley wrote: > > > When the size of a transfer exceeds the size of the FIFO (32 bytes), RX > > overflows will be generated and receive data will be corrupted and > > warnings will be produced. For example, here's an error generated by a > > transfer of 36 bytes: > > > spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0 > > > I am not entirely sure how this happens, as rxlen being 4 means that 32 > > of 36 bytes have been read from the RX FIFO so there should be > > sufficient room for 4 more bytes but timing is likely a factor as simply > > adding a delay in the transmit path is enough to avoid the overflows. > > The reads from the FIFO happen prior to the check for overflow in the > interrupt handler so if we overflow then take the interrupt we will read > the full 32 byte FIFO before it sees that there was an overflow. > > > @@ -221,6 +221,13 @@ static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi) > > while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) { > > u32 word; > > > > + /* > > + * If the transfer is larger than FIFO_DEPTH, spin until space > > + * is made in the RX FIFO to avoid losing data to RX overflows > > + */ > > + while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_FULL) > > + ; > > + > > So, this is the transmit side but we're polling the RX FIFO and not > doing anything to clear it? I see that the FIFO reads are driven from > interrupt context. If I had to guess I'd say that there's latency in > the interrupt being delivered (possibly to a different CPU) and when the > transfer is being driven by the TX side it's stuffing data into the TX > FIFO faster than interrupts are being delivered (the TX side just seems > to busy wait on there being space in the FIFO which can't be good for > slower speeds...) so the TX and RX sides of the transfer get out of sync. > > Given that AFAICT the controller has to RX all the time I suspect you > want to move the RX processing out of interrupt context into the main > _transfer_one() function and busy wait for that too, or push the TX side > into the interrupt handler (which at first glance looks simpler). > Either way the two directions will be driven from the same place and so > not get out of sync. Hmm, at first glance it is indeed simpler, but it may be a bit of a poor compromise. There used to be calls to mchp_corespi_write_fifo() in the interrupt handler, but I removed them in the ?summer? because messages were being sent out of order. IIRC what happens is that the tx fifo can be emptied by the controller hardware before the whole message has been written into it, thereby generating the interrupts and triggering a parallel call to mchp_corespi_write_fifo(). To avoid that, the first byte could be sent by transfer_one(), and then everything else handled by the current implementation of mchp_corespi_write_fifo(), but I think that would cause "actual" RX overflows, without cutting down to sending (FIFO_MAX - 1) bytes per call or ensuring that the rx fifo is read before the tx one. Given the timing issues there seem to be around the rx ready/tx done interrupts and the rx/tx fifo status bits being set/cleared, I don't want to play games there. That means cutting down to something well under FIFO_MAX (which is 32 bytes), probably to something like 1/2/4 bytes depending on transfer size since that can be done in a single write on the bus and will allow the same code to be used in transfer_one() and in the interrupt handler. Moving it out of the interrupt handler entirely probably makes the code a lot easier to reason about, so I think I'd actually rather do that. That said, I don't think I would want that change going into fixes as it's a pretty intrusive change, although given where we are in the cycle it ain't as much of a problem since I'd not be sending it until after 6.13 is released anyway. Cheers, Conor.
On Wed, Jan 15, 2025 at 06:22:55PM +0000, Conor Dooley wrote: > Moving it out of the interrupt handler entirely probably makes the code > a lot easier to reason about, so I think I'd actually rather do that. > That said, I don't think I would want that change going into fixes as it's > a pretty intrusive change, although given where we are in the cycle it > ain't as much of a problem since I'd not be sending it until after 6.13 > is released anyway. OTOH the patch you sent will lock up hard if the RX FIFO gets stuck for some other reason which is always a bit concerning...
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c index 5b6af55855ef..3582fe8d3fc4 100644 --- a/drivers/spi/spi-microchip-core.c +++ b/drivers/spi/spi-microchip-core.c @@ -221,6 +221,13 @@ static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi) while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) { u32 word; + /* + * If the transfer is larger than FIFO_DEPTH, spin until space + * is made in the RX FIFO to avoid losing data to RX overflows + */ + while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_FULL) + ; + if (spi->n_bytes == 4) word = spi->tx_buf ? *((u32 *)spi->tx_buf) : 0xaa; else if (spi->n_bytes == 2)