Message ID | 20240408105149.59258-3-philmd@linaro.org |
---|---|
State | New |
Headers | show |
Series | hw/net/lan9118: Fix overflow in TX FIFO | expand |
On Mon, 8 Apr 2024 at 11:52, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > When the TX FIFO is full, raise the TX Status FIFO Overflow (TXSO) > flag, "Generated when the TX Status FIFO overflows" [*]. This doesn't sound right. The TX Status FIFO and the TX Data FIFO are separate FIFOs, and the TX FIFO has its own overflow bit, TDFO. And I think the overflow here is of a third FIFO, the MIL's transmit FIFO... > diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c > index 7be0430ac5..7a1367b0bb 100644 > --- a/hw/net/lan9118.c > +++ b/hw/net/lan9118.c > @@ -795,8 +795,11 @@ static void tx_fifo_push(lan9118_state *s, uint32_t val) > /* Documentation is somewhat unclear on the ordering of bytes > in FIFO words. Empirical results show it to be little-endian. > */ > - /* TODO: FIFO overflow checking. */ > while (n--) { > + if (s->txp->len == PKT_SIZE) { > + s->int_sts |= TXSO_INT; > + break; > + } While I was looking at this bug, I realised that we have serious confusion about whether any of the variables we use to track FIFO size and FIFO usage are word counts or byte counts. Looking at table 5-3 in the data sheet, the size of these FIFOs is actually software-configurable in the HW_CFG register, but we don't implement that and (attempt to) only provide the default configuration setting of TX_FIF_SZ == 5. That should mean: TX data FIFO size == 4608 bytes == 1152 words RX data FIFO size == 10560 bytes == 2640 words TX status FIFO size == 512 bytes == 128 words RX status FIFO size == 704 bytes == 176 words But we don't consistently use either word or byte units for the variables we use to track FIFO size and FIFO usage. For instance: * we initialise s->tx_fifo_size to 4608, which is a byte count * we initialise s->rx_status_fifo_size to 704, which is a byte count... * ...and then three lines later override that to 176, which is a word count! * we generally simply increment the various fifo_used fields when we push a word into the FIFOs, implying word counts * we mostly do calculations assuming word counts * calculations of the RX_FIFO_INF and TX_FIFO_INF fields (which report the used space in words and the free space in bytes) are confused about units too * the tx_status_fifo[] array is 512 words long and the bounds checks assume 512 is a word count, but it is a byte count * the rx_status_fifo[] array is 896 words long, but the worst case RX status FIFO size is 896 bytes, even if we allowed runtime adjustable FIFO sizes * the rx_fifo[] array, on the other hand, is 3360 words long, which really is the max possible size in words Anyway, I think that txp->data[] is effectively modelling the "2K Byte transmit FIFO" within the MIL, not the TX FIFO. (We don't need to model the TX FIFO itself, because we don't do asynchronous sending of data packets: as soon as we've accumulated a complete packet into the MIL TX FIFO, we send it out. In real hardware the guest can put multiple packets into the TX data FIFO, which is why it makes sense to be able to configure a TX data FIFO size larger than the largest possible packet and larger than the MIL TX FIFO.) So the limit that we are enforcing here is similar to the one described in the "Calculating Worst-Case TX FIFO (MIL) usage", except that we don't actually use data space for the gaps caused by unaligned buffers. So this can only overflow if the packet is greater than what the data sheet says is the maximum size of 1514 bytes. The datasheet unfortunately doesn't describe the behaviour if this maximum is exceeded, and our current code doesn't try to check it (it's in the "command B" words, which are all supposed to match in the case of a fragmented packet, and which we also don't check). The most plausible behaviour to take I think is to raise TXE when we would overflow the s->txp_data[] buffer; there are various conditions described for when TXE is raised that seem like this would fit in reasonably with them. (There is a status bit TDFO for "TX Data FIFO Overrun", which I think is probably only for overruns of the TX data FIFO, not the MIL's TX FIFO.) Since the datasheet doesn't say if the packet should be dropped or truncated if it's invalid like this, I guess we can do whatever's easiest. > s->txp->data[s->txp->len] = val & 0xff; > s->txp->len++; > val >>= 8; Conclusion: * we should raise TXE, not TXSO * add a comment about what exactly is going on here * we should try to clean up the confusion between words and bytes, as a separate patch that isn't -stable/-9.0 material... thanks -- PMM
On 8/4/24 16:24, Peter Maydell wrote: > On Mon, 8 Apr 2024 at 11:52, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: >> >> When the TX FIFO is full, raise the TX Status FIFO Overflow (TXSO) >> flag, "Generated when the TX Status FIFO overflows" [*]. > > This doesn't sound right. The TX Status FIFO and the > TX Data FIFO are separate FIFOs, and the TX FIFO has its own > overflow bit, TDFO. And I think the overflow here is of > a third FIFO, the MIL's transmit FIFO... > >> diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c >> index 7be0430ac5..7a1367b0bb 100644 >> --- a/hw/net/lan9118.c >> +++ b/hw/net/lan9118.c >> @@ -795,8 +795,11 @@ static void tx_fifo_push(lan9118_state *s, uint32_t val) >> /* Documentation is somewhat unclear on the ordering of bytes >> in FIFO words. Empirical results show it to be little-endian. >> */ >> - /* TODO: FIFO overflow checking. */ >> while (n--) { >> + if (s->txp->len == PKT_SIZE) { >> + s->int_sts |= TXSO_INT; >> + break; >> + } > > While I was looking at this bug, I realised that we have serious > confusion about whether any of the variables we use to track FIFO > size and FIFO usage are word counts or byte counts. > > Looking at table 5-3 in the data sheet, the size of these > FIFOs is actually software-configurable in the HW_CFG register, > but we don't implement that and (attempt to) only provide > the default configuration setting of TX_FIF_SZ == 5. That > should mean: > TX data FIFO size == 4608 bytes == 1152 words > RX data FIFO size == 10560 bytes == 2640 words > TX status FIFO size == 512 bytes == 128 words > RX status FIFO size == 704 bytes == 176 words > > But we don't consistently use either word or byte units for the > variables we use to track FIFO size and FIFO usage. For instance: > * we initialise s->tx_fifo_size to 4608, which is a byte count > * we initialise s->rx_status_fifo_size to 704, which is a byte count... > * ...and then three lines later override that to 176, which is a word > count! > * we generally simply increment the various fifo_used fields > when we push a word into the FIFOs, implying word counts > * we mostly do calculations assuming word counts > * calculations of the RX_FIFO_INF and TX_FIFO_INF fields > (which report the used space in words and the free space > in bytes) are confused about units too > * the tx_status_fifo[] array is 512 words long and the bounds > checks assume 512 is a word count, but it is a byte count > * the rx_status_fifo[] array is 896 words long, but the worst > case RX status FIFO size is 896 bytes, even if we allowed > runtime adjustable FIFO sizes > * the rx_fifo[] array, on the other hand, is 3360 words long, > which really is the max possible size in words > > Anyway, I think that txp->data[] is effectively modelling > the "2K Byte transmit FIFO" within the MIL, not the TX FIFO. > (We don't need to model the TX FIFO itself, because we don't > do asynchronous sending of data packets: as soon as we've > accumulated a complete packet into the MIL TX FIFO, we > send it out. In real hardware the guest can put multiple > packets into the TX data FIFO, which is why it makes sense to be > able to configure a TX data FIFO size larger than the largest > possible packet and larger than the MIL TX FIFO.) > > So the limit that we are enforcing here is similar to the one > described in the "Calculating Worst-Case TX FIFO (MIL) usage", > except that we don't actually use data space for the gaps > caused by unaligned buffers. So this can only overflow if the > packet is greater than what the data sheet says is the > maximum size of 1514 bytes. The datasheet unfortunately doesn't > describe the behaviour if this maximum is exceeded, and our > current code doesn't try to check it (it's in the "command B" > words, which are all supposed to match in the case of a > fragmented packet, and which we also don't check). > > The most plausible behaviour to take I think is to raise > TXE when we would overflow the s->txp_data[] buffer; there are > various conditions described for when TXE is raised that seem > like this would fit in reasonably with them. > (There is a status bit TDFO for "TX Data FIFO Overrun", which > I think is probably only for overruns of the TX data FIFO, > not the MIL's TX FIFO.) > > Since the datasheet doesn't say if the packet should be > dropped or truncated if it's invalid like this, I guess > we can do whatever's easiest. > >> s->txp->data[s->txp->len] = val & 0xff; >> s->txp->len++; >> val >>= 8; > > Conclusion: > * we should raise TXE, not TXSO > * add a comment about what exactly is going on here > * we should try to clean up the confusion between words and > bytes, as a separate patch that isn't -stable/-9.0 > material... Thanks a lot for this very detailed analysis! v2 on the way.
diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c index 7be0430ac5..7a1367b0bb 100644 --- a/hw/net/lan9118.c +++ b/hw/net/lan9118.c @@ -795,8 +795,11 @@ static void tx_fifo_push(lan9118_state *s, uint32_t val) /* Documentation is somewhat unclear on the ordering of bytes in FIFO words. Empirical results show it to be little-endian. */ - /* TODO: FIFO overflow checking. */ while (n--) { + if (s->txp->len == PKT_SIZE) { + s->int_sts |= TXSO_INT; + break; + } s->txp->data[s->txp->len] = val & 0xff; s->txp->len++; val >>= 8;
When the TX FIFO is full, raise the TX Status FIFO Overflow (TXSO) flag, "Generated when the TX Status FIFO overflows" [*]. Broken since model introduction in commit 2a42499017 ("LAN9118 emulation"). When using the reproducer from https://gitlab.com/qemu-project/qemu/-/issues/2267 we get: hw/net/lan9118.c:798:17: runtime error: index 2048 out of bounds for type 'uint8_t[2048]' (aka 'unsigned char[2048]') #0 0x563ec9a057b1 in tx_fifo_push hw/net/lan9118.c:798:43 #1 0x563ec99fbb28 in lan9118_writel hw/net/lan9118.c:1042:9 #2 0x563ec99f2de2 in lan9118_16bit_mode_write hw/net/lan9118.c:1205:9 #3 0x563ecbf78013 in memory_region_write_accessor system/memory.c:497:5 #4 0x563ecbf776f5 in access_with_adjusted_size system/memory.c:573:18 #5 0x563ecbf75643 in memory_region_dispatch_write system/memory.c:1521:16 #6 0x563ecc01bade in flatview_write_continue_step system/physmem.c:2713:18 #7 0x563ecc01b374 in flatview_write_continue system/physmem.c:2743:19 #8 0x563ecbff1c9b in flatview_write system/physmem.c:2774:12 #9 0x563ecbff1768 in address_space_write system/physmem.c:2894:18 ... [*] LAN9118 DS00002266B.pdf, Table 5.3.3 "INTERRUPT STATUS REGISTER" Reported-by: Will Lester Reported-by: Chuhong Yuan <hslester96@gmail.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2267 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> --- hw/net/lan9118.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)