Message ID | 20201017213611.2557565-3-vladimir.oltean@nxp.com |
---|---|
State | New |
Headers | show |
Series | Generic TX reallocation for DSA | expand |
On Sun, Oct 18, 2020 at 12:36:00AM +0300, Vladimir Oltean wrote: > diff --git a/net/dsa/slave.c b/net/dsa/slave.c > index d4326940233c..790f5c8deb13 100644 > --- a/net/dsa/slave.c > +++ b/net/dsa/slave.c > @@ -548,6 +548,36 @@ netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev) > } > EXPORT_SYMBOL_GPL(dsa_enqueue_skb); > > +static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev) I forgot to actually pad the skb here, if it's a tail tagger, silly me. The following changes should do the trick. > +{ > + struct net_device *master = dsa_slave_to_master(dev); The addition of master->needed_headroom and master->needed_tailroom used to be here, that's why this unused variable is here. > + struct dsa_slave_priv *p = netdev_priv(dev); > + struct dsa_slave_stats *e; > + int headroom, tailroom; int padlen = 0, err; > + > + headroom = dev->needed_headroom; > + tailroom = dev->needed_tailroom; > + /* For tail taggers, we need to pad short frames ourselves, to ensure > + * that the tail tag does not fail at its role of being at the end of > + * the packet, once the master interface pads the frame. > + */ > + if (unlikely(tailroom && skb->len < ETH_ZLEN)) > + tailroom += ETH_ZLEN - skb->len; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ padlen = ETH_ZLEN - skb->len; tailroom += padlen; > + > + if (likely(skb_headroom(skb) >= headroom && > + skb_tailroom(skb) >= tailroom) && > + !skb_cloned(skb)) > + /* No reallocation needed, yay! */ > + return 0; > + > + e = this_cpu_ptr(p->extra_stats); > + u64_stats_update_begin(&e->syncp); > + e->tx_reallocs++; > + u64_stats_update_end(&e->syncp); > + > + return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); if (err < 0 || !padlen) return err; return __skb_put_padto(skb, padlen, false); > +} > + > static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) > { > struct dsa_slave_priv *p = netdev_priv(dev); > @@ -567,6 +597,11 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) > */ > dsa_skb_tx_timestamp(p, skb); > > + if (dsa_realloc_skb(skb, dev)) { > + kfree_skb(skb); > + return NETDEV_TX_OK; > + } > + > /* Transmit function may have to reallocate the original SKB, > * in which case it must have freed it. Only free it here on error. > */ > @@ -1802,6 +1837,14 @@ int dsa_slave_create(struct dsa_port *port) > slave_dev->netdev_ops = &dsa_slave_netdev_ops; > if (ds->ops->port_max_mtu) > slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index); > + /* Try to save one extra realloc later in the TX path (in the master) > + * by also inheriting the master's needed headroom and tailroom. > + * The 8021q driver also does this. > + */ Also, this comment is bogus given the current code. It should be removed from here, and... > + if (cpu_dp->tag_ops->tail_tag) > + slave_dev->needed_tailroom = cpu_dp->tag_ops->overhead; > + else > + slave_dev->needed_headroom = cpu_dp->tag_ops->overhead; ...put here, along with: slave_dev->needed_headroom += master->needed_headroom; slave_dev->needed_tailroom += master->needed_tailroom; > SET_NETDEV_DEVTYPE(slave_dev, &dsa_type); > > netdev_for_each_tx_queue(slave_dev, dsa_slave_set_lockdep_class_one, > -- > 2.25.1 >
On 10/17/2020 3:01 PM, Vladimir Oltean wrote: > On Sun, Oct 18, 2020 at 12:36:00AM +0300, Vladimir Oltean wrote: >> diff --git a/net/dsa/slave.c b/net/dsa/slave.c >> index d4326940233c..790f5c8deb13 100644 >> --- a/net/dsa/slave.c >> +++ b/net/dsa/slave.c >> @@ -548,6 +548,36 @@ netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev) >> } >> EXPORT_SYMBOL_GPL(dsa_enqueue_skb); >> >> +static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev) > > I forgot to actually pad the skb here, if it's a tail tagger, silly me. > The following changes should do the trick. > >> +{ >> + struct net_device *master = dsa_slave_to_master(dev); > > The addition of master->needed_headroom and master->needed_tailroom used > to be here, that's why this unused variable is here. > >> + struct dsa_slave_priv *p = netdev_priv(dev); >> + struct dsa_slave_stats *e; >> + int headroom, tailroom; > int padlen = 0, err; >> + >> + headroom = dev->needed_headroom; >> + tailroom = dev->needed_tailroom; >> + /* For tail taggers, we need to pad short frames ourselves, to ensure >> + * that the tail tag does not fail at its role of being at the end of >> + * the packet, once the master interface pads the frame. >> + */ >> + if (unlikely(tailroom && skb->len < ETH_ZLEN)) >> + tailroom += ETH_ZLEN - skb->len; > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > padlen = ETH_ZLEN - skb->len; > tailroom += padlen; >> + >> + if (likely(skb_headroom(skb) >= headroom && >> + skb_tailroom(skb) >= tailroom) && >> + !skb_cloned(skb)) >> + /* No reallocation needed, yay! */ >> + return 0; >> + >> + e = this_cpu_ptr(p->extra_stats); >> + u64_stats_update_begin(&e->syncp); >> + e->tx_reallocs++; >> + u64_stats_update_end(&e->syncp); >> + >> + return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); > if (err < 0 || !padlen) > return err; > > return __skb_put_padto(skb, padlen, false); >> +} >> + >> static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) >> { >> struct dsa_slave_priv *p = netdev_priv(dev); >> @@ -567,6 +597,11 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) >> */ >> dsa_skb_tx_timestamp(p, skb); >> >> + if (dsa_realloc_skb(skb, dev)) { >> + kfree_skb(skb); >> + return NETDEV_TX_OK; >> + } >> + >> /* Transmit function may have to reallocate the original SKB, >> * in which case it must have freed it. Only free it here on error. >> */ >> @@ -1802,6 +1837,14 @@ int dsa_slave_create(struct dsa_port *port) >> slave_dev->netdev_ops = &dsa_slave_netdev_ops; >> if (ds->ops->port_max_mtu) >> slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index); >> + /* Try to save one extra realloc later in the TX path (in the master) >> + * by also inheriting the master's needed headroom and tailroom. >> + * The 8021q driver also does this. >> + */ > > Also, this comment is bogus given the current code. It should be removed > from here, and... > >> + if (cpu_dp->tag_ops->tail_tag) >> + slave_dev->needed_tailroom = cpu_dp->tag_ops->overhead; >> + else >> + slave_dev->needed_headroom = cpu_dp->tag_ops->overhead; > ...put here, along with: > slave_dev->needed_headroom += master->needed_headroom; > slave_dev->needed_tailroom += master->needed_tailroom; Not positive you need that because you may be account for more head or tail room than necessary. For instance with tag_brcm.c and systemport.c we need 4 bytes of head room for the Broadcom tag and an additional 8 bytes for pushing the transmit status block descriptor in front of the Ethernet frame about to be transmitted. These additional 8 bytes are a requirement of the DSA master here and exist regardless of DSA being used, but we should not be propagating them to the DSA slave. -- Florian
On Sun, Oct 18, 2020 at 01:01:04AM +0300, Vladimir Oltean wrote: > > + return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); > if (err < 0 || !padlen) > return err; > > return __skb_put_padto(skb, padlen, false); Oops, another one here. Should be: return __skb_put_padto(skb, skb->len + padlen, false); > > +}
On Sun, Oct 18, 2020 at 01:31:20AM +0300, Vladimir Oltean wrote: > On Sun, Oct 18, 2020 at 01:01:04AM +0300, Vladimir Oltean wrote: > > > + return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > err = pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); > > if (err < 0 || !padlen) > > return err; > > > > return __skb_put_padto(skb, padlen, false); > > Oops, another one here. Should be: > > return __skb_put_padto(skb, skb->len + padlen, false); > > > +} Last one for today. This should actually be correct now, and not allocate double the needed headroom size. static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); struct dsa_slave_stats *e; int needed_headroom; int needed_tailroom; int padlen = 0, err; needed_headroom = dev->needed_headroom; needed_tailroom = dev->needed_tailroom; /* For tail taggers, we need to pad short frames ourselves, to ensure * that the tail tag does not fail at its role of being at the end of * the packet, once the master interface pads the frame. */ if (unlikely(needed_tailroom && skb->len < ETH_ZLEN)) padlen = ETH_ZLEN - skb->len; needed_tailroom += padlen; needed_headroom -= skb_headroom(skb); needed_tailroom -= skb_tailroom(skb); if (likely(needed_headroom <= 0 && needed_tailroom <= 0 && !skb_cloned(skb))) /* No reallocation needed, yay! */ return 0; e = this_cpu_ptr(p->extra_stats); u64_stats_update_begin(&e->syncp); e->tx_reallocs++; u64_stats_update_end(&e->syncp); err = pskb_expand_head(skb, max(needed_headroom, 0), max(needed_tailroom, 0), GFP_ATOMIC); if (err < 0 || !padlen) return err; return __skb_put_padto(skb, ETH_ZLEN, false); }
diff --git a/net/dsa/slave.c b/net/dsa/slave.c index d4326940233c..790f5c8deb13 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -548,6 +548,36 @@ netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev) } EXPORT_SYMBOL_GPL(dsa_enqueue_skb); +static int dsa_realloc_skb(struct sk_buff *skb, struct net_device *dev) +{ + struct net_device *master = dsa_slave_to_master(dev); + struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_slave_stats *e; + int headroom, tailroom; + + headroom = dev->needed_headroom; + tailroom = dev->needed_tailroom; + /* For tail taggers, we need to pad short frames ourselves, to ensure + * that the tail tag does not fail at its role of being at the end of + * the packet, once the master interface pads the frame. + */ + if (unlikely(tailroom && skb->len < ETH_ZLEN)) + tailroom += ETH_ZLEN - skb->len; + + if (likely(skb_headroom(skb) >= headroom && + skb_tailroom(skb) >= tailroom) && + !skb_cloned(skb)) + /* No reallocation needed, yay! */ + return 0; + + e = this_cpu_ptr(p->extra_stats); + u64_stats_update_begin(&e->syncp); + e->tx_reallocs++; + u64_stats_update_end(&e->syncp); + + return pskb_expand_head(skb, headroom, tailroom, GFP_ATOMIC); +} + static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); @@ -567,6 +597,11 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) */ dsa_skb_tx_timestamp(p, skb); + if (dsa_realloc_skb(skb, dev)) { + kfree_skb(skb); + return NETDEV_TX_OK; + } + /* Transmit function may have to reallocate the original SKB, * in which case it must have freed it. Only free it here on error. */ @@ -1802,6 +1837,14 @@ int dsa_slave_create(struct dsa_port *port) slave_dev->netdev_ops = &dsa_slave_netdev_ops; if (ds->ops->port_max_mtu) slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index); + /* Try to save one extra realloc later in the TX path (in the master) + * by also inheriting the master's needed headroom and tailroom. + * The 8021q driver also does this. + */ + if (cpu_dp->tag_ops->tail_tag) + slave_dev->needed_tailroom = cpu_dp->tag_ops->overhead; + else + slave_dev->needed_headroom = cpu_dp->tag_ops->overhead; SET_NETDEV_DEVTYPE(slave_dev, &dsa_type); netdev_for_each_tx_queue(slave_dev, dsa_slave_set_lockdep_class_one,
At the moment, taggers are left with the task of ensuring that the skb headers are writable (which they aren't, if the frames were cloned for TX timestamping, for flooding by the bridge, etc), and that there is enough space in the skb data area for the DSA tag to be pushed. Moreover, the life of tail taggers is even harder, because they need to ensure that short frames have enough padding, a problem that normal taggers don't have. The principle of the DSA framework is that everything except for the most intimate hardware specifics (like in this case, the actual packing of the DSA tag bits) should be done inside the core, to avoid having code paths that are very rarely tested. So provide a TX reallocation procedure that should cover the known needs of DSA today. Note that this patch also gives the network stack a good hint about the headroom/tailroom it's going to need. Up till now it wasn't doing that. So the reallocation procedure should really be there only for the exceptional cases, and for cloned packets which need to be unshared. The tx_reallocs counter should prove that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- net/dsa/slave.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)