Message ID | 20220112173312.764660-1-miquel.raynal@bootlin.com |
---|---|
Headers | show |
Series | IEEE 802.15.4 scan support | expand |
Hi, On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Now that we have access to all the basic information to know which > symbol duration should be applied, let's set the symbol duration > automatically. The two locations that must call for the symbol duration > to be set are: > - when manually requesting a channel change though the netlink interface > - at PHY creation, ieee802154_alloc_hw() already calls > ieee802154_change_channel() which will now update the symbol duration > accordingly. > > If an information is missing, the symbol duration is not touched, a > debug message is eventually printed. This keeps the compatibility with > the unconverted drivers for which it was too complicated for me to find > their precise information. If they initially provided a symbol duration, > it would be kept. If they don't, the symbol duration value is left > untouched. > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > include/net/cfg802154.h | 2 + > net/mac802154/cfg.c | 1 + > net/mac802154/main.c | 93 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 96 insertions(+) > > diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h > index 286709a9dd0b..52eefc4b5b4d 100644 > --- a/include/net/cfg802154.h > +++ b/include/net/cfg802154.h > @@ -455,4 +455,6 @@ static inline const char *wpan_phy_name(struct wpan_phy *phy) > return dev_name(&phy->dev); > } > > +void ieee802154_set_symbol_duration(struct wpan_phy *phy); > + > #endif /* __NET_CFG802154_H */ > diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c > index 6969f1330ccd..ba57da07c08e 100644 > --- a/net/mac802154/cfg.c > +++ b/net/mac802154/cfg.c > @@ -113,6 +113,7 @@ int ieee802154_change_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel) > if (!ret) { > wpan_phy->current_page = page; > wpan_phy->current_channel = channel; > + ieee802154_set_symbol_duration(wpan_phy); > } > > return ret; We also need to do it in ieee802154_register_hw()? > diff --git a/net/mac802154/main.c b/net/mac802154/main.c > index 77a4943f345f..88826c5aa4ba 100644 > --- a/net/mac802154/main.c > +++ b/net/mac802154/main.c > @@ -113,6 +113,99 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) > } > EXPORT_SYMBOL(ieee802154_alloc_hw); > > +void ieee802154_set_symbol_duration(struct wpan_phy *phy) > +{ > + struct phy_page *page = &phy->supported.page[phy->current_page]; > + struct phy_channels *chan; > + unsigned int chunk; > + u32 duration = 0; > + > + for (chunk = 0; chunk < page->nchunks; chunk++) { > + if (page->chunk[chunk].channels & phy->current_channel) > + break; > + } > + > + if (chunk == page->nchunks) > + goto set_duration; > + > + chan = &page->chunk[chunk]; > + switch (chan->protocol) { > + case IEEE802154_BPSK_PHY: > + switch (chan->band) { > + case IEEE802154_868_MHZ_BAND: > + /* 868 MHz BPSK 802.15.4-2003: 20 ksym/s */ > + duration = 50 * 1000; * NSEC_PER_USEC? - Alex
Hi, On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > As it is currently designed, the set_channel() cfg802154 hook > implemented in the softMAC is doing a couple of checks before actually > performing the channel change. However, as we enhance the support for > automatically setting the symbol duration during channel changes, it > will also be needed to ensure that the corresponding channel as properly > be selected at probe time. In order to verify this, we will need to no, we don't set channels at probe time. We set the current_page/channel whatever the default is according to the hardware datasheet. I think this channel should be dropped and all drivers set the defaults before registering hw as what we do at e.g. at86rf230, see [0]. - Alex [0] https://elixir.bootlin.com/linux/v5.16/source/drivers/net/ieee802154/at86rf230.c#L1553
Hi Alexander, alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:30:35 -0500: > Hi, > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > As it is currently designed, the set_channel() cfg802154 hook > > implemented in the softMAC is doing a couple of checks before actually > > performing the channel change. However, as we enhance the support for > > automatically setting the symbol duration during channel changes, it > > will also be needed to ensure that the corresponding channel as properly > > be selected at probe time. In order to verify this, we will need to > > no, we don't set channels at probe time. We set the > current_page/channel whatever the default is according to the hardware > datasheet. I think this channel should be dropped and all drivers set > the defaults before registering hw as what we do at e.g. at86rf230, > see [0]. Is there a reason for refusing to call ->set_channel() at probe time? Anyway, I'll put the symbol duration setting in the registration helper and I will fix hwsim aside. > > - Alex > > [0] https://elixir.bootlin.com/linux/v5.16/source/drivers/net/ieee802154/at86rf230.c#L1553 Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:25:01 -0500: > Hi, > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Now that we have access to all the basic information to know which > > symbol duration should be applied, let's set the symbol duration > > automatically. The two locations that must call for the symbol duration > > to be set are: > > - when manually requesting a channel change though the netlink interface > > - at PHY creation, ieee802154_alloc_hw() already calls > > ieee802154_change_channel() which will now update the symbol duration > > accordingly. > > > > If an information is missing, the symbol duration is not touched, a > > debug message is eventually printed. This keeps the compatibility with > > the unconverted drivers for which it was too complicated for me to find > > their precise information. If they initially provided a symbol duration, > > it would be kept. If they don't, the symbol duration value is left > > untouched. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > include/net/cfg802154.h | 2 + > > net/mac802154/cfg.c | 1 + > > net/mac802154/main.c | 93 +++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 96 insertions(+) > > > > diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h > > index 286709a9dd0b..52eefc4b5b4d 100644 > > --- a/include/net/cfg802154.h > > +++ b/include/net/cfg802154.h > > @@ -455,4 +455,6 @@ static inline const char *wpan_phy_name(struct wpan_phy *phy) > > return dev_name(&phy->dev); > > } > > > > +void ieee802154_set_symbol_duration(struct wpan_phy *phy); > > + > > #endif /* __NET_CFG802154_H */ > > diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c > > index 6969f1330ccd..ba57da07c08e 100644 > > --- a/net/mac802154/cfg.c > > +++ b/net/mac802154/cfg.c > > @@ -113,6 +113,7 @@ int ieee802154_change_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel) > > if (!ret) { > > wpan_phy->current_page = page; > > wpan_phy->current_channel = channel; > > + ieee802154_set_symbol_duration(wpan_phy); > > } > > > > return ret; > > We also need to do it in ieee802154_register_hw()? As you probably saw, my idea was to call for a channel change during the registration but you nacked that possibility so I'll indeed have to set the symbol duration manually there. > > diff --git a/net/mac802154/main.c b/net/mac802154/main.c > > index 77a4943f345f..88826c5aa4ba 100644 > > --- a/net/mac802154/main.c > > +++ b/net/mac802154/main.c > > @@ -113,6 +113,99 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) > > } > > EXPORT_SYMBOL(ieee802154_alloc_hw); > > > > +void ieee802154_set_symbol_duration(struct wpan_phy *phy) > > +{ > > + struct phy_page *page = &phy->supported.page[phy->current_page]; > > + struct phy_channels *chan; > > + unsigned int chunk; > > + u32 duration = 0; > > + > > + for (chunk = 0; chunk < page->nchunks; chunk++) { > > + if (page->chunk[chunk].channels & phy->current_channel) .channels still being a bitfield, David allegedly reported that the above line should use "& BIT(phy->current_channel)". > > + break; > > + } > > + > > + if (chunk == page->nchunks) > > + goto set_duration; > > + > > + chan = &page->chunk[chunk]; > > + switch (chan->protocol) { > > + case IEEE802154_BPSK_PHY: > > + switch (chan->band) { > > + case IEEE802154_868_MHZ_BAND: > > + /* 868 MHz BPSK 802.15.4-2003: 20 ksym/s */ > > + duration = 50 * 1000; > > * NSEC_PER_USEC? Oh right, I grepped for USEC_TO_NSEC but the macro was named the other way around, thanks. Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:53:57 -0500: > Hi, > > On Wed, 12 Jan 2022 at 17:30, Alexander Aring <alex.aring@gmail.com> wrote: > > > > Hi, > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > As it is currently designed, the set_channel() cfg802154 hook > > > implemented in the softMAC is doing a couple of checks before actually > > > performing the channel change. However, as we enhance the support for > > > automatically setting the symbol duration during channel changes, it > > > will also be needed to ensure that the corresponding channel as properly > > > be selected at probe time. In order to verify this, we will need to > > > > no, we don't set channels at probe time. We set the > > current_page/channel whatever the default is according to the hardware > > datasheet. I think this channel should be dropped and all drivers set > > s/channel/patch/ I've dropped the patch and put an additional call to _set_symbol_duration() in the hw registration routine as discussed initially. Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:44:02 -0500: > Hi, > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > ... > > + return 0; > > +} > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > index c829e4a75325..40656728c624 100644 > > --- a/net/mac802154/tx.c > > +++ b/net/mac802154/tx.c > > @@ -54,6 +54,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > struct net_device *dev = skb->dev; > > int ret; > > > > + if (unlikely(mac802154_scan_is_ongoing(local))) > > + return NETDEV_TX_BUSY; > > + > > Please look into the functions "ieee802154_wake_queue()" and > "ieee802154_stop_queue()" which prevent this function from being > called. Call stop before starting scanning and wake after scanning is > done or stopped. Mmmh all this is already done, isn't it? - mac802154_trigger_scan_locked() stops the queue before setting the promiscuous mode - mac802154_end_of_scan() wakes the queue after resetting the promiscuous mode to its original state Should I drop the check which stands for an extra precaution? But overall I think I don't understand well this part. What is a bit foggy to me is why the (async) tx implementation does: *Core* *Driver* stop_queue() drv_async_xmit() ------- \------> do something ------- calls ieee802154_xmit_complete() wakeup_queue() <--------/ So we actually disable the queue for transmitting. Why?? > Also there exists a race which exists in your way and also the one > mentioned above. There can still be some transmissions going on... We > need to wait until "all possible" tx completions are done... to be > sure there are really no transmissions going on. However we need to be > sure that a wake cannot be done if a tx completion is done, we need to > avoid it when the scan operation is ongoing as a workaround for this > race. > > This race exists and should be fixed in future work? Yep, this is true, do you have any pointers? Because I looked at the code and for now it appears quite unpractical to add some kind of flushing mechanism on that net queue. I believe we cannot use the netif interface for that so we would have to implement our own mechanism in the ieee802154 core. Thanks, Miquèl
Hi, On Thu, 13 Jan 2022 at 04:32, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:30:35 -0500: > > > Hi, > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > As it is currently designed, the set_channel() cfg802154 hook > > > implemented in the softMAC is doing a couple of checks before actually > > > performing the channel change. However, as we enhance the support for > > > automatically setting the symbol duration during channel changes, it > > > will also be needed to ensure that the corresponding channel as properly > > > be selected at probe time. In order to verify this, we will need to > > > > no, we don't set channels at probe time. We set the > > current_page/channel whatever the default is according to the hardware > > datasheet. I think this channel should be dropped and all drivers set > > the defaults before registering hw as what we do at e.g. at86rf230, > > see [0]. > > Is there a reason for refusing to call ->set_channel() at probe time? > because the current drivers work the way to not set the channel/page during probe time. Also the 802.15.4 spec states that this default value is hardware specific and this goes back whatever the vendor decides. Also you drop the check that if the same channel is already set don't set it which works like a shadow register for registers. Is there a reason why to set a channel during probe time? Are you setting the value which is the default one again? If the driver has a random default value it should choose some and stick to one, the others do whatever the datasheet has after resetting the hardware. I really don't see the sense here why every driver should introduce on driver level a set channel call. At probing time the transceiver registers are already in a state which we should reflect. > Anyway, I'll put the symbol duration setting in the registration helper > and I will fix hwsim aside. > ok, thanks. - Alex
Hi, On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: ... > + case IEEE802154_4030KHZ_MEAN_PRF: > + duration = 3974; > + break; > + case IEEE802154_62890KHZ_MEAN_PRF: > + duration = 1018; > + break; > + default: > + break; > + } > + break; > + default: > + break; > + } > + > +set_duration: > + if (!duration) > + pr_debug("Unknown PHY symbol duration, the driver should be fixed\n"); Why should the driver be fixed? It's more this table which needs to be fixed? > + else > + phy->symbol_duration = duration; Can you also set the lifs/sifs period after the duration is known? - Alex
Hi, On Thu, 13 Jan 2022 at 12:07, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:44:02 -0500: > > > Hi, > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > ... > > > + return 0; > > > +} > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > index c829e4a75325..40656728c624 100644 > > > --- a/net/mac802154/tx.c > > > +++ b/net/mac802154/tx.c > > > @@ -54,6 +54,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > struct net_device *dev = skb->dev; > > > int ret; > > > > > > + if (unlikely(mac802154_scan_is_ongoing(local))) > > > + return NETDEV_TX_BUSY; > > > + > > > > Please look into the functions "ieee802154_wake_queue()" and > > "ieee802154_stop_queue()" which prevent this function from being > > called. Call stop before starting scanning and wake after scanning is > > done or stopped. > > Mmmh all this is already done, isn't it? > - mac802154_trigger_scan_locked() stops the queue before setting the > promiscuous mode > - mac802154_end_of_scan() wakes the queue after resetting the > promiscuous mode to its original state > > Should I drop the check which stands for an extra precaution? > no, I think then it should be a WARN_ON() more without any return (hopefully it will survive). This case should never happen otherwise we have a bug that we wake the queue when we "took control about transmissions" only. Change the name, I think it will be in future not only scan related. Maybe "mac802154_queue_stopped()". Everything which is queued from socket/upperlayer(6lowpan) goes this way. > > But overall I think I don't understand well this part. What is > a bit foggy to me is why the (async) tx implementation does: > > *Core* *Driver* > > stop_queue() > drv_async_xmit() ------- > \------> do something > ------- calls ieee802154_xmit_complete() > wakeup_queue() <--------/ > > So we actually disable the queue for transmitting. Why?? > Because all transceivers have either _one_ transmit framebuffer or one framebuffer for transmit and receive one time. We need to report to stop giving us more skb's while we are busy with one to transmit. This all will/must be changed in future if there is hardware outside which is more powerful and the driver needs to control the flow here. That ieee802154_xmit_complete() calls wakeup_queue need to be forbidden when we are in "synchronous transmit mode"/the queue is stopped. The synchronous transmit mode is not for any hotpath, it's for MLME and I think we also need a per phy lock to avoid multiple synchronous transmissions at one time. Please note that I don't think here only about scan operation, also for other possible MLME-ops. > > Also there exists a race which exists in your way and also the one > > mentioned above. There can still be some transmissions going on... We > > need to wait until "all possible" tx completions are done... to be > > sure there are really no transmissions going on. However we need to be > > sure that a wake cannot be done if a tx completion is done, we need to > > avoid it when the scan operation is ongoing as a workaround for this > > race. > > > > This race exists and should be fixed in future work? > > Yep, this is true, do you have any pointers? Because I looked at the > code and for now it appears quite unpractical to add some kind of > flushing mechanism on that net queue. I believe we cannot use the netif > interface for that so we would have to implement our own mechanism in > the ieee802154 core. yes, we need some kind of "wait_for_completion()" and "complete()". We are currently lucky that we allow only one skb to be transmitted at one time. I think it is okay to put that on a per phy basis... - Alex
Hi Alexander, alex.aring@gmail.com wrote on Thu, 13 Jan 2022 18:36:15 -0500: > Hi, > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > ... > > + case IEEE802154_4030KHZ_MEAN_PRF: > > + duration = 3974; > > + break; > > + case IEEE802154_62890KHZ_MEAN_PRF: > > + duration = 1018; > > + break; > > + default: > > + break; > > + } > > + break; > > + default: > > + break; > > + } > > + > > +set_duration: > > + if (!duration) > > + pr_debug("Unknown PHY symbol duration, the driver should be fixed\n"); > > Why should the driver be fixed? It's more this table which needs to be fixed? Right. > > > + else > > + phy->symbol_duration = duration; > > Can you also set the lifs/sifs period after the duration is known? Done. Thanks, Miquèl
Hi Alexander, alex.aring@gmail.com wrote on Thu, 13 Jan 2022 19:01:56 -0500: > Hi, > > On Thu, 13 Jan 2022 at 12:07, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Alexander, > > > > alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:44:02 -0500: > > > > > Hi, > > > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > ... > > > > + return 0; > > > > +} > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > index c829e4a75325..40656728c624 100644 > > > > --- a/net/mac802154/tx.c > > > > +++ b/net/mac802154/tx.c > > > > @@ -54,6 +54,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > struct net_device *dev = skb->dev; > > > > int ret; > > > > > > > > + if (unlikely(mac802154_scan_is_ongoing(local))) > > > > + return NETDEV_TX_BUSY; > > > > + > > > > > > Please look into the functions "ieee802154_wake_queue()" and > > > "ieee802154_stop_queue()" which prevent this function from being > > > called. Call stop before starting scanning and wake after scanning is > > > done or stopped. > > > > Mmmh all this is already done, isn't it? > > - mac802154_trigger_scan_locked() stops the queue before setting the > > promiscuous mode > > - mac802154_end_of_scan() wakes the queue after resetting the > > promiscuous mode to its original state > > > > Should I drop the check which stands for an extra precaution? > > > > no, I think then it should be a WARN_ON() more without any return > (hopefully it will survive). This case should never happen otherwise > we have a bug that we wake the queue when we "took control about > transmissions" only. > Change the name, I think it will be in future not only scan related. > Maybe "mac802154_queue_stopped()". Everything which is queued from > socket/upperlayer(6lowpan) goes this way. Got it. I've changed the name of the helper, and used an atomic variable there to follow the count. > > But overall I think I don't understand well this part. What is > > a bit foggy to me is why the (async) tx implementation does: > > > > *Core* *Driver* > > > > stop_queue() > > drv_async_xmit() ------- > > \------> do something > > ------- calls ieee802154_xmit_complete() > > wakeup_queue() <--------/ > > > > So we actually disable the queue for transmitting. Why?? > > > > Because all transceivers have either _one_ transmit framebuffer or one > framebuffer for transmit and receive one time. We need to report to > stop giving us more skb's while we are busy with one to transmit. > This all will/must be changed in future if there is hardware outside > which is more powerful and the driver needs to control the flow here. > > That ieee802154_xmit_complete() calls wakeup_queue need to be > forbidden when we are in "synchronous transmit mode"/the queue is > stopped. The synchronous transmit mode is not for any hotpath, it's > for MLME and I think we also need a per phy lock to avoid multiple > synchronous transmissions at one time. Please note that I don't think > here only about scan operation, also for other possible MLME-ops. > First, thank you very much for all your guidance and reviews, I think I have a much clearer understanding now. I've tried to follow your advices, creating: - a way of tracking ongoing transmissions - a synchronous API for MLME transfers I've decided to use the wait_queue + atomic combo which looks nice. Everything seems to work, I just need a bit of time to clean and rework a bit the series before sending a v3. Thanks, Miquèl
Hi, On Fri, 14 Jan 2022 at 13:44, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Alexander, > > alex.aring@gmail.com wrote on Thu, 13 Jan 2022 19:01:56 -0500: > > > Hi, > > > > On Thu, 13 Jan 2022 at 12:07, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > Hi Alexander, > > > > > > alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:44:02 -0500: > > > > > > > Hi, > > > > > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > ... > > > > > + return 0; > > > > > +} > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > > index c829e4a75325..40656728c624 100644 > > > > > --- a/net/mac802154/tx.c > > > > > +++ b/net/mac802154/tx.c > > > > > @@ -54,6 +54,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > > struct net_device *dev = skb->dev; > > > > > int ret; > > > > > > > > > > + if (unlikely(mac802154_scan_is_ongoing(local))) > > > > > + return NETDEV_TX_BUSY; > > > > > + > > > > > > > > Please look into the functions "ieee802154_wake_queue()" and > > > > "ieee802154_stop_queue()" which prevent this function from being > > > > called. Call stop before starting scanning and wake after scanning is > > > > done or stopped. > > > > > > Mmmh all this is already done, isn't it? > > > - mac802154_trigger_scan_locked() stops the queue before setting the > > > promiscuous mode > > > - mac802154_end_of_scan() wakes the queue after resetting the > > > promiscuous mode to its original state > > > > > > Should I drop the check which stands for an extra precaution? > > > > > > > no, I think then it should be a WARN_ON() more without any return > > (hopefully it will survive). This case should never happen otherwise > > we have a bug that we wake the queue when we "took control about > > transmissions" only. > > Change the name, I think it will be in future not only scan related. > > Maybe "mac802154_queue_stopped()". Everything which is queued from > > socket/upperlayer(6lowpan) goes this way. > > Got it. > > I've changed the name of the helper, and used an atomic variable there > to follow the count. > > > > But overall I think I don't understand well this part. What is > > > a bit foggy to me is why the (async) tx implementation does: > > > > > > *Core* *Driver* > > > > > > stop_queue() > > > drv_async_xmit() ------- > > > \------> do something > > > ------- calls ieee802154_xmit_complete() > > > wakeup_queue() <--------/ > > > > > > So we actually disable the queue for transmitting. Why?? > > > > > > > Because all transceivers have either _one_ transmit framebuffer or one > > framebuffer for transmit and receive one time. We need to report to > > stop giving us more skb's while we are busy with one to transmit. > > This all will/must be changed in future if there is hardware outside > > which is more powerful and the driver needs to control the flow here. > > > > That ieee802154_xmit_complete() calls wakeup_queue need to be > > forbidden when we are in "synchronous transmit mode"/the queue is > > stopped. The synchronous transmit mode is not for any hotpath, it's > > for MLME and I think we also need a per phy lock to avoid multiple > > synchronous transmissions at one time. Please note that I don't think > > here only about scan operation, also for other possible MLME-ops. > > > > First, thank you very much for all your guidance and reviews, I think I > have a much clearer understanding now. > > I've tried to follow your advices, creating: > - a way of tracking ongoing transmissions > - a synchronous API for MLME transfers > Please note that I think we cannot use netif_stop_queue() from context outside of netif xmit() callback. It's because the atomic counter itself is racy in xmit(), we need to be sure xmit() can't occur while stopping the queue. I think maybe "netif_tx_disable()" is the right call to stop from another context, because it holds the tx_lock, which I believe is held while xmit(). Where the wake queue call should be fine to call..., maybe we can remove some EXPORT_SYMBOL() then? I saw that some drivers call "ieee802154_wake_queue()" in error cases, may we introduce a new helper "?ieee802154_xmit_error?" for error cases so you can also catch error cases in your sync tx. See `grep -r "ieee802154_wake_queue" drivers/net/ieee802154`, if we have more information we might add more meaning into the error cases (e.g. proper errno). > I've decided to use the wait_queue + atomic combo which looks nice. > Everything seems to work, I just need a bit of time to clean and rework > a bit the series before sending a v3. > Okay, sounds good to implement both requirements. - Alex
Hi, On Sun, 16 Jan 2022 at 17:44, Alexander Aring <alex.aring@gmail.com> wrote: > > Hi, > > On Fri, 14 Jan 2022 at 13:44, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Alexander, > > > > alex.aring@gmail.com wrote on Thu, 13 Jan 2022 19:01:56 -0500: > > > > > Hi, > > > > > > On Thu, 13 Jan 2022 at 12:07, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > > > Hi Alexander, > > > > > > > > alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:44:02 -0500: > > > > > > > > > Hi, > > > > > > > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > ... > > > > > > + return 0; > > > > > > +} > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > > > index c829e4a75325..40656728c624 100644 > > > > > > --- a/net/mac802154/tx.c > > > > > > +++ b/net/mac802154/tx.c > > > > > > @@ -54,6 +54,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > > > struct net_device *dev = skb->dev; > > > > > > int ret; > > > > > > > > > > > > + if (unlikely(mac802154_scan_is_ongoing(local))) > > > > > > + return NETDEV_TX_BUSY; > > > > > > + > > > > > > > > > > Please look into the functions "ieee802154_wake_queue()" and > > > > > "ieee802154_stop_queue()" which prevent this function from being > > > > > called. Call stop before starting scanning and wake after scanning is > > > > > done or stopped. > > > > > > > > Mmmh all this is already done, isn't it? > > > > - mac802154_trigger_scan_locked() stops the queue before setting the > > > > promiscuous mode > > > > - mac802154_end_of_scan() wakes the queue after resetting the > > > > promiscuous mode to its original state > > > > > > > > Should I drop the check which stands for an extra precaution? > > > > > > > > > > no, I think then it should be a WARN_ON() more without any return > > > (hopefully it will survive). This case should never happen otherwise > > > we have a bug that we wake the queue when we "took control about > > > transmissions" only. > > > Change the name, I think it will be in future not only scan related. > > > Maybe "mac802154_queue_stopped()". Everything which is queued from > > > socket/upperlayer(6lowpan) goes this way. > > > > Got it. > > > > I've changed the name of the helper, and used an atomic variable there > > to follow the count. > > > > > > But overall I think I don't understand well this part. What is > > > > a bit foggy to me is why the (async) tx implementation does: > > > > > > > > *Core* *Driver* > > > > > > > > stop_queue() > > > > drv_async_xmit() ------- > > > > \------> do something > > > > ------- calls ieee802154_xmit_complete() > > > > wakeup_queue() <--------/ > > > > > > > > So we actually disable the queue for transmitting. Why?? > > > > > > > > > > Because all transceivers have either _one_ transmit framebuffer or one > > > framebuffer for transmit and receive one time. We need to report to > > > stop giving us more skb's while we are busy with one to transmit. > > > This all will/must be changed in future if there is hardware outside > > > which is more powerful and the driver needs to control the flow here. > > > > > > That ieee802154_xmit_complete() calls wakeup_queue need to be > > > forbidden when we are in "synchronous transmit mode"/the queue is > > > stopped. The synchronous transmit mode is not for any hotpath, it's > > > for MLME and I think we also need a per phy lock to avoid multiple > > > synchronous transmissions at one time. Please note that I don't think > > > here only about scan operation, also for other possible MLME-ops. > > > > > > > First, thank you very much for all your guidance and reviews, I think I > > have a much clearer understanding now. > > > > I've tried to follow your advices, creating: > > - a way of tracking ongoing transmissions > > - a synchronous API for MLME transfers > > > > Please note that I think we cannot use netif_stop_queue() from context > outside of netif xmit() callback. It's because the atomic counter > itself is racy in xmit(), we need to be sure xmit() can't occur while > stopping the queue. I think maybe "netif_tx_disable()" is the right > call to stop from another context, because it holds the tx_lock, which > I believe is held while xmit(). > Where the wake queue call should be fine to call..., maybe we can > remove some EXPORT_SYMBOL() then? > I am sorry, that comment should go below. > I saw that some drivers call "ieee802154_wake_queue()" in error cases, > may we introduce a new helper "?ieee802154_xmit_error?" for error > cases so you can also catch error cases in your sync tx. See `grep -r > "ieee802154_wake_queue" drivers/net/ieee802154`, if we have more > information we might add more meaning into the error cases (e.g. > proper errno). maybe we can remove some EXPORT_SYMBOL() then? - Alex
Hi Alexander, alex.aring@gmail.com wrote on Sun, 16 Jan 2022 17:44:18 -0500: > Hi, > > On Fri, 14 Jan 2022 at 13:44, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > Hi Alexander, > > > > alex.aring@gmail.com wrote on Thu, 13 Jan 2022 19:01:56 -0500: > > > > > Hi, > > > > > > On Thu, 13 Jan 2022 at 12:07, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > > > Hi Alexander, > > > > > > > > alex.aring@gmail.com wrote on Wed, 12 Jan 2022 17:44:02 -0500: > > > > > > > > > Hi, > > > > > > > > > > On Wed, 12 Jan 2022 at 12:33, Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > ... > > > > > > + return 0; > > > > > > +} > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > > > index c829e4a75325..40656728c624 100644 > > > > > > --- a/net/mac802154/tx.c > > > > > > +++ b/net/mac802154/tx.c > > > > > > @@ -54,6 +54,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > > > struct net_device *dev = skb->dev; > > > > > > int ret; > > > > > > > > > > > > + if (unlikely(mac802154_scan_is_ongoing(local))) > > > > > > + return NETDEV_TX_BUSY; > > > > > > + > > > > > > > > > > Please look into the functions "ieee802154_wake_queue()" and > > > > > "ieee802154_stop_queue()" which prevent this function from being > > > > > called. Call stop before starting scanning and wake after scanning is > > > > > done or stopped. > > > > > > > > Mmmh all this is already done, isn't it? > > > > - mac802154_trigger_scan_locked() stops the queue before setting the > > > > promiscuous mode > > > > - mac802154_end_of_scan() wakes the queue after resetting the > > > > promiscuous mode to its original state > > > > > > > > Should I drop the check which stands for an extra precaution? > > > > > > > > > > no, I think then it should be a WARN_ON() more without any return > > > (hopefully it will survive). This case should never happen otherwise > > > we have a bug that we wake the queue when we "took control about > > > transmissions" only. > > > Change the name, I think it will be in future not only scan related. > > > Maybe "mac802154_queue_stopped()". Everything which is queued from > > > socket/upperlayer(6lowpan) goes this way. > > > > Got it. > > > > I've changed the name of the helper, and used an atomic variable there > > to follow the count. > > > > > > But overall I think I don't understand well this part. What is > > > > a bit foggy to me is why the (async) tx implementation does: > > > > > > > > *Core* *Driver* > > > > > > > > stop_queue() > > > > drv_async_xmit() ------- > > > > \------> do something > > > > ------- calls ieee802154_xmit_complete() > > > > wakeup_queue() <--------/ > > > > > > > > So we actually disable the queue for transmitting. Why?? > > > > > > > > > > Because all transceivers have either _one_ transmit framebuffer or one > > > framebuffer for transmit and receive one time. We need to report to > > > stop giving us more skb's while we are busy with one to transmit. > > > This all will/must be changed in future if there is hardware outside > > > which is more powerful and the driver needs to control the flow here. > > > > > > That ieee802154_xmit_complete() calls wakeup_queue need to be > > > forbidden when we are in "synchronous transmit mode"/the queue is > > > stopped. The synchronous transmit mode is not for any hotpath, it's > > > for MLME and I think we also need a per phy lock to avoid multiple > > > synchronous transmissions at one time. Please note that I don't think > > > here only about scan operation, also for other possible MLME-ops. > > > > > > > First, thank you very much for all your guidance and reviews, I think I > > have a much clearer understanding now. > > > > I've tried to follow your advices, creating: > > - a way of tracking ongoing transmissions > > - a synchronous API for MLME transfers > > > > Please note that I think we cannot use netif_stop_queue() from context > outside of netif xmit() callback. It's because the atomic counter > itself is racy in xmit(), we need to be sure xmit() can't occur while > stopping the queue. In my current implementation I don't see this as a real problem because for me, there is no real difference between: - a transfer is started - we call stop_queue() * right here a transfer is ongoing * and - we call stop_queue() - the counter is racy hence a last transfer is started * right here a transfer is ongoing * because stopping the queue and "flushing" it are two different things. In the code I don't only rely on the queue being stopped but if I don't want any more transfer to happen after that, so I also sync the queue thanks to the new helpers introduced. Please check v3 (which is coming very soon) and tell me what you think. Maybe I missed something. > I think maybe "netif_tx_disable()" is the right > call to stop from another context, because it holds the tx_lock, which > I believe is held while xmit(). > Where the wake queue call should be fine to call..., maybe we can > remove some EXPORT_SYMBOL() then? > > I saw that some drivers call "ieee802154_wake_queue()" in error cases, > may we introduce a new helper "?ieee802154_xmit_error?" for error > cases so you can also catch error cases in your sync tx. See `grep -r > "ieee802154_wake_queue" drivers/net/ieee802154`, if we have more > information we might add more meaning into the error cases (e.g. > proper errno). Most of the time the calling functions are void functions. In fact they all simply hardcode the xmit_done helper and even worse, sometimes they simply leak the skb. I've handled that already by updating all these callers to be sure the only way out is to call xmit_done, which helps a lot tracking transfers. Also, you are right, we can certainly drop a couple of EXPORT_SYMBOLS :-) > > I've decided to use the wait_queue + atomic combo which looks nice. > > Everything seems to work, I just need a bit of time to clean and rework > > a bit the series before sending a v3. > > > > Okay, sounds good to implement both requirements. > > - Alex Thanks, Miquèl