mbox series

[v3,00/41] IEEE 802.15.4 scan support

Message ID 20220117115440.60296-1-miquel.raynal@bootlin.com
Headers show
Series IEEE 802.15.4 scan support | expand

Message

Miquel Raynal Jan. 17, 2022, 11:53 a.m. UTC
Hello,

	*** TLDR ***

Here is a series attempting to bring support for scans in the
IEEE 802.15.4 stack. A number of improvements had to be made, including:
* a better handling of the symbol durations
* a few changes in Kconfig
* a better handling of the tx queues
* a synchronous Tx API

Active and passive scans can be locally tested only with hwsim.

Sorry for the big series, might be split in the near future.

	************

A second series aligning the tooling with these changes is related,
bringing support for a number of new features such as:

* Sending (or stopping) beacons. Intervals ranging from 0 to 14 are
  valid for passively sending beacons at regular intervals. An interval
  of 15 would request the core to answer to received BEACON_REQ.
  # iwpan dev wpan0 beacons send interval 2 # send BEACON at a fixed rate
  # iwpan dev wpan0 beacons send interval 15 # answer BEACON_REQ only
  # iwpan dev wpan0 beacons stop # apply to both cases

* Scanning all the channels or only a subset:
  # iwpan dev wpan1 scan type passive duration 3 # will not trigger BEACON_REQ
  # iwpan dev wpan1 scan type active duration 3 # will trigger BEACON_REQ

* If a beacon is received during a scan, the internal PAN list is
  updated and can be dumped, flushed and configured with:
  # iwpan dev wpan1 pans dump
  PAN 0xffff (on wpan1)
      coordinator 0x2efefdd4cdbf9330
      page 0
      channel 13
      superframe spec. 0xcf22
      LQI 0
      seen 7156ms ago
  # iwpan dev wpan1 pans flush
  # iwpan dev wpan1 set max_pan_entries 100
  # iwpan dev wpan1 set pans_expiration 3600

* It is also possible to monitor the events with:
  # iwpan event

* As well as triggering a non blocking scan:
  # iwpan dev wpan1 scan trigger type passive duration 3
  # iwpan dev wpan1 scan done
  # iwpan dev wpan1 scan abort

The PAN list gets automatically updated by dropping the expired PANs
each time the user requests access to the list.

Internally, both requests (scan/beacons) are handled periodically by
delayed workqueues when relevant.

So far the only technical point that is missing in this series is the
possibility to grab a reference over the module driving the net device
in order to prevent module unloading during a scan or when the beacons
work is ongoing.

Finally, this series is a deep reshuffle of David Girault's original
work, hence the fact that he is almost systematically credited, either
by being the only author when I created the patches based on his changes
with almost no modification, or with a Co-developped-by tag whenever the
final code base is significantly different than his first proposal while
still being greatly inspired from it.

Cheers,
Miquèl

Changes in v3:
* Dropped two patches:
  net: mac802154: Split the set channel hook implementation
  net: mac802154: Ensure proper channel selection at probe time
* Fixed a check against the supported channels list in
  ieee802154_set_symbol_duration().
* Reworded a bit the above helper to print different error messages and
  dropped the goto statement in it.
* Used the NSEC_PER_USEC macro in the symbol conversion from us to ns.
* Stopped calling ->set_channel() at probe time.
* Fixed hwsim which does not internally set the right channel.
* Used definitions instead of hardcoded values when relevant.
* Moved two helpers out of the experimental section because they are now
  used outside of experimental code.
* Did a number of renames. Added a couple of comments.
* Updated several drivers to force them to use the core xmit complete
  callback instead of workarounding it.
* Created a helper checking if a queue must be kept on hold.
* Created a couple of atomic variables and wait_queue_t per phy.
* Created a sync API for MLME transmissions.
* Created a hot path and a slow path.
* Put the warning in the hot path.
* Added a flag to prevent drivers supporting only datagrams to use the
  different scanning features.
* Dropped ieee802154_wake/stop_queue() from the exported
  symbols. Drivers should not use these directly, but call other helpers
  in order to fail the tx counters.

Changes in v2:
* Create two new netlink commands to set the maximum number of PANs that
  can be listed as well as their expiration time (in seconds).
* Added a patch to the series to avoid ignoring bad frames in hwsim as
  requested by Alexander.
* Changed the symbol duration type to receive nanoseconds instead of
  microseconds.
* Dropped most of the hwsim patches and reworked how drivers advertise
  their channels in order to be capable of deriving the symbol durations
  automatically.
* The scanning boolean gets turned into an atomic.
* The ca8210 driver does not support scanning, implement the driver
  hooks to reflect the situation.
* Reworked a bit the content of each patch to ease the introduction of
  active scans. 
* Added active scan support.

David Girault (5):
  net: ieee802154: Move IEEE 802.15.4 Kconfig main entry
  net: mac802154: Include the softMAC stack inside the IEEE 802.15.4
    menu
  net: ieee802154: Move the address structure earlier
  net: ieee802154: Add a kernel doc header to the ieee802154_addr
    structure
  net: ieee802154: Trace the registration of new PANs

Miquel Raynal (36):
  MAINTAINERS: Remove Harry Morris bouncing address
  net: ieee802154: hwsim: Ensure proper channel selection at probe time
  net: ieee802154: hwsim: Ensure frame checksum are valid
  net: ieee802154: Use the IEEE802154_MAX_PAGE define when relevant
  net: ieee802154: Improve the way supported channels are declared
  net: ieee802154: Give more details to the core about the channel
    configurations
  net: ieee802154: mcr20a: Fix lifs/sifs periods
  net: mac802154: Convert the symbol duration into nanoseconds
  net: mac802154: Set the symbol duration automatically
  net: ieee802154: Drop duration settings when the core does it already
  net: ieee802154: Return meaningful error codes from the netlink
    helpers
  net: mac802154: Explain the use of ieee802154_wake/stop_queue()
  net: ieee802154: at86rf230: Call the complete helper when a
    transmission is over
  net: ieee802154: atusb: Call the complete helper when a transmission
    is over
  net: ieee802154: ca8210: Call the complete helper when a transmission
    is over
  net: mac802154: Stop exporting ieee802154_wake/stop_queue()
  net: mac802154: Rename the synchronous xmit worker
  net: mac802154: Rename the main tx_work struct
  net: mac802154: Follow the count of ongoing transmissions
  net: mac802154: Hold the transmit queue when relevant
  net: mac802154: Create a hot tx path
  net: mac802154: Add a warning in the hot path
  net: mac802154: Introduce a tx queue flushing mechanism
  net: mac802154: Introduce a synchronous API for MLME commands
  net: ieee802154: Add support for internal PAN management
  net: ieee802154: Define a beacon frame header
  net: ieee802154: Define frame types
  net: ieee802154: Add support for scanning requests
  net: mac802154: Handle scan requests
  net: ieee802154: Full PAN management
  net: ieee802154: Add beacons support
  net: mac802154: Handle beacons requests
  net: mac802154: Add support for active scans
  net: mac802154: Add support for processing beacon requests
  net: ieee802154: Handle limited devices with only datagram support
  net: ieee802154: ca8210: Flag the driver as being limited

 MAINTAINERS                              |   3 +-
 drivers/net/ieee802154/adf7242.c         |   3 +-
 drivers/net/ieee802154/at86rf230.c       |  68 ++-
 drivers/net/ieee802154/atusb.c           |  89 ++--
 drivers/net/ieee802154/ca8210.c          |  17 +-
 drivers/net/ieee802154/cc2520.c          |   3 +-
 drivers/net/ieee802154/fakelb.c          |  43 +-
 drivers/net/ieee802154/mac802154_hwsim.c |  88 +++-
 drivers/net/ieee802154/mcr20a.c          |  11 +-
 drivers/net/ieee802154/mrf24j40.c        |   3 +-
 include/linux/ieee802154.h               |   7 +
 include/net/cfg802154.h                  | 175 ++++++-
 include/net/ieee802154_netdev.h          |  85 ++++
 include/net/mac802154.h                  |  29 +-
 include/net/nl802154.h                   |  99 ++++
 net/Kconfig                              |   3 +-
 net/ieee802154/Kconfig                   |   1 +
 net/ieee802154/Makefile                  |   2 +-
 net/ieee802154/core.c                    |   3 +
 net/ieee802154/core.h                    |  31 ++
 net/ieee802154/header_ops.c              |  67 +++
 net/ieee802154/nl-phy.c                  |  13 +-
 net/ieee802154/nl802154.c                | 556 ++++++++++++++++++++++-
 net/ieee802154/nl802154.h                |   4 +
 net/ieee802154/pan.c                     | 234 ++++++++++
 net/ieee802154/rdev-ops.h                |  52 +++
 net/ieee802154/trace.h                   |  86 ++++
 net/mac802154/Makefile                   |   2 +-
 net/mac802154/cfg.c                      |  82 +++-
 net/mac802154/ieee802154_i.h             |  86 +++-
 net/mac802154/main.c                     | 119 ++++-
 net/mac802154/rx.c                       |  34 +-
 net/mac802154/scan.c                     | 447 ++++++++++++++++++
 net/mac802154/tx.c                       |  48 +-
 net/mac802154/util.c                     |  38 +-
 35 files changed, 2413 insertions(+), 218 deletions(-)
 create mode 100644 net/ieee802154/pan.c
 create mode 100644 net/mac802154/scan.c

Comments

Alexander Aring Jan. 17, 2022, 10:58 p.m. UTC | #1
Hi,

On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> ieee802154_xmit_complete() is the right helper to call when a
> transmission is over. The fact that it completed or not is not really a
> question, but drivers must tell the core that the completion is over,
> even if it was canceled. Do not call ieee802154_wake_queue() manually,
> in order to let full control of this task to the core.
>

This is not a cancellation of a transmission, it is something weird
going on.  Introduce a xmit_error() for this, you call consume_skb()
which is wrong for a non error case.

> By using the complete helper we also avoid leacking the skb structure.
>

Yes, we are leaking here.

- Alex
Alexander Aring Jan. 17, 2022, 11:14 p.m. UTC | #2
Hi,

On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> We should never start a transmission after the queue has been stopped.
>
> But because it might work we don't kill the function here but rather
> warn loudly the user that something is wrong.
>
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  net/mac802154/tx.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> index 18ee6fcfcd7f..de5ecda80472 100644
> --- a/net/mac802154/tx.c
> +++ b/net/mac802154/tx.c
> @@ -112,6 +112,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
>  static netdev_tx_t
>  ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
>  {
> +       WARN_ON(mac802154_queue_is_stopped(local));
> +

we should do a WARN_ON_ONCE() in this hot function.

- Alex
Alexander Aring Jan. 18, 2022, 12:34 a.m. UTC | #3
Hi,

On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> ieee802154_xmit_complete() is the right helper to call when a
> transmission is over. The fact that it completed or not is not really a
> question, but drivers must tell the core that the completion is over,
> even if it was canceled. Do not call ieee802154_wake_queue() manually,
> in order to let full control of this task to the core.
>
> By using the complete helper we also avoid leacking the skb structure.
>
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/net/ieee802154/at86rf230.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> index 583f835c317a..1941e1f3d2ef 100644
> --- a/drivers/net/ieee802154/at86rf230.c
> +++ b/drivers/net/ieee802154/at86rf230.c
> @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
>         if (ctx->free)
>                 kfree(ctx);
>
> -       ieee802154_wake_queue(lp->hw);
> +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);

also this lp->tx_skb can be a dangled pointer, after xmit_complete()
we need to set it to NULL in a xmit_error() we can check on NULL
before calling kfree_skb().

- Alex
Alexander Aring Jan. 18, 2022, 12:36 a.m. UTC | #4
Hi,

On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote:
>
> Hi,
>
> On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > ieee802154_xmit_complete() is the right helper to call when a
> > transmission is over. The fact that it completed or not is not really a
> > question, but drivers must tell the core that the completion is over,
> > even if it was canceled. Do not call ieee802154_wake_queue() manually,
> > in order to let full control of this task to the core.
> >
> > By using the complete helper we also avoid leacking the skb structure.
> >
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >  drivers/net/ieee802154/at86rf230.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> > index 583f835c317a..1941e1f3d2ef 100644
> > --- a/drivers/net/ieee802154/at86rf230.c
> > +++ b/drivers/net/ieee802154/at86rf230.c
> > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
> >         if (ctx->free)
> >                 kfree(ctx);
> >
> > -       ieee802154_wake_queue(lp->hw);
> > +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);
>
> also this lp->tx_skb can be a dangled pointer, after xmit_complete()
> we need to set it to NULL in a xmit_error() we can check on NULL
> before calling kfree_skb().
>

forget the NULL checking, it's already done by core. However in some
cases this is called with a dangled pointer on lp->tx_skb.

- Alex
Miquel Raynal Jan. 18, 2022, 10:40 a.m. UTC | #5
Hi Alexander,

> > So far the only technical point that is missing in this series is the
> > possibility to grab a reference over the module driving the net device
> > in order to prevent module unloading during a scan or when the beacons
> > work is ongoing.

Do you have any advises regarding this issue? That is the only
technical point that is left unaddressed IMHO.

> > Finally, this series is a deep reshuffle of David Girault's original
> > work, hence the fact that he is almost systematically credited, either
> > by being the only author when I created the patches based on his changes
> > with almost no modification, or with a Co-developped-by tag whenever the
> > final code base is significantly different than his first proposal while
> > still being greatly inspired from it.
> >  
> 
> can you please split this patch series, what I see is now:
> 
> 1. cleanup patches
> 2. sync tx handling for mlme commands
> 3. scan support

Works for me. I just wanted to give the big picture but I'll split the
series.

Also sorry for forgetting the 'wpan-next' subject prefix.

> we try to bring the patches upstream in this order.
> 
> Thanks.
> 
> - Alex


Thanks,
Miquèl
Miquel Raynal Jan. 18, 2022, 6:20 p.m. UTC | #6
Hi Alexander,

alex.aring@gmail.com wrote on Mon, 17 Jan 2022 18:14:17 -0500:

> Hi,
> 
> On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > We should never start a transmission after the queue has been stopped.
> >
> > But because it might work we don't kill the function here but rather
> > warn loudly the user that something is wrong.
> >
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >  net/mac802154/tx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > index 18ee6fcfcd7f..de5ecda80472 100644
> > --- a/net/mac802154/tx.c
> > +++ b/net/mac802154/tx.c
> > @@ -112,6 +112,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
> >  static netdev_tx_t
> >  ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
> >  {
> > +       WARN_ON(mac802154_queue_is_stopped(local));
> > +  
> 
> we should do a WARN_ON_ONCE() in this hot function.

Sure!

Thanks,
Miquèl
Miquel Raynal Jan. 18, 2022, 6:22 p.m. UTC | #7
Hi Alexander,

alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500:

> Hi,
> 
> On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > >
> > > ieee802154_xmit_complete() is the right helper to call when a
> > > transmission is over. The fact that it completed or not is not really a
> > > question, but drivers must tell the core that the completion is over,
> > > even if it was canceled. Do not call ieee802154_wake_queue() manually,
> > > in order to let full control of this task to the core.
> > >
> > > By using the complete helper we also avoid leacking the skb structure.
> > >
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  drivers/net/ieee802154/at86rf230.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> > > index 583f835c317a..1941e1f3d2ef 100644
> > > --- a/drivers/net/ieee802154/at86rf230.c
> > > +++ b/drivers/net/ieee802154/at86rf230.c
> > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
> > >         if (ctx->free)
> > >                 kfree(ctx);
> > >
> > > -       ieee802154_wake_queue(lp->hw);
> > > +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);  
> >
> > also this lp->tx_skb can be a dangled pointer, after xmit_complete()
> > we need to set it to NULL in a xmit_error() we can check on NULL
> > before calling kfree_skb().
> >  
> 
> forget the NULL checking, it's already done by core. However in some
> cases this is called with a dangled pointer on lp->tx_skb.

I'll try to fix these dangling situation first if I find them.

I'll also introduce a xmit_error() helper as you suggest.

Thanks,
Miquèl
Alexander Aring Jan. 18, 2022, 10:53 p.m. UTC | #8
Hi,

On Tue, 18 Jan 2022 at 13:20, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> alex.aring@gmail.com wrote on Mon, 17 Jan 2022 17:52:10 -0500:
>
> > Hi,
> >
> > On Mon, 17 Jan 2022 at 06:54, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > These periods are expressed in time units (microseconds) while 40 and 12
> > > are the number of symbol durations these periods will last. We need to
> > > multiply them both with phy->symbol_duration in order to get these
> > > values in microseconds.
> > >
> > > Fixes: 8c6ad9cc5157 ("ieee802154: Add NXP MCR20A IEEE 802.15.4 transceiver driver")
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  drivers/net/ieee802154/mcr20a.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/ieee802154/mcr20a.c b/drivers/net/ieee802154/mcr20a.c
> > > index f0eb2d3b1c4e..e2c249aef430 100644
> > > --- a/drivers/net/ieee802154/mcr20a.c
> > > +++ b/drivers/net/ieee802154/mcr20a.c
> > > @@ -976,8 +976,8 @@ static void mcr20a_hw_setup(struct mcr20a_local *lp)
> > >         dev_dbg(printdev(lp), "%s\n", __func__);
> > >
> > >         phy->symbol_duration = 16;
> > > -       phy->lifs_period = 40;
> > > -       phy->sifs_period = 12;
> > > +       phy->lifs_period = 40 * phy->symbol_duration;
> > > +       phy->sifs_period = 12 * phy->symbol_duration;
> >
> > I thought we do that now in register_hw(). Why does this patch exist?
>
> The lifs and sifs period are wrong.
>
> Fixing this silently by generalizing the calculation is simply wrong. I
> feel we need to do this in order:
> 1- Fix the period because it is wrong.
> 2- Now that the period is set to a valid value and the core is able to
>    do the same operation and set the variables to an identical content,
>    we can drop these lines from the driver.
>
> #2 being a mechanical change, doing it without #1 means that something
> that appears harmless actually changes the behavior of the driver. We
> generally try to avoid that, no?

yes, maybe Stefan can get this patch then somehow to wpan and queue it
for stable.

Thanks for clarification.

- Alex
Alexander Aring Jan. 18, 2022, 11:12 p.m. UTC | #9
Hi,

On Tue, 18 Jan 2022 at 05:40, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> > > So far the only technical point that is missing in this series is the
> > > possibility to grab a reference over the module driving the net device
> > > in order to prevent module unloading during a scan or when the beacons
> > > work is ongoing.
>
> Do you have any advises regarding this issue? That is the only
> technical point that is left unaddressed IMHO.
>

module_get()/module_put() or I don't see where the problem here is.
You can avoid module unloading with it. Which module is the problem
here?

> > > Finally, this series is a deep reshuffle of David Girault's original
> > > work, hence the fact that he is almost systematically credited, either
> > > by being the only author when I created the patches based on his changes
> > > with almost no modification, or with a Co-developped-by tag whenever the
> > > final code base is significantly different than his first proposal while
> > > still being greatly inspired from it.
> > >
> >
> > can you please split this patch series, what I see is now:
> >
> > 1. cleanup patches
> > 2. sync tx handling for mlme commands
> > 3. scan support
>
> Works for me. I just wanted to give the big picture but I'll split the
> series.
>

maybe also put some "symbol duration" series into it if it's getting
too large? It is difficult to review 40 patches... in one step.

> Also sorry for forgetting the 'wpan-next' subject prefix.
>

no problem.

I really appreciate your work and your willingness to work on all
outstanding issues. I am really happy to see something that we can use
for mlme-commands and to separate it from the hotpath transmission...
It is good to see architecture for that which I think goes in the
right direction.

- Alex
Miquel Raynal Jan. 19, 2022, 10:45 p.m. UTC | #10
Hi Alexander,

alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500:

> Hi,
> 
> On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > >
> > > ieee802154_xmit_complete() is the right helper to call when a
> > > transmission is over. The fact that it completed or not is not really a
> > > question, but drivers must tell the core that the completion is over,
> > > even if it was canceled. Do not call ieee802154_wake_queue() manually,
> > > in order to let full control of this task to the core.
> > >
> > > By using the complete helper we also avoid leacking the skb structure.
> > >
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  drivers/net/ieee802154/at86rf230.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> > > index 583f835c317a..1941e1f3d2ef 100644
> > > --- a/drivers/net/ieee802154/at86rf230.c
> > > +++ b/drivers/net/ieee802154/at86rf230.c
> > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
> > >         if (ctx->free)
> > >                 kfree(ctx);
> > >
> > > -       ieee802154_wake_queue(lp->hw);
> > > +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);  
> >
> > also this lp->tx_skb can be a dangled pointer, after xmit_complete()
> > we need to set it to NULL in a xmit_error() we can check on NULL
> > before calling kfree_skb().

I've created a xmit_error() helper as suggested, which call
dev_kfree_skb_any() instead of *consume_skb*().

> 
> forget the NULL checking, it's already done by core.

Indeed, it is.

> However in some
> cases this is called with a dangled pointer on lp->tx_skb.

I've fixed that by setting it to NULL after the call to the xmit_error
helper.

> 
> - Alex


Thanks,
Miquèl
Miquel Raynal Jan. 19, 2022, 10:56 p.m. UTC | #11
Hi Alexander,

alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500:

> Hi,
> 
> On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > >
> > > ieee802154_xmit_complete() is the right helper to call when a
> > > transmission is over. The fact that it completed or not is not really a
> > > question, but drivers must tell the core that the completion is over,
> > > even if it was canceled. Do not call ieee802154_wake_queue() manually,
> > > in order to let full control of this task to the core.
> > >
> > > By using the complete helper we also avoid leacking the skb structure.
> > >
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  drivers/net/ieee802154/at86rf230.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> > > index 583f835c317a..1941e1f3d2ef 100644
> > > --- a/drivers/net/ieee802154/at86rf230.c
> > > +++ b/drivers/net/ieee802154/at86rf230.c
> > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
> > >         if (ctx->free)
> > >                 kfree(ctx);
> > >
> > > -       ieee802154_wake_queue(lp->hw);
> > > +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);  
> >
> > also this lp->tx_skb can be a dangled pointer, after xmit_complete()
> > we need to set it to NULL in a xmit_error() we can check on NULL
> > before calling kfree_skb().
> >  
> 
> forget the NULL checking, it's already done by core. However in some
> cases this is called with a dangled pointer on lp->tx_skb.

Actually I don't see why tx_skb is dangling?

There is no function that could accesses lp->tx_skb between the free
operation and the next call to ->xmit() which does a lp->tx_skb = skb.
Am I missing something?

Thanks,
Miquèl
Alexander Aring Jan. 19, 2022, 11:34 p.m. UTC | #12
Hi,

On Wed, 19 Jan 2022 at 17:56, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500:
>
> > Hi,
> >
> > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > >
> > > > ieee802154_xmit_complete() is the right helper to call when a
> > > > transmission is over. The fact that it completed or not is not really a
> > > > question, but drivers must tell the core that the completion is over,
> > > > even if it was canceled. Do not call ieee802154_wake_queue() manually,
> > > > in order to let full control of this task to the core.
> > > >
> > > > By using the complete helper we also avoid leacking the skb structure.
> > > >
> > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > ---
> > > >  drivers/net/ieee802154/at86rf230.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> > > > index 583f835c317a..1941e1f3d2ef 100644
> > > > --- a/drivers/net/ieee802154/at86rf230.c
> > > > +++ b/drivers/net/ieee802154/at86rf230.c
> > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
> > > >         if (ctx->free)
> > > >                 kfree(ctx);
> > > >
> > > > -       ieee802154_wake_queue(lp->hw);
> > > > +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);
> > >
> > > also this lp->tx_skb can be a dangled pointer, after xmit_complete()
> > > we need to set it to NULL in a xmit_error() we can check on NULL
> > > before calling kfree_skb().
> > >
> >
> > forget the NULL checking, it's already done by core. However in some
> > cases this is called with a dangled pointer on lp->tx_skb.
>
> Actually I don't see why tx_skb is dangling?
>
> There is no function that could accesses lp->tx_skb between the free
> operation and the next call to ->xmit() which does a lp->tx_skb = skb.
> Am I missing something?
>

look into at86rf230_sync_state_change() it is a sync over async and
the function "at86rf230_async_error_recover_complete()" is a generic
error handling to recover from a state change. It's e.g. being used in
e.g. at86rf230_start() which can occur in cases which are not xmit
related.

Indeed there is no dangled pointer in the irq handling, sorry. I
thought maybe the receive handling but the transceiver is doing a lot
of its own state change handling because of some framebuffer
protection which is not the case.

- Alex
Alexander Aring Jan. 20, 2022, 12:19 a.m. UTC | #13
Hi,

On Wed, 19 Jan 2022 at 18:34, Alexander Aring <alex.aring@gmail.com> wrote:
>
> Hi,
>
> On Wed, 19 Jan 2022 at 17:56, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Alexander,
> >
> > alex.aring@gmail.com wrote on Mon, 17 Jan 2022 19:36:39 -0500:
> >
> > > Hi,
> > >
> > > On Mon, 17 Jan 2022 at 19:34, Alexander Aring <alex.aring@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Mon, 17 Jan 2022 at 06:55, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > > >
> > > > > ieee802154_xmit_complete() is the right helper to call when a
> > > > > transmission is over. The fact that it completed or not is not really a
> > > > > question, but drivers must tell the core that the completion is over,
> > > > > even if it was canceled. Do not call ieee802154_wake_queue() manually,
> > > > > in order to let full control of this task to the core.
> > > > >
> > > > > By using the complete helper we also avoid leacking the skb structure.
> > > > >
> > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > ---
> > > > >  drivers/net/ieee802154/at86rf230.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
> > > > > index 583f835c317a..1941e1f3d2ef 100644
> > > > > --- a/drivers/net/ieee802154/at86rf230.c
> > > > > +++ b/drivers/net/ieee802154/at86rf230.c
> > > > > @@ -343,7 +343,7 @@ at86rf230_async_error_recover_complete(void *context)
> > > > >         if (ctx->free)
> > > > >                 kfree(ctx);
> > > > >
> > > > > -       ieee802154_wake_queue(lp->hw);
> > > > > +       ieee802154_xmit_complete(lp->hw, lp->tx_skb, false);
> > > >
> > > > also this lp->tx_skb can be a dangled pointer, after xmit_complete()
> > > > we need to set it to NULL in a xmit_error() we can check on NULL
> > > > before calling kfree_skb().
> > > >
> > >
> > > forget the NULL checking, it's already done by core. However in some
> > > cases this is called with a dangled pointer on lp->tx_skb.
> >
> > Actually I don't see why tx_skb is dangling?
> >
> > There is no function that could accesses lp->tx_skb between the free
> > operation and the next call to ->xmit() which does a lp->tx_skb = skb.
> > Am I missing something?
> >
>
> look into at86rf230_sync_state_change() it is a sync over async and
> the function "at86rf230_async_error_recover_complete()" is a generic
> error handling to recover from a state change. It's e.g. being used in
> e.g. at86rf230_start() which can occur in cases which are not xmit
> related.
>

which means there is more being broken that we should not simply call
to wake the queue in non-transmit cases...

- Alex
Miquel Raynal Jan. 20, 2022, 12:24 a.m. UTC | #14
Hi Alexander,

alex.aring@gmail.com wrote on Tue, 18 Jan 2022 18:12:49 -0500:

> Hi,
> 
> On Tue, 18 Jan 2022 at 05:40, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Alexander,
> >  
> > > > So far the only technical point that is missing in this series is the
> > > > possibility to grab a reference over the module driving the net device
> > > > in order to prevent module unloading during a scan or when the beacons
> > > > work is ongoing.  
> >
> > Do you have any advises regarding this issue? That is the only
> > technical point that is left unaddressed IMHO.
> >  
> 
> module_get()/module_put() or I don't see where the problem here is.
> You can avoid module unloading with it. Which module is the problem
> here?

I'll give it another try, maybe when I first tried that I was missing a
few mental peaces and did not understood the puzzle correctly.

> > > > Finally, this series is a deep reshuffle of David Girault's original
> > > > work, hence the fact that he is almost systematically credited, either
> > > > by being the only author when I created the patches based on his changes
> > > > with almost no modification, or with a Co-developped-by tag whenever the
> > > > final code base is significantly different than his first proposal while
> > > > still being greatly inspired from it.
> > > >  
> > >
> > > can you please split this patch series, what I see is now:
> > >
> > > 1. cleanup patches
> > > 2. sync tx handling for mlme commands
> > > 3. scan support  
> >
> > Works for me. I just wanted to give the big picture but I'll split the
> > series.
> >  
> 
> maybe also put some "symbol duration" series into it if it's getting
> too large? It is difficult to review 40 patches... in one step.

Yep, I truly understand (and now 50+).

> 
> > Also sorry for forgetting the 'wpan-next' subject prefix.
> >  
> 
> no problem.
> 
> I really appreciate your work and your willingness to work on all
> outstanding issues. I am really happy to see something that we can use
> for mlme-commands and to separate it from the hotpath transmission...
> It is good to see architecture for that which I think goes in the
> right direction.

That is very stirring to read :)

Thanks,
Miquèl