diff mbox series

[2/3] mac80211: Add support to trigger sta disconnect on hardware restart

Message ID 20201215172352.5311-1-youghand@codeaurora.org
State New
Headers show
Series None | expand

Commit Message

Youghandhar Chintala Dec. 15, 2020, 5:23 p.m. UTC
Currently in case of target hardware restart, we just reconfig and
re-enable the security keys and enable the network queues to start
data traffic back from where it was interrupted.

Many ath10k wifi chipsets have sequence numbers for the data
packets assigned by firmware and the mac sequence number will
restart from zero after target hardware restart leading to mismatch
in the sequence number expected by the remote peer vs the sequence
number of the frame sent by the target firmware.

This mismatch in sequence number will cause out-of-order packets
on the remote peer and all the frames sent by the device are dropped
until we reach the sequence number which was sent before we restarted
the target hardware

In order to fix this, we trigger a sta disconnect, for the targets
which expose this corresponding wiphy flag, in case of target hw
restart. After this there will be a fresh connection and thereby
avoiding the dropping of frames by remote peer.

The right fix would be to pull the entire data path into the host
which is not feasible or would need lots of complex changes and
will still be inefficient.

Tested on ath10k using WCN3990, QCA6174

Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org>
---
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/mlme.c        |  9 +++++++++
 net/mac80211/util.c        | 22 +++++++++++++++++++---
 3 files changed, 31 insertions(+), 3 deletions(-)

Comments

Abhishek Kumar Feb. 5, 2021, 9:51 p.m. UTC | #1
Since using DELBA frame to APs to re-establish BA session has a
dependency on APs and also some APs may not honour the DELBA frame. I
am fine with having the disconnect/reconnect solution. The change
looks good to me.

Reviewed-by: Abhishek Kumar <kuabhs@chromium.org>


Thanks
Abhishek

On Thu, Jan 28, 2021 at 12:08 AM <youghand@codeaurora.org> wrote:
>

> On 2020-12-15 23:10, Felix Fietkau wrote:

> > On 2020-12-15 18:23, Youghandhar Chintala wrote:

> >> Currently in case of target hardware restart, we just reconfig and

> >> re-enable the security keys and enable the network queues to start

> >> data traffic back from where it was interrupted.

> >>

> >> Many ath10k wifi chipsets have sequence numbers for the data

> >> packets assigned by firmware and the mac sequence number will

> >> restart from zero after target hardware restart leading to mismatch

> >> in the sequence number expected by the remote peer vs the sequence

> >> number of the frame sent by the target firmware.

> >>

> >> This mismatch in sequence number will cause out-of-order packets

> >> on the remote peer and all the frames sent by the device are dropped

> >> until we reach the sequence number which was sent before we restarted

> >> the target hardware

> >>

> >> In order to fix this, we trigger a sta disconnect, for the targets

> >> which expose this corresponding wiphy flag, in case of target hw

> >> restart. After this there will be a fresh connection and thereby

> >> avoiding the dropping of frames by remote peer.

> >>

> >> The right fix would be to pull the entire data path into the host

> >> which is not feasible or would need lots of complex changes and

> >> will still be inefficient.

> > How about simply tracking which tids have aggregation enabled and send

> > DELBA frames for those after the restart?

> > It would mean less disruption for affected stations and less ugly hacks

> > in the stack for unreliable hardware.

> >

> > - Felix

>

> Hi Felix,

>

> We did try to send an ADDBA frame to the AP once the SSR happened. The

> AP ack’ed the frame and the new BA session with renewed sequence number

> was established. But still, the AP did not respond to the ping requests

> with the new sequence number. It did not respond until one of the two

> happened.

> 1.      The sequence number was more than the sequence number that DUT had

> used before SSR happened

> 2.      DUT disconnected and then reconnected.

> The other option is to send a DELBA frame to the AP and make the AP also

> force to establish the BA session from its side. This we feel can have

> some interoperability issues as some of the AP’s may not honour the

> DELBA frame and will continue to use the earlier BA session that it had

> established. Given that re-negotiating the BA session is prone to IOT

> issues, we feel that it would be good to go with the

> Disconnect/Reconnect solution which is foolproof and will work in all

> scenarios.

>

> Regards,

> Youghandhar
Guenter Roeck Feb. 8, 2021, 3:43 p.m. UTC | #2
On Tue, Dec 15, 2020 at 10:53:52PM +0530, Youghandhar Chintala wrote:
> Currently in case of target hardware restart, we just reconfig and
> re-enable the security keys and enable the network queues to start
> data traffic back from where it was interrupted.
> 
> Many ath10k wifi chipsets have sequence numbers for the data
> packets assigned by firmware and the mac sequence number will
> restart from zero after target hardware restart leading to mismatch
> in the sequence number expected by the remote peer vs the sequence
> number of the frame sent by the target firmware.
> 
> This mismatch in sequence number will cause out-of-order packets
> on the remote peer and all the frames sent by the device are dropped
> until we reach the sequence number which was sent before we restarted
> the target hardware
> 
> In order to fix this, we trigger a sta disconnect, for the targets
> which expose this corresponding wiphy flag, in case of target hw
> restart. After this there will be a fresh connection and thereby
> avoiding the dropping of frames by remote peer.
> 
> The right fix would be to pull the entire data path into the host
> which is not feasible or would need lots of complex changes and
> will still be inefficient.
> 
> Tested on ath10k using WCN3990, QCA6174
> 
> Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org>
> Reviewed-by: Abhishek Kumar <kuabhs@chromium.org>
> ---
>  net/mac80211/ieee80211_i.h |  3 +++
>  net/mac80211/mlme.c        |  9 +++++++++
>  net/mac80211/util.c        | 22 +++++++++++++++++++---
>  3 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index cde2e3f..8cbeb5f 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -748,6 +748,8 @@ struct ieee80211_if_mesh {
>   *	back to wireless media and to the local net stack.
>   * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume.
>   * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver
> + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart
> + *	recovery
>   */
>  enum ieee80211_sub_if_data_flags {
>  	IEEE80211_SDATA_ALLMULTI		= BIT(0),
> @@ -755,6 +757,7 @@ enum ieee80211_sub_if_data_flags {
>  	IEEE80211_SDATA_DONT_BRIDGE_PACKETS	= BIT(3),
>  	IEEE80211_SDATA_DISCONNECT_RESUME	= BIT(4),
>  	IEEE80211_SDATA_IN_DRIVER		= BIT(5),
> +	IEEE80211_SDATA_DISCONNECT_HW_RESTART	= BIT(6),
>  };
>  
>  /**
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index 6adfcb9..e4d0d16 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -4769,6 +4769,15 @@ void ieee80211_sta_restart(struct ieee80211_sub_if_data *sdata)
>  					      true);
>  		sdata_unlock(sdata);
>  		return;
> +	} else if (sdata->flags & IEEE80211_SDATA_DISCONNECT_HW_RESTART) {
> +		sdata->flags &= ~IEEE80211_SDATA_DISCONNECT_HW_RESTART;
> +		mlme_dbg(sdata, "driver requested disconnect after hardware restart\n");
> +		ieee80211_sta_connection_lost(sdata,
> +					      ifmgd->associated->bssid,
> +					      WLAN_REASON_UNSPECIFIED,
> +					      true);
> +		sdata_unlock(sdata);
> +		return;
>  	}
>  	sdata_unlock(sdata);
>  }
> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
> index 8c3c01a..98567a3 100644
> --- a/net/mac80211/util.c
> +++ b/net/mac80211/util.c
> @@ -2567,9 +2567,12 @@ int ieee80211_reconfig(struct ieee80211_local *local)
>  	}
>  	mutex_unlock(&local->sta_mtx);
>  
> -	/* add back keys */
> -	list_for_each_entry(sdata, &local->interfaces, list)
> -		ieee80211_reenable_keys(sdata);
> +
> +	if (!(hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART)) {
> +		/* add back keys */
> +		list_for_each_entry(sdata, &local->interfaces, list)
> +			ieee80211_reenable_keys(sdata);
> +	}
>  
>  	/* Reconfigure sched scan if it was interrupted by FW restart */
>  	mutex_lock(&local->mtx);
> @@ -2643,6 +2646,19 @@ int ieee80211_reconfig(struct ieee80211_local *local)
>  					IEEE80211_QUEUE_STOP_REASON_SUSPEND,
>  					false);
>  
> +	if ((hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART) &&
> +	    !reconfig_due_to_wowlan) {
> +		list_for_each_entry(sdata, &local->interfaces, list) {
> +			if (!ieee80211_sdata_running(sdata))
> +				continue;
> +			if (sdata->vif.type == NL80211_IFTYPE_STATION) {
> +				sdata->flags |=
> +					IEEE80211_SDATA_DISCONNECT_HW_RESTART;
> +				ieee80211_sta_restart(sdata);

If CONFIG_PM=n:

ERROR: "ieee80211_sta_restart" [net/mac80211/mac80211.ko] undefined!

Guenter

> +			}
> +		}
> +	}
> +
>  	/*
>  	 * If this is for hw restart things are still running.
>  	 * We may want to change that later, however.
Johannes Berg Feb. 12, 2021, 8:37 a.m. UTC | #3
On Fri, 2021-02-05 at 13:51 -0800, Abhishek Kumar wrote:
> Since using DELBA frame to APs to re-establish BA session has a
> dependency on APs and also some APs may not honour the DELBA frame.


That's completely out of spec ... Can you say which AP this was?

You could also try sending a BAR that updates the SN.

johannes
Johannes Berg Feb. 12, 2021, 8:42 a.m. UTC | #4
On Tue, 2020-12-15 at 22:53 +0530, Youghandhar Chintala wrote:
> The right fix would be to pull the entire data path into the host

> +++ b/net/mac80211/ieee80211_i.h
> @@ -748,6 +748,8 @@ struct ieee80211_if_mesh {
>   *	back to wireless media and to the local net stack.
>   * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume.
>   * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver
> + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart
> + *	recovery

How did you model this on IEEE80211_SDATA_DISCONNECT_RESUME, but than
didn't check how that's actually used?

Please change it so that the two models are the same. You really don't
need the wiphy flag.

johannes
Johannes Berg Feb. 12, 2021, 8:44 a.m. UTC | #5
On Fri, 2021-02-12 at 09:42 +0100, Johannes Berg wrote:
> On Tue, 2020-12-15 at 22:53 +0530, Youghandhar Chintala wrote:
> > The right fix would be to pull the entire data path into the host
> > +++ b/net/mac80211/ieee80211_i.h
> > @@ -748,6 +748,8 @@ struct ieee80211_if_mesh {
> >   *	back to wireless media and to the local net stack.
> >   * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume.
> >   * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver
> > + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart
> > + *	recovery
> 
> How did you model this on IEEE80211_SDATA_DISCONNECT_RESUME, but than
> didn't check how that's actually used?
> 
> Please change it so that the two models are the same. You really don't
> need the wiphy flag.

In fact, you could even simply
generalize IEEE80211_SDATA_DISCONNECT_RESUME
and ieee80211_resume_disconnect() to _reconfig_ instead of _resume_, and
call it from the driver just before requesting HW restart.

johannes
diff mbox series

Patch

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index cde2e3f..8cbeb5f 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -748,6 +748,8 @@  struct ieee80211_if_mesh {
  *	back to wireless media and to the local net stack.
  * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume.
  * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver
+ * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart
+ *	recovery
  */
 enum ieee80211_sub_if_data_flags {
 	IEEE80211_SDATA_ALLMULTI		= BIT(0),
@@ -755,6 +757,7 @@  enum ieee80211_sub_if_data_flags {
 	IEEE80211_SDATA_DONT_BRIDGE_PACKETS	= BIT(3),
 	IEEE80211_SDATA_DISCONNECT_RESUME	= BIT(4),
 	IEEE80211_SDATA_IN_DRIVER		= BIT(5),
+	IEEE80211_SDATA_DISCONNECT_HW_RESTART	= BIT(6),
 };
 
 /**
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 6adfcb9..e4d0d16 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -4769,6 +4769,15 @@  void ieee80211_sta_restart(struct ieee80211_sub_if_data *sdata)
 					      true);
 		sdata_unlock(sdata);
 		return;
+	} else if (sdata->flags & IEEE80211_SDATA_DISCONNECT_HW_RESTART) {
+		sdata->flags &= ~IEEE80211_SDATA_DISCONNECT_HW_RESTART;
+		mlme_dbg(sdata, "driver requested disconnect after hardware restart\n");
+		ieee80211_sta_connection_lost(sdata,
+					      ifmgd->associated->bssid,
+					      WLAN_REASON_UNSPECIFIED,
+					      true);
+		sdata_unlock(sdata);
+		return;
 	}
 	sdata_unlock(sdata);
 }
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 8c3c01a..98567a3 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -2567,9 +2567,12 @@  int ieee80211_reconfig(struct ieee80211_local *local)
 	}
 	mutex_unlock(&local->sta_mtx);
 
-	/* add back keys */
-	list_for_each_entry(sdata, &local->interfaces, list)
-		ieee80211_reenable_keys(sdata);
+
+	if (!(hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART)) {
+		/* add back keys */
+		list_for_each_entry(sdata, &local->interfaces, list)
+			ieee80211_reenable_keys(sdata);
+	}
 
 	/* Reconfigure sched scan if it was interrupted by FW restart */
 	mutex_lock(&local->mtx);
@@ -2643,6 +2646,19 @@  int ieee80211_reconfig(struct ieee80211_local *local)
 					IEEE80211_QUEUE_STOP_REASON_SUSPEND,
 					false);
 
+	if ((hw->wiphy->flags & WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART) &&
+	    !reconfig_due_to_wowlan) {
+		list_for_each_entry(sdata, &local->interfaces, list) {
+			if (!ieee80211_sdata_running(sdata))
+				continue;
+			if (sdata->vif.type == NL80211_IFTYPE_STATION) {
+				sdata->flags |=
+					IEEE80211_SDATA_DISCONNECT_HW_RESTART;
+				ieee80211_sta_restart(sdata);
+			}
+		}
+	}
+
 	/*
 	 * If this is for hw restart things are still running.
 	 * We may want to change that later, however.