diff mbox series

[net-next,4/6] net: ipa: ensure hardware has power in ipa_start_xmit()

Message ID 20210812195035.2816276-5-elder@linaro.org
State New
Headers show
Series net: ipa: last things before PM conversion | expand

Commit Message

Alex Elder Aug. 12, 2021, 7:50 p.m. UTC
We need to ensure the hardware is powered when we transmit a packet.
But if it's not, we can't block to wait for it.  So asynchronously
request power in ipa_start_xmit(), and only proceed if the return
value indicates the power state is active.

If the hardware is not active, a runtime resume request will have
been initiated.  In that case, stop the network stack from further
transmit attempts until the resume completes.  Return NETDEV_TX_BUSY,
to retry sending the packet once the queue is restarted.

If the power request returns an error (other than -EINPROGRESS,
which just means a resume requested elsewhere isn't complete), just
drop the packet.

Signed-off-by: Alex Elder <elder@linaro.org>

---
 drivers/net/ipa/ipa_modem.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

-- 
2.27.0

Comments

Jakub Kicinski Aug. 14, 2021, 12:46 a.m. UTC | #1
On Thu, 12 Aug 2021 14:50:33 -0500 Alex Elder wrote:
> +	/* The hardware must be powered for us to transmit */

> +	dev = &ipa->pdev->dev;

> +	ret = pm_runtime_get(dev);

> +	if (ret < 1) {

> +		/* If a resume won't happen, just drop the packet */

> +		if (ret < 0 && ret != -EINPROGRESS) {

> +			pm_runtime_put_noidle(dev);

> +			goto err_drop_skb;

> +		}


This is racy, what if the pm work gets scheduled on another CPU and
calls wake right here (i.e. before you call netif_stop_queue())?
The queue may never get woken up?

> +		/* No power (yet).  Stop the network stack from transmitting

> +		 * until we're resumed; ipa_modem_resume() arranges for the

> +		 * TX queue to be started again.

> +		 */

> +		netif_stop_queue(netdev);

> +

> +		(void)pm_runtime_put(dev);

> +

> +		return NETDEV_TX_BUSY;
Alex Elder Aug. 14, 2021, 2:25 a.m. UTC | #2
On 8/13/21 7:46 PM, Jakub Kicinski wrote:
> On Thu, 12 Aug 2021 14:50:33 -0500 Alex Elder wrote:

>> +	/* The hardware must be powered for us to transmit */

>> +	dev = &ipa->pdev->dev;

>> +	ret = pm_runtime_get(dev);

>> +	if (ret < 1) {

>> +		/* If a resume won't happen, just drop the packet */

>> +		if (ret < 0 && ret != -EINPROGRESS) {

>> +			pm_runtime_put_noidle(dev);

>> +			goto err_drop_skb;

>> +		}

> 

> This is racy, what if the pm work gets scheduled on another CPU and

> calls wake right here (i.e. before you call netif_stop_queue())?

> The queue may never get woken up?


I haven't been seeing this happen but I think you may be right.

I did think about this race, but I think I was relying on the
PM work queue to somehow avoid the problem.  I need to think
about this again after a good night's sleep.  I might need
to add an atomic flag or something.

					-Alex

>> +		/* No power (yet).  Stop the network stack from transmitting

>> +		 * until we're resumed; ipa_modem_resume() arranges for the

>> +		 * TX queue to be started again.

>> +		 */

>> +		netif_stop_queue(netdev);

>> +

>> +		(void)pm_runtime_put(dev);

>> +

>> +		return NETDEV_TX_BUSY;
Jakub Kicinski Aug. 16, 2021, 2:15 p.m. UTC | #3
On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote:
> > This is racy, what if the pm work gets scheduled on another CPU and

> > calls wake right here (i.e. before you call netif_stop_queue())?

> > The queue may never get woken up?  

> 

> I haven't been seeing this happen but I think you may be right.

> 

> I did think about this race, but I think I was relying on the

> PM work queue to somehow avoid the problem.  I need to think

> about this again after a good night's sleep.  I might need

> to add an atomic flag or something.


Maybe add a spin lock?  Seems like the whole wake up path will be
expensive enough for a spin lock to be in the noise. You can always 
add complexity later.
Alex Elder Aug. 16, 2021, 2:20 p.m. UTC | #4
On 8/16/21 9:15 AM, Jakub Kicinski wrote:
> On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote:

>>> This is racy, what if the pm work gets scheduled on another CPU and

>>> calls wake right here (i.e. before you call netif_stop_queue())?

>>> The queue may never get woken up?

>>

>> I haven't been seeing this happen but I think you may be right.

>>

>> I did think about this race, but I think I was relying on the

>> PM work queue to somehow avoid the problem.  I need to think

>> about this again after a good night's sleep.  I might need

>> to add an atomic flag or something.

> 

> Maybe add a spin lock?  Seems like the whole wake up path will be

> expensive enough for a spin lock to be in the noise. You can always

> add complexity later.


Exactly what I just decided after trying to work out a
clever way without using a spinlock...  I'll be sending
out a fix today.  Thanks.

					-Alex
Alex Elder Aug. 16, 2021, 5:56 p.m. UTC | #5
On 8/16/21 9:20 AM, Alex Elder wrote:
> On 8/16/21 9:15 AM, Jakub Kicinski wrote:

>> On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote:

>>>> This is racy, what if the pm work gets scheduled on another CPU and

>>>> calls wake right here (i.e. before you call netif_stop_queue())?

>>>> The queue may never get woken up?

>>>

>>> I haven't been seeing this happen but I think you may be right.

>>>

>>> I did think about this race, but I think I was relying on the

>>> PM work queue to somehow avoid the problem.  I need to think

>>> about this again after a good night's sleep.  I might need

>>> to add an atomic flag or something.

>>

>> Maybe add a spin lock?  Seems like the whole wake up path will be

>> expensive enough for a spin lock to be in the noise. You can always

>> add complexity later.

> 

> Exactly what I just decided after trying to work out a

> clever way without using a spinlock...  I'll be sending

> out a fix today.  Thanks.


I'm finding this isn't an easy problem to solve (or even think
about).  While I ponder the best course of action I'm going
to send out another series (i.e., *before* I send a fix for
this issue) because I'd like to get everything I have out
for review this week.  I *will* address this potential race
one way or another, possibly later today.

					-Alex

> 

>                      -Alex

>
Jakub Kicinski Aug. 16, 2021, 8:19 p.m. UTC | #6
On Mon, 16 Aug 2021 12:56:40 -0500 Alex Elder wrote:
> I'm finding this isn't an easy problem to solve (or even think

> about).  While I ponder the best course of action I'm going

> to send out another series (i.e., *before* I send a fix for

> this issue) because I'd like to get everything I have out

> for review this week.  I *will* address this potential race

> one way or another, possibly later today.


Alright, thanks for the heads up.
diff mbox series

Patch

diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
index 0a3b034614b61..aa1b483d9f7db 100644
--- a/drivers/net/ipa/ipa_modem.c
+++ b/drivers/net/ipa/ipa_modem.c
@@ -106,6 +106,7 @@  static int ipa_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	struct ipa_endpoint *endpoint;
 	struct ipa *ipa = priv->ipa;
 	u32 skb_len = skb->len;
+	struct device *dev;
 	int ret;
 
 	if (!skb_len)
@@ -115,7 +116,31 @@  static int ipa_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	if (endpoint->data->qmap && skb->protocol != htons(ETH_P_MAP))
 		goto err_drop_skb;
 
+	/* The hardware must be powered for us to transmit */
+	dev = &ipa->pdev->dev;
+	ret = pm_runtime_get(dev);
+	if (ret < 1) {
+		/* If a resume won't happen, just drop the packet */
+		if (ret < 0 && ret != -EINPROGRESS) {
+			pm_runtime_put_noidle(dev);
+			goto err_drop_skb;
+		}
+
+		/* No power (yet).  Stop the network stack from transmitting
+		 * until we're resumed; ipa_modem_resume() arranges for the
+		 * TX queue to be started again.
+		 */
+		netif_stop_queue(netdev);
+
+		(void)pm_runtime_put(dev);
+
+		return NETDEV_TX_BUSY;
+	}
+
 	ret = ipa_endpoint_skb_tx(endpoint, skb);
+
+	(void)pm_runtime_put(dev);
+
 	if (ret) {
 		if (ret != -E2BIG)
 			return NETDEV_TX_BUSY;
@@ -201,7 +226,10 @@  void ipa_modem_suspend(struct net_device *netdev)
  *
  * Re-enable transmit on the modem network device.  This is called
  * in (power management) work queue context, scheduled when resuming
- * the modem.
+ * the modem.  We can't enable the queue directly in ipa_modem_resume()
+ * because transmits restart the instant the queue is awakened; but the
+ * device power state won't be ACTIVE until *after* ipa_modem_resume()
+ * returns.
  */
 static void ipa_modem_wake_queue_work(struct work_struct *work)
 {