Message ID | 20210812195035.2816276-5-elder@linaro.org |
---|---|
State | New |
Headers | show |
Series | net: ipa: last things before PM conversion | expand |
On Thu, 12 Aug 2021 14:50:33 -0500 Alex Elder wrote: > + /* The hardware must be powered for us to transmit */ > + dev = &ipa->pdev->dev; > + ret = pm_runtime_get(dev); > + if (ret < 1) { > + /* If a resume won't happen, just drop the packet */ > + if (ret < 0 && ret != -EINPROGRESS) { > + pm_runtime_put_noidle(dev); > + goto err_drop_skb; > + } This is racy, what if the pm work gets scheduled on another CPU and calls wake right here (i.e. before you call netif_stop_queue())? The queue may never get woken up? > + /* No power (yet). Stop the network stack from transmitting > + * until we're resumed; ipa_modem_resume() arranges for the > + * TX queue to be started again. > + */ > + netif_stop_queue(netdev); > + > + (void)pm_runtime_put(dev); > + > + return NETDEV_TX_BUSY;
On 8/13/21 7:46 PM, Jakub Kicinski wrote: > On Thu, 12 Aug 2021 14:50:33 -0500 Alex Elder wrote: >> + /* The hardware must be powered for us to transmit */ >> + dev = &ipa->pdev->dev; >> + ret = pm_runtime_get(dev); >> + if (ret < 1) { >> + /* If a resume won't happen, just drop the packet */ >> + if (ret < 0 && ret != -EINPROGRESS) { >> + pm_runtime_put_noidle(dev); >> + goto err_drop_skb; >> + } > > This is racy, what if the pm work gets scheduled on another CPU and > calls wake right here (i.e. before you call netif_stop_queue())? > The queue may never get woken up? I haven't been seeing this happen but I think you may be right. I did think about this race, but I think I was relying on the PM work queue to somehow avoid the problem. I need to think about this again after a good night's sleep. I might need to add an atomic flag or something. -Alex >> + /* No power (yet). Stop the network stack from transmitting >> + * until we're resumed; ipa_modem_resume() arranges for the >> + * TX queue to be started again. >> + */ >> + netif_stop_queue(netdev); >> + >> + (void)pm_runtime_put(dev); >> + >> + return NETDEV_TX_BUSY;
On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote: > > This is racy, what if the pm work gets scheduled on another CPU and > > calls wake right here (i.e. before you call netif_stop_queue())? > > The queue may never get woken up? > > I haven't been seeing this happen but I think you may be right. > > I did think about this race, but I think I was relying on the > PM work queue to somehow avoid the problem. I need to think > about this again after a good night's sleep. I might need > to add an atomic flag or something. Maybe add a spin lock? Seems like the whole wake up path will be expensive enough for a spin lock to be in the noise. You can always add complexity later.
On 8/16/21 9:15 AM, Jakub Kicinski wrote: > On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote: >>> This is racy, what if the pm work gets scheduled on another CPU and >>> calls wake right here (i.e. before you call netif_stop_queue())? >>> The queue may never get woken up? >> >> I haven't been seeing this happen but I think you may be right. >> >> I did think about this race, but I think I was relying on the >> PM work queue to somehow avoid the problem. I need to think >> about this again after a good night's sleep. I might need >> to add an atomic flag or something. > > Maybe add a spin lock? Seems like the whole wake up path will be > expensive enough for a spin lock to be in the noise. You can always > add complexity later. Exactly what I just decided after trying to work out a clever way without using a spinlock... I'll be sending out a fix today. Thanks. -Alex
On 8/16/21 9:20 AM, Alex Elder wrote: > On 8/16/21 9:15 AM, Jakub Kicinski wrote: >> On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote: >>>> This is racy, what if the pm work gets scheduled on another CPU and >>>> calls wake right here (i.e. before you call netif_stop_queue())? >>>> The queue may never get woken up? >>> >>> I haven't been seeing this happen but I think you may be right. >>> >>> I did think about this race, but I think I was relying on the >>> PM work queue to somehow avoid the problem. I need to think >>> about this again after a good night's sleep. I might need >>> to add an atomic flag or something. >> >> Maybe add a spin lock? Seems like the whole wake up path will be >> expensive enough for a spin lock to be in the noise. You can always >> add complexity later. > > Exactly what I just decided after trying to work out a > clever way without using a spinlock... I'll be sending > out a fix today. Thanks. I'm finding this isn't an easy problem to solve (or even think about). While I ponder the best course of action I'm going to send out another series (i.e., *before* I send a fix for this issue) because I'd like to get everything I have out for review this week. I *will* address this potential race one way or another, possibly later today. -Alex > > -Alex >
On Mon, 16 Aug 2021 12:56:40 -0500 Alex Elder wrote: > I'm finding this isn't an easy problem to solve (or even think > about). While I ponder the best course of action I'm going > to send out another series (i.e., *before* I send a fix for > this issue) because I'd like to get everything I have out > for review this week. I *will* address this potential race > one way or another, possibly later today. Alright, thanks for the heads up.
diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c index 0a3b034614b61..aa1b483d9f7db 100644 --- a/drivers/net/ipa/ipa_modem.c +++ b/drivers/net/ipa/ipa_modem.c @@ -106,6 +106,7 @@ static int ipa_start_xmit(struct sk_buff *skb, struct net_device *netdev) struct ipa_endpoint *endpoint; struct ipa *ipa = priv->ipa; u32 skb_len = skb->len; + struct device *dev; int ret; if (!skb_len) @@ -115,7 +116,31 @@ static int ipa_start_xmit(struct sk_buff *skb, struct net_device *netdev) if (endpoint->data->qmap && skb->protocol != htons(ETH_P_MAP)) goto err_drop_skb; + /* The hardware must be powered for us to transmit */ + dev = &ipa->pdev->dev; + ret = pm_runtime_get(dev); + if (ret < 1) { + /* If a resume won't happen, just drop the packet */ + if (ret < 0 && ret != -EINPROGRESS) { + pm_runtime_put_noidle(dev); + goto err_drop_skb; + } + + /* No power (yet). Stop the network stack from transmitting + * until we're resumed; ipa_modem_resume() arranges for the + * TX queue to be started again. + */ + netif_stop_queue(netdev); + + (void)pm_runtime_put(dev); + + return NETDEV_TX_BUSY; + } + ret = ipa_endpoint_skb_tx(endpoint, skb); + + (void)pm_runtime_put(dev); + if (ret) { if (ret != -E2BIG) return NETDEV_TX_BUSY; @@ -201,7 +226,10 @@ void ipa_modem_suspend(struct net_device *netdev) * * Re-enable transmit on the modem network device. This is called * in (power management) work queue context, scheduled when resuming - * the modem. + * the modem. We can't enable the queue directly in ipa_modem_resume() + * because transmits restart the instant the queue is awakened; but the + * device power state won't be ACTIVE until *after* ipa_modem_resume() + * returns. */ static void ipa_modem_wake_queue_work(struct work_struct *work) {
We need to ensure the hardware is powered when we transmit a packet. But if it's not, we can't block to wait for it. So asynchronously request power in ipa_start_xmit(), and only proceed if the return value indicates the power state is active. If the hardware is not active, a runtime resume request will have been initiated. In that case, stop the network stack from further transmit attempts until the resume completes. Return NETDEV_TX_BUSY, to retry sending the packet once the queue is restarted. If the power request returns an error (other than -EINPROGRESS, which just means a resume requested elsewhere isn't complete), just drop the packet. Signed-off-by: Alex Elder <elder@linaro.org> --- drivers/net/ipa/ipa_modem.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) -- 2.27.0