diff mbox series

qcom: apr: Make apr callbacks in non-atomic context

Message ID 20181115184904.27223-1-srinivas.kandagatla@linaro.org
State Accepted
Commit 1ac19ad799f880af58b9a8a4321334f6f6fc72e6
Headers show
Series qcom: apr: Make apr callbacks in non-atomic context | expand

Commit Message

Srinivas Kandagatla Nov. 15, 2018, 6:49 p.m. UTC
APR communication with DSP is not atomic in nature.
Its request-response type. Trying to pretend that these are atomic
and invoking apr client callbacks directly under atomic/irq context has
endless issues with soundcard. It makes more sense to convert these
to nonatomic calls. This also coverts all the dais to be nonatomic.

All the callbacks are now invoked as part of rx work queue.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>

---
 drivers/soc/qcom/apr.c | 74 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 69 insertions(+), 5 deletions(-)

-- 
2.19.1

Comments

Bjorn Andersson Jan. 31, 2019, 4:05 p.m. UTC | #1
On Thu 31 Jan 02:44 PST 2019, Srinivas Kandagatla wrote:

> 

> 

> On 31/01/2019 01:16, Bjorn Andersson wrote:

> > On Thu 15 Nov 10:49 PST 2018, Srinivas Kandagatla wrote:

> > 

> > > APR communication with DSP is not atomic in nature.

> > > Its request-response type. Trying to pretend that these are atomic

> > > and invoking apr client callbacks directly under atomic/irq context has

> > > endless issues with soundcard. It makes more sense to convert these

> > > to nonatomic calls. This also coverts all the dais to be nonatomic.

> > > 

> > Hi Srinivas,

> > 

> > Sorry for not looking at this before.

> > 

> NP, thanks for the review!

> 

> > Are you sure that you're meeting the latency requirements of low-latency

> > audio with this change?

> 

> Low and Ultra Low Latency audio is not supported in the exiting upstream

> qdsp drivers.

> 


Sure, but we want the design to allow for that still, either in future
upstream or by additional downstream code.

> Also it depends on definition of "latency", is the latency referring to

> "filling the data" or "latency between DSP command and response".

> 


I'm referring to the latency between the message from the DSP until we
give it a new buffer.

> For former case as long as we have more samples in our ring buffer there

> should be no latency in filling the data.

> For later case it should not really matter as long as former case is taken

> care off.

> 

> Low latency audio involves smaller sample sizes and no or minimal

> preprocessing in DSP so am guessing that we should be okay with responses in

> workqueue as long as we have good size ring buffer.

> 


Relying on more buffered data will increase the latency of the audio,
preventing you from doing really low-latency things.

Regards,
Bjorn
Bjorn Andersson Feb. 5, 2019, 6:35 p.m. UTC | #2
On Thu 31 Jan 09:33 PST 2019, Srinivas Kandagatla wrote:

> On 31/01/2019 16:05, Bjorn Andersson wrote:

> > Sure, but we want the design to allow for that still, either in future

> > upstream or by additional downstream code.

> > 

> Yes, I agree, I don't have solution for this ATM.

> It will be interesting to see how Intel handles this kind of usecase on

> there DSP.

> 

> The whole issue is the APR messaging is not really atomic in nature, it is

> basically request->response but the fact in existing code is that smd/glink

> callbacks run in interrupt context.

> 

> Trying to pretend that APR is atomic in nature is problem with audio.

> 

> As audio (dai-links) can be marked as atomic or non-atomic depending on

> which bus it links with, for example when it has to work with other buses

> like slimbus, soundwire, i2c whose transactions can sleep we mark the audio

> dai-link as non-atomic which means that the functions can sleep.

> In the above case, invoking any audio functions as part of the rpmsg

> callback is an issue.

> 

> The only solution I found to address this is handle the callbacks in

> workqueue.

> 


Okay, I think we should merge this and once we have the means of doing
low latency playback we can measure and worst case revisit this
decision.

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>


Regards,
Bjorn

> > > Also it depends on definition of "latency", is the latency referring to

> > > "filling the data" or "latency between DSP command and response".

> > > 

> > I'm referring to the latency between the message from the DSP until we

> > give it a new buffer.

> > 

> > > For former case as long as we have more samples in our ring buffer there

> > > should be no latency in filling the data.

> > > For later case it should not really matter as long as former case is taken

> > > care off.

> > > 

> > > Low latency audio involves smaller sample sizes and no or minimal

> > > preprocessing in DSP so am guessing that we should be okay with responses in

> > > workqueue as long as we have good size ring buffer.

> > > 

> > Relying on more buffered data will increase the latency of the audio,

> > preventing you from doing really low-latency things.

> My bad!.. Yes, in low latency case we would have very less buffers!

> 

> srini

>
diff mbox series

Patch

diff --git a/drivers/soc/qcom/apr.c b/drivers/soc/qcom/apr.c
index 74f8b9607daa..8cfa825fce81 100644
--- a/drivers/soc/qcom/apr.c
+++ b/drivers/soc/qcom/apr.c
@@ -8,6 +8,7 @@ 
 #include <linux/spinlock.h>
 #include <linux/idr.h>
 #include <linux/slab.h>
+#include <linux/workqueue.h>
 #include <linux/of_device.h>
 #include <linux/soc/qcom/apr.h>
 #include <linux/rpmsg.h>
@@ -17,8 +18,18 @@  struct apr {
 	struct rpmsg_endpoint *ch;
 	struct device *dev;
 	spinlock_t svcs_lock;
+	spinlock_t rx_lock;
 	struct idr svcs_idr;
 	int dest_domain_id;
+	struct workqueue_struct *rxwq;
+	struct work_struct rx_work;
+	struct list_head rx_list;
+};
+
+struct apr_rx_buf {
+	struct list_head node;
+	int len;
+	uint8_t buf[];
 };
 
 /**
@@ -62,11 +73,7 @@  static int apr_callback(struct rpmsg_device *rpdev, void *buf,
 				  int len, void *priv, u32 addr)
 {
 	struct apr *apr = dev_get_drvdata(&rpdev->dev);
-	uint16_t hdr_size, msg_type, ver, svc_id;
-	struct apr_device *svc = NULL;
-	struct apr_driver *adrv = NULL;
-	struct apr_resp_pkt resp;
-	struct apr_hdr *hdr;
+	struct apr_rx_buf *abuf;
 	unsigned long flags;
 
 	if (len <= APR_HDR_SIZE) {
@@ -75,6 +82,34 @@  static int apr_callback(struct rpmsg_device *rpdev, void *buf,
 		return -EINVAL;
 	}
 
+	abuf = kzalloc(sizeof(*abuf) + len, GFP_ATOMIC);
+	if (!abuf)
+		return -ENOMEM;
+
+	abuf->len = len;
+	memcpy(abuf->buf, buf, len);
+
+	spin_lock_irqsave(&apr->rx_lock, flags);
+	list_add_tail(&abuf->node, &apr->rx_list);
+	spin_unlock_irqrestore(&apr->rx_lock, flags);
+
+	queue_work(apr->rxwq, &apr->rx_work);
+
+	return 0;
+}
+
+
+static int apr_do_rx_callback(struct apr *apr, struct apr_rx_buf *abuf)
+{
+	uint16_t hdr_size, msg_type, ver, svc_id;
+	struct apr_device *svc = NULL;
+	struct apr_driver *adrv = NULL;
+	struct apr_resp_pkt resp;
+	struct apr_hdr *hdr;
+	unsigned long flags;
+	void *buf = abuf->buf;
+	int len = abuf->len;
+
 	hdr = buf;
 	ver = APR_HDR_FIELD_VER(hdr->hdr_field);
 	if (ver > APR_PKT_VER + 1)
@@ -132,6 +167,23 @@  static int apr_callback(struct rpmsg_device *rpdev, void *buf,
 	return 0;
 }
 
+static void apr_rxwq(struct work_struct *work)
+{
+	struct apr *apr = container_of(work, struct apr, rx_work);
+	struct apr_rx_buf *abuf, *b;
+	unsigned long flags;
+
+	if (!list_empty(&apr->rx_list)) {
+		list_for_each_entry_safe(abuf, b, &apr->rx_list, node) {
+			apr_do_rx_callback(apr, abuf);
+			spin_lock_irqsave(&apr->rx_lock, flags);
+			list_del(&abuf->node);
+			spin_unlock_irqrestore(&apr->rx_lock, flags);
+			kfree(abuf);
+		}
+	}
+}
+
 static int apr_device_match(struct device *dev, struct device_driver *drv)
 {
 	struct apr_device *adev = to_apr_device(dev);
@@ -285,6 +337,14 @@  static int apr_probe(struct rpmsg_device *rpdev)
 	dev_set_drvdata(dev, apr);
 	apr->ch = rpdev->ept;
 	apr->dev = dev;
+	apr->rxwq = create_singlethread_workqueue("qcom_apr_rx");
+	if (!apr->rxwq) {
+		dev_err(apr->dev, "Failed to start Rx WQ\n");
+		return -ENOMEM;
+	}
+	INIT_WORK(&apr->rx_work, apr_rxwq);
+	INIT_LIST_HEAD(&apr->rx_list);
+	spin_lock_init(&apr->rx_lock);
 	spin_lock_init(&apr->svcs_lock);
 	idr_init(&apr->svcs_idr);
 	of_register_apr_devices(dev);
@@ -303,6 +363,10 @@  static int apr_remove_device(struct device *dev, void *null)
 
 static void apr_remove(struct rpmsg_device *rpdev)
 {
+	struct apr *apr = dev_get_drvdata(&rpdev->dev);
+
+	flush_workqueue(apr->rxwq);
+	destroy_workqueue(apr->rxwq);
 	device_for_each_child(&rpdev->dev, NULL, apr_remove_device);
 }