diff mbox series

usb: typec: ucsi: Clear pending after acking connector change

Message ID 20210516040953.622409-1-bjorn.andersson@linaro.org
State New
Headers show
Series usb: typec: ucsi: Clear pending after acking connector change | expand

Commit Message

Bjorn Andersson May 16, 2021, 4:09 a.m. UTC
It's possible that the interrupt handler for the UCSI driver signals a
connector changes after the handler clears the PENDING bit, but before
it has sent the acknowledge request. The result is that the handler is
invoked yet again, to ack the same connector change.

At least some versions of the Qualcomm UCSI firmware will not handle the
second - "spurious" - acknowledgment gracefully. So make sure to not
clear the pending flag until the change is acknowledged.

Any connector changes coming in after the acknowledgment, that would
have the pending flag incorrectly cleared, would afaict be covered by
the subsequent connector status check.

Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

---
 drivers/usb/typec/ucsi/ucsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.29.2

Comments

Heikki Krogerus May 17, 2021, 10:03 a.m. UTC | #1
On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote:
> It's possible that the interrupt handler for the UCSI driver signals a

> connector changes after the handler clears the PENDING bit, but before

> it has sent the acknowledge request. The result is that the handler is

> invoked yet again, to ack the same connector change.

> 

> At least some versions of the Qualcomm UCSI firmware will not handle the

> second - "spurious" - acknowledgment gracefully. So make sure to not

> clear the pending flag until the change is acknowledged.

> 

> Any connector changes coming in after the acknowledgment, that would

> have the pending flag incorrectly cleared, would afaict be covered by

> the subsequent connector status check.

> 

> Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information")

> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>


I'm OK with this if Bejamin does not see any problems with it. I'll
wait for his comments before giving my reviewed-by tag.

That workaround (commit 217504a05532) is unfortunately too fragile.
I'm going to now separate the processing of the connector state from
the event handler (interrupt handler). That way we should be fairly
sure we don't loose any of the connector states even if an event is
generated while we are still in the middle of processing the previous
one(s), and at the same time be sure that we also don't confuse the
firmware.

So the event handler shall after that only read the connector status,
schedule the unique job where it's processed and ACK the event.
Nothing else.

> ---

>  drivers/usb/typec/ucsi/ucsi.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c

> index 282c3c825c13..f451ce0132a9 100644

> --- a/drivers/usb/typec/ucsi/ucsi.c

> +++ b/drivers/usb/typec/ucsi/ucsi.c

> @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct work_struct *work)

>  	ucsi_send_command(con->ucsi, command, NULL, 0);

>  

>  	/* 3. ACK connector change */

> -	clear_bit(EVENT_PENDING, &ucsi->flags);

>  	ret = ucsi_acknowledge_connector_change(ucsi);

> +	clear_bit(EVENT_PENDING, &ucsi->flags);

>  	if (ret) {

>  		dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);

>  		goto out_unlock;

> -- 

> 2.29.2


thanks,

-- 
heikki
Heikki Krogerus May 17, 2021, 12:15 p.m. UTC | #2
Hi,

On Mon, May 17, 2021 at 01:03:12PM +0300, Heikki Krogerus wrote:
> On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote:

> > It's possible that the interrupt handler for the UCSI driver signals a

> > connector changes after the handler clears the PENDING bit, but before

> > it has sent the acknowledge request. The result is that the handler is

> > invoked yet again, to ack the same connector change.

> > 

> > At least some versions of the Qualcomm UCSI firmware will not handle the

> > second - "spurious" - acknowledgment gracefully. So make sure to not

> > clear the pending flag until the change is acknowledged.

> > 

> > Any connector changes coming in after the acknowledgment, that would

> > have the pending flag incorrectly cleared, would afaict be covered by

> > the subsequent connector status check.

> > 

> > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information")

> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> 

> I'm OK with this if Bejamin does not see any problems with it. I'll

> wait for his comments before giving my reviewed-by tag.

> 

> That workaround (commit 217504a05532) is unfortunately too fragile.

> I'm going to now separate the processing of the connector state from

> the event handler (interrupt handler). That way we should be fairly

> sure we don't loose any of the connector states even if an event is

> generated while we are still in the middle of processing the previous

> one(s), and at the same time be sure that we also don't confuse the

> firmware.

> 

> So the event handler shall after that only read the connector status,

> schedule the unique job where it's processed and ACK the event.

> Nothing else.


Seems to be straightforward to implement. I'm attaching the patch I
made for that. I think it should actually also remove the problem you
are seeing. Can you test it?

thanks,

-- 
heikki
From d3d2e0cbd7a278e235050294af25259aabb2c7c2 Mon Sep 17 00:00:00 2001
From: Heikki Krogerus <heikki.krogerus@linux.intel.com>

Date: Mon, 17 May 2021 14:58:22 +0300
Subject: [PATCH] usb: typec: ucsi: Process every connector change as unique
 connector state

This will change the Connector Change event handler function
so that it will only read the connector status and store it
as a unique state, queue a job where it's actually
processed, and then acknowledge the event immediately. That
routine will not do anything else from now on.

That will make sure we don't loose any of the reported
connector states even if they are reported while the driver
is still processing the previous ones.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>

---
 drivers/usb/typec/ucsi/ucsi.c | 197 ++++++++++++++--------------------
 drivers/usb/typec/ucsi/ucsi.h |   3 +-
 2 files changed, 82 insertions(+), 118 deletions(-)

diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index 1d8b7df59ff49..1dac2e6325870 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -49,11 +49,16 @@ static int ucsi_acknowledge_command(struct ucsi *ucsi)
 static int ucsi_acknowledge_connector_change(struct ucsi *ucsi)
 {
 	u64 ctrl;
+	int ret;
 
 	ctrl = UCSI_ACK_CC_CI;
 	ctrl |= UCSI_ACK_CONNECTOR_CHANGE;
 
-	return ucsi->ops->sync_write(ucsi, UCSI_CONTROL, &ctrl, sizeof(ctrl));
+	mutex_lock(&ucsi->ppm_lock);
+	ret = ucsi->ops->sync_write(ucsi, UCSI_CONTROL, &ctrl, sizeof(ctrl));
+	mutex_unlock(&ucsi->ppm_lock);
+
+	return ret;
 }
 
 static int ucsi_exec_command(struct ucsi *ucsi, u64 command);
@@ -656,118 +661,26 @@ static void ucsi_partner_change(struct ucsi_connector *con)
 		ucsi_altmode_update_active(con);
 }
 
-static void ucsi_handle_connector_change(struct work_struct *work)
+struct ucsi_con_event {
+	struct work_struct work;
+	struct ucsi_connector *con;
+	struct ucsi_connector_status status;
+};
+
+static void ucsi_connector_work(struct work_struct *work)
 {
-	struct ucsi_connector *con = container_of(work, struct ucsi_connector,
-						  work);
+	struct ucsi_con_event *event = container_of(work, struct ucsi_con_event, work);
+	struct ucsi_connector *con = event->con;
 	struct ucsi *ucsi = con->ucsi;
-	struct ucsi_connector_status pre_ack_status;
-	struct ucsi_connector_status post_ack_status;
 	enum typec_role role;
 	enum usb_role u_role = USB_ROLE_NONE;
-	u16 inferred_changes;
-	u16 changed_flags;
-	u64 command;
 	int ret;
 
 	mutex_lock(&con->lock);
 
-	/*
-	 * Some/many PPMs have an issue where all fields in the change bitfield
-	 * are cleared when an ACK is send. This will causes any change
-	 * between GET_CONNECTOR_STATUS and ACK to be lost.
-	 *
-	 * We work around this by re-fetching the connector status afterwards.
-	 * We then infer any changes that we see have happened but that may not
-	 * be represented in the change bitfield.
-	 *
-	 * Also, even though we don't need to know the currently supported alt
-	 * modes, we run the GET_CAM_SUPPORTED command to ensure the PPM does
-	 * not get stuck in case it assumes we do.
-	 * Always do this, rather than relying on UCSI_CONSTAT_CAM_CHANGE to be
-	 * set in the change bitfield.
-	 *
-	 * We end up with the following actions:
-	 *  1. UCSI_GET_CONNECTOR_STATUS, store result, update unprocessed_changes
-	 *  2. UCSI_GET_CAM_SUPPORTED, discard result
-	 *  3. ACK connector change
-	 *  4. UCSI_GET_CONNECTOR_STATUS, store result
-	 *  5. Infere lost changes by comparing UCSI_GET_CONNECTOR_STATUS results
-	 *  6. If PPM reported a new change, then restart in order to ACK
-	 *  7. Process everything as usual.
-	 *
-	 * We may end up seeing a change twice, but we can only miss extremely
-	 * short transitional changes.
-	 */
-
-	/* 1. First UCSI_GET_CONNECTOR_STATUS */
-	command = UCSI_GET_CONNECTOR_STATUS | UCSI_CONNECTOR_NUMBER(con->num);
-	ret = ucsi_send_command(ucsi, command, &pre_ack_status,
-				sizeof(pre_ack_status));
-	if (ret < 0) {
-		dev_err(ucsi->dev, "%s: GET_CONNECTOR_STATUS failed (%d)\n",
-			__func__, ret);
-		goto out_unlock;
-	}
-	con->unprocessed_changes |= pre_ack_status.change;
-
-	/* 2. Run UCSI_GET_CAM_SUPPORTED and discard the result. */
-	command = UCSI_GET_CAM_SUPPORTED;
-	command |= UCSI_CONNECTOR_NUMBER(con->num);
-	ucsi_send_command(con->ucsi, command, NULL, 0);
-
-	/* 3. ACK connector change */
-	clear_bit(EVENT_PENDING, &ucsi->flags);
-	ret = ucsi_acknowledge_connector_change(ucsi);
-	if (ret) {
-		dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);
-		goto out_unlock;
-	}
-
-	/* 4. Second UCSI_GET_CONNECTOR_STATUS */
-	command = UCSI_GET_CONNECTOR_STATUS | UCSI_CONNECTOR_NUMBER(con->num);
-	ret = ucsi_send_command(ucsi, command, &post_ack_status,
-				sizeof(post_ack_status));
-	if (ret < 0) {
-		dev_err(ucsi->dev, "%s: GET_CONNECTOR_STATUS failed (%d)\n",
-			__func__, ret);
-		goto out_unlock;
-	}
-
-	/* 5. Inferre any missing changes */
-	changed_flags = pre_ack_status.flags ^ post_ack_status.flags;
-	inferred_changes = 0;
-	if (UCSI_CONSTAT_PWR_OPMODE(changed_flags) != 0)
-		inferred_changes |= UCSI_CONSTAT_POWER_OPMODE_CHANGE;
-
-	if (changed_flags & UCSI_CONSTAT_CONNECTED)
-		inferred_changes |= UCSI_CONSTAT_CONNECT_CHANGE;
-
-	if (changed_flags & UCSI_CONSTAT_PWR_DIR)
-		inferred_changes |= UCSI_CONSTAT_POWER_DIR_CHANGE;
-
-	if (UCSI_CONSTAT_PARTNER_FLAGS(changed_flags) != 0)
-		inferred_changes |= UCSI_CONSTAT_PARTNER_CHANGE;
-
-	if (UCSI_CONSTAT_PARTNER_TYPE(changed_flags) != 0)
-		inferred_changes |= UCSI_CONSTAT_PARTNER_CHANGE;
-
-	/* Mask out anything that was correctly notified in the later call. */
-	inferred_changes &= ~post_ack_status.change;
-	if (inferred_changes)
-		dev_dbg(ucsi->dev, "%s: Inferred changes that would have been lost: 0x%04x\n",
-			__func__, inferred_changes);
-
-	con->unprocessed_changes |= inferred_changes;
-
-	/* 6. If PPM reported a new change, then restart in order to ACK */
-	if (post_ack_status.change)
-		goto out_unlock;
-
-	/* 7. Continue as if nothing happened */
-	con->status = post_ack_status;
-	con->status.change = con->unprocessed_changes;
-	con->unprocessed_changes = 0;
+	trace_ucsi_connector_change(con->num, &event->status);
+	con->status = event->status;
+	kfree(event);
 
 	role = !!(con->status.flags & UCSI_CONSTAT_PWR_DIR);
 
@@ -825,17 +738,49 @@ static void ucsi_handle_connector_change(struct work_struct *work)
 	if (con->status.change & UCSI_CONSTAT_PARTNER_CHANGE)
 		ucsi_partner_change(con);
 
-	trace_ucsi_connector_change(con->num, &con->status);
+	mutex_unlock(&con->lock);
+}
 
-out_unlock:
-	if (test_and_clear_bit(EVENT_PENDING, &ucsi->flags)) {
-		schedule_work(&con->work);
-		mutex_unlock(&con->lock);
-		return;
+/*
+ * We can not read the connector status in ucsi_connector_change() function
+ * below because there may be already a command pending. This work is scheduled
+ * separately only because of that.
+ *
+ * This function must finish fast so we do not loose the next events. Every
+ * event will have a separate job queued for it in the connector specific
+ * workqueue. That way the next event can be generated safely before the
+ * previous ones are fully processed.
+ */
+static void ucsi_handle_connector_change(struct work_struct *work)
+{
+	struct ucsi_connector *con = container_of(work, struct ucsi_connector, work);
+	struct ucsi_connector_status status;
+	struct ucsi_con_event *event;
+	u64 command;
+	int ret;
+
+	command = UCSI_GET_CONNECTOR_STATUS | UCSI_CONNECTOR_NUMBER(con->num);
+	ret = ucsi_send_command(con->ucsi, command, &status, sizeof(status));
+	if (ret < 0) {
+		dev_err(con->ucsi->dev, "GET_CONNECTOR_STATUS failed (%d)\n", ret);
+		goto out_ack;
 	}
 
-	clear_bit(EVENT_PROCESSING, &ucsi->flags);
-	mutex_unlock(&con->lock);
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		goto out_ack;
+
+	INIT_WORK(&event->work, ucsi_connector_work);
+	event->status = status;
+	event->con = con;
+	queue_work(con->wq, &event->work);
+
+out_ack:
+	clear_bit(EVENT_PENDING, &con->ucsi->flags);
+
+	ret = ucsi_acknowledge_connector_change(con->ucsi);
+	if (ret)
+		dev_err(con->ucsi->dev, "%s: ACK failed (%d)", __func__, ret);
 }
 
 /**
@@ -852,10 +797,10 @@ void ucsi_connector_change(struct ucsi *ucsi, u8 num)
 		return;
 	}
 
-	set_bit(EVENT_PENDING, &ucsi->flags);
+	if (test_and_set_bit(EVENT_PENDING, &ucsi->flags))
+		return;
 
-	if (!test_and_set_bit(EVENT_PROCESSING, &ucsi->flags))
-		schedule_work(&con->work);
+	schedule_work(&con->work);
 }
 EXPORT_SYMBOL_GPL(ucsi_connector_change);
 
@@ -1041,8 +986,18 @@ static int ucsi_register_port(struct ucsi *ucsi, int index)
 	enum typec_accessory *accessory = cap->accessory;
 	enum usb_role u_role = USB_ROLE_NONE;
 	u64 command;
+	char *name;
 	int ret;
 
+	name = kasprintf(GFP_KERNEL, "%s-port%d", dev_name(ucsi->dev), index);
+	if (!name)
+		return -ENOMEM;
+
+	con->wq = create_singlethread_workqueue(name);
+	kfree(name);
+	if (!con->wq)
+		return -ENOMEM;
+
 	INIT_WORK(&con->work, ucsi_handle_connector_change);
 	init_completion(&con->complete);
 	mutex_init(&con->lock);
@@ -1178,6 +1133,12 @@ static int ucsi_register_port(struct ucsi *ucsi, int index)
 	fwnode_handle_put(cap->fwnode);
 out_unlock:
 	mutex_unlock(&con->lock);
+
+	if (ret && con->wq) {
+		destroy_workqueue(con->wq);
+		con->wq = NULL;
+	}
+
 	return ret;
 }
 
@@ -1248,6 +1209,8 @@ static int ucsi_init(struct ucsi *ucsi)
 		ucsi_unregister_partner(con);
 		ucsi_unregister_altmodes(con, UCSI_RECIPIENT_CON);
 		ucsi_unregister_port_psy(con);
+		if (con->wq)
+			destroy_workqueue(con->wq);
 		typec_unregister_port(con->port);
 		con->port = NULL;
 	}
@@ -1369,6 +1332,8 @@ void ucsi_unregister(struct ucsi *ucsi)
 		ucsi_unregister_altmodes(&ucsi->connector[i],
 					 UCSI_RECIPIENT_CON);
 		ucsi_unregister_port_psy(&ucsi->connector[i]);
+		if (ucsi->connector[i].wq)
+			destroy_workqueue(ucsi->connector[i].wq);
 		typec_unregister_port(ucsi->connector[i].port);
 	}
 
diff --git a/drivers/usb/typec/ucsi/ucsi.h b/drivers/usb/typec/ucsi/ucsi.h
index cee666790907e..280f1e1bda2c9 100644
--- a/drivers/usb/typec/ucsi/ucsi.h
+++ b/drivers/usb/typec/ucsi/ucsi.h
@@ -300,7 +300,6 @@ struct ucsi {
 #define EVENT_PENDING	0
 #define COMMAND_PENDING	1
 #define ACK_PENDING	2
-#define EVENT_PROCESSING	3
 };
 
 #define UCSI_MAX_SVID		5
@@ -317,6 +316,7 @@ struct ucsi_connector {
 	struct mutex lock; /* port lock */
 	struct work_struct work;
 	struct completion complete;
+	struct workqueue_struct *wq;
 
 	struct typec_port *port;
 	struct typec_partner *partner;
@@ -326,7 +326,6 @@ struct ucsi_connector {
 
 	struct typec_capability typec_cap;
 
-	u16 unprocessed_changes;
 	struct ucsi_connector_status status;
 	struct ucsi_connector_capability cap;
 	struct power_supply *psy;
-- 
2.30.2
Benjamin Berg May 17, 2021, 12:36 p.m. UTC | #3
On Mon, 2021-05-17 at 13:03 +0300, Heikki Krogerus wrote:
> On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote:

> > It's possible that the interrupt handler for the UCSI driver

> > signals a

> > connector changes after the handler clears the PENDING bit, but

> > before

> > it has sent the acknowledge request. The result is that the handler

> > is

> > invoked yet again, to ack the same connector change.

> > 

> > At least some versions of the Qualcomm UCSI firmware will not

> > handle the

> > second - "spurious" - acknowledgment gracefully. So make sure to

> > not

> > clear the pending flag untuntil the changeil the change is

> > acknowledged.

> > 

> > Any connector changes coming in after the acknowledgment, that

> > would

> > have the pending flag incorrectly cleared, would afaict be covered

> > by

> > the subsequent connector status check.

> > 

> > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing

> > change information")

> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> 

> I'm OK with this if Bejamin does not see any problems with it. I'll

> wait for his comments before giving my reviewed-by tag.


Interesting problem. I wonder if we can detect this situation and
restart the worker instead of sending the ACK.

As for the patch, I do believe it is safe. I agree with the assessment
in the commit message, that the subsequent connector check will detect
any race condition and recovers from it.

Acked-By: Benjamin Berg <bberg@redhat.com>


Benjamin

> That workaround (commit 217504a05532) is unfortunately too fragile.

> I'm going to now separate the processing of the connector state from

> the event handler (interrupt handler). That way we should be fairly

> sure we don't loose any of the connector states even if an event is

> generated while we are still in the middle of processing the previous

> one(s), and at the same time be sure that we also don't confuse the

> firmware.

> So the event handler shall after that only read the connector status,

> schedule the unique job where it's processed and ACK the event.

> Nothing else.

> 

> > ---

> >  drivers/usb/typec/ucsi/ucsi.c | 2 +-

> >  1 file changed, 1 insertion(+), 1 deletion(-)

> > 

> > diff --git a/drivers/usb/typec/ucsi/ucsi.c

> > b/drivers/usb/typec/ucsi/ucsi.c

> > index 282c3c825c13..f451ce0132a9 100644

> > --- a/drivers/usb/typec/ucsi/ucsi.c

> > +++ b/drivers/usb/typec/ucsi/ucsi.c

> > @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct

> > work_struct *work)

> >         ucsi_send_command(con->ucsi, command, NULL, 0);

> >  

> >         /* 3. ACK connector change */

> > -       clear_bit(EVENT_PENDING, &ucsi->flags);

> >         ret = ucsi_acknowledge_connector_change(ucsi);

> > +       clear_bit(EVENT_PENDING, &ucsi->flags);

> >         if (ret) {

> >                 dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__,

> > ret);

> >                 goto out_unlock;

> > -- 

> > 2.29.2

> 

> thanks,

>
Benjamin Berg May 17, 2021, 12:57 p.m. UTC | #4
Hi Heikki,


On Mon, 2021-05-17 at 15:15 +0300, Heikki Krogerus wrote:
> On Mon, May 17, 2021 at 01:03:12PM +0300, Heikki Krogerus wrote:

> > On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote:

> > > It's possible that the interrupt handler for the UCSI driver

> > > signals a

> > > connector changes after the handler clears the PENDING bit, but

> > > before

> > > it has sent the acknowledge request. The result is that the handler

> > > is

> > > invoked yet again, to ack the same connector change.

> > > 

> > > At least some versions of the Qualcomm UCSI firmware will not

> > > handle the

> > > second - "spurious" - acknowledgment gracefully. So make sure to

> > > not

> > > clear the pending flag until the change is acknowledged.

> > > 

> > > Any connector changes coming in after the acknowledgment, that

> > > would

> > > have the pending flag incorrectly cleared, would afaict be covered

> > > by

> > > the subsequent connector status check.

> > > 

> > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing

> > > change information")

> > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> > 

> > I'm OK with this if Bejamin does not see any problems with it. I'll

> > wait for his comments before giving my reviewed-by tag.

> > 

> > That workaround (commit 217504a05532) is unfortunately too fragile.

> > I'm going to now separate the processing of the connector state from

> > the event handler (interrupt handler). That way we should be fairly

> > sure we don't loose any of the connector states even if an event is

> > generated while we are still in the middle of processing the previous

> > one(s), and at the same time be sure that we also don't confuse the

> > firmware.

> > 

> > So the event handler shall after that only read the connector status,

> > schedule the unique job where it's processed and ACK the event.

> > Nothing else.

> 

> Seems to be straightforward to implement. I'm attaching the patch I

> made for that. I think it should actually also remove the problem you

> are seeing. Can you test it?


Hmm, I am happy to try the patch tomorrow in the scenario where I
observed my race condition.

Unfortunately, I don't feel it'll work. The problem that I was seeing
looked like a race condition in the PPM itself, where the window is the
time between the UCSI_GET_CONNECTOR_STATUS command and the subsequent
ACK.
For such a firmware level bug in the PPM, we need a way to detect the
race condition when it happens (or get a fix for the firmware).

Note that there was also some code to always sent a
UCSI_GET_CAM_SUPPORTED command "to ensure the PPM does not get stuck in
case it assumes we do". I have no idea where this came from or what
PPMs might require this workaround. But I doubt it is a good idea to
drop it.

Benjamin
Heikki Krogerus May 18, 2021, 1:21 p.m. UTC | #5
On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote:
> It's possible that the interrupt handler for the UCSI driver signals a

> connector changes after the handler clears the PENDING bit, but before

> it has sent the acknowledge request. The result is that the handler is

> invoked yet again, to ack the same connector change.

> 

> At least some versions of the Qualcomm UCSI firmware will not handle the

> second - "spurious" - acknowledgment gracefully. So make sure to not

> clear the pending flag until the change is acknowledged.

> 

> Any connector changes coming in after the acknowledgment, that would

> have the pending flag incorrectly cleared, would afaict be covered by

> the subsequent connector status check.

> 

> Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information")

> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>


Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>


> ---

>  drivers/usb/typec/ucsi/ucsi.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c

> index 282c3c825c13..f451ce0132a9 100644

> --- a/drivers/usb/typec/ucsi/ucsi.c

> +++ b/drivers/usb/typec/ucsi/ucsi.c

> @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct work_struct *work)

>  	ucsi_send_command(con->ucsi, command, NULL, 0);

>  

>  	/* 3. ACK connector change */

> -	clear_bit(EVENT_PENDING, &ucsi->flags);

>  	ret = ucsi_acknowledge_connector_change(ucsi);

> +	clear_bit(EVENT_PENDING, &ucsi->flags);

>  	if (ret) {

>  		dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);

>  		goto out_unlock;

> -- 

> 2.29.2


-- 
heikki
Heikki Krogerus May 18, 2021, 1:29 p.m. UTC | #6
On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote:
> Hi Heikki,

> 

> 

> On Mon, 2021-05-17 at 15:15 +0300, Heikki Krogerus wrote:

> > On Mon, May 17, 2021 at 01:03:12PM +0300, Heikki Krogerus wrote:

> > > On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote:

> > > > It's possible that the interrupt handler for the UCSI driver

> > > > signals a

> > > > connector changes after the handler clears the PENDING bit, but

> > > > before

> > > > it has sent the acknowledge request. The result is that the handler

> > > > is

> > > > invoked yet again, to ack the same connector change.

> > > > 

> > > > At least some versions of the Qualcomm UCSI firmware will not

> > > > handle the

> > > > second - "spurious" - acknowledgment gracefully. So make sure to

> > > > not

> > > > clear the pending flag until the change is acknowledged.

> > > > 

> > > > Any connector changes coming in after the acknowledgment, that

> > > > would

> > > > have the pending flag incorrectly cleared, would afaict be covered

> > > > by

> > > > the subsequent connector status check.

> > > > 

> > > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing

> > > > change information")

> > > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

> > > 

> > > I'm OK with this if Bejamin does not see any problems with it. I'll

> > > wait for his comments before giving my reviewed-by tag.

> > > 

> > > That workaround (commit 217504a05532) is unfortunately too fragile.

> > > I'm going to now separate the processing of the connector state from

> > > the event handler (interrupt handler). That way we should be fairly

> > > sure we don't loose any of the connector states even if an event is

> > > generated while we are still in the middle of processing the previous

> > > one(s), and at the same time be sure that we also don't confuse the

> > > firmware.

> > > 

> > > So the event handler shall after that only read the connector status,

> > > schedule the unique job where it's processed and ACK the event.

> > > Nothing else.

> > 

> > Seems to be straightforward to implement. I'm attaching the patch I

> > made for that. I think it should actually also remove the problem you

> > are seeing. Can you test it?

> 

> Hmm, I am happy to try the patch tomorrow in the scenario where I

> observed my race condition.

> 

> Unfortunately, I don't feel it'll work. The problem that I was seeing

> looked like a race condition in the PPM itself, where the window is the

> time between the UCSI_GET_CONNECTOR_STATUS command and the subsequent

> ACK.

> For such a firmware level bug in the PPM, we need a way to detect the

> race condition when it happens (or get a fix for the firmware).


OK. Let me know does the patch bring the issue back for you.

> Note that there was also some code to always sent a

> UCSI_GET_CAM_SUPPORTED command "to ensure the PPM does not get stuck in

> case it assumes we do". I have no idea where this came from or what

> PPMs might require this workaround. But I doubt it is a good idea to

> drop it.


Sure.

thanks,

-- 
heikki
Benjamin Berg May 18, 2021, 6:04 p.m. UTC | #7
On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote:
> On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote:

> > 

> > [SNIP]

> > Unfortunately, I don't feel it'll work. The problem that I was

> > seeing

> > looked like a race condition in the PPM itself, where the window is

> > the

> > time between the UCSI_GET_CONNECTOR_STATUS command and the

> > subsequent

> > ACK.

> > For such a firmware level bug in the PPM, we need a way to detect

> > the

> > race condition when it happens (or get a fix for the firmware).

> 

> OK. Let me know does the patch bring the issue back for you.


So, I just tried the patch, and I can occasionally reproduce the issue
where "online" for the ucsi power adapter is stuck at "1" after
unplugging with the patch applied.

Benjamin
Heikki Krogerus May 19, 2021, 11:33 a.m. UTC | #8
On Tue, May 18, 2021 at 08:04:14PM +0200, Benjamin Berg wrote:
> On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote:

> > On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote:

> > > 

> > > [SNIP]

> > > Unfortunately, I don't feel it'll work. The problem that I was

> > > seeing

> > > looked like a race condition in the PPM itself, where the window is

> > > the

> > > time between the UCSI_GET_CONNECTOR_STATUS command and the

> > > subsequent

> > > ACK.

> > > For such a firmware level bug in the PPM, we need a way to detect

> > > the

> > > race condition when it happens (or get a fix for the firmware).

> > 

> > OK. Let me know does the patch bring the issue back for you.

> 

> So, I just tried the patch, and I can occasionally reproduce the issue

> where "online" for the ucsi power adapter is stuck at "1" after

> unplugging with the patch applied.


Thanks for testing it.

I'm still not sure that the PPM is the culprit here. I have a feeling
that the problem you are seeing is caused by the workaround (bad
workaround) that we have for the issue where the EC firmware does not
return with the BUSY bit set in the CCI when it should in many cases.
The UCSI ACPI driver has one minute timeout value for command
completion because of that, which is way too long. So if the EC
firmware decides to take its time before acknowledging command, the
driver is stuck, and we start loosing the events... Well, I guess
technically the PPM would be the culprit in the end in any case, but
I'm just not sure that there is any race like you suspected.

But this is off topic. I'll send you an RFC proposal what I think we
could do about that.


thanks,

-- 
heikki
Benjamin Berg May 19, 2021, 11:50 a.m. UTC | #9
On Wed, 2021-05-19 at 14:33 +0300, Heikki Krogerus wrote:
> On Tue, May 18, 2021 at 08:04:14PM +0200, Benjamin Berg wrote:

> > On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote:

> > > On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote:

> > > > 

> > > > [SNIP]

> > > > Unfortunately, I don't feel it'll work. The problem that I was

> > > > seeing

> > > > looked like a race condition in the PPM itself, where the

> > > > window is

> > > > the

> > > > time between the UCSI_GET_CONNECTOR_STATUS command and the

> > > > subsequent

> > > > ACK.

> > > > For such a firmware level bug in the PPM, we need a way to

> > > > detect

> > > > the

> > > > race condition when it happens (or get a fix for the firmware).

> > > 

> > > OK. Let me know does the patch bring the issue back for you.

> > 

> > So, I just tried the patch, and I can occasionally reproduce the

> > issue

> > where "online" for the ucsi power adapter is stuck at "1" after

> > unplugging with the patch applied.

> 

> Thanks for testing it.

> 

> I'm still not sure that the PPM is the culprit here. I have a feeling

> that the problem you are seeing is caused by the workaround (bad

> workaround) that we have for the issue where the EC firmware does not

> return with the BUSY bit set in the CCI when it should in many cases.

> The UCSI ACPI driver has one minute timeout value for command

> completion because of that, which is way too long. So if the EC

> firmware decides to take its time before acknowledging command, the

> driver is stuck, and we start loosing the events...


Hmm, interesting, yes. If the PPM is sending a second change event
before or after we ACK'ed the first one, and we loose this event, then
that would absolutely explain the issue I am seeing.

In my case, I was able to considerably increase the probability of the
bug by inserting a sleep just before acknowledging the connector
change. Not sure whether this is meaningful, but maybe that tells us
something about who is at fault here.

Benjamin

> Well, I guess

> technically the PPM would be the culprit in the end in any case, but

> I'm just not sure that there is any race like you suspected.

> 

> But this is off topic. I'll send you an RFC proposal what I think we

> could do about that.

> 

> 

> thanks,

>
diff mbox series

Patch

diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index 282c3c825c13..f451ce0132a9 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -694,8 +694,8 @@  static void ucsi_handle_connector_change(struct work_struct *work)
 	ucsi_send_command(con->ucsi, command, NULL, 0);
 
 	/* 3. ACK connector change */
-	clear_bit(EVENT_PENDING, &ucsi->flags);
 	ret = ucsi_acknowledge_connector_change(ucsi);
+	clear_bit(EVENT_PENDING, &ucsi->flags);
 	if (ret) {
 		dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);
 		goto out_unlock;