Oops with dwc3 in device mode and functionfs

Message ID CAF78GY339R5vz3agt_f-7X2=dwjDpmXUoKKnX1rObXnwGjWVnA@mail.gmail.com
State New
Headers show

Commit Message

Vincent Pelletier Jan. 3, 2017, 5:18 p.m.
Checking more, I think the 3 "do {...} while(--count)" loops in f_fs.c
are wrong: they should be "while(count--){...}" ("git format-patch"
attached, sorry, I'm on a dev-hostile mail client at the moment):
- count covers function-specific endpoint, so it is 0 when no endpoint
is declared, which leads to an integer underflow and exceeding eps
array end.
- a function enabling/disabling endpoint 0 does not make sense to me
(they are only supposed to care about their own descriptor, don't they
?)

On Tue, Jan 3, 2017 at 3:49 PM, Vincent Pelletier <plr.vincent@gmail.com> wrote:
> The functionfs side of things is a bit complicated, as I'm writing a

> python module[2] for it, and I am still very new to functionfs. It

> does not crash on 4.9-rc7 though.


As per the above, this was not the whole truth: a device declared with
2 endpoints does not crash on 4.9-rc7, crashes on 4.10-rc2, including
with the upcomming rc3 fixes. A zero endpoint version crashes *and*
made me realise the loop error.

Sadly, this fix alone is not enough to get a functional device. I did
not investigate much yet, and need to get some sleep.
-- 
Vincent Pelletier

Comments

Vincent Pelletier Jan. 6, 2017, 3:21 p.m. | #1
On Tue, 3 Jan 2017 18:18:02 +0100, Vincent Pelletier
<plr.vincent@gmail.com> wrote:
> Sadly, this fix alone is not enough to get a functional device. I did

> not investigate much yet, and need to get some sleep.


I investigated more.

The next issue is that often (>9 times out of 10) the dwc3 fails to
respond to the SET_CONFIGURATION standard request. As a result, only
EP0 works.

Host-side, syslog contains a -110 (ETIMEDOUT) error. Re-triggering a
SET_CONFIGURATION causes the same timeout to happen again.

Device-side, functionfs does trigger the "ENABLE" event.

Altogether, ep0 does work (vendor-specific requests from host do get
received by the program on device which listens to events on ep0 file).
Host (comprehensibly) refuses to emit requests to endpoints which are
not considered as configured, so I cannot test this part.

Plugging my USB protocol analyser, I see corrupted USB PIDs. Here is a
capture, starting at the beginning od the last GET_DESCRIPTOR
transaction before the SET_CONFIGURATION one:

001:23.273'515"383n SETUP   @054.00 DATA0     8B ACK
                    000 80 06 03 03 09 04 ff 00                          ........
001:23.273'518"000n IN      @054.00 NAK
001:23.273'539"966n IN      @054.00 NAK
001:23.273'582"983n IN      @054.00 NAK
001:23.273'604"666n IN      @054.00 DATA1    34B ACK
                    000 22 03 46 00 5a 00 45 00  44 00 34 00 34 00 33 00 ..F.Z.E.D.4.4.3.
                    010 44 00 30 00 31 00 54 00  34 00 32 00 35 00 30 00 D.0.1.T.4.2.5.0.
                    020 31 00                                            1.
001:23.273'607"666n OUT     @054.00 DATA1     0B NAK
001:23.273'628"966n PING    @054.00 NAK
001:23.273'649"650n PING    @054.00 NAK
001:23.273'683"966n PING    @054.00 ACK
001:23.273'685"300n OUT     @054.00 DATA1     0B ACK
001:23.274'459"183n (bad pid) 0xf0 0x36 0xf8
001:23.275'036"750n (bad pid) 0x03 0x00 0x00 0x00 0x96 0x52 0x68 0xfa
001:23.275'037"266n SETUP   @054.00 DATA0     8B ACK
                    000 00 09 01 00 00 00 00 00                          ........
001:23.275'039"750n IN      @054.00 NAK
001:23.275'067"183n IN      @054.00 NAK
001:23.275'110"183n IN      @054.00 NAK
001:23.275'131"866n IN      @054.00 NAK
001:23.275'153"183n IN      @054.00 NAK
001:23.275'196"200n IN      @054.00 NAK
001:23.275'217"883n IN      @054.00 NAK
001:23.275'239"200n IN      @054.00 NAK
001:23.275'260"883n IN      @054.00 NAK
001:23.275'287"300n IN      @054.00 NAK
001:23.275'308"900n IN      @054.00 NAK
001:23.275'329"900n IN      @054.00 NAK
001:23.275'356"216n IN      @054.00 NAK
001:23.275'377"900n IN      @054.00 NAK
001:23.275'399"216n IN      @054.00 NAK
001:23.275'442"233n IN      @054.00 NAK
001:23.275'463"916n IN      @054.00 NAK
001:23.275'485"233n IN      @054.00 NAK
001:23.275'506"916n IN      @054.00 NAK
001:23.275'528"233n IN      @054.00 NAK
001:23.275'571"250n IN      @054.00 NAK
001:23.275'592"933n IN      @054.00 NAK
001:23.275'614"250n IN      @054.00 NAK
001:23.275'635"933n IN      @054.00 NAK
001:23.275'662"350n IN      @054.00 NAK
001:23.275'683"950n IN      @054.00 NAK
001:23.275'706"266n IN      @054.00 NAK
001:23.275'727"950n IN      @054.00 NAK
001:23.275'749"266n IN      @054.00 NAK
001:23.275'770"950n IN      @054.00 NAK
001:23.275'811"200n (bad pid) 0xf0 0x36 0xf8
001:23.275'792"266n IN      @054.00 NAK     (incomplete transaction)
001:23.278'811"983n HS idle

The format being:
timestamp token_type "@"device"."endpoint data_type bytes"B" handshake_type
  data_stage_hexdump

Strangely, when plugging the device on a hub seems to improve the
situation. Sadly, my protocol analyser gets confused when plugged
between that hub and the device, and fails to capture anything (it
likely fails to detect the chirp handshake).

There seems to be another issue (or is it just a consequence of this
one ?) where it may become completely silent, unable to join the bus
when udc identifier is writted to UDC file.

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Pelletier Jan. 7, 2017, 4:07 a.m. | #2
On Fri, 6 Jan 2017 15:21:17 +0000, Vincent Pelletier
<plr.vincent@gmail.com> wrote:
> The next issue is that often (>9 times out of 10) the dwc3 fails to

> respond to the SET_CONFIGURATION standard request. As a result, only

> EP0 works.


I captured a successful enumeration, there are also corrupted pids:

002:03.271'422"583n SETUP   @054.00 DATA0     8B ACK
                    000 80 06 03 03 09 04 ff 00                          ........
002:03.271'425"200n IN      @054.00 NAK
002:03.271'446"233n IN      @054.00 NAK
002:03.271'478"233n IN      @054.00 NAK
002:03.271'499"233n IN      @054.00 DATA1    34B ACK
                    000 22 03 46 00 5a 00 45 00  44 00 34 00 34 00 33 00 ..F.Z.E.D.4.4.3.
                    010 44 00 30 00 31 00 54 00  34 00 32 00 35 00 30 00 D.0.1.T.4.2.5.0.
                    020 31 00                                            1.
002:03.271'502"133n OUT     @054.00 DATA1     0B NAK
002:03.271'523"216n PING    @054.00 NAK
002:03.271'556"233n PING    @054.00 NAK
002:03.271'581"666n PING    @054.00 ACK
002:03.271'583"050n OUT     @054.00 DATA1     0B ACK
002:03.272'345"833n SETUP   @054.00 DATA0     8B ACK
                    000 00 09 01 00 00 00 00 00                          ........
002:03.272'348"350n IN      @054.00 NAK
002:03.272'369"350n IN      @054.00 NAK
[...]
002:03.273'082"100n IN      @054.00 NAK
002:03.273'119"633n (bad pid) 0xf0 0x3e 0x88
002:03.273'103"450n IN      @054.00 NAK     (incomplete transaction)
002:03.273'123"333n (bad pid) 0xf0 0x3e 0x88
002:03.273'127"033n (bad pid) 0xf0 0x3e 0x88
002:03.273'145"450n IN      @054.00 NAK
002:03.273'166"466n IN      @054.00 NAK
[...]
002:03.278'582"600n IN      @054.00 NAK
002:03.278'604"200n IN      @054.00 DATA1     0B ACK

There are 285 NAKs between SETUP and the final ACK.

A note about "incomplete transaction": this is a bug in my trace
parser, the NAK transaction is complete, it is the next token (the bad
pid) which confuses the state tracker, which also causes the
timestamp non-monotonicity in parser's output too. There is one bad-CRC
DATA0 packet after each bad pid.

> There seems to be another issue (or is it just a consequence of this

> one ?) where it may become completely silent, unable to join the bus

> when udc identifier is written to UDC file.


From what I can reproduce, it always happens in 3 stages:
- first enumeration (whether SET_CONFIGURATION succeeds or not seems
  irrelevant)
- second enumeration, which SET_CONFIGURATION always times-out in my
  experience (but given the low success rate, having two consecutive
  successes would be hard anyway)
- 3rd enumeration attempt fails, no event on the bus.
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Shevchenko March 17, 2017, 3:51 p.m. | #3
On Sat, 2017-01-07 at 04:07 +0000, Vincent Pelletier wrote:
> On Fri, 6 Jan 2017 15:21:17 +0000, Vincent Pelletier

> <plr.vincent@gmail.com> wrote:

> > The next issue is that often (>9 times out of 10) the dwc3 fails to

> > respond to the SET_CONFIGURATION standard request. As a result, only

> > EP0 works.

> 

> I captured a successful enumeration, there are also corrupted pids:


Can you check my latest snapshot?

It has been discovered that ep1 and ep8 are not usable for USB (tracing
related stuff). Felipe did a quick hack for that.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felipe Balbi March 20, 2017, 9:23 a.m. | #4
Hi,

Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

> On Sat, 2017-01-07 at 04:07 +0000, Vincent Pelletier wrote:

>> On Fri, 6 Jan 2017 15:21:17 +0000, Vincent Pelletier

>> <plr.vincent@gmail.com> wrote:

>> > The next issue is that often (>9 times out of 10) the dwc3 fails to

>> > respond to the SET_CONFIGURATION standard request. As a result, only

>> > EP0 works.

>> 

>> I captured a successful enumeration, there are also corrupted pids:

>

> Can you check my latest snapshot?

>

> It has been discovered that ep1 and ep8 are not usable for USB (tracing

> related stuff). Felipe did a quick hack for that.


yeah, I have been writing a functionfs "driver" myself to test stuff
out. It's still pretty early but at least enumeration is rather solid.

-- 
balbi
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Pelletier March 24, 2017, 11:06 p.m. | #5
Hello Andy, Felipe,

Sorry for the late reply.

On Mon, Mar 20, 2017 at 6:23 PM, Felipe Balbi
<felipe.balbi@linux.intel.com> wrote:
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

>> Can you check my latest snapshot?


I built your github eds branch as of cfa21022e (based on 4.11.0-rc3),
but I still get enumeration issues (see below). I did not merge any
other branches above it for this test (especially, not Felipe's usb
branches) just in case.

>> It has been discovered that ep1 and ep8 are not usable for USB (tracing

>> related stuff). Felipe did a quick hack for that.


Ah, thanks for the info, I did not keep track at all lately.

> yeah, I have been writing a functionfs "driver" myself to test stuff

> out. It's still pretty early but at least enumeration is rather solid.


Mar 25 07:43:14 vincent-tkpad kernel: [1582082.400893] usb 1-1: USB
disconnect, device number 22
Mar 25 07:43:16 vincent-tkpad kernel: [1582084.028746] usb 1-1: new
high-speed USB device number 23 using xhci_hcd
Mar 25 07:43:16 vincent-tkpad kernel: [1582084.159290] usb 1-1: New
USB device found, idVendor=1d6b, idProduct=0104
Mar 25 07:43:16 vincent-tkpad kernel: [1582084.159294] usb 1-1: New
USB device strings: Mfr=1, Product=2, SerialNumber=3
Mar 25 07:43:16 vincent-tkpad kernel: [1582084.159296] usb 1-1: Product: tesla
Mar 25 07:43:16 vincent-tkpad kernel: [1582084.159297] usb 1-1:
Manufacturer: vpelletier
Mar 25 07:43:16 vincent-tkpad kernel: [1582084.159298] usb 1-1:
SerialNumber: FZED443D01T42501
Mar 25 07:43:21 vincent-tkpad kernel: [1582089.156950] usb 1-1: can't
set config #1, error -110
Mar 25 07:43:21 vincent-tkpad mtp-probe: checking bus 1, device 23:
"/sys/devices/pci0000:00/0000:00:14.0/usb1/1-1"
Mar 25 07:43:21 vincent-tkpad mtp-probe: bus: 1, device: 23 was not an
MTP device

After a few plug/unplug cycles the device gets stuck and does not
enumerate at all (instead of only failing to set the configuration)
until device reboot.

FWIW, at home enumeration happens much more reliably behind a hub than
directly on a host (both being xHCI). Maybe my edison board has some
electrical issue a hub would tolerate better ?

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Shevchenko March 26, 2017, 12:20 p.m. | #6
On Sat, 2017-03-25 at 08:06 +0900, Vincent Pelletier wrote:
> Hello Andy, Felipe,

> 

> Sorry for the late reply.

> 

> On Mon, Mar 20, 2017 at 6:23 PM, Felipe Balbi

> <felipe.balbi@linux.intel.com> wrote:

> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

> > > Can you check my latest snapshot?

> 

> I built your github eds branch as of cfa21022e (based on 4.11.0-rc3),

> but I still get enumeration issues (see below). I did not merge any

> other branches above it for this test (especially, not Felipe's usb

> branches) just in case.


Noted.

> FWIW, at home enumeration happens much more reliably behind a hub than

> directly on a host (both being xHCI). Maybe my edison board has some

> electrical issue a hub would tolerate better ?


We don't know (yet?) Basin Cove PMIC relations with USB mux / vbus /
etc. It might be involved. Since you IIRC were looking to some PMIC
related stuff can you at some point check if it has anything possible
effect on USB?

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Pelletier March 26, 2017, 8:10 p.m. | #7
Hello,

On Sun, Mar 26, 2017 at 9:20 PM, Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
> On Sat, 2017-03-25 at 08:06 +0900, Vincent Pelletier wrote:

>> FWIW, at home enumeration happens much more reliably behind a hub than

>> directly on a host (both being xHCI). Maybe my edison board has some

>> electrical issue a hub would tolerate better ?


Re-reading what I wrote, I realize my vocabulary is misleading.
Rephrasing for the record:

Enumeration (HS chirp, bus address attribution) typically works, even
directly on an xHCI. It is SET_CONFIGURATION (including the one done
by the kernel after enumerating and before any driver tries to access
the device) which very often fails on an xHCI, but very often succeeds
when a hub (in my case, USB3) is inserted between the edison and the
same xHCI.

Having had issues with an electrically non-compliant USB2 device
before (chirp was of the wrong duration), it was confusing my USB
protocol analyzer, and it would help to place it behind a hub (the hub
being more tolerant and hiding the non-compliance from the analyzer -
allowing it to switch from FS to HS and actually capture traffic).
Hence my suspicion for some kind of electrical issue, but I have no
proof for this.

> We don't know (yet?) Basin Cove PMIC relations with USB mux / vbus /

> etc. It might be involved. Since you IIRC were looking to some PMIC

> related stuff


The ADC part ? Yes.
<ot>
I think I have fixed all the issues you pointed out but the two big
ones: using regmap and the IRQ cascading issue found on drivers of its
chip family. I'm not sure how I should declare registers for the
former (platform data ?), and need to catch up with the IRQ fixes from
other drivers.
</ot>

> can you at some point check if it has anything possible effect on USB?


Like, looking if some driver touching the PMIC in the original
firmware source would mention USB ? Yep, I can do.

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felipe Balbi March 27, 2017, 7:55 a.m. | #8
Hi,

Vincent Pelletier <plr.vincent@gmail.com> writes:
> Hello,

>

> On Sun, Mar 26, 2017 at 9:20 PM, Andy Shevchenko

> <andriy.shevchenko@linux.intel.com> wrote:

>> On Sat, 2017-03-25 at 08:06 +0900, Vincent Pelletier wrote:

>>> FWIW, at home enumeration happens much more reliably behind a hub than

>>> directly on a host (both being xHCI). Maybe my edison board has some

>>> electrical issue a hub would tolerate better ?

>

> Re-reading what I wrote, I realize my vocabulary is misleading.

> Rephrasing for the record:

>

> Enumeration (HS chirp, bus address attribution) typically works, even

> directly on an xHCI. It is SET_CONFIGURATION (including the one done

> by the kernel after enumerating and before any driver tries to access

> the device) which very often fails on an xHCI, but very often succeeds

> when a hub (in my case, USB3) is inserted between the edison and the

> same xHCI.


I noticed that too and I still have no idea what's going on :-s

-- 
balbi

Patch hide | download patch | download mbox

From 85815ce5c82022ccfbb2c0184589c4766325b517 Mon Sep 17 00:00:00 2001
From: Vincent Pelletier <vincent@nexedi.com>
Date: Tue, 3 Jan 2017 16:47:23 +0100
Subject: [RFC] usb: gadget: f_fs: Fix iterations on endpoints.

When zero endpoints are declared for a function, there is no endpoint
to disable, enable or free, so replace do...while loops with while loops.
Change pre-decrement to post-decrement to iterate the same number of times
when there are endpoints to process.

Signed-off-by: Vincent Pelletier <vincent@nexedi.com>
---
 drivers/usb/gadget/function/f_fs.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 5e746adc8a2d..5490fc51638e 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1806,7 +1806,7 @@  static void ffs_func_eps_disable(struct ffs_function *func)
 	unsigned long flags;
 
 	spin_lock_irqsave(&func->ffs->eps_lock, flags);
-	do {
+	while (count--) {
 		/* pending requests get nuked */
 		if (likely(ep->ep))
 			usb_ep_disable(ep->ep);
@@ -1817,7 +1817,7 @@  static void ffs_func_eps_disable(struct ffs_function *func)
 			__ffs_epfile_read_buffer_free(epfile);
 			++epfile;
 		}
-	} while (--count);
+	}
 	spin_unlock_irqrestore(&func->ffs->eps_lock, flags);
 }
 
@@ -1831,7 +1831,7 @@  static int ffs_func_eps_enable(struct ffs_function *func)
 	int ret = 0;
 
 	spin_lock_irqsave(&func->ffs->eps_lock, flags);
-	do {
+	while(count--) {
 		struct usb_endpoint_descriptor *ds;
 		int desc_idx;
 
@@ -1867,7 +1867,7 @@  static int ffs_func_eps_enable(struct ffs_function *func)
 
 		++ep;
 		++epfile;
-	} while (--count);
+	}
 	spin_unlock_irqrestore(&func->ffs->eps_lock, flags);
 
 	return ret;
@@ -3448,12 +3448,12 @@  static void ffs_func_unbind(struct usb_configuration *c,
 
 	/* cleanup after autoconfig */
 	spin_lock_irqsave(&func->ffs->eps_lock, flags);
-	do {
+	while (count--) {
 		if (ep->ep && ep->req)
 			usb_ep_free_request(ep->ep, ep->req);
 		ep->req = NULL;
 		++ep;
-	} while (--count);
+	}
 	spin_unlock_irqrestore(&func->ffs->eps_lock, flags);
 	kfree(func->eps);
 	func->eps = NULL;
-- 
2.11.0