Locking issues w/ functionfs gadget and aio?

Message ID CALAqxLXPWhQvWfqrpw0MVygothMKhE++cPMj2ZjoZR7KVPwYXQ@mail.gmail.com
State New
Headers show

Commit Message

John Stultz June 13, 2015, 12:51 a.m.
On Mon, Jun 8, 2015 at 6:14 PM, John Stultz <john.stultz@linaro.org> wrote:
> After setting up functionfs for adb w/ 4.1-rc7, I noticed some flakey behavior.
> I enabled some lock debugging and got the following:
>
> [   91.648093] read strings
> [   91.650264] g_ffs gadget: g_ffs ready
> [   91.652551] ci_hdrc ci_hdrc.0: CI_HDRC_CONTROLLER_RESET_EVENT received
> [   96.068693] BUG: spinlock lockup suspected on CPU#0, adbd/2791
> [   96.068751]  lock: 0xe7764880, .magic: e7764880, .owner: <none>/-1,
> .owner_cpu: -407539900
> [   96.073448] CPU: 0 PID: 2791 Comm: adbd Not tainted
> 4.1.0-rc1-00032-g359b12f #147
> [   96.081688] Hardware name: Qualcomm (Flattened Device Tree)
> [   96.089266] [<c0216ac8>] (unwind_backtrace) from [<c02136a8>]
> (show_stack+0x10/0x14)
> [   96.094635] [<c02136a8>] (show_stack) from [<c075d9fc>]
> (dump_stack+0x70/0xbc)
> [   96.102627] [<c075d9fc>] (dump_stack) from [<c026ef90>]
> (do_raw_spin_lock+0x114/0x1a0)
> [   96.109661] [<c026ef90>] (do_raw_spin_lock) from [<c0764cb8>]
> (_raw_spin_lock_irqsave+0x50/0x5c)
> [   96.117560] [<c0764cb8>] (_raw_spin_lock_irqsave) from [<c037c1a0>]
> (kiocb_set_cancel_fn+0x1c/0x60)
> [   96.126519] [<c037c1a0>] (kiocb_set_cancel_fn) from [<c05ae568>]
> (ffs_epfile_read_iter+0x8c/0x140)
> [   96.135289] [<c05ae568>] (ffs_epfile_read_iter) from [<c0332018>]
> (__vfs_read+0xb0/0xd4)
> [   96.144290] [<c0332018>] (__vfs_read) from [<c0332ef8>] (vfs_read+0x7c/0x100)
> [   96.152535] [<c0332ef8>] (vfs_read) from [<c0332fbc>] (SyS_read+0x40/0x8c)
> [   96.159571] [<c0332fbc>] (SyS_read) from [<c020ff20>]
> (ret_fast_syscall+0x0/0x4c)
> [  117.678633] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  117.683069]  0: (1 GPs behind) idle=805/140000000000000/0
> softirq=7187/7189 fqs=2601
> [  117.683316]  (detected by 3, t=2603 jiffies, g=3028, c=3027, q=474)
> [  117.692345] Task dump for CPU 0:
> [  117.697202] adbd            R running      0  2791      1 0x00000002
> [  117.704296] [<c075f234>] (__schedule) from [<ffffffff>] (0xffffffff)
>
>
> It seems we're stuck on the kioctx.ctx_lock, and reviewing that lock
> usage I don't see any problems in fs/aio.c right off.
>
> So I'm guessing the f_fs.c code is somehow not initializing the lock
> structure, or maybe calling kiocb_set_cancel_fn() from the wrong
> context?

So looking further at this, it seems that the __vfs_read() calls
new_sync_read(), which allocates a struct kiocb kiocb on the stack and
passes it to the ffs_epfile_read_iter() funciton. That then calls
kiocb_set_cancel_fn() passing a pointer to that kiocb. However,
kiocb_set_cancel_fn assumes the kiocb is a sub-element of a struct
aio_kiocb, and it tries to grab the kioctx from that parent structure.
However it seems there is no aio_kiocb structure here, so the
spin_lock_irqsave hangs trying to lock random data on the stack.

I'm not super sure what the right fix is, but if do something like the
following (sorry, whitespace corrupted via copy/paste), I don't seem
to run into the problem.

        if (res == -EIOCBQUEUED)
@@ -968,7 +969,8 @@ static ssize_t ffs_epfile_read_iter(struct kiocb
*kiocb, struct iov_iter *to)

        kiocb->private = p;

-       kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
+       if(p->aio)
+               kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);

        res = ffs_epfile_io(kiocb->ki_filp, p);
        if (res == -EIOCBQUEUED)


Is there a better solution here? I'm not sure I see if the
is_sync_kiocb(kiocb) check would ever be false from here, so I'm not
sure if all the p->aio checking is really needed or not.

This issue seems to have been introduced with 70e60d917e91fff
(gadget/function/f_fs.c: switch to ->{read,write}_iter())

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John Stultz June 22, 2015, 5:20 p.m. | #1
On Sat, Jun 20, 2015 at 10:24 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jun 12, 2015 at 05:51:12PM -0700, John Stultz wrote:
>
>> I'm not super sure what the right fix is, but if do something like the
>> following (sorry, whitespace corrupted via copy/paste), I don't seem
>> to run into the problem.
>
> Looks sane.  Which tree would you prefer it to go through, vfs or usb?
> BTW, in either case you'd need Signed-off-by: on that patch...

Heh. I assumed my hack would be the wrong thing, but if there's no
better suggestion, I'm happy to submit it.

I have no strong preference of which tree it goes through, but I'd
like for Andrzej to weigh in to make sure he agrees.

I'll write up a proper changelog and resubmit here shortly.

thanks for the review!
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in

Patch

diff --git a/drivers/usb/gadget/function/f_fs.c
b/drivers/usb/gadget/function/f_fs.c
index 3507f88..e322818 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -924,7 +924,8 @@  static ssize_t ffs_epfile_write_iter(struct kiocb
*kiocb, struct iov_iter *from)

        kiocb->private = p;

-       kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);
+       if (p->aio)
+               kiocb_set_cancel_fn(kiocb, ffs_aio_cancel);

        res = ffs_epfile_io(kiocb->ki_filp, p);