diff mbox series

Bluetooth: call lock_sock() outside of spinlock section

Message ID 20210627131134.5434-1-penguin-kernel@I-love.SAKURA.ne.jp
State New
Headers show
Series Bluetooth: call lock_sock() outside of spinlock section | expand

Commit Message

Tetsuo Handa June 27, 2021, 1:11 p.m. UTC
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1]. Defer calling lock_sock()
via sock_hold().

Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
---
 net/bluetooth/hci_sock.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

Comments

Luiz Augusto von Dentz July 7, 2021, 6:20 p.m. UTC | #1
Hi Tetsuo,

On Wed, Jul 7, 2021 at 2:43 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
> calling lock_sock() with rw spinlock held [1]. Defer calling lock_sock()
> via sock_hold().
>
> Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
> Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
> Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
> ---
> Changes in v2:
>   Take hci_sk_list.lock for write in case bt_sock_unlink() is called after
>   sk_hashed(sk) test, and defer hci_dev_put(hdev) till schedulable context.
>
>  net/bluetooth/hci_sock.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..d8e1ac1ae10d 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>
>         if (event == HCI_DEV_UNREG) {
>                 struct sock *sk;
> +               bool put_dev;
>
> +restart:
> +               put_dev = false;
>                 /* Detach sockets from device */
>                 read_lock(&hci_sk_list.lock);
>                 sk_for_each(sk, &hci_sk_list.head) {
> +                       /* hci_sk_list.lock is preventing hci_sock_release()
> +                        * from calling bt_sock_unlink().
> +                        */
> +                       if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
> +                               continue;
> +                       /* Take a ref because we can't call lock_sock() with
> +                        * hci_sk_list.lock held.
> +                        */
> +                       sock_hold(sk);
> +                       read_unlock(&hci_sk_list.lock);
>                         lock_sock(sk);
> -                       if (hci_pi(sk)->hdev == hdev) {
> +                       /* Since hci_sock_release() might have already called
> +                        * bt_sock_unlink() while waiting for lock_sock(),
> +                        * use sk_hashed(sk) for checking that bt_sock_unlink()
> +                        * is not yet called.
> +                        */
> +                       write_lock(&hci_sk_list.lock);
> +                       if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
>                                 hci_pi(sk)->hdev = NULL;
>                                 sk->sk_err = EPIPE;
>                                 sk->sk_state = BT_OPEN;
>                                 sk->sk_state_change(sk);
> -
> -                               hci_dev_put(hdev);
> +                               put_dev = true;
>                         }
> +                       write_unlock(&hci_sk_list.lock);
>                         release_sock(sk);
> +                       sock_put(sk);
> +                       if (put_dev)
> +                               hci_dev_put(hdev);
> +                       /* Restarting is safe, for hci_pi(sk)->hdev != hdev if
> +                        * condition met and sk_unhashed(sk) == true otherwise.
> +                        */
> +                       goto restart;

This sounds a little too complicated, afaik backward goto is not even
consider a good practice either, since it appears we don't unlink the
sockets here we could perhaps don't release the reference to hdev
either and leave hci_sock_release to deal with it and then perhaps we
can take away the backward goto, actually why are you restarting to
begin with? It is also weird that this only manifests in the Bluetooth
HCI sockets or other subsystems don't use such locking mechanism
anymore?


>                 }
>                 read_unlock(&hci_sk_list.lock);
>         }
> --
> 2.18.4
>
>
Tetsuo Handa July 7, 2021, 11:33 p.m. UTC | #2
On 2021/07/08 3:20, Luiz Augusto von Dentz wrote:
>> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
>> index b04a5a02ecf3..d8e1ac1ae10d 100644
>> --- a/net/bluetooth/hci_sock.c
>> +++ b/net/bluetooth/hci_sock.c
>> @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>>
>>         if (event == HCI_DEV_UNREG) {
>>                 struct sock *sk;
>> +               bool put_dev;
>>
>> +restart:
>> +               put_dev = false;
>>                 /* Detach sockets from device */
>>                 read_lock(&hci_sk_list.lock);
>>                 sk_for_each(sk, &hci_sk_list.head) {
>> +                       /* hci_sk_list.lock is preventing hci_sock_release()
>> +                        * from calling bt_sock_unlink().
>> +                        */
>> +                       if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
>> +                               continue;
>> +                       /* Take a ref because we can't call lock_sock() with
>> +                        * hci_sk_list.lock held.
>> +                        */
>> +                       sock_hold(sk);
>> +                       read_unlock(&hci_sk_list.lock);
>>                         lock_sock(sk);
>> -                       if (hci_pi(sk)->hdev == hdev) {
>> +                       /* Since hci_sock_release() might have already called
>> +                        * bt_sock_unlink() while waiting for lock_sock(),
>> +                        * use sk_hashed(sk) for checking that bt_sock_unlink()
>> +                        * is not yet called.
>> +                        */
>> +                       write_lock(&hci_sk_list.lock);
>> +                       if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
>>                                 hci_pi(sk)->hdev = NULL;
>>                                 sk->sk_err = EPIPE;
>>                                 sk->sk_state = BT_OPEN;
>>                                 sk->sk_state_change(sk);
>> -
>> -                               hci_dev_put(hdev);
>> +                               put_dev = true;
>>                         }
>> +                       write_unlock(&hci_sk_list.lock);
>>                         release_sock(sk);
>> +                       sock_put(sk);
>> +                       if (put_dev)
>> +                               hci_dev_put(hdev);
>> +                       /* Restarting is safe, for hci_pi(sk)->hdev != hdev if
>> +                        * condition met and sk_unhashed(sk) == true otherwise.
>> +                        */
>> +                       goto restart;
> 
> This sounds a little too complicated, afaik backward goto is not even
> consider a good practice either, since it appears we don't unlink the
> sockets here

Because hci_sock_release() might be concurrently called while
hci_sock_dev_event() from hci_unregister_dev() from vhci_release() is running.

While hci_sock_dev_event() itself does not unlink the sockets from hci_sk_list.head,
bt_sock_unlink() from hci_sock_release() unlinks a socket from hci_sk_list.head.

Therefore, as long as there is possibility that hci_sk_list is modified by other thread
when current thread is traversing this list, we need to be prepared for such race.

>              we could perhaps don't release the reference to hdev
> either and leave hci_sock_release to deal with it and then perhaps we
> can take away the backward goto, actually why are you restarting to
> begin with?

Do you mean something like

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..0525883f4639 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 	if (event == HCI_DEV_UNREG) {
 		struct sock *sk;
 
-		/* Detach sockets from device */
+		/* Change socket state and notify */
 		read_lock(&hci_sk_list.lock);
 		sk_for_each(sk, &hci_sk_list.head) {
-			lock_sock(sk);
 			if (hci_pi(sk)->hdev == hdev) {
-				hci_pi(sk)->hdev = NULL;
 				sk->sk_err = EPIPE;
 				sk->sk_state = BT_OPEN;
 				sk->sk_state_change(sk);
-
-				hci_dev_put(hdev);
 			}
-			release_sock(sk);
 		}
 		read_unlock(&hci_sk_list.lock);
 	}

? I can't judge because I don't know how this works. I worry that
without lock_sock()/release_sock(), this races with e.g. hci_sock_bind().

We could take away the backward goto if we can do something like below.

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..1ca03769badf 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida);
 
 static atomic_t monitor_promisc = ATOMIC_INIT(0);
 
+static DEFINE_MUTEX(sock_list_lock);
+
 /* ----- HCI socket interface ----- */
 
 /* Socket info */
@@ -760,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 		struct sock *sk;
 
 		/* Detach sockets from device */
-		read_lock(&hci_sk_list.lock);
+		mutex_lock(&sock_list_lock);
 		sk_for_each(sk, &hci_sk_list.head) {
 			lock_sock(sk);
 			if (hci_pi(sk)->hdev == hdev) {
@@ -773,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 			}
 			release_sock(sk);
 		}
-		read_unlock(&hci_sk_list.lock);
+		mutex_unlock(&sock_list_lock);
 	}
 }
 
@@ -838,6 +840,7 @@ static int hci_sock_release(struct socket *sock)
 	if (!sk)
 		return 0;
 
+	mutex_lock(&sock_list_lock);
 	lock_sock(sk);
 
 	switch (hci_pi(sk)->channel) {
@@ -860,6 +863,7 @@ static int hci_sock_release(struct socket *sock)
 	}
 
 	bt_sock_unlink(&hci_sk_list, sk);
+	mutex_unlock(&sock_list_lock);
 
 	hdev = hci_pi(sk)->hdev;
 	if (hdev) {
@@ -2049,7 +2053,9 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
 	sock->state = SS_UNCONNECTED;
 	sk->sk_state = BT_OPEN;
 
+	mutex_lock(&sock_list_lock);
 	bt_sock_link(&hci_sk_list, sk);
+	mutex_unlock(&sock_list_lock);
 	return 0;
 }
 

>             It is also weird that this only manifests in the Bluetooth
> HCI sockets or other subsystems don't use such locking mechanism
> anymore?

If other subsystems have similar problem, that should be handled by different
patches. This patch fixes a regression introduced when fixing CVE-2021-3573,
and I think that Linux distributors are waiting for this regression to be fixed
so that they can backport commit e305509e678b3a4a ("Bluetooth: use correct lock
to prevent UAF of hdev object"). Also, this regression is currently 7th top
crashers for syzbot, and I'd like to apply this patch as soon as possible.

I think that this patch can serve as a response to Lin's comment

  > In short, I have no idea if there is any lock replacing solution for
  > this bug. I need help and suggestions because the lock mechanism is
  > just so difficult.

at https://patchwork.kernel.org/project/bluetooth/patch/CAJjojJsj9pzF4j2MVvsM-hCpvyR7OkZn232yt3MdOGnLxOiRRg@mail.gmail.com
without changing behavior.

> 
> 
>>                 }
>>                 read_unlock(&hci_sk_list.lock);
>>         }
>> --
>> 2.18.4
Lin Ma July 8, 2021, 1 a.m. UTC | #3
> 
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..0525883f4639 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>  	if (event == HCI_DEV_UNREG) {
>  		struct sock *sk;
>  
> -		/* Detach sockets from device */
> +		/* Change socket state and notify */
>  		read_lock(&hci_sk_list.lock);
>  		sk_for_each(sk, &hci_sk_list.head) {
> -			lock_sock(sk);
>  			if (hci_pi(sk)->hdev == hdev) {
> -				hci_pi(sk)->hdev = NULL;
>  				sk->sk_err = EPIPE;
>  				sk->sk_state = BT_OPEN;
>  				sk->sk_state_change(sk);
> -
> -				hci_dev_put(hdev);
>  			}
> -			release_sock(sk);
>  		}
>  		read_unlock(&hci_sk_list.lock);
>  	}
> 
> ? I can't judge because I don't know how this works. I worry that
> without lock_sock()/release_sock(), this races with e.g. hci_sock_bind().
> 
> We could take away the backward goto if we can do something like below.
> 
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..1ca03769badf 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida);
>  
>  static atomic_t monitor_promisc = ATOMIC_INIT(0);
>  
> +static DEFINE_MUTEX(sock_list_lock);
> +
>  /* ----- HCI socket interface ----- */
>  
>  /* Socket info */
> @@ -760,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>  		struct sock *sk;
>  
>  		/* Detach sockets from device */
> -		read_lock(&hci_sk_list.lock);
> +		mutex_lock(&sock_list_lock);
>  		sk_for_each(sk, &hci_sk_list.head) {
>  			lock_sock(sk);
>  			if (hci_pi(sk)->hdev == hdev) {
> @@ -773,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>  			}
>  			release_sock(sk);
>  		}
> -		read_unlock(&hci_sk_list.lock);
> +		mutex_unlock(&sock_list_lock);
>  	}
>  }
>  
> @@ -838,6 +840,7 @@ static int hci_sock_release(struct socket *sock)
>  	if (!sk)
>  		return 0;
>  
> +	mutex_lock(&sock_list_lock);
>  	lock_sock(sk);
>  
>  	switch (hci_pi(sk)->channel) {
> @@ -860,6 +863,7 @@ static int hci_sock_release(struct socket *sock)
>  	}
>  
>  	bt_sock_unlink(&hci_sk_list, sk);
> +	mutex_unlock(&sock_list_lock);
>  
>  	hdev = hci_pi(sk)->hdev;
>  	if (hdev) {
> @@ -2049,7 +2053,9 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
>  	sock->state = SS_UNCONNECTED;
>  	sk->sk_state = BT_OPEN;
>  
> +	mutex_lock(&sock_list_lock);
>  	bt_sock_link(&hci_sk_list, sk);
> +	mutex_unlock(&sock_list_lock);
>  	return 0;
>  }
>  
> 
> >             It is also weird that this only manifests in the Bluetooth
> > HCI sockets or other subsystems don't use such locking mechanism
> > anymore?
> 

Hello Tetsuo,

Yeah, that's a great patch indeed. Add one extra mutex lock for handling this.
In fact, I have tried to replace all the hci_sk_list.lock from rwlock_t to mutext.

> https://patchwork.kernel.org/project/bluetooth/patch/CAJjojJsj9pzF4j2MVvsM-hCpvyR7OkZn232yt3MdOGnLxOiRRg@mail.gmail.com/
> However, from the lock principle in the Linux kernel, this lock
> replacement is not appropriate. I take a lot of time to try with other
> lock combinations for this case but failed. For example, I tried to
> replace the rwlock_t in the hci_sk_list with a sleep-able mutex lock.

Because I have seem other part of code in kernel uses this combination: mutex_t + lock_sock. It shouldn't trigger any locking errors. (Will test it)

> Also, this regression is currently 7th top
> crashers for syzbot, and I'd like to apply this patch as soon as possible.
> 

XD, Yeah. Because the bug crash point is located at function hci_sock_dev_event(). Whenever syzkaller fuzzes Bluetooth stack and the executor exits, the crash happens.

> I think that this patch can serve as a response to Lin's comment

> > In short, I have no idea if there is any lock replacing solution for
> > this bug. I need help and suggestions because the lock mechanism is
> > just so difficult.

Thanks for that, it's quite appreciating.

Regards
Lin Ma
Tetsuo Handa July 9, 2021, 1:50 p.m. UTC | #4
It seems that history of this locking problem is a trial and error.

Commit b40df5743ee8aed8 ("[PATCH] bluetooth: fix socket locking in
hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock()
as an attempt to fix lockdep warning.

Then, commit 4ce61d1c7a8ef4c1 ("[BLUETOOTH]: Fix locking in
hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
local_bh_disable() + bh_lock_sock_nested() as an attempt to fix
sleep in atomic context warning.

Then, commit 4b5dd696f81b210c ("Bluetooth: Remove local_bh_disable() from
hci_sock.c") in 3.3-rc1 removed local_bh_disable().

Then, commit e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF
of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
lock_sock() as an attempt to fix CVE-2021-3573.

But unfortunately it is too difficult to convert rw spinlock into mutex;
we need to live with current rw spinlock.

And we have three choices that can live with current rw spinlock.
Patches for these choices are show bottom. All tested by syzbot.

(1) Introduce a global mutex dedicated for hci_sock_dev_event(), and block
    bt_sock_unlink() and concurrent hci_sock_dev_event() callers.

    This is simplest if it is guaranteed that total delay for lock_sock()
    on all sockets is short enough.

    But it is not clear how long lock_sock() might block, for e.g.
    hci_sock_bound_ioctl() which is called inside lock_sock() section is
    doing copy_from_user()/copy_to_user() which may result in blocking
    other lock_sock() waiters for many seconds. I think that POC.zip is
    demonstrating that this delay is controllable via userfaultfd.

    Of course, the robust fix will be not to call
    copy_from_user()/copy_to_user() inside lock_sock() section. But such
    big change is not suitable for a fix for commit e305509e678b3a4a
    ("Bluetooth: use correct lock to prevent UAF of hdev object").

(2) Introduce a global mutex for hci_sock_dev_event(), but don't block
    bt_sock_unlink() nor concurrent hci_sock_dev_event() callers (i.e.
    use this mutex like a spinlock).

    Since it will be safe to resume sk_for_each() as long as currently
    accessing socket remains on that list, we can track which socket is
    currently accessed, and let bt_sock_unlink() wait if that socket is
    currently accessed.

    It is possible that total delay becomes long enough for khungtaskd to
    complain. Commit 8d0caedb75968304 ("can: bcm/raw/isotp: use per module
    netdevice notifier") is an example for avoiding khungtaskd warning
    using this choice. Compared to that commit, this choice for
    hci_sock_dev_event() case will need to also touch "struct hci_pinfo"
    because we need to track concurrent hci_sock_dev_event() callers.

(3) Don't introduce a global mutex for hci_sock_dev_event(), and don't
    block bt_sock_unlink() nor concurrent hci_sock_dev_event() callers.

    Since it will be safe to resume sk_for_each() as long as currently
    accessing socket remains on that list, take a refcount on currently
    accessing socket and check if currently accessing socket is still
    on the list. This choice needs to touch only hci_sock_dev_event().

Which choice do we want to go?

Patch for choice (1):

----------------------------------------
 net/bluetooth/hci_sock.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..c860ec4ea7b8 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -150,6 +150,8 @@ static struct bt_sock_list hci_sk_list = {
 	.lock = __RW_LOCK_UNLOCKED(hci_sk_list.lock)
 };
 
+static DEFINE_MUTEX(hci_sk_list_lock);
+
 static bool is_filtered_packet(struct sock *sk, struct sk_buff *skb)
 {
 	struct hci_filter *flt;
@@ -758,10 +760,13 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 
 	if (event == HCI_DEV_UNREG) {
 		struct sock *sk;
+		int put_count = 0;
 
 		/* Detach sockets from device */
+		mutex_lock(&hci_sk_list_lock);
 		read_lock(&hci_sk_list.lock);
 		sk_for_each(sk, &hci_sk_list.head) {
+			read_unlock(&hci_sk_list.lock);
 			lock_sock(sk);
 			if (hci_pi(sk)->hdev == hdev) {
 				hci_pi(sk)->hdev = NULL;
@@ -769,11 +774,15 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 				sk->sk_state = BT_OPEN;
 				sk->sk_state_change(sk);
 
-				hci_dev_put(hdev);
+				put_count++;
 			}
 			release_sock(sk);
+			read_lock(&hci_sk_list.lock);
 		}
 		read_unlock(&hci_sk_list.lock);
+		mutex_unlock(&hci_sk_list_lock);
+		while (put_count--)
+			hci_dev_put(hdev);
 	}
 }
 
@@ -838,6 +847,10 @@ static int hci_sock_release(struct socket *sock)
 	if (!sk)
 		return 0;
 
+	mutex_lock(&hci_sk_list_lock);
+	bt_sock_unlink(&hci_sk_list, sk);
+	mutex_unlock(&hci_sk_list_lock);
+
 	lock_sock(sk);
 
 	switch (hci_pi(sk)->channel) {
@@ -859,8 +872,6 @@ static int hci_sock_release(struct socket *sock)
 		break;
 	}
 
-	bt_sock_unlink(&hci_sk_list, sk);
-
 	hdev = hci_pi(sk)->hdev;
 	if (hdev) {
 		if (hci_pi(sk)->channel == HCI_CHANNEL_USER) {
----------------------------------------

Patch for choice (2):

----------------------------------------
 net/bluetooth/hci_sock.c |   39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..3e65fcc8c9af 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida);
 
 static atomic_t monitor_promisc = ATOMIC_INIT(0);
 
+static DEFINE_MUTEX(dev_event_lock);
+
 /* ----- HCI socket interface ----- */
 
 /* Socket info */
@@ -57,6 +59,7 @@ struct hci_pinfo {
 	unsigned long     flags;
 	__u32             cookie;
 	char              comm[TASK_COMM_LEN];
+	unsigned int      event_in_progress;
 };
 
 void hci_sock_set_flag(struct sock *sk, int nr)
@@ -758,10 +761,15 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 
 	if (event == HCI_DEV_UNREG) {
 		struct sock *sk;
+		int put_count = 0;
 
 		/* Detach sockets from device */
+		mutex_lock(&dev_event_lock);
 		read_lock(&hci_sk_list.lock);
 		sk_for_each(sk, &hci_sk_list.head) {
+			read_unlock(&hci_sk_list.lock);
+			hci_pi(sk)->event_in_progress++;
+			mutex_unlock(&dev_event_lock);
 			lock_sock(sk);
 			if (hci_pi(sk)->hdev == hdev) {
 				hci_pi(sk)->hdev = NULL;
@@ -769,11 +777,17 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 				sk->sk_state = BT_OPEN;
 				sk->sk_state_change(sk);
 
-				hci_dev_put(hdev);
+				put_count++;
 			}
 			release_sock(sk);
+			mutex_lock(&dev_event_lock);
+			hci_pi(sk)->event_in_progress--;
+			read_lock(&hci_sk_list.lock);
 		}
 		read_unlock(&hci_sk_list.lock);
+		mutex_unlock(&dev_event_lock);
+		while (put_count--)
+			hci_dev_put(hdev);
 	}
 }
 
@@ -838,6 +852,26 @@ static int hci_sock_release(struct socket *sock)
 	if (!sk)
 		return 0;
 
+	/*
+	 * Wait for sk_for_each() in hci_sock_dev_event() to stop accessing
+	 * this sk before unlinking. Need to unlink before lock_sock(), for
+	 * hci_sock_dev_event() calls lock_sock() after incrementing
+	 * event_in_progress counter.
+	 */
+	while (1) {
+		bool unlinked = true;
+
+		mutex_lock(&dev_event_lock);
+		if (!hci_pi(sk)->event_in_progress)
+			bt_sock_unlink(&hci_sk_list, sk);
+		else
+			unlinked = false;
+		mutex_unlock(&dev_event_lock);
+		if (unlinked)
+			break;
+		schedule_timeout_uninterruptible(1);
+	}
+
 	lock_sock(sk);
 
 	switch (hci_pi(sk)->channel) {
@@ -859,8 +893,6 @@ static int hci_sock_release(struct socket *sock)
 		break;
 	}
 
-	bt_sock_unlink(&hci_sk_list, sk);
-
 	hdev = hci_pi(sk)->hdev;
 	if (hdev) {
 		if (hci_pi(sk)->channel == HCI_CHANNEL_USER) {
@@ -2049,6 +2081,7 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
 	sock->state = SS_UNCONNECTED;
 	sk->sk_state = BT_OPEN;
 
+	hci_pi(sk)->event_in_progress = 0;
 	bt_sock_link(&hci_sk_list, sk);
 	return 0;
 }
----------------------------------------

Patch for choice (3):

----------------------------------------
 net/bluetooth/hci_sock.c |   35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..38146cf37378 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -758,22 +758,53 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
 
 	if (event == HCI_DEV_UNREG) {
 		struct sock *sk;
+		int put_count = 0;
 
 		/* Detach sockets from device */
+restart:
 		read_lock(&hci_sk_list.lock);
 		sk_for_each(sk, &hci_sk_list.head) {
+			/* This sock_hold(sk) is safe, for bt_sock_unlink(sk)
+			 * is not called yet.
+			 */
+			sock_hold(sk);
+			read_unlock(&hci_sk_list.lock);
 			lock_sock(sk);
-			if (hci_pi(sk)->hdev == hdev) {
+			write_lock(&hci_sk_list.lock);
+			/* Check that bt_sock_unlink(sk) is not called yet. */
+			if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
 				hci_pi(sk)->hdev = NULL;
 				sk->sk_err = EPIPE;
 				sk->sk_state = BT_OPEN;
 				sk->sk_state_change(sk);
 
-				hci_dev_put(hdev);
+				put_count++;
 			}
+			write_unlock(&hci_sk_list.lock);
 			release_sock(sk);
+			read_lock(&hci_sk_list.lock);
+			/* If bt_sock_unlink(sk) is not called yet, we can
+			 * continue iteration. We can use __sock_put(sk) here
+			 * because hci_sock_release() will call sock_put(sk)
+			 * after bt_sock_unlink(sk).
+			 */
+			if (sk_hashed(sk)) {
+				__sock_put(sk);
+				continue;
+			}
+			/* Otherwise, we need to restart iteration, for the
+			 * next socket pointed by sk->next might be already
+			 * gone. We can't use __sock_put(sk) here because
+			 * hci_sock_release() might have already called
+			 * sock_put(sk) after bt_sock_unlink(sk).
+			 */
+			read_unlock(&hci_sk_list.lock);
+			sock_put(sk);
+			goto restart;
 		}
 		read_unlock(&hci_sk_list.lock);
+		while (put_count--)
+			hci_dev_put(hdev);
 	}
 }
 
----------------------------------------
Tetsuo Handa July 10, 2021, 1:34 p.m. UTC | #5
On 2021/07/08 8:33, Tetsuo Handa wrote:
>>              we could perhaps don't release the reference to hdev
>> either and leave hci_sock_release to deal with it and then perhaps we
>> can take away the backward goto, actually why are you restarting to
>> begin with?
> 
> Do you mean something like
> 
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..0525883f4639 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>  	if (event == HCI_DEV_UNREG) {
>  		struct sock *sk;
>  
> -		/* Detach sockets from device */
> +		/* Change socket state and notify */
>  		read_lock(&hci_sk_list.lock);
>  		sk_for_each(sk, &hci_sk_list.head) {
> -			lock_sock(sk);
>  			if (hci_pi(sk)->hdev == hdev) {
> -				hci_pi(sk)->hdev = NULL;
>  				sk->sk_err = EPIPE;
>  				sk->sk_state = BT_OPEN;
>  				sk->sk_state_change(sk);
> -
> -				hci_dev_put(hdev);
>  			}
> -			release_sock(sk);
>  		}
>  		read_unlock(&hci_sk_list.lock);
>  	}
> 
> ? I can't judge because I don't know how this works. I worry that
> without lock_sock()/release_sock(), this races with e.g. hci_sock_bind().
> 

I examined hci_unregister_dev() and concluded that this can't work.

hci_sock_dev_event(hdev, HCI_DEV_UNREG) can't defer dropping the reference to
this hdev till hci_sock_release(), for hci_unregister_dev() cleans up everything
related to this hdev and calls hci_dev_put(hdev) and then vhci_release() calls
hci_free_dev(hdev).

That's the reason hci_sock_dev_event() has to use lock_sock() in order not to
miss some hci_dev_put(hdev) calls.

>> This sounds a little too complicated, afaik backward goto is not even
>> consider a good practice either, since it appears we don't unlink the
>> sockets here

Despite your comment, I'd like to go with choice (3) for now. After lock_sock() became
free from delay caused by pagefault handling, we could consider updating to choice (1).
diff mbox series

Patch

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index eed0dd066e12..64e54ab0892f 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -759,19 +759,39 @@  void hci_sock_dev_event(struct hci_dev *hdev, int event)
 	if (event == HCI_DEV_UNREG) {
 		struct sock *sk;
 
+restart:
 		/* Detach sockets from device */
 		read_lock(&hci_sk_list.lock);
 		sk_for_each(sk, &hci_sk_list.head) {
+			/* hci_sk_list.lock is preventing hci_sock_release()
+			 * from calling bt_sock_unlink().
+			 */
+			if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
+				continue;
+			/* Take a ref because we can't call lock_sock() with
+			 * hci_sk_list.lock held.
+			 */
+			sock_hold(sk);
+			read_unlock(&hci_sk_list.lock);
 			lock_sock(sk);
-			if (hci_pi(sk)->hdev == hdev) {
+			/* Since hci_sock_release() might have already called
+			 * bt_sock_unlink() while waiting for lock_sock(),
+			 * use sk_hashed(sk) for checking that bt_sock_unlink()
+			 * is not yet called.
+			 */
+			if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
 				hci_pi(sk)->hdev = NULL;
 				sk->sk_err = EPIPE;
 				sk->sk_state = BT_OPEN;
 				sk->sk_state_change(sk);
-
 				hci_dev_put(hdev);
 			}
 			release_sock(sk);
+			sock_put(sk);
+			/* Restarting is safe, for hci_pi(sk)->hdev is now NULL
+			 * if condition met and will "continue;" otherwise.
+			 */
+			goto restart;
 		}
 		read_unlock(&hci_sk_list.lock);
 	}