Message ID | 20220610110749.110881-1-soenke.huster@eknoes.de |
---|---|
State | New |
Headers | show |
Series | [v2] Bluetooth: RFCOMM: Use skb_trim to trim checksum | expand |
This is automated email and please do not reply to this email! Dear submitter, Thank you for submitting the patches to the linux bluetooth mailing list. This is a CI test results with your patch series: PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=649266 ---Test result--- Test Summary: CheckPatch PASS 0.78 seconds GitLint PASS 0.42 seconds SubjectPrefix PASS 0.31 seconds BuildKernel PASS 31.50 seconds BuildKernel32 PASS 30.28 seconds Incremental Build with patchesPASS 39.69 seconds TestRunner: Setup PASS 487.85 seconds TestRunner: l2cap-tester PASS 18.02 seconds TestRunner: bnep-tester PASS 5.79 seconds TestRunner: mgmt-tester FAIL 106.97 seconds TestRunner: rfcomm-tester PASS 9.38 seconds TestRunner: sco-tester PASS 9.27 seconds TestRunner: smp-tester PASS 9.11 seconds TestRunner: userchan-tester PASS 6.15 seconds Details ############################## Test: TestRunner: mgmt-tester - FAIL - 106.97 seconds Run test-runner with mgmt-tester Total: 493, Passed: 491 (99.6%), Failed: 2, Not Run: 0 Failed Test Cases Add Advertising - Success (Name+data+appear) Timed out 2.769 seconds Add Ext Advertising - Success (Name+data+appear) Timed out 2.405 seconds --- Regards, Linux Bluetooth
Hi Eric, On 10.06.22 15:59, Eric Dumazet wrote: > On Fri, Jun 10, 2022 at 4:08 AM Soenke Huster <soenke.huster@eknoes.de> wrote: >> >> As skb->tail might be zero, it can underflow. This leads to a page >> fault: skb_tail_pointer simply adds skb->tail (which is now MAX_UINT) >> to skb->head. >> >> BUG: unable to handle page fault for address: ffffed1021de29ff >> #PF: supervisor read access in kernel mode >> #PF: error_code(0x0000) - not-present page >> RIP: 0010:rfcomm_run+0x831/0x4040 (net/bluetooth/rfcomm/core.c:1751) >> >> By using skb_trim instead of the direct manipulation, skb->tail >> is reset. Thus, the correct pointer to the checksum is used. >> >> Signed-off-by: Soenke Huster <soenke.huster@eknoes.de> >> --- >> v2: Clarified how the bug triggers, minimize code change >> >> net/bluetooth/rfcomm/core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c >> index 7324764384b6..443b55edb3ab 100644 >> --- a/net/bluetooth/rfcomm/core.c >> +++ b/net/bluetooth/rfcomm/core.c >> @@ -1747,7 +1747,7 @@ static struct rfcomm_session *rfcomm_recv_frame(struct rfcomm_session *s, >> type = __get_type(hdr->ctrl); >> >> /* Trim FCS */ >> - skb->len--; skb->tail--; >> + skb_trim(skb, skb->len - 1); >> fcs = *(u8 *)skb_tail_pointer(skb); >> >> if (__check_fcs(skb->data, type, fcs)) { >> -- >> 2.36.1 >> > > Again, I do not see how skb->tail could possibly zero at this point. > > If it was, skb with illegal layout has been queued in the first place, > we need to fix the producer, not the consumer. > Sorry, I thought that might be a right place as there is not much code in the kernel that manipulates ->tail directly. > A driver missed an skb_put() perhaps. > I am using the (I guess quite unused) virtio_bt driver, and figured out that the following fixes the bug: --- a/drivers/bluetooth/virtio_bt.c +++ b/drivers/bluetooth/virtio_bt.c @@ -219,7 +219,7 @@ static void virtbt_rx_work(struct work_struct *work) if (!skb) return; - skb->len = len; + skb_put(skb, len); virtbt_rx_handle(vbt, skb); if (virtbt_add_inbuf(vbt) < 0) I guess this is the root cause? I just used Bluetooth for a while in the VM and no error occurred, everything worked fine. > Can you please dump the skb here ? > > diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c > index 7324764384b6773074032ad671777bf86bd3360e..358ccb4fe7214aea0bb4084188c7658316fe0ff7 > 100644 > --- a/net/bluetooth/rfcomm/core.c > +++ b/net/bluetooth/rfcomm/core.c > @@ -1746,6 +1746,11 @@ static struct rfcomm_session > *rfcomm_recv_frame(struct rfcomm_session *s, > dlci = __get_dlci(hdr->addr); > type = __get_type(hdr->ctrl); > > + if (!skb->tail) { > + DO_ONCE_LITE(skb_dump(KERN_ERR, skb, false)); > + kfree_skb(skb); > + return s; > + } > /* Trim FCS */ > skb->len--; skb->tail--; > fcs = *(u8 *)skb_tail_pointer(skb); If it might still help: skb len=4 headroom=9 headlen=4 tailroom=1728 mac=(-1,-1) net=(0,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 skb linear: 00000000: 03 3f 01 1c
Hi Eric, On 10.06.22 18:55, Eric Dumazet wrote: > On Fri, Jun 10, 2022 at 8:35 AM Sönke Huster <soenke.huster@eknoes.de> wrote: >> >> Hi Eric, >> >> On 10.06.22 15:59, Eric Dumazet wrote: >>> On Fri, Jun 10, 2022 at 4:08 AM Soenke Huster <soenke.huster@eknoes.de> wrote: >>>> >>>> As skb->tail might be zero, it can underflow. This leads to a page >>>> fault: skb_tail_pointer simply adds skb->tail (which is now MAX_UINT) >>>> to skb->head. >>>> >>>> BUG: unable to handle page fault for address: ffffed1021de29ff >>>> #PF: supervisor read access in kernel mode >>>> #PF: error_code(0x0000) - not-present page >>>> RIP: 0010:rfcomm_run+0x831/0x4040 (net/bluetooth/rfcomm/core.c:1751) >>>> >>>> By using skb_trim instead of the direct manipulation, skb->tail >>>> is reset. Thus, the correct pointer to the checksum is used. >>>> >>>> Signed-off-by: Soenke Huster <soenke.huster@eknoes.de> >>>> --- >>>> v2: Clarified how the bug triggers, minimize code change >>>> >>>> net/bluetooth/rfcomm/core.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c >>>> index 7324764384b6..443b55edb3ab 100644 >>>> --- a/net/bluetooth/rfcomm/core.c >>>> +++ b/net/bluetooth/rfcomm/core.c >>>> @@ -1747,7 +1747,7 @@ static struct rfcomm_session *rfcomm_recv_frame(struct rfcomm_session *s, >>>> type = __get_type(hdr->ctrl); >>>> >>>> /* Trim FCS */ >>>> - skb->len--; skb->tail--; >>>> + skb_trim(skb, skb->len - 1); >>>> fcs = *(u8 *)skb_tail_pointer(skb); >>>> >>>> if (__check_fcs(skb->data, type, fcs)) { >>>> -- >>>> 2.36.1 >>>> >>> >>> Again, I do not see how skb->tail could possibly zero at this point. >>> >>> If it was, skb with illegal layout has been queued in the first place, >>> we need to fix the producer, not the consumer. >>> >> >> Sorry, I thought that might be a right place as there is not much code in the kernel >> that manipulates ->tail directly. >> >>> A driver missed an skb_put() perhaps. >>> >> >> I am using the (I guess quite unused) virtio_bt driver, and figured out that the following >> fixes the bug: >> >> --- a/drivers/bluetooth/virtio_bt.c >> +++ b/drivers/bluetooth/virtio_bt.c >> @@ -219,7 +219,7 @@ static void virtbt_rx_work(struct work_struct *work) >> if (!skb) >> return; >> >> - skb->len = len; >> + skb_put(skb, len); > > Removing skb->len=len seems about right. > But skb_put() should be done earlier. > > We are approaching the skb producer :) > > Now you have to find/check who added this illegal skb in the virt queue. > > Maybe virtbt_add_inbuf() ? I think here, the length of the skb can't really be known - an empty SKB is put into the virtqueue, and then filled with data in the device, which is implemented in a Hypervisor. Maybe my implementation of that device might then be wrong, on the other hand I am pretty sure the driver should be the one that sets the length of the skb. But the driver only knows it in virtbt_rx_work, as it learns the size of the added buffer there for the first time. > > Also there is kernel info leak I think. > I think your are right! > diff --git a/drivers/bluetooth/virtio_bt.c b/drivers/bluetooth/virtio_bt.c > index 67c21263f9e0f250f0719b8e7f1fe15b0eba5ee0..c9b832c447ee451f027430b284d7bb246f6ecb24 > 100644 > --- a/drivers/bluetooth/virtio_bt.c > +++ b/drivers/bluetooth/virtio_bt.c > @@ -37,6 +37,9 @@ static int virtbt_add_inbuf(struct virtio_bluetooth *vbt) > if (!skb) > return -ENOMEM; > > + skb_put(skb, 1000); > + memset(skb->data, 0, 1000); > + > sg_init_one(sg, skb->data, 1000); > > err = virtqueue_add_inbuf(vq, sg, 1, skb, GFP_KERNEL); > > >> virtbt_rx_handle(vbt, skb); >> >> if (virtbt_add_inbuf(vbt) < 0) >> >> I guess this is the root cause? I just used Bluetooth for a while in the VM >> and no error occurred, everything worked fine. >> >>> Can you please dump the skb here ? >>> >>> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c >>> index 7324764384b6773074032ad671777bf86bd3360e..358ccb4fe7214aea0bb4084188c7658316fe0ff7 >>> 100644 >>> --- a/net/bluetooth/rfcomm/core.c >>> +++ b/net/bluetooth/rfcomm/core.c >>> @@ -1746,6 +1746,11 @@ static struct rfcomm_session >>> *rfcomm_recv_frame(struct rfcomm_session *s, >>> dlci = __get_dlci(hdr->addr); >>> type = __get_type(hdr->ctrl); >>> >>> + if (!skb->tail) { >>> + DO_ONCE_LITE(skb_dump(KERN_ERR, skb, false)); >>> + kfree_skb(skb); >>> + return s; >>> + } >>> /* Trim FCS */ >>> skb->len--; skb->tail--; >>> fcs = *(u8 *)skb_tail_pointer(skb); >> >> If it might still help: >> >> skb len=4 headroom=9 headlen=4 tailroom=1728 >> mac=(-1,-1) net=(0,-1) trans=-1 >> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) >> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) >> hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 >> skb linear: 00000000: 03 3f 01 1c >>
On Tue, Jun 14, 2022 at 6:42 AM Sönke Huster <soenke.huster@eknoes.de> wrote: > > Hi Eric, > > On 10.06.22 18:55, Eric Dumazet wrote: > > On Fri, Jun 10, 2022 at 8:35 AM Sönke Huster <soenke.huster@eknoes.de> wrote: > >> > >> Hi Eric, > >> > >> On 10.06.22 15:59, Eric Dumazet wrote: > >>> On Fri, Jun 10, 2022 at 4:08 AM Soenke Huster <soenke.huster@eknoes.de> wrote: > >>>> > >>>> As skb->tail might be zero, it can underflow. This leads to a page > >>>> fault: skb_tail_pointer simply adds skb->tail (which is now MAX_UINT) > >>>> to skb->head. > >>>> > >>>> BUG: unable to handle page fault for address: ffffed1021de29ff > >>>> #PF: supervisor read access in kernel mode > >>>> #PF: error_code(0x0000) - not-present page > >>>> RIP: 0010:rfcomm_run+0x831/0x4040 (net/bluetooth/rfcomm/core.c:1751) > >>>> > >>>> By using skb_trim instead of the direct manipulation, skb->tail > >>>> is reset. Thus, the correct pointer to the checksum is used. > >>>> > >>>> Signed-off-by: Soenke Huster <soenke.huster@eknoes.de> > >>>> --- > >>>> v2: Clarified how the bug triggers, minimize code change > >>>> > >>>> net/bluetooth/rfcomm/core.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c > >>>> index 7324764384b6..443b55edb3ab 100644 > >>>> --- a/net/bluetooth/rfcomm/core.c > >>>> +++ b/net/bluetooth/rfcomm/core.c > >>>> @@ -1747,7 +1747,7 @@ static struct rfcomm_session *rfcomm_recv_frame(struct rfcomm_session *s, > >>>> type = __get_type(hdr->ctrl); > >>>> > >>>> /* Trim FCS */ > >>>> - skb->len--; skb->tail--; > >>>> + skb_trim(skb, skb->len - 1); > >>>> fcs = *(u8 *)skb_tail_pointer(skb); > >>>> > >>>> if (__check_fcs(skb->data, type, fcs)) { > >>>> -- > >>>> 2.36.1 > >>>> > >>> > >>> Again, I do not see how skb->tail could possibly zero at this point. > >>> > >>> If it was, skb with illegal layout has been queued in the first place, > >>> we need to fix the producer, not the consumer. > >>> > >> > >> Sorry, I thought that might be a right place as there is not much code in the kernel > >> that manipulates ->tail directly. > >> > >>> A driver missed an skb_put() perhaps. > >>> > >> > >> I am using the (I guess quite unused) virtio_bt driver, and figured out that the following > >> fixes the bug: > >> > >> --- a/drivers/bluetooth/virtio_bt.c > >> +++ b/drivers/bluetooth/virtio_bt.c > >> @@ -219,7 +219,7 @@ static void virtbt_rx_work(struct work_struct *work) > >> if (!skb) > >> return; > >> > >> - skb->len = len; > >> + skb_put(skb, len); > > > > Removing skb->len=len seems about right. > > But skb_put() should be done earlier. > > > > We are approaching the skb producer :) > > > > Now you have to find/check who added this illegal skb in the virt queue. > > > > Maybe virtbt_add_inbuf() ? > > I think here, the length of the skb can't really be known - an empty SKB is put into > the virtqueue, and then filled with data in the device, which is implemented in a Hypervisor. > Maybe my implementation of that device might then be wrong, on the other hand I am pretty > sure the driver should be the one that sets the length of the skb. But the driver only > knows it in virtbt_rx_work, as it learns the size of the added buffer there for the first time. > > > > > Also there is kernel info leak I think. > > > > I think your are right! If this patch in drivers/bluetooth/virtio_bt.c fixes the issue, please submit a formal patch. You can take ownership of it, of course. If not, more investigation is needed on your side ;) Thanks ! > > > diff --git a/drivers/bluetooth/virtio_bt.c b/drivers/bluetooth/virtio_bt.c > > index 67c21263f9e0f250f0719b8e7f1fe15b0eba5ee0..c9b832c447ee451f027430b284d7bb246f6ecb24 > > 100644 > > --- a/drivers/bluetooth/virtio_bt.c > > +++ b/drivers/bluetooth/virtio_bt.c > > @@ -37,6 +37,9 @@ static int virtbt_add_inbuf(struct virtio_bluetooth *vbt) > > if (!skb) > > return -ENOMEM; > > > > + skb_put(skb, 1000); > > + memset(skb->data, 0, 1000); > > + > > sg_init_one(sg, skb->data, 1000); > > > > err = virtqueue_add_inbuf(vq, sg, 1, skb, GFP_KERNEL); > > > > > >> virtbt_rx_handle(vbt, skb); > >> > >> if (virtbt_add_inbuf(vbt) < 0) > >> > >> I guess this is the root cause? I just used Bluetooth for a while in the VM > >> and no error occurred, everything worked fine. > >> > >>> Can you please dump the skb here ? > >>> > >>> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c > >>> index 7324764384b6773074032ad671777bf86bd3360e..358ccb4fe7214aea0bb4084188c7658316fe0ff7 > >>> 100644 > >>> --- a/net/bluetooth/rfcomm/core.c > >>> +++ b/net/bluetooth/rfcomm/core.c > >>> @@ -1746,6 +1746,11 @@ static struct rfcomm_session > >>> *rfcomm_recv_frame(struct rfcomm_session *s, > >>> dlci = __get_dlci(hdr->addr); > >>> type = __get_type(hdr->ctrl); > >>> > >>> + if (!skb->tail) { > >>> + DO_ONCE_LITE(skb_dump(KERN_ERR, skb, false)); > >>> + kfree_skb(skb); > >>> + return s; > >>> + } > >>> /* Trim FCS */ > >>> skb->len--; skb->tail--; > >>> fcs = *(u8 *)skb_tail_pointer(skb); > >> > >> If it might still help: > >> > >> skb len=4 headroom=9 headlen=4 tailroom=1728 > >> mac=(-1,-1) net=(0,-1) trans=-1 > >> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > >> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > >> hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 > >> skb linear: 00000000: 03 3f 01 1c > >>
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c index 7324764384b6..443b55edb3ab 100644 --- a/net/bluetooth/rfcomm/core.c +++ b/net/bluetooth/rfcomm/core.c @@ -1747,7 +1747,7 @@ static struct rfcomm_session *rfcomm_recv_frame(struct rfcomm_session *s, type = __get_type(hdr->ctrl); /* Trim FCS */ - skb->len--; skb->tail--; + skb_trim(skb, skb->len - 1); fcs = *(u8 *)skb_tail_pointer(skb); if (__check_fcs(skb->data, type, fcs)) {
As skb->tail might be zero, it can underflow. This leads to a page fault: skb_tail_pointer simply adds skb->tail (which is now MAX_UINT) to skb->head. BUG: unable to handle page fault for address: ffffed1021de29ff #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:rfcomm_run+0x831/0x4040 (net/bluetooth/rfcomm/core.c:1751) By using skb_trim instead of the direct manipulation, skb->tail is reset. Thus, the correct pointer to the checksum is used. Signed-off-by: Soenke Huster <soenke.huster@eknoes.de> --- v2: Clarified how the bug triggers, minimize code change net/bluetooth/rfcomm/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)