hwrng: core - Add WARN_ON for buggy read return values

Message ID	ZvEFQAWVgWNd9j7e@gondor.apana.org.au
State	Accepted
Commit	fb10c7a84661471cdcc8998d63703211b873c126
Headers	show Received: from abb.hmeau.com (abb.hmeau.com [144.6.53.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35E2F14F70; Mon, 23 Sep 2024 06:06:07 +0000 (UTC) Date: Mon, 23 Sep 2024 14:05:52 +0800 From: Herbert Xu <herbert@gondor.apana.org.au> To: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>, Guangwu Zhang <guazhang@redhat.com> Cc: Peter Huewe <peterhuewe@gmx.de>, Jarkko Sakkinen <jarkko@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>, linux-integrity@vger.kernel.org Subject: [PATCH] hwrng: core - Add WARN_ON for buggy read return values Message-ID: <ZvEFQAWVgWNd9j7e@gondor.apana.org.au> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	hwrng: core - Add WARN_ON for buggy read return values \| expand hwrng: core - Add WARN_ON for buggy read return values

Herbert Xu Sept. 23, 2024, 6:05 a.m. UTC

Dear TPM maintainers:

Please have a look at the tpm hwrng driver because it appears to
be returning a length longer than the buffer length that we gave
it.  In particular, tpm2 appears to be the culprit (though I didn't
really check tpm1 at all so it could also be buggy).

The following patch hopefully should confirm that this is indeed
caused by TPM and not some other HWRNG driver.

---8<---
If a buggy driver returns a length that is longer than the size
of the buffer provided to it, then this may lead to a buffer overread
in the caller.

Stop this by adding a check for it in the hwrng core.

Reported-by: Guangwu Zhang <guazhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Jarkko Sakkinen Sept. 23, 2024, 2:36 p.m. UTC | #1

On Mon Sep 23, 2024 at 5:31 PM EEST, Jarkko Sakkinen wrote:
> Greg, did we have something under Documentation/ that would fully
> address the use of WARN?

... personally I think that even if there was a separate document, this
should be somehow addressed already in SubmittingPatches so that it
can't be possibly missed by anyone.

BR, Jarkko

Herbert Xu Sept. 23, 2024, 10:32 p.m. UTC | #2

On Mon, Sep 23, 2024 at 04:48:27PM +0200, Greg KH wrote:
>
> Please see:
> 	https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
> which describes that.  We should make it more explicit that any WARN()
> or WARN_ON() calls that can be hit by user interactions somehow, will
> end up getting a CVE id when we fix it up to not do so.

If the aformentioned WARN_ON hits, then the driver has probabaly
already done a buffer overrun so it's a CVE anyway.

Cheers,

Jarkko Sakkinen Sept. 24, 2024, 4:05 p.m. UTC | #3

On Tue Sep 24, 2024 at 1:32 AM EEST, Herbert Xu wrote:
> On Mon, Sep 23, 2024 at 04:48:27PM +0200, Greg KH wrote:
> >
> > Please see:
> > 	https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
> > which describes that.  We should make it more explicit that any WARN()
> > or WARN_ON() calls that can be hit by user interactions somehow, will
> > end up getting a CVE id when we fix it up to not do so.
>
> If the aformentioned WARN_ON hits, then the driver has probabaly
> already done a buffer overrun so it's a CVE anyway.

We'll see I finally got into testing this. Sorry for latencies, I'm
switching jobs and unfortunately German Post Office lost my priority
mail containing contracts (sent them from Finland to Berlin) so have
been signing, scanning etc. the whole day :-) My last week in the
current job, and next week is the first in the new job, so this
week is a bit bumpy.

BR, Jarkko

Herbert Xu Sept. 27, 2024, 12:42 a.m. UTC | #4

On Tue, Sep 24, 2024 at 08:43:00PM +0300, Jarkko Sakkinen wrote:
>
> Without any traces that would provide more information I don't see
> the smoking gun.

I haven't confirmed that it's definitely the tpm2 driver, it's just
based on the backtrace.  Hopefully my patch will confirm it one way
or the other.  Here is the backtrace:

[  100.784159] vmd 0000:c2:00.5: Bound to PCI domain 10002 
[  100.786209] Monitor-Mwait will be used to enter C-1 state 
[  100.786225] Monitor-Mwait will be used to enter C-2 state 
[  100.786244] ACPI: \_SB_.SCK0.C000: Found 2 idle states 
[  100.823093] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 
[  100.823636] ACPI: button: Power Button [PWRF] 
[  100.905756] ERST: Error Record Serialization Table (ERST) support is initialized. 
[  100.905858] pstore: Using crash dump compression: deflate 
[  100.905861] pstore: Registered erst as persistent store backend 
[  100.907044] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled 
[  100.908305] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A 
[  100.926608] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A 
[  100.942953] Non-volatile memory driver v1.3 
[  100.947908] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22) 
[  101.226913] ACPI: bus type drm_connector registered 
[  101.229708] alg: ecdh-nist-p256 (ecdh-nist-p256-generic) is disabled due to FIPS 
[  101.229745] tpm tpm0: crypto ecdh allocation failed 
[  101.236311] tpm tpm0: A TPM error (708) occurred start auth session 
[  101.238797] ================================================================== 
[  101.238800] BUG: KASAN: slab-out-of-bounds in blake2s_update+0x135/0x2b0 
[  101.238808] Read of size 44 at addr ff11000167334d98 by task hwrng/318 
[  101.238811]  
[  101.238813] CPU: 26 UID: 0 PID: 318 Comm: hwrng Not tainted 6.11.0-0.rc5.22.el10.x86_64+debug #1 
[  101.238818] Hardware name: Supermicro SSG-110P-NTR10-EI018/X12SPO-NTF, BIOS 1.3 05/20/2022 
[  101.238820] Call Trace: 
[  101.238823]  <TASK> 
[  101.238826]  dump_stack_lvl+0x6f/0xb0 
[  101.238833]  ? blake2s_update+0x135/0x2b0 
[  101.238836]  print_address_description.constprop.0+0x88/0x330 
[  101.238843]  ? blake2s_update+0x135/0x2b0 
[  101.238847]  print_report+0x108/0x209 
[  101.238851]  ? blake2s_update+0x135/0x2b0 
[  101.238855]  ? __virt_addr_valid+0x20b/0x440 
[  101.238859]  ? blake2s_update+0x135/0x2b0 
[  101.238863]  kasan_report+0xa8/0xe0 
[  101.238868]  ? blake2s_update+0x135/0x2b0 
[  101.238874]  kasan_check_range+0x10f/0x1f0 
[  101.238879]  __asan_memcpy+0x23/0x60 
[  101.238883]  blake2s_update+0x135/0x2b0 
[  101.238887]  add_hwgenerator_randomness+0x3d/0xe0 
[  101.238895]  hwrng_fillfn+0x144/0x270 
[  101.238900]  ? __pfx_hwrng_fillfn+0x10/0x10 
[  101.238904]  kthread+0x2d2/0x3a0 
[  101.238908]  ? __pfx_kthread+0x10/0x10 
[  101.238912]  ret_from_fork+0x31/0x70 
[  101.238917]  ? __pfx_kthread+0x10/0x10 
[  101.238920]  ret_from_fork_asm+0x1a/0x30 
[  101.238929]  </TASK> 
[  101.238931]  
[  101.238932] Allocated by task 1: 
[  101.238934]  kasan_save_stack+0x30/0x50 
[  101.238937]  kasan_save_track+0x14/0x30 
[  101.238940]  __kasan_kmalloc+0x8f/0xa0 
[  101.238942]  __kmalloc_noprof+0x1fe/0x410 
[  101.238947]  kobj_map+0x7e/0x6d0 
[  101.238951]  cdev_add+0x92/0x180 
[  101.238954]  tty_cdev_add+0x17a/0x280 
[  101.238957]  tty_register_device_attr+0x401/0x740 
[  101.238960]  tty_register_driver+0x381/0x6f0 
[  101.238963]  vty_init+0x2c1/0x2f0 
[  101.238967]  tty_init+0x13b/0x150 
[  101.238970]  do_one_initcall+0x11c/0x5c0 
[  101.238975]  do_initcalls+0x1b4/0x1f0 
[  101.238980]  kernel_init_freeable+0x4ae/0x520 
[  101.238984]  kernel_init+0x1c/0x150 
[  101.238988]  ret_from_fork+0x31/0x70 
[  101.238991]  ret_from_fork_asm+0x1a/0x30 
[  101.238994]  
[  101.238995] The buggy address belongs to the object at ff11000167334d80 
[  101.238995]  which belongs to the cache kmalloc-64 of size 64 
[  101.238998] The buggy address is located 24 bytes inside of 
[  101.238998]  allocated 56-byte region [ff11000167334d80, ff11000167334db8) 
[  101.239002]  
[  101.239003] The buggy address belongs to the physical page: 
[  101.239004] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x167334 
[  101.239008] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) 
[  101.239012] page_type: 0xfdffffff(slab) 
[  101.239016] raw: 0017ffffc0000000 ff1100010003c8c0 dead000000000122 0000000000000000 
[  101.239019] raw: 0000000000000000 0000000000200020 00000001fdffffff 0000000000000000 
[  101.239021] page dumped because: kasan: bad access detected 
[  101.239023]  
[  101.239024] Memory state around the buggy address: 
[  101.239025]  ff11000167334c80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc 
[  101.239028]  ff11000167334d00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc 
[  101.239030] >ff11000167334d80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc 
[  101.239031]                                         ^ 
[  101.239033]  ff11000167334e00: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc 
[  101.239035]  ff11000167334e80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc 
[  101.239037] ================================================================== 
[  101.383067] rdac: device handler registered 
[  101.383412] hp_sw: device handler registered 
[  101.383415] emc: device handler registered 
[  101.383879] alua: device handler registered 
[  101.391255] xhci_hcd 0000:00:14.0: xHCI Host Controller 
[  101.391892] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1 
[  101.393706] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810 
[  101.399646] xhci_hcd 0000:00:14.0: xHCI Host Controller 
[  101.400136] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2 
[  101.400163] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed 
[  101.400818] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.11 
[  101.400823] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 
[  101.400826] usb usb1: Product: xHCI Host Controller 
[  101.400829] usb usb1: Manufacturer: Linux 6.11.0-0.rc5.22.el10.x86_64+debug xhci-hcd 
[  101.400832] usb usb1: SerialNumber: 0000:00:14.0 
[  101.403055] hub 1-0:1.0: USB hub found 
[  101.403222] hub 1-0:1.0: 16 ports detected 
[  101.657974] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.11 
[  101.657982] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 
[  101.657986] usb usb2: Product: xHCI Host Controller 
[  101.657990] usb usb2: Manufacturer: Linux 6.11.0-0.rc5.22.el10.x86_64+debug xhci-hcd 
[  101.657993] usb usb2: SerialNumber: 0000:00:14.0 
[  101.660659] hub 2-0:1.0: USB hub found 
[  101.660882] hub 2-0:1.0: 10 ports detected  {code}

Thanks,

Jarkko Sakkinen Oct. 7, 2024, 11:28 p.m. UTC | #5

On Fri, 2024-09-27 at 08:42 +0800, Herbert Xu wrote:
> On Tue, Sep 24, 2024 at 08:43:00PM +0300, Jarkko Sakkinen wrote:
> > 
> > Without any traces that would provide more information I don't see
> > the smoking gun.
> 
> I haven't confirmed that it's definitely the tpm2 driver, it's just
> based on the backtrace.  Hopefully my patch will confirm it one way
> or the other.  Here is the backtrace:

Agreed.

> 
> [  100.784159] vmd 0000:c2:00.5: Bound to PCI domain 10002 
> [  100.786209] Monitor-Mwait will be used to enter C-1 state 
> [  100.786225] Monitor-Mwait will be used to enter C-2 state 
> [  100.786244] ACPI: \_SB_.SCK0.C000: Found 2 idle states 
> [  100.823093] input: Power Button as
> /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 
> [  100.823636] ACPI: button: Power Button [PWRF] 
> [  100.905756] ERST: Error Record Serialization Table (ERST) support
> is initialized. 
> [  100.905858] pstore: Using crash dump compression: deflate 
> [  100.905861] pstore: Registered erst as persistent store backend 
> [  100.907044] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> enabled 
> [  100.908305] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud =
> 115200) is a 16550A 
> [  100.926608] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud =
> 115200) is a 16550A 
> [  100.942953] Non-volatile memory driver v1.3 
> [  100.947908] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id
> 22) 
> [  101.226913] ACPI: bus type drm_connector registered 
> [  101.229708] alg: ecdh-nist-p256 (ecdh-nist-p256-generic) is
> disabled due to FIPS 
> [  101.229745] tpm tpm0: crypto ecdh allocation failed 
> [  101.236311] tpm tpm0: A TPM error (708) occurred start auth

I guess it is TPM2_StartAuthSession returning TPM_RC_VALUE. Probably
James should look into this as the bus encryption code is clearly
tripping here.

I'm on second week on a new job so cannot promise any bandwidth
yet this week. Earliest next week...

BR, Jarkko

hwrng: core - Add WARN_ON for buggy read return values

Commit Message

Comments

Patch