Message ID | 20250514-venus-fixes-v3-1-32298566011f@quicinc.com |
---|---|
State | New |
Headers | show |
Series | venus driver fixes for vulnerabilities due to unexpected firmware payload | expand |
On 14/05/2025 14:38, Dikshita Agarwal wrote: > From: Vedang Nagar <quic_vnagar@quicinc.com> > > Currently, Time-Of-Check to Time-Of-Use (TOCTOU) issue happens when > handling packets from firmware via shared memory. > > The problematic code pattern: > > u32 dwords = *rd_ptr >> 2; > if (!dwords || (dwords << 2) > IFACEQ_VAR_HUGE_PKT_SIZE)) > return -EINVAL; > > memcpy(pkt, rd_ptr, dwords << 2); > > Here, *rd_ptr is used to determine the size of the packet and is > validated. However, since rd_ptr points to firmware-controlled memory, > the firmware could change the contents (e.g., embedded header fields > like pkt->hdr.size) after the size was validated but before or during > the memcpy() call. > > This opens up a race window where a malicious or buggy firmware could > inject inconsistent or malicious data, potentially leading to > information leaks, driver crashes, or undefined behavior. > > Fix this by rechecking the packet size field from shared memory > immediately before the memcpy() to ensure it has not beenn altered. > > Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files") > Signed-off-by: Vedang Nagar <quic_vnagar@quicinc.com> > Co-developed-by: Dikshita Agarwal <quic_dikshita@quicinc.com> > Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com> > --- > drivers/media/platform/qcom/venus/hfi_venus.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c b/drivers/media/platform/qcom/venus/hfi_venus.c > index b5f2ea8799507f9b83f1529e70061ea89a9cc5c8..163c8d16530bc44a84b2b21076e6189d476fe360 100644 > --- a/drivers/media/platform/qcom/venus/hfi_venus.c > +++ b/drivers/media/platform/qcom/venus/hfi_venus.c > @@ -295,6 +295,9 @@ static int venus_read_queue(struct venus_hfi_device *hdev, > new_rd_idx = rd_idx + dwords; > if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) { > if (new_rd_idx < qsize) { > + if ((*rd_ptr >> 2) != dwords) > + return -EINVAL; > + > memcpy(pkt, rd_ptr, dwords << 2); > } else { > size_t len; > Here's how this code fragment looks after the change, I'll add two "}" for readability and annotate dwords = *rd_ptr >> 2; // read the value here if (!dwords) return -EINVAL; new_rd_idx = rd_idx + dwords; // validate the size against a maximum value // this step is correct if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) { if (new_rd_idx < qsize) { // Re-read the value because firmware // might have changed the value if ((*rd_ptr >> 2) != dwords) return -EINVAL; // now trust dwords memcpy(pkt, rd_ptr, dwords << 2); } } But this is all wrong. There is no atomicity on the APSS side between the first verification of dwords size and the mempcpy() the commit log itself shows that the firmware is free-running with respect to the instruction pipeline of the APSS, it is an AMP problem. Adding another check of the dwords side right before the memcpy() doesn't address the problem which the commit log describes as the firmware updating the length field of a header in shared memory. There are perhaps 10 assembly instructions between the first check and the procedure prologue of the memcpy(); Adding another length check right before the memcpy() simply reduces the number of CPU instructions - the effective window that the firmware can update that header still. if ((*rd_ptr >> 2) != dwords) // conditional branch operation memcpy(pkt, rd_ptr, dwords << 2); Begins with a procedure prologue - setting up the call stack - and then is a series of fetch/stores to copy data from here to there The memcpy() itself by its nature it not atomic on the front-side-bus of the APSS to shared DRAM with the firmware. On a CPU and SoC architecture level this fix just doesn't work. To be honest we are already doing the right thing in this routine. 1. Reading the value from the packet header. 2. Validating the given size against the maximum size 3. Rejecting the memcpy() if the given size _at_the_time_we_read_ is too large. The alternative to guarantee would be something like asm("bus master asserts bus lock to PAGE/PAGES involved"); dwords = *rd_ptr; if (dwords > MAX_VALUE) return -EFIRMWARE_BUG; memcpy(dst, src, dwords >> 2); asm("bus master unlocks memory"); Lets say we make the change proposed in this patch, here is how it breaks: if ((*rd_ptr >> 2) != dwords) return -EINVAL; // now trust dwords memcpy(pkt, rd_ptr, dwords << 2); objdump qlt-kernel/build/x1e80100-crd_qlt_integration/drivers/media/platform/qcom/venus/venus-core.o --disassemble=venus_read_queue.isra.0 5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast 5c4c: 2a0303e2 mov w2, w3 5c50: aa0703e0 mov x0, x7 5c54: 94000000 bl 0 <memcpy> 5c58: 52800000 mov w0, #0x0 Your conditional jump is @ 0x5c48 your call to memcpy @ 0x5c54 Between 0x5c48 and 0x5c54 the firmware can update the value _again_ Indeed the firmware can update the value up until the time we complete reading the bit of the pkt header in memcpy() so an additional few instructions for sure. You could make some type of argument to re-verify the content of the pkt _after_ the memcpy() But the only verification that makes any sense _before_ the memcpy() is to verify the length at the point you _read_ - subsequent to the latching operation - were we fetch the length value from DRAM into our CPU cache, operating stack and/or local registers. Once that data is fetched within the cache/stack/registers of the CPU/APSS that is the relevant value. For the fix you have here to work you need this 5c48: MAGICOP memorybuslock 5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast 5c4c: 2a0303e2 mov w2, w3 5c50: aa0703e0 mov x0, x7 5c54: 94000000 bl 0 <memcpy> 5c58: 52800000 mov w0, #0x0 5c5c: MAGICUNOP memorybusunlock Because the firmware is free-running - with respect to the instruction pipeline of the above assembly. If you really want to verify the data is still valid - it should be done _after_ the memcpy(); But even then I'd say to you, why verify _after_ the memcpy() - and what happens on the instruction directly _after_ the verification - is the data considered more valid now ? i.e. this: memcpy(pkt, rd_ptr, dwords << 2); if ((*rd_ptr >> 2) != dwords) return -EINVAL; doesn't have the above described architectural race condition but it doesn't make the data any more trustworthy - because it doesn't have atomicity memcpy(pkt, rd_ptr, dwords << 2); if ((*rd_ptr >> 2) != dwords) return -EINVAL; dev_info(dev, "The value of *rd_ptr %lu!=%lu can be different now\n", *rd_ptr >> 2, dwords); Sorry this patch just can't work. It's a very hard NAK from me. --- bod
On 5/15/2025 2:47 PM, Bryan O'Donoghue wrote: > On 14/05/2025 14:38, Dikshita Agarwal wrote: >> From: Vedang Nagar <quic_vnagar@quicinc.com> >> >> Currently, Time-Of-Check to Time-Of-Use (TOCTOU) issue happens when >> handling packets from firmware via shared memory. >> >> The problematic code pattern: >> >> u32 dwords = *rd_ptr >> 2; >> if (!dwords || (dwords << 2) > IFACEQ_VAR_HUGE_PKT_SIZE)) >> return -EINVAL; >> >> memcpy(pkt, rd_ptr, dwords << 2); >> >> Here, *rd_ptr is used to determine the size of the packet and is >> validated. However, since rd_ptr points to firmware-controlled memory, >> the firmware could change the contents (e.g., embedded header fields >> like pkt->hdr.size) after the size was validated but before or during >> the memcpy() call. >> >> This opens up a race window where a malicious or buggy firmware could >> inject inconsistent or malicious data, potentially leading to >> information leaks, driver crashes, or undefined behavior. >> >> Fix this by rechecking the packet size field from shared memory >> immediately before the memcpy() to ensure it has not beenn altered. >> >> Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files") >> Signed-off-by: Vedang Nagar <quic_vnagar@quicinc.com> >> Co-developed-by: Dikshita Agarwal <quic_dikshita@quicinc.com> >> Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com> >> --- >> drivers/media/platform/qcom/venus/hfi_venus.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c >> b/drivers/media/platform/qcom/venus/hfi_venus.c >> index >> b5f2ea8799507f9b83f1529e70061ea89a9cc5c8..163c8d16530bc44a84b2b21076e6189d476fe360 100644 >> --- a/drivers/media/platform/qcom/venus/hfi_venus.c >> +++ b/drivers/media/platform/qcom/venus/hfi_venus.c >> @@ -295,6 +295,9 @@ static int venus_read_queue(struct venus_hfi_device *hdev, >> new_rd_idx = rd_idx + dwords; >> if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) { >> if (new_rd_idx < qsize) { >> + if ((*rd_ptr >> 2) != dwords) >> + return -EINVAL; >> + >> memcpy(pkt, rd_ptr, dwords << 2); >> } else { >> size_t len; >> > > Here's how this code fragment looks after the change, I'll add two "}" for > readability and annotate > > dwords = *rd_ptr >> 2; // read the value here > if (!dwords) > return -EINVAL; > > new_rd_idx = rd_idx + dwords; > > // validate the size against a maximum value > // this step is correct > if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) { > if (new_rd_idx < qsize) { > // Re-read the value because firmware > // might have changed the value > if ((*rd_ptr >> 2) != dwords) > return -EINVAL; > > // now trust dwords > memcpy(pkt, rd_ptr, dwords << 2); > } > } > > But this is all wrong. > > There is no atomicity on the APSS side between the first verification of dwords > size and the mempcpy() the commit log itself shows that the firmware is > free-running with respect to the instruction pipeline of the APSS, it is an AMP > problem. > > Adding another check of the dwords side right before the memcpy() doesn't > address the problem which the commit log describes as the firmware updating the > length field of a header in shared memory. > > There are perhaps 10 assembly instructions between the first check and the > procedure prologue of the memcpy(); > > Adding another length check right before the memcpy() simply reduces the number > of CPU instructions - the effective window that the firmware can update that > header still. > > if ((*rd_ptr >> 2) != dwords) // conditional branch operation > > memcpy(pkt, rd_ptr, dwords << 2); > > Begins with a procedure prologue - setting up the call stack - and then is a > series of fetch/stores to copy data from here to there > > The memcpy() itself by its nature it not atomic on the front-side-bus of the > APSS to shared DRAM with the firmware. > > On a CPU and SoC architecture level this fix just doesn't work. > > To be honest we are already doing the right thing in this routine. > > 1. Reading the value from the packet header. > 2. Validating the given size against the maximum size > 3. Rejecting the memcpy() if the given size _at_the_time_we_read_ is too > large. > > The alternative to guarantee would be something like > > asm("bus master asserts bus lock to PAGE/PAGES involved"); > dwords = *rd_ptr; > if (dwords > MAX_VALUE) > return -EFIRMWARE_BUG; > memcpy(dst, src, dwords >> 2); > > asm("bus master unlocks memory"); > > Lets say we make the change proposed in this patch, here is how it breaks: > > if ((*rd_ptr >> 2) != dwords) > return -EINVAL; > > // now trust dwords > memcpy(pkt, rd_ptr, dwords << 2); > > > objdump > qlt-kernel/build/x1e80100-crd_qlt_integration/drivers/media/platform/qcom/venus/venus-core.o --disassemble=venus_read_queue.isra.0 > > 5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast > 5c4c: 2a0303e2 mov w2, w3 > 5c50: aa0703e0 mov x0, x7 > 5c54: 94000000 bl 0 <memcpy> > 5c58: 52800000 mov w0, #0x0 > > Your conditional jump is @ 0x5c48 your call to memcpy @ 0x5c54 > > Between 0x5c48 and 0x5c54 the firmware can update the value _again_ > Indeed the firmware can update the value up until the time we complete reading > the bit of the pkt header in memcpy() so an additional few instructions for sure. > > You could make some type of argument to re-verify the content of the pkt _after_ > the memcpy() > > But the only verification that makes any sense _before_ the memcpy() is to > verify the length at the point you _read_ - subsequent to the latching operation > - were we fetch the length value from DRAM into our CPU cache, operating stack > and/or local registers. > > Once that data is fetched within the cache/stack/registers of the CPU/APSS that > is the relevant value. > > For the fix you have here to work you need this > > 5c48: MAGICOP memorybuslock > 5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast > 5c4c: 2a0303e2 mov w2, w3 > 5c50: aa0703e0 mov x0, x7 > 5c54: 94000000 bl 0 <memcpy> > 5c58: 52800000 mov w0, #0x0 > 5c5c: MAGICUNOP memorybusunlock > > Because the firmware is free-running - with respect to the instruction pipeline > of the above assembly. > > If you really want to verify the data is still valid - it should be done _after_ > the memcpy(); > > But even then I'd say to you, why verify _after_ the memcpy() - and what happens > on the instruction directly _after_ the verification - is the data considered > more valid now ? the patch _only_ reduces the window where data in shared queue can go wrong. Doing it after memcpy() would be better here given the data would not be read further from shared queue, which would avoid the case of data getting updated later. memcpy(hfi_dev->pkt_buf, rd_ptr from shared queue, dwords..) pkt_hdr = (struct hfi_pkt_hdr *) (hfi_dev->pkt_buf); if ((pkt_hdr->size >> 2) != dwords) return -EINVAL; Regards, Vikash > > i.e. this: > > memcpy(pkt, rd_ptr, dwords << 2); > > if ((*rd_ptr >> 2) != dwords) > return -EINVAL; > > doesn't have the above described architectural race condition but it doesn't > make the data any more trustworthy - because it doesn't have atomicity > > memcpy(pkt, rd_ptr, dwords << 2); > > if ((*rd_ptr >> 2) != dwords) > return -EINVAL; > > dev_info(dev, "The value of *rd_ptr %lu!=%lu can be different now\n", > *rd_ptr >> 2, dwords); > > Sorry this patch just can't work. It's a very hard NAK from me. > > --- > bod
On 15/05/2025 10:56, Vikash Garodia wrote: > memcpy(hfi_dev->pkt_buf, rd_ptr from shared queue, dwords..) > > pkt_hdr = (struct hfi_pkt_hdr *) (hfi_dev->pkt_buf); > > if ((pkt_hdr->size >> 2) != dwords) > return -EINVAL; Yeah it would be better wrt the commit log. But does it really give additional data-confidence - I don't believe it does. => The firmware can update the pkt header after our subsequent-to-memcpy() check. Again this is a data-lifetime expectation question. You validate the received data against a maximum size reading to a buffer you know the size of - and do it once. The firmware might corrupt that data in-between but that is not catastrophic for the APSS which has a buffer of a known size containing potential bad data. Fine - and additional check after the mempcy() only imparts verisimilitude - only validates our data at the time of the check. my-linear-uninterrupted-context: memcpy(); if(*rd_ptr >> 2 > len) <- doesn't branch return -EBAD if(*rd_ptr >> 2 > len) <- does branch firmware went nuts return -EBAD Superficially you might say this addresses the problem if (*rd_ptr > MAX) return -EBAD; memcpy(); if (*rd_ptr > MAX) return -EBAD; But what if the "malicious" firmware only updated the data in the packet, not the length - or another field we are not checking ? As I say if this can happen if (*rd_ptr > MAX) return -EBAD; memcpy(); if (*rd_ptr > MAX) // good return -EBAD; if (*rd_ptr > MAX) //bad return -EBAD; then this can happen if (*rd_ptr > MAX) return -EBAD; memcpy(); if (*rd_ptr > MAX) // good return -EBAD; if (*rd_ptr > MAX) //good return -EBAD; if (*rd_ptr > MAX) //bad return -EBAD; We need to have a crisp and clear definition of the data-lifetime. Since we don't have a checksum element in the header the only check that makes sense is to validate the buffer size data_len = *ptr_val >> 2; if (data_len > max) return BAD; Using the data_len in memcpy if the *ptr_val can change is _NOT_ TOCTOU This is TOCTOU if (*ptr_val > max) return EBAD; memcpy(dst, src, *ptr_val); Because I validated the content of the pointer and then I relied on the data that pointer pointed to, which might have changed. TBH I think the entire premise of this patch is bogus. --- bod
On 5/15/2025 3:58 PM, Bryan O'Donoghue wrote: > On 15/05/2025 10:56, Vikash Garodia wrote: >> memcpy(hfi_dev->pkt_buf, rd_ptr from shared queue, dwords..) >> >> pkt_hdr = (struct hfi_pkt_hdr *) (hfi_dev->pkt_buf); >> >> if ((pkt_hdr->size >> 2) != dwords) >> return -EINVAL; > > Yeah it would be better wrt the commit log. > > But does it really give additional data-confidence - I don't believe it does. > > => The firmware can update the pkt header after our subsequent-to-memcpy() check. How will that matter if the queue is updated after memcpy to local packet ? All processing of data would be from local packet. > > Again this is a data-lifetime expectation question. > > You validate the received data against a maximum size reading to a buffer you > know the size of - and do it once. > > The firmware might corrupt that data in-between but that is not catastrophic for > the APSS which has a buffer of a known size containing potential bad data. There is no way to authenticate the content of payload. We are trying to avoid vulnerability by ensuring OOB does not happen by validating the size _alone_. Do you see rest of the data in payload can lead to any kind of vulnerability ? > > Fine - and additional check after the mempcy() only imparts verisimilitude - > only validates our data at the time of the check. > > my-linear-uninterrupted-context: > > memcpy(); > > if(*rd_ptr >> 2 > len) <- doesn't branch > return -EBAD > > if(*rd_ptr >> 2 > len) <- does branch firmware went nuts > return -EBAD > > Superficially you might say this addresses the problem > > if (*rd_ptr > MAX) > return -EBAD; > > memcpy(); > > if (*rd_ptr > MAX) > return -EBAD; > > But what if the "malicious" firmware only updated the data in the packet, not > the length - or another field we are not checking ? That does not cause any vulnerability. You can check and suggest if you see a vulnerability when the data outside length is an issue w.r.t vulnerability. > > As I say if this can happen > > > if (*rd_ptr > MAX) > return -EBAD; > > memcpy(); > > if (*rd_ptr > MAX) // good > return -EBAD; > > > if (*rd_ptr > MAX) //bad > return -EBAD; > > then this can happen > > if (*rd_ptr > MAX) > return -EBAD; > > memcpy(); > > if (*rd_ptr > MAX) // good > return -EBAD; > > > if (*rd_ptr > MAX) //good > return -EBAD; > > if (*rd_ptr > MAX) //bad > return -EBAD; > > We need to have a crisp and clear definition of the data-lifetime. Since we > don't have a checksum element in the header the only check that makes sense is > to validate the buffer size > > data_len = *ptr_val >> 2; > if (data_len > max) > return BAD; > > Using the data_len in memcpy if the *ptr_val can change is _NOT_ TOCTOU > > This is TOCTOU > > if (*ptr_val > max) > return EBAD; > > memcpy(dst, src, *ptr_val); > > Because I validated the content of the pointer and then I relied on the data > that pointer pointed to, which might have changed. Yes, precisely for that, memcpy() does not rely on ptr_val. The one you are referring as data_len is same as dword. Regards, Vikash > > TBH I think the entire premise of this patch is bogus. > > --- > bod
diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c b/drivers/media/platform/qcom/venus/hfi_venus.c index b5f2ea8799507f9b83f1529e70061ea89a9cc5c8..163c8d16530bc44a84b2b21076e6189d476fe360 100644 --- a/drivers/media/platform/qcom/venus/hfi_venus.c +++ b/drivers/media/platform/qcom/venus/hfi_venus.c @@ -295,6 +295,9 @@ static int venus_read_queue(struct venus_hfi_device *hdev, new_rd_idx = rd_idx + dwords; if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) { if (new_rd_idx < qsize) { + if ((*rd_ptr >> 2) != dwords) + return -EINVAL; + memcpy(pkt, rd_ptr, dwords << 2); } else { size_t len;