mbox series

[v3,0/2] venus driver fixes for vulnerabilities due to unexpected firmware payload

Message ID 20250514-venus-fixes-v3-0-32298566011f@quicinc.com
Headers show
Series venus driver fixes for vulnerabilities due to unexpected firmware payload | expand

Message

Dikshita Agarwal May 14, 2025, 1:38 p.m. UTC
This series primarily adds check at relevant places in venus driver
where there are possible OOB accesses due to unexpected payload
from venus firmware. The patches describes the specific OOB possibility.

Changes in v3:
- Add check for validating the size instead of forcefully updating it (Bryan)
- Reduce duplication of code while handling sequence change event (Vikash)
- Update the inst->error for failure case instead of slienly breaking (Bryan)
- Link to v2: https://lore.kernel.org/lkml/20250215-venus-security-fixes-v2-0-cfc7e4b87168@quicinc.com/

Changes in v2:
- Decompose sequence change event function. 
- Fix repopulating the packet .with the first read during read_queue.
- Link to v1: https://lore.kernel.org/r/20250104-venus-security-fixes-v1-0-9d0dd4594cb4@quicinc.com

Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com>
---
Vedang Nagar (2):
      media: venus: fix TOCTOU vulnerability when reading packets from shared memory
      media: venus: Fix OOB read due to missing payload bound check

 drivers/media/platform/qcom/venus/hfi_msgs.c  | 83 +++++++++++++++++++--------
 drivers/media/platform/qcom/venus/hfi_venus.c |  3 +
 2 files changed, 61 insertions(+), 25 deletions(-)
---
base-commit: b64b134942c8cf4801ea288b3fd38b509aedec21
change-id: 20250514-venus-fixes-8d93bccd9b9d

Best regards,

Comments

Bryan O'Donoghue May 15, 2025, 10:28 a.m. UTC | #1
On 15/05/2025 10:56, Vikash Garodia wrote:
> memcpy(hfi_dev->pkt_buf, rd_ptr from shared queue, dwords..)
> 
> pkt_hdr = (struct hfi_pkt_hdr *) (hfi_dev->pkt_buf);
> 
> if ((pkt_hdr->size >> 2) != dwords)
>      return -EINVAL;

Yeah it would be better wrt the commit log.

But does it really give additional data-confidence - I don't believe it 
does.

=> The firmware can update the pkt header after our 
subsequent-to-memcpy() check.

Again this is a data-lifetime expectation question.

You validate the received data against a maximum size reading to a 
buffer you know the size of - and do it once.

The firmware might corrupt that data in-between but that is not 
catastrophic for the APSS which has a buffer of a known size containing 
potential bad data.

Fine - and additional check after the mempcy() only imparts 
verisimilitude - only validates our data at the time of the check.

my-linear-uninterrupted-context:

memcpy();

if(*rd_ptr >> 2 > len) <- doesn't branch
     return -EBAD

if(*rd_ptr >> 2 > len) <- does branch firmware went nuts
     return -EBAD

Superficially you might say this addresses the problem

if (*rd_ptr > MAX)
     return -EBAD;

memcpy();

if (*rd_ptr > MAX)
     return -EBAD;

But what if the "malicious" firmware only updated the data in the 
packet, not the length - or another field we are not checking ?

As I say if this can happen


if (*rd_ptr > MAX)
     return -EBAD;

memcpy();

if (*rd_ptr > MAX)  // good
     return -EBAD;


if (*rd_ptr > MAX) //bad
     return -EBAD;

then this can happen

if (*rd_ptr > MAX)
     return -EBAD;

memcpy();

if (*rd_ptr > MAX) // good
     return -EBAD;


if (*rd_ptr > MAX) //good
     return -EBAD;

if (*rd_ptr > MAX) //bad
     return -EBAD;

We need to have a crisp and clear definition of the data-lifetime. Since 
we don't have a checksum element in the header the only check that makes 
sense is to validate the buffer size

data_len = *ptr_val >> 2;
if (data_len > max)
     return BAD;

Using the data_len in memcpy if the *ptr_val can change is _NOT_ TOCTOU

This is TOCTOU

if (*ptr_val > max)
     return EBAD;

memcpy(dst, src, *ptr_val);

Because I validated the content of the pointer and then I relied on the 
data that pointer pointed to, which might have changed.

TBH I think the entire premise of this patch is bogus.

---
bod
Vikash Garodia May 15, 2025, 12:11 p.m. UTC | #2
On 5/15/2025 3:58 PM, Bryan O'Donoghue wrote:
> On 15/05/2025 10:56, Vikash Garodia wrote:
>> memcpy(hfi_dev->pkt_buf, rd_ptr from shared queue, dwords..)
>>
>> pkt_hdr = (struct hfi_pkt_hdr *) (hfi_dev->pkt_buf);
>>
>> if ((pkt_hdr->size >> 2) != dwords)
>>      return -EINVAL;
> 
> Yeah it would be better wrt the commit log.
> 
> But does it really give additional data-confidence - I don't believe it does.
> 
> => The firmware can update the pkt header after our subsequent-to-memcpy() check.
How will that matter if the queue is updated after memcpy to local packet ? All
processing of data would be from local packet.

> 
> Again this is a data-lifetime expectation question.
> 
> You validate the received data against a maximum size reading to a buffer you
> know the size of - and do it once.
> 
> The firmware might corrupt that data in-between but that is not catastrophic for
> the APSS which has a buffer of a known size containing potential bad data.
There is no way to authenticate the content of payload. We are trying to avoid
vulnerability by ensuring OOB does not happen by validating the size _alone_. Do
you see rest of the data in payload can lead to any kind of vulnerability ?

> 
> Fine - and additional check after the mempcy() only imparts verisimilitude -
> only validates our data at the time of the check.
> 
> my-linear-uninterrupted-context:
> 
> memcpy();
> 
> if(*rd_ptr >> 2 > len) <- doesn't branch
>     return -EBAD
> 
> if(*rd_ptr >> 2 > len) <- does branch firmware went nuts
>     return -EBAD
> 
> Superficially you might say this addresses the problem
> 
> if (*rd_ptr > MAX)
>     return -EBAD;
> 
> memcpy();
> 
> if (*rd_ptr > MAX)
>     return -EBAD;
> 
> But what if the "malicious" firmware only updated the data in the packet, not
> the length - or another field we are not checking ?
That does not cause any vulnerability. You can check and suggest if you see a
vulnerability when the data outside length is an issue w.r.t vulnerability.

> 
> As I say if this can happen
> 
> 
> if (*rd_ptr > MAX)
>     return -EBAD;
> 
> memcpy();
> 
> if (*rd_ptr > MAX)  // good
>     return -EBAD;
> 
> 
> if (*rd_ptr > MAX) //bad
>     return -EBAD;
> 
> then this can happen
> 
> if (*rd_ptr > MAX)
>     return -EBAD;
> 
> memcpy();
> 
> if (*rd_ptr > MAX) // good
>     return -EBAD;
> 
> 
> if (*rd_ptr > MAX) //good
>     return -EBAD;
> 
> if (*rd_ptr > MAX) //bad
>     return -EBAD;
> 
> We need to have a crisp and clear definition of the data-lifetime. Since we
> don't have a checksum element in the header the only check that makes sense is
> to validate the buffer size
> 
> data_len = *ptr_val >> 2;
> if (data_len > max)
>     return BAD;
> 
> Using the data_len in memcpy if the *ptr_val can change is _NOT_ TOCTOU
> 
> This is TOCTOU
> 
> if (*ptr_val > max)
>     return EBAD;
> 
> memcpy(dst, src, *ptr_val);
> 
> Because I validated the content of the pointer and then I relied on the data
> that pointer pointed to, which might have changed.
Yes, precisely for that, memcpy() does not rely on ptr_val. The one you are
referring as data_len is same as dword.

Regards,
Vikash
> 
> TBH I think the entire premise of this patch is bogus.
> 
> ---
> bod
Bryan O'Donoghue May 15, 2025, 12:47 p.m. UTC | #3
On 15/05/2025 13:11, Vikash Garodia wrote:
>> But what if the "malicious" firmware only updated the data in the packet, not
>> the length - or another field we are not checking ?
> That does not cause any vulnerability. You can check and suggest if you see a
> vulnerability when the data outside length is an issue w.r.t vulnerability.

I don't believe you have identified a vulnerability here.

You read a length field, you check that length field against a MAX size.

Re-reading to see if the firmware wrote new bad data to the transmitted 
packet in-memory is not a fix before or after the memcpy() because the 
time you do that re-read is not fixed - locked wrt the freerunning firmware.

---
bod
Vikash Garodia May 15, 2025, 1:23 p.m. UTC | #4
On 5/15/2025 6:17 PM, Bryan O'Donoghue wrote:
> On 15/05/2025 13:11, Vikash Garodia wrote:
>>> But what if the "malicious" firmware only updated the data in the packet, not
>>> the length - or another field we are not checking ?
>> That does not cause any vulnerability. You can check and suggest if you see a
>> vulnerability when the data outside length is an issue w.r.t vulnerability.
> 
> I don't believe you have identified a vulnerability here.
> 
> You read a length field, you check that length field against a MAX size.
> 
> Re-reading to see if the firmware wrote new bad data to the transmitted packet
> in-memory is not a fix before or after the memcpy() because the time you do that
> re-read is not fixed - locked wrt the freerunning firmware.
It would be more meaningful if you can suggest the vulnerability you see with
the changes suggested i.e adding the check in local packet against the size read
from shared queue. Based on that we can see how to fix it, otherwise this
discussion in not leading to any conclusion.

Regards,
Vikash
Bryan O'Donoghue May 15, 2025, 5:51 p.m. UTC | #5
On 15/05/2025 14:23, Vikash Garodia wrote:
>> Re-reading to see if the firmware wrote new bad data to the transmitted packet
>> in-memory is not a fix before or after the memcpy() because the time you do that
>> re-read is not fixed - locked wrt the freerunning firmware.
> It would be more meaningful if you can suggest the vulnerability you see with
> the changes suggested i.e adding the check in local packet against the size read
> from shared queue. Based on that we can see how to fix it, otherwise this
> discussion in not leading to any conclusion.

So to re-iterate.

TOCTOU is this

if (*ptr_val >> 2 >= MAX)
     return -EBAD;

memcpy(dst, src, *ptr_val >> 2);

Here a malicious actor can change *ptr_val between our check and our use.

not

data_value = *ptr_val >> 2;

if (data_value >= MAX)
     return -EBAD;

memcpy(dst, src, data_value);

Here the taking a copy of the value and subsequently relying on that 
value mitigates TOCTOU, because the value *ptr_val is latched - read 
into a local variable which cannot be manipulated from an outside agent 
i.e. venus firmware.

The example in the commit log is not a TOCTOU for that reason.

Adding an additional check _after_ the memcpy() seems silly to me because

data_value = *ptr_val >> 2;

if (data_value >= MAX)
     return -EBAD;

memcpy(dst, src, data_value);

// This statement could be false
if (data_value != *ptr_value >> 2)
     return -EBAD;

// while this subsequent statement is true
if (data_value != *ptr_value >> 2)
     return -EBAD;

And in any case this is a post-use sanity check not a mitigation for 
TOCTOU which we don't have.

---
bod
Vikash Garodia May 15, 2025, 6:25 p.m. UTC | #6
On 5/15/2025 11:21 PM, Bryan O'Donoghue wrote:
> On 15/05/2025 14:23, Vikash Garodia wrote:
>>> Re-reading to see if the firmware wrote new bad data to the transmitted packet
>>> in-memory is not a fix before or after the memcpy() because the time you do that
>>> re-read is not fixed - locked wrt the freerunning firmware.
>> It would be more meaningful if you can suggest the vulnerability you see with
>> the changes suggested i.e adding the check in local packet against the size read
>> from shared queue. Based on that we can see how to fix it, otherwise this
>> discussion in not leading to any conclusion.
> 
> So to re-iterate.
> 
> TOCTOU is this
> 
> if (*ptr_val >> 2 >= MAX)
>     return -EBAD;
> 
> memcpy(dst, src, *ptr_val >> 2);
> 
> Here a malicious actor can change *ptr_val between our check and our use.
> 
> not
> 
> data_value = *ptr_val >> 2;
> 
> if (data_value >= MAX)
>     return -EBAD;
> 
> memcpy(dst, src, data_value);
> 
> Here the taking a copy of the value and subsequently relying on that value
> mitigates TOCTOU, because the value *ptr_val is latched - read into a local
> variable which cannot be manipulated from an outside agent i.e. venus firmware.
> 
> The example in the commit log is not a TOCTOU for that reason.
> 
> Adding an additional check _after_ the memcpy() seems silly to me because
> 
> data_value = *ptr_val >> 2;
> 
> if (data_value >= MAX)
>     return -EBAD;
> 
> memcpy(dst, src, data_value);
> 
> // This statement could be false
> if (data_value != *ptr_value >> 2)
>     return -EBAD;
> 
> // while this subsequent statement is true
> if (data_value != *ptr_value >> 2)
>     return -EBAD;
> 
Check the pseudo code which i proposed earlier in this conversation [1]. It does
not rely on ptr_val at all to check the sanity after memcpy.

[1] https://lore.kernel.org/all/0c50c24a-35fa-acfb-a807-b4ed5394506b@quicinc.com/
> And in any case this is a post-use sanity check not a mitigation for TOCTOU
> which we don't have.
> 
> ---
> bod