diff mbox series

net: packetmmap: fix only tx timestamp on request

Message ID 1620085579-5646-1-git-send-email-rsanger@wand.net.nz
State Superseded
Headers show
Series net: packetmmap: fix only tx timestamp on request | expand

Commit Message

Richard Sanger May 3, 2021, 11:46 p.m. UTC
The packetmmap tx ring should only return timestamps if requested,
as documented. This allows compatibility with non-timestamp aware
user-space code which checks tp_status == TP_STATUS_AVAILABLE;
not expecting additional timestamp flags to be set.

Signed-off-by: Richard Sanger <rsanger@wand.net.nz>
---
 net/packet/af_packet.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Willem de Bruijn May 4, 2021, 12:36 a.m. UTC | #1
On Mon, May 3, 2021 at 8:04 PM Richard Sanger <rsanger@wand.net.nz> wrote:
>

> The packetmmap tx ring should only return timestamps if requested,

> as documented. This allows compatibility with non-timestamp aware

> user-space code which checks tp_status == TP_STATUS_AVAILABLE;

> not expecting additional timestamp flags to be set.


This is an established interface.

Passing the status goes back to 2013, since commit b9c32fb27170
("packet: if hw/sw ts enabled in rx/tx ring, report which ts we got").

Passing a timestamp itself in tp_sec/tp_usec goes back to before git,
probably to the introduction of the ring.

I don't think we can change this now. That will likely break
applications that have come to expect current behavior.

Is it documented somewhere that the ring works differently? Or are you
referring to the general SO_TIMESTAMPING behavior, which is a separate
timestamp interface.
Richard Sanger May 4, 2021, 1:22 a.m. UTC | #2
Hi Willem,

This is to match up with the documented behaviour; see the timestamping section
at the bottom of
https://www.kernel.org/doc/html/latest/networking/packet_mmap.html

If no call to setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, ...) is made then
the tx path ring should not return timestamps, or timestamp flags set in
tp_status.

As noted in b9c32fb27170
("packet: if hw/sw ts enabled in rx/tx ring, report which ts we got")
this is to retain backwards compatibility with old code.

However, currently, a timestamp can be returned without setting
PACKET_TIMESTAMP, in the case that skb->tstamp includes a timestamp.
I only noticed this recently due to:
aa4e689ed1 (veth: add software timestamping)
which means skb->tstamp now includes a timestamp.

The issue this bug causes for old/non-timestamp aware code is that tp_status
may incorrectly have the TP_STATUS_TS_SOFTWARE flag set, so the documented
check (tp_status == TP_STATUS_AVAILABLE) that a frame in the ring is free fails.
Causing such code to hang infinitely.

This patch corrects the behaviour for the tx path. But, doesn't change the
behaviour on the rx path. The rx path still includes a timestamp (hence
the patch always sets the SOF_TIMESTAMPING_SOFTWARE flag on rx).

Thanks,
Richard


On Tue, May 4, 2021 at 12:36 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Mon, May 3, 2021 at 8:04 PM Richard Sanger <rsanger@wand.net.nz> wrote:

> >

> > The packetmmap tx ring should only return timestamps if requested,

> > as documented. This allows compatibility with non-timestamp aware

> > user-space code which checks tp_status == TP_STATUS_AVAILABLE;

> > not expecting additional timestamp flags to be set.

>

> This is an established interface.

>

> Passing the status goes back to 2013, since commit b9c32fb27170

> ("packet: if hw/sw ts enabled in rx/tx ring, report which ts we got").

>

> Passing a timestamp itself in tp_sec/tp_usec goes back to before git,

> probably to the introduction of the ring.

>

> I don't think we can change this now. That will likely break

> applications that have come to expect current behavior.

>

> Is it documented somewhere that the ring works differently? Or are you

> referring to the general SO_TIMESTAMPING behavior, which is a separate

> timestamp interface.
Willem de Bruijn May 4, 2021, 2:45 p.m. UTC | #3
On Mon, May 3, 2021 at 9:22 PM Richard Sanger <rsanger@wand.net.nz> wrote:
>

> Hi Willem,

>

> This is to match up with the documented behaviour; see the timestamping section

> at the bottom of

> https://www.kernel.org/doc/html/latest/networking/packet_mmap.html

>

> If no call to setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, ...) is made then

> the tx path ring should not return timestamps, or timestamp flags set in

> tp_status.

>

> As noted in b9c32fb27170

> ("packet: if hw/sw ts enabled in rx/tx ring, report which ts we got")

> this is to retain backwards compatibility with old code.

>

> However, currently, a timestamp can be returned without setting

> PACKET_TIMESTAMP, in the case that skb->tstamp includes a timestamp.

> I only noticed this recently due to:

> aa4e689ed1 (veth: add software timestamping)

> which means skb->tstamp now includes a timestamp.

>

> The issue this bug causes for old/non-timestamp aware code is that tp_status

> may incorrectly have the TP_STATUS_TS_SOFTWARE flag set, so the documented

> check (tp_status == TP_STATUS_AVAILABLE) that a frame in the ring is free fails.

> Causing such code to hang infinitely.


Then this would need a

Fixes: b9c32fb27170 ("packet: if hw/sw ts enabled in rx/tx ring,
report which ts we got")

I don't fully follow the commit message in that patch for why enabling
this unconditionally on Tx is safe:

"
   This should not break
    anything for the following reasons: [..]

    ii) in TX ring path, time stamps with PACKET_TIMESTAMP
    socketoption are not available resp. had no effect except that the
    application setting this is buggy. Next to TP_STATUS_AVAILABLE, the
    user also should check for other flags such as TP_STATUS_WRONG_FORMAT
    to reclaim frames to the application. Thus, in case TX ts are turned
    off (default case), nothing happens to the application logic
"

But I think the point is that tx packets are not timestamped unless
skb_shinfo(skb)->tx_flags holds a timestamp request. Such as for
the software timestamps that veth can now generate:

"
static inline void skb_tx_timestamp(struct sk_buff *skb)
{
        skb_clone_tx_timestamp(skb);
        if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
                skb_tstamp_tx(skb, NULL);
}
"

So unless this packet socket has SOF_TIMESTAMPING_TX_SOFTWARE
configured, no timestamps should be recorded for its packets, as tx flag
SKBTX_SW_TSTAMP is not set.

> This patch corrects the behaviour for the tx path. But, doesn't change the

> behaviour on the rx path. The rx path still includes a timestamp (hence

> the patch always sets the SOF_TIMESTAMPING_SOFTWARE flag on rx).


Right, this patch suppresses reporting of any recorded timestamps. But
the system should already be suppressing recording of these
timestamps.

Assuming you discovered this with a real application: does it call
setsockopt SOL_SOCKET/SO_TIMESTAMPING at all?

It's safe to suppress on the reporting side as extra precaution against
spuriously timestamped packets. I just want to understand how these
timestamps are even recorded in the first place.

Small nit wrt the patch: the comment "/* always timestamp; prefer an
existing software timestamp */" states what the code does, but more
interesting would be why.
Richard Sanger May 5, 2021, 4:29 a.m. UTC | #4
On Wed, May 5, 2021 at 2:45 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Mon, May 3, 2021 at 9:22 PM Richard Sanger <rsanger@wand.net.nz> wrote:

> >

> > Hi Willem,

> >

> > This is to match up with the documented behaviour; see the timestamping section

> > at the bottom of

> > https://www.kernel.org/doc/html/latest/networking/packet_mmap.html

[ ... ]
>

> Then this would need a

>

> Fixes: b9c32fb27170 ("packet: if hw/sw ts enabled in rx/tx ring,

> report which ts we got")


ack, I will resubmit the patch with that as the summary line of the commit
message.

> I don't fully follow the commit message in that patch for why enabling

> this unconditionally on Tx is safe:

>

[...]
>

> But I think the point is that tx packets are not timestamped unless

> skb_shinfo(skb)->tx_flags holds a timestamp request. Such as for

> the software timestamps that veth can now generate:

>


I came to the same understanding, tx timestamping should be disabled unless
the code calls setsockopt SOL_SOCKET/SO_TIMESTAMPING.

> "

> static inline void skb_tx_timestamp(struct sk_buff *skb)

> {

>         skb_clone_tx_timestamp(skb);

>         if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)

>                 skb_tstamp_tx(skb, NULL);

> }

> "

>

> So unless this packet socket has SOF_TIMESTAMPING_TX_SOFTWARE

> configured, no timestamps should be recorded for its packets, as tx flag

> SKBTX_SW_TSTAMP is not set.


You are right, that check is working correctly, I'm mistaken on the trigger of
this behaviour. It doesn't appear related to aa4e689ed1
(veth: add software timestamping). In fact, this bug is present in Linux 4.19
the version before that patch was added, and likely earlier versions too.

I've just verified using printk() that after the call to skb_tx_timestamp(skb)
in veth_xmit() skb->tstamp == 0 as expected.

However, when skb_tx_timestamp() is called within the packetmmap code path
skb->tstamp holds a valid time.

> > This patch corrects the behaviour for the tx path. But, doesn't change the

> > behaviour on the rx path. The rx path still includes a timestamp (hence

> > the patch always sets the SOF_TIMESTAMPING_SOFTWARE flag on rx).

>

> Right, this patch suppresses reporting of any recorded timestamps. But

> the system should already be suppressing recording of these

> timestamps.

>

> Assuming you discovered this with a real application: does it call

> setsockopt SOL_SOCKET/SO_TIMESTAMPING at all?

>


Yes, I can confirm my code does not setsockopt SO_TIMESTAMPING
Here is the filtered output of strace

# strace ./test-live -c 1 ring:veth0 2>&1  | grep sock
socket(AF_PACKET, SOCK_RAW, htons(0 /* ETH_P_??? */)) = 3
setsockopt(3, SOL_PACKET, PACKET_VERSION, [1], 4) = 0
setsockopt(3, SOL_PACKET, PACKET_TX_RING, {tp_block_size=1048576,
tp_block_nr=1, tp_frame_size=4096, tp_frame_nr=256}, 16) = 0
socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 4

> It's safe to suppress on the reporting side as extra precaution against

> spuriously timestamped packets. I just want to understand how these

> timestamps are even recorded in the first place.

>


Agreed, if this isn't expected behaviour, how skb->tstamp is getting filled
with a timestamp remains a mystery to me. I'll report back if I find the
source.

> Small nit wrt the patch: the comment "/* always timestamp; prefer an

> existing software timestamp */" states what the code does, but more

> interesting would be why.


Absolutely, I'll replace it with something along the lines of
/* always timestamp; prefer an existing software timestamp taken closer to
   the time of capture */
Richard Sanger May 5, 2021, 5:09 a.m. UTC | #5
> However, when skb_tx_timestamp() is called within the packetmmap code path

> skb->tstamp holds a valid time.


Sorry, I've confused a function name here, meant to say:
However, when ***tpacket_get_timestamp()*** is called within the packetmmap
code path skb->tstamp holds a valid time.
Willem de Bruijn May 6, 2021, 1:23 a.m. UTC | #6
On Wed, May 5, 2021 at 7:42 PM Richard Sanger <rsanger@wand.net.nz> wrote:
>

> On Wed, May 5, 2021 at 2:45 AM Willem de Bruijn

> <willemdebruijn.kernel@gmail.com> wrote:

> >

> > On Mon, May 3, 2021 at 9:22 PM Richard Sanger <rsanger@wand.net.nz> wrote:

> > >

> > > Hi Willem,

> > >

> > > This is to match up with the documented behaviour; see the timestamping section

> > > at the bottom of

> > > https://www.kernel.org/doc/html/latest/networking/packet_mmap.html

> [ ... ]

> >

> > Then this would need a

> >

> > Fixes: b9c32fb27170 ("packet: if hw/sw ts enabled in rx/tx ring,

> > report which ts we got")

>

> ack, I will resubmit the patch with that as the summary line of the commit

> message.


The fixes tag is not the summary line.

> > I don't fully follow the commit message in that patch for why enabling

> > this unconditionally on Tx is safe:

> >

> [...]

> >

> > But I think the point is that tx packets are not timestamped unless

> > skb_shinfo(skb)->tx_flags holds a timestamp request. Such as for

> > the software timestamps that veth can now generate:

> >

>

> I came to the same understanding, tx timestamping should be disabled unless

> the code calls setsockopt SOL_SOCKET/SO_TIMESTAMPING.

>

> > "

> > static inline void skb_tx_timestamp(struct sk_buff *skb)

> > {

> >         skb_clone_tx_timestamp(skb);

> >         if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)

> >                 skb_tstamp_tx(skb, NULL);

> > }

> > "

> >

> > So unless this packet socket has SOF_TIMESTAMPING_TX_SOFTWARE

> > configured, no timestamps should be recorded for its packets, as tx flag

> > SKBTX_SW_TSTAMP is not set.

>

> You are right, that check is working correctly, I'm mistaken on the trigger of

> this behaviour. It doesn't appear related to aa4e689ed1

> (veth: add software timestamping). In fact, this bug is present in Linux 4.19

> the version before that patch was added, and likely earlier versions too.

>

> I've just verified using printk() that after the call to skb_tx_timestamp(skb)

> in veth_xmit() skb->tstamp == 0 as expected.

>

> However, when skb_tx_timestamp() is called within the packetmmap code path

> skb->tstamp holds a valid time.


Interesting. I had expected veth_xmit to trigger skb_orphan, which
calls the destructor.

But this is no longer true as of commit 9c4c325252c5 ("skbuff:
preserve sock reference when scrubbing the skb.").

As a result, I suppose the skb can enter the next namespace and be
timestamped there if receive timestamps are enabled (this is not
per-socket).

One way to verify, if you can easily recompile a kernel, is to add a
WARN_ON_ONCE(1) to tpacket_destruct_skb to see which path led up to
queuing the completion notification.

> > > This patch corrects the behaviour for the tx path. But, doesn't change the

> > > behaviour on the rx path. The rx path still includes a timestamp (hence

> > > the patch always sets the SOF_TIMESTAMPING_SOFTWARE flag on rx).

> >

> > Right, this patch suppresses reporting of any recorded timestamps. But

> > the system should already be suppressing recording of these

> > timestamps.

> >

> > Assuming you discovered this with a real application: does it call

> > setsockopt SOL_SOCKET/SO_TIMESTAMPING at all?

> >

>

> Yes, I can confirm my code does not setsockopt SO_TIMESTAMPING

> Here is the filtered output of strace

>

> # strace ./test-live -c 1 ring:veth0 2>&1  | grep sock

> socket(AF_PACKET, SOCK_RAW, htons(0 /* ETH_P_??? */)) = 3

> setsockopt(3, SOL_PACKET, PACKET_VERSION, [1], 4) = 0

> setsockopt(3, SOL_PACKET, PACKET_TX_RING, {tp_block_size=1048576,

> tp_block_nr=1, tp_frame_size=4096, tp_frame_nr=256}, 16) = 0

> socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 4

>

> > It's safe to suppress on the reporting side as extra precaution against

> > spuriously timestamped packets. I just want to understand how these

> > timestamps are even recorded in the first place.

> >

>

> Agreed, if this isn't expected behaviour, how skb->tstamp is getting filled

> with a timestamp remains a mystery to me. I'll report back if I find the

> source.


I think we need to understand exactly what goes on before we apply a
patch. It might just be papering over the problem otherwise.
Richard Sanger May 11, 2021, 10:11 p.m. UTC | #7
I've had a chance to look into this further and have found where the
timestamp is added. Details are at the end of this message.

On Thu, May 6, 2021 at 1:23 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Wed, May 5, 2021 at 7:42 PM Richard Sanger <rsanger@wand.net.nz> wrote:

[...]
> >

> > I've just verified using printk() that after the call to skb_tx_timestamp(skb)

> > in veth_xmit() skb->tstamp == 0 as expected.

> >

> > However, when skb_tx_timestamp() is called within the packetmmap code path

> > skb->tstamp holds a valid time.

>

> Interesting. I had expected veth_xmit to trigger skb_orphan, which

> calls the destructor.

>

> But this is no longer true as of commit 9c4c325252c5 ("skbuff:

> preserve sock reference when scrubbing the skb.").

>

> As a result, I suppose the skb can enter the next namespace and be

> timestamped there if receive timestamps are enabled (this is not

> per-socket).

>

> One way to verify, if you can easily recompile a kernel, is to add a

> WARN_ON_ONCE(1) to tpacket_destruct_skb to see which path led up to

> queuing the completion notification.

>


Here's the output of putting a WARN_ON_ONCE(1) statement in
tpacket_destruct_skb, I don't believe it is related to the problem.

[   37.249629] RIP: 0010:tpacket_destruct_skb+0x24/0x60
[...]
[   37.249659] Call Trace:
[   37.249661]  <IRQ>
[   37.249666]  skb_release_head_state+0x44/0x90
[   37.249680]  skb_release_all+0x13/0x30
[   37.249684]  kfree_skb+0x2f/0xa0
[   37.249689]  llc_rcv+0x2e/0x360 [llc]
[   37.249698]  __netif_receive_skb_one_core+0x8f/0xa0
[   37.249707]  __netif_receive_skb+0x18/0x60
[   37.249710]  process_backlog+0xa9/0x160
[   37.249714]  __napi_poll+0x31/0x140
[   37.249717]  net_rx_action+0xde/0x210
[   37.249722]  __do_softirq+0xe0/0x29b
[   37.249737]  do_softirq+0x66/0x80
[   37.249747]  </IRQ>
[   37.249748]  __local_bh_enable_ip+0x50/0x60
[   37.249751]  __dev_queue_xmit+0x23a/0x6e0
[   37.249756]  dev_queue_xmit+0x10/0x20
[   37.249759]  packet_sendmsg+0x6b8/0x1c90
[   37.249763]  ? __drain_all_pages+0x150/0x1c0
[   37.249772]  sock_sendmsg+0x65/0x70
[   37.249778]  __sys_sendto+0x113/0x190
[   37.249783]  ? handle_mm_fault+0xda/0x2b0
[   37.249790]  ? exit_to_user_mode_prepare+0x3c/0x1e0
[   37.249800]  ? do_user_addr_fault+0x1d3/0x640
[   37.249805]  __x64_sys_sendto+0x29/0x30
[   37.249809]  do_syscall_64+0x40/0xb0
[   37.249816]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   37.249820] RIP: 0033:0x7f43950d27ea

[...]

> I think we need to understand exactly what goes on before we apply a

> patch. It might just be papering over the problem otherwise.


Okay, so the call path that adds the timestamp looks like this:

send() syscall triggers tpacket_snd() which calls the veth_xmit() hander.
In drivers/net/veth.c veth_xmit() calls veth_forward_skb() which then
calls netif_rx()/netif_rx_internal() in net/core/dev.c.
And finally, net_timestamp_check(netdev_tstamp_prequeue, skb) adds
the timestamp, netdev_tstamp_prequeue defaults to 1.

net_timestamp_check in its current form was added by 588f033075
("net: use jump_label for netstamp_needed ")
In the kernel since 3.3-rc1, so it looks like this issue has been present the
entire time. Pre-conditions are netstamp_needed_key and
netdev_tstamp_prequeue, so if either is false, timestamping won't happen
at this stage in the code.

Here's the call trace of where the timestamp is added

[  251.619538] Call Trace:
[  251.619550]  netif_rx+0x1b/0x60
[  251.619556]  veth_xmit+0x19d/0x230 [veth]
[  251.619563]  netdev_start_xmit+0x4a/0x8b
[  251.619566]  dev_hard_start_xmit.cold+0xc8/0x1d5
[  251.619569]  __dev_queue_xmit.cold+0xa3/0x12c
[  251.619572]  dev_queue_xmit+0x10/0x20
[  251.619575]  packet_sendmsg+0x6b8/0x1c90
[  251.619580]  ? __drain_all_pages+0x150/0x1c0
[  251.619588]  sock_sendmsg+0x65/0x70
[  251.619594]  __sys_sendto+0x113/0x190
[  251.619598]  ? handle_mm_fault+0xda/0x2b0
[  251.619604]  ? exit_to_user_mode_prepare+0x3c/0x1e0
[  251.619611]  ? do_user_addr_fault+0x1d3/0x640
[  251.619615]  __x64_sys_sendto+0x29/0x30
[  251.619618]  do_syscall_64+0x40/0xb0
[  251.619623]  entry_SYSCALL_64_after_hwframe+0x44/0xae

This appears to be reasonable, but I don't know what the expected behaviour
is. Should this timestamp still be cleared before returning the sent skb?
Willem de Bruijn May 11, 2021, 10:56 p.m. UTC | #8
On Tue, May 11, 2021 at 6:11 PM Richard Sanger <rsanger@wand.net.nz> wrote:
>

> I've had a chance to look into this further and have found where the

> timestamp is added. Details are at the end of this message.

>

> On Thu, May 6, 2021 at 1:23 PM Willem de Bruijn

> <willemdebruijn.kernel@gmail.com> wrote:

> >

> > On Wed, May 5, 2021 at 7:42 PM Richard Sanger <rsanger@wand.net.nz> wrote:

> [...]

> > >

> > > I've just verified using printk() that after the call to skb_tx_timestamp(skb)

> > > in veth_xmit() skb->tstamp == 0 as expected.

> > >

> > > However, when skb_tx_timestamp() is called within the packetmmap code path

> > > skb->tstamp holds a valid time.

> >

> > Interesting. I had expected veth_xmit to trigger skb_orphan, which

> > calls the destructor.

> >

> > But this is no longer true as of commit 9c4c325252c5 ("skbuff:

> > preserve sock reference when scrubbing the skb.").

> >

> > As a result, I suppose the skb can enter the next namespace and be

> > timestamped there if receive timestamps are enabled (this is not

> > per-socket).

> >

> > One way to verify, if you can easily recompile a kernel, is to add a

> > WARN_ON_ONCE(1) to tpacket_destruct_skb to see which path led up to

> > queuing the completion notification.

> >

>

> Here's the output of putting a WARN_ON_ONCE(1) statement in

> tpacket_destruct_skb, I don't believe it is related to the problem.


It might be, if this is indeed an llc (ETH_P_802_2) packet that was
inserted with tpacket.

veth calls netif_rx, which enqueues the packet to the percpu queue for
later processing in softirq context using process_backlog.

>

> [   37.249629] RIP: 0010:tpacket_destruct_skb+0x24/0x60

> [...]

> [   37.249659] Call Trace:

> [   37.249661]  <IRQ>

> [   37.249666]  skb_release_head_state+0x44/0x90

> [   37.249680]  skb_release_all+0x13/0x30

> [   37.249684]  kfree_skb+0x2f/0xa0

> [   37.249689]  llc_rcv+0x2e/0x360 [llc]

> [   37.249698]  __netif_receive_skb_one_core+0x8f/0xa0

> [   37.249707]  __netif_receive_skb+0x18/0x60

> [   37.249710]  process_backlog+0xa9/0x160

> [   37.249714]  __napi_poll+0x31/0x140

> [   37.249717]  net_rx_action+0xde/0x210

> [   37.249722]  __do_softirq+0xe0/0x29b

> [   37.249737]  do_softirq+0x66/0x80

> [   37.249747]  </IRQ>

> [   37.249748]  __local_bh_enable_ip+0x50/0x60

> [   37.249751]  __dev_queue_xmit+0x23a/0x6e0

> [   37.249756]  dev_queue_xmit+0x10/0x20

> [   37.249759]  packet_sendmsg+0x6b8/0x1c90

> [   37.249763]  ? __drain_all_pages+0x150/0x1c0

> [   37.249772]  sock_sendmsg+0x65/0x70

> [   37.249778]  __sys_sendto+0x113/0x190

> [   37.249783]  ? handle_mm_fault+0xda/0x2b0

> [   37.249790]  ? exit_to_user_mode_prepare+0x3c/0x1e0

> [   37.249800]  ? do_user_addr_fault+0x1d3/0x640

> [   37.249805]  __x64_sys_sendto+0x29/0x30

> [   37.249809]  do_syscall_64+0x40/0xb0

> [   37.249816]  entry_SYSCALL_64_after_hwframe+0x44/0xae

> [   37.249820] RIP: 0033:0x7f43950d27ea

>

> [...]

>

> > I think we need to understand exactly what goes on before we apply a

> > patch. It might just be papering over the problem otherwise.

>

> Okay, so the call path that adds the timestamp looks like this:

>

> send() syscall triggers tpacket_snd() which calls the veth_xmit() hander.

> In drivers/net/veth.c veth_xmit() calls veth_forward_skb() which then

> calls netif_rx()/netif_rx_internal() in net/core/dev.c.

> And finally, net_timestamp_check(netdev_tstamp_prequeue, skb) adds

> the timestamp, netdev_tstamp_prequeue defaults to 1.

>

> net_timestamp_check in its current form was added by 588f033075

> ("net: use jump_label for netstamp_needed ")

> In the kernel since 3.3-rc1, so it looks like this issue has been present the

> entire time. Pre-conditions are netstamp_needed_key and

> netdev_tstamp_prequeue, so if either is false, timestamping won't happen

> at this stage in the code.


Thanks. That's as I expected.

We cannot suppress the timestamp assignment by net_timestamp_check, as
it may be expected by the receive process.

I don't immediately see a way to clear/ignore the field in
tpacket_destruct_skb if the field was set from a different network
namespace than the one in which the packet socket inserted the packet.

So your suggested patch to suppress spurious tx timestamps if software
timestamping reporting is not enabled SGTM.


> Here's the call trace of where the timestamp is added

>

> [  251.619538] Call Trace:

> [  251.619550]  netif_rx+0x1b/0x60

> [  251.619556]  veth_xmit+0x19d/0x230 [veth]

> [  251.619563]  netdev_start_xmit+0x4a/0x8b

> [  251.619566]  dev_hard_start_xmit.cold+0xc8/0x1d5

> [  251.619569]  __dev_queue_xmit.cold+0xa3/0x12c

> [  251.619572]  dev_queue_xmit+0x10/0x20

> [  251.619575]  packet_sendmsg+0x6b8/0x1c90

> [  251.619580]  ? __drain_all_pages+0x150/0x1c0

> [  251.619588]  sock_sendmsg+0x65/0x70

> [  251.619594]  __sys_sendto+0x113/0x190

> [  251.619598]  ? handle_mm_fault+0xda/0x2b0

> [  251.619604]  ? exit_to_user_mode_prepare+0x3c/0x1e0

> [  251.619611]  ? do_user_addr_fault+0x1d3/0x640

> [  251.619615]  __x64_sys_sendto+0x29/0x30

> [  251.619618]  do_syscall_64+0x40/0xb0

> [  251.619623]  entry_SYSCALL_64_after_hwframe+0x44/0xae

>

> This appears to be reasonable, but I don't know what the expected behaviour

> is. Should this timestamp still be cleared before returning the sent skb?
diff mbox series

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ba96db1..b69805e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -422,7 +422,8 @@  static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts,
 	    ktime_to_timespec64_cond(shhwtstamps->hwtstamp, ts))
 		return TP_STATUS_TS_RAW_HARDWARE;
 
-	if (ktime_to_timespec64_cond(skb->tstamp, ts))
+	if ((flags & SOF_TIMESTAMPING_SOFTWARE) &&
+	    ktime_to_timespec64_cond(skb->tstamp, ts))
 		return TP_STATUS_TS_SOFTWARE;
 
 	return 0;
@@ -2340,7 +2341,10 @@  static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 
 	skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
 
-	if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp)))
+	/* always timestamp; prefer an existing software timestamp */
+	ts_status = tpacket_get_timestamp(skb, &ts,
+					  po->tp_tstamp | SOF_TIMESTAMPING_SOFTWARE);
+	if (!ts_status)
 		ktime_get_real_ts64(&ts);
 
 	status |= ts_status;