From patchwork Mon Nov 27 16:19:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnd Bergmann X-Patchwork-Id: 119741 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp488636qgn; Mon, 27 Nov 2017 08:20:33 -0800 (PST) X-Google-Smtp-Source: AGs4zMZ6XuQr+vEaLPXPx9xiXdA2paU4ns5ahTida+BKojsUNOFr4ZhutiuVGR5mJfgaDKEFprnq X-Received: by 10.84.172.1 with SMTP id m1mr39707047plb.174.1511799632949; Mon, 27 Nov 2017 08:20:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511799632; cv=none; d=google.com; s=arc-20160816; b=yOEMp8lLqN4tnmb2eN7+IpCLBLrntOuYpUXm7vPalkGr92S7ByAO1upK0mVYvPznGe JR9GLdMEekK0FGBWe+vjiA/Qj02wiSq2HKGO0Uv4xDgWrck0AH3fjbRrP7g0C6XC9RxD 6lDbQP2/Cl+gUcWvDZvxlTRO9wLVQA6T3F6yoof7YShIO4WepiykDVPxRb/3ReSlahA0 WttdBCqW8ee1epW287RdPTiqbt5EYZi5xFMmbMARXi8K6VTN0j1pNW84acZvs5bNmfhy FAjTZBbIxFqUS/tRgJSlLws8bDL73ddmvpaC2ftmUMsnVmv9y3Vl3/x/AtNIloGliPvF oA7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=jMDF0W5jdEJnYatO7P9mdlhNkK89J2VNSe8WclAixQs=; b=ZYlr0ho1+4MgfaLgdVRwAAXb0ln1mFDzuhrVRW3ntG6sKWV6FvsEPp0kqdlQ3v13FP bxaPUbOG1hADfAxD1Tl7o4yYm3uVk+D1O8To64GW7s9dqudFYAzj9I2EuN/YojnfclSc YIGVg+vrpJauD4HvXHMfKI6unUvAsi4ZF1mwGoWt4HBJrs9OJpUOtU8eM/In5KO85HbA TbOTTlOlnQLp3DmQGghAvuv+E13AisL3ID5vXWdoJvsc+I49nFcqp/vpUN2jKxHzH2rw LvRzs9s+jiQ6+9GU2VhrfLcqKvPrX6Ck2p0jV1GAln/MFskLEKo+tPBE5CnnHhG4wqFX N7FQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si4952981pgt.806.2017.11.27.08.20.32; Mon, 27 Nov 2017 08:20:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753507AbdK0QUb (ORCPT + 28 others); Mon, 27 Nov 2017 11:20:31 -0500 Received: from mout.kundenserver.de ([212.227.126.131]:50876 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753320AbdK0QU3 (ORCPT ); Mon, 27 Nov 2017 11:20:29 -0500 Received: from wuerfel.lan ([109.193.157.232]) by mrelayeu.kundenserver.de (mreue006 [212.227.15.129]) with ESMTPA (Nemesis) id 0MJIXG-1eHhB63WqA-002ra0; Mon, 27 Nov 2017 17:20:15 +0100 From: Arnd Bergmann To: "David S. Miller" Cc: y2038@lists.linaro.org, Arnd Bergmann , Eric Dumazet , Willem de Bruijn , "Rosen, Rami" , Andrey Konovalov , "Reshetova, Elena" , Mike Maloney , Sowmini Varadhan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] [net-next] packet: clarify timestamp overflow Date: Mon, 27 Nov 2017 17:19:24 +0100 Message-Id: <20171127162001.4055813-1-arnd@arndb.de> X-Mailer: git-send-email 2.9.0 X-Provags-ID: V03:K0:f9NuAXPM1QuK5pFkJWgrtIdqsibbqaROZUWTUZa9ZvTQ3wnkXEW OlLnm2VO5WAcxmtFq9wwKggJ6Ia8cUusqcMNAfj+LMKWbdEUEmvvYm8pp9zqFgWba04Aw44 dM4K/x0OOshT0jP3qnxhaW/0kMXkMXF15oyZh/j9cMK9oUsLBnsKarEA7v0rseNEv7ZJwvM QoOa4UdDNilzES81f1wsw== X-UI-Out-Filterresults: notjunk:1; V01:K0:CWj6BVP+hs8=:zyVhJOzT6jtzvqGOhWvvB6 CX8N2IiQaYb2bgw0majS/UvAvwAsoVNQbaK0SIP/q8Ndl1uSZLIY+S0W7FPApRvcubHbAXJHp tjSUtR+6aO7AA9yJhRSVTJMv2tn1JnqLMKHphBeHT0CepsQm8z+KTsgmpXYEIWSZF1cjevHLV L0hW++BMOIrnvtPzXxoo0AknaGO43Di7oWrjquzFB6CnwGZre6xQ2IWpcHdRz0f3jPr2oP10H 573PhKFg0NMT9mKJvAql1T63AYAL2k5VIEzRUeK/wS+wLuhBh9fZyrAW9YtDmIq0Wq8Pfi9vy klOZsEKS2zO1/1kJ3GmbJUXreCp0jI9STQ+guVNuBRN2bf0RqcfCDwNKwjQ2qEzh0cZxkUur7 CJUuGjE3tfx3luB9zNcprQ+4BHE8HP1vTO4kQyB4DB5G2aDU31nOQH5ZMN5ZDZPf2OxtrMXNd pIxW/5E4xjsqOLYiFV6m111iqYRycf36PR+eJBnpGnckYGmKDtpyMZ2wLd4rjOmK5M84EOpDx 3V05mA/Vw+ngUqSJihEt1MymSMuKszbd6baNd2jo4MCKjMEtVAhTiu333OYvcKU8S4REVDu6m V5f1UjME2cF+xC7c9hsKAsyzGWoHFIPgKoXbjQd3r3IIDStieivhwXguW7DfnT94hZ7DtJqCx RN06vLG/1F028uRittum6x3ZLY4xXAyQi9gQqNiM2znbAgqTvB+9votg4r7ElG8eYqVKyV3/g SYbJz3IANzk/3f41sRV17mB8yT19BsJey6cgQg== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The memory mapped packet socket data structure in version 1 through 3 all contain 32-bit second values for the packet time stamps, which makes them suffer from the overflow of time_t in y2038 or y2106 (depending on whether user space interprets the value as signed or unsigned). The implementation uses the deprecated getnstimeofday() function. In order to get rid of that, this changes the code to use ktime_get_real_ts64() as a replacement, documenting the nature of the overflow. As long as the user applications treat the timestamps as unsigned, or only use the difference between timestamps, they are fine, and changing the timestamps to 64-bit wouldn't require a more invasive user space API change. Note: a lot of other APIs suffer from incompatible structures when time_t gets redefined to 64-bit in 32-bit user space, but this one does not. Signed-off-by: Arnd Bergmann --- net/packet/af_packet.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) -- 2.9.0 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 737092ca9b4e..7432c6699818 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -439,17 +439,17 @@ static int __packet_get_status(struct packet_sock *po, void *frame) } } -static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec *ts, +static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts, unsigned int flags) { struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); if (shhwtstamps && (flags & SOF_TIMESTAMPING_RAW_HARDWARE) && - ktime_to_timespec_cond(shhwtstamps->hwtstamp, ts)) + ktime_to_timespec64_cond(shhwtstamps->hwtstamp, ts)) return TP_STATUS_TS_RAW_HARDWARE; - if (ktime_to_timespec_cond(skb->tstamp, ts)) + if (ktime_to_timespec64_cond(skb->tstamp, ts)) return TP_STATUS_TS_SOFTWARE; return 0; @@ -459,13 +459,20 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, struct sk_buff *skb) { union tpacket_uhdr h; - struct timespec ts; + struct timespec64 ts; __u32 ts_status; if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) return 0; h.raw = frame; + /* + * versions 1 through 3 overflow the timestamps in y2106, since they + * all store the seconds in a 32-bit unsigned integer. + * If we create a version 4, that should have a 64-bit timestamp, + * either 64-bit seconds + 32-bit nanoseconds, or just 64-bit + * nanoseconds. + */ switch (po->tp_version) { case TPACKET_V1: h.h1->tp_sec = ts.tv_sec; @@ -805,8 +812,8 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1, * It shouldn't really happen as we don't close empty * blocks. See prb_retire_rx_blk_timer_expired(). */ - struct timespec ts; - getnstimeofday(&ts); + struct timespec64 ts; + ktime_get_real_ts64(&ts); h1->ts_last_pkt.ts_sec = ts.tv_sec; h1->ts_last_pkt.ts_nsec = ts.tv_nsec; } @@ -836,7 +843,7 @@ static void prb_thaw_queue(struct tpacket_kbdq_core *pkc) static void prb_open_block(struct tpacket_kbdq_core *pkc1, struct tpacket_block_desc *pbd1) { - struct timespec ts; + struct timespec64 ts; struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1; smp_rmb(); @@ -849,7 +856,7 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1, BLOCK_NUM_PKTS(pbd1) = 0; BLOCK_LEN(pbd1) = BLK_PLUS_PRIV(pkc1->blk_sizeof_priv); - getnstimeofday(&ts); + ktime_get_real_ts64(&ts); h1->ts_first_pkt.ts_sec = ts.tv_sec; h1->ts_first_pkt.ts_nsec = ts.tv_nsec; @@ -2184,7 +2191,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, unsigned long status = TP_STATUS_USER; unsigned short macoff, netoff, hdrlen; struct sk_buff *copy_skb = NULL; - struct timespec ts; + struct timespec64 ts; __u32 ts_status; bool is_drop_n_account = false; bool do_vnet = false; @@ -2312,7 +2319,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, skb_copy_bits(skb, 0, h.raw + macoff, snaplen); if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) - getnstimeofday(&ts); + ktime_get_real_ts64(&ts); status |= ts_status; From patchwork Mon Nov 27 16:19:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnd Bergmann X-Patchwork-Id: 119742 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp489985qgn; Mon, 27 Nov 2017 08:21:27 -0800 (PST) X-Google-Smtp-Source: AGs4zMa1Wljywt5gesyIEfnj/se30gg0E6xwNpASZxkZdS3NUInyE/OiqWtPKzhykJI53Mk3m+WE X-Received: by 10.159.216.142 with SMTP id s14mr32510982plp.173.1511799687866; Mon, 27 Nov 2017 08:21:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511799687; cv=none; d=google.com; s=arc-20160816; b=dcgFGYe7TuM5/PqTOEUkMzRF2uWvlwZ+8OaPgmsoVKXYiopkBRczgtEjG1g+dLINll mB82cFdJrWkptL4CNHMBi90J5w40cy+mwZagBX0sVTqiHrORTvgfQ0DkE5FflJNsB5TD VjiYQPHvB8LVGtQqxJdeaZvtQ7/nhjsvvfgK6j3JMd1q8ZKnQNkjbVTh0PGjebdpMYPx DGzZwG013r1EYeNGlinz8Cw6IpzQWfxqQvZ1Ej3KQsejQUEiMv5fVebxnb0p4XLS2UKt 9PzTPLMOse6XF1bYoID5+T8A4Q3wk+hvDwWizwQ6o/yDkFs8apS+ogXDJxohUibWugL3 t05w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=ZVC5lKn14dyJe2bifxHHW5KZYepfNugSI7acqeXLKtc=; b=WT+a+v7i8NBjfCSDuj8uiwQRKLaAwsCQGo4F79MPGggLFFjav1DqBqjdviGMmicD0J SQZJBKSlVuatqgChYGajW0UhL3m0CrkG1C3MrgbkiB+yVO2+IcboGQhhT95Qlx/Pdv8Q wF+BYDvyvo8KFpehqGZObtsKWzF5qX6m6vSkMwZDCFs67Oo03xBCZvOf+ETR9QXSTmbJ zqw14LdALH9cImUa4i0vBmJssjSpS/0Kw6RyYYBawOTbkWEFTl152QO8+hErSBUsrfJH e9cNC/kZUpqwgWWrLd6mlb4+gJoycgAnMTUu2+MF0sgWimfqaCDtSLTqqUJl0d/I/KEI vrKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c26si302418pfh.105.2017.11.27.08.21.27; Mon, 27 Nov 2017 08:21:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753529AbdK0QV0 (ORCPT + 28 others); Mon, 27 Nov 2017 11:21:26 -0500 Received: from mout.kundenserver.de ([212.227.126.135]:52312 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753228AbdK0QVX (ORCPT ); Mon, 27 Nov 2017 11:21:23 -0500 Received: from wuerfel.lan ([109.193.157.232]) by mrelayeu.kundenserver.de (mreue006 [212.227.15.129]) with ESMTPA (Nemesis) id 0M3ZGX-1fAdKO1HQv-00rFCx; Mon, 27 Nov 2017 17:21:01 +0100 From: Arnd Bergmann To: "David S. Miller" Cc: y2038@lists.linaro.org, Arnd Bergmann , Shuah Khan , Willem de Bruijn , Mike Maloney , Eric Dumazet , Kees Cook , "Rosen, Rami" , Andrey Konovalov , "Reshetova, Elena" , Sowmini Varadhan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH 2/2] [RFC] packet: experimental support for 64-bit timestamps Date: Mon, 27 Nov 2017 17:19:25 +0100 Message-Id: <20171127162001.4055813-2-arnd@arndb.de> X-Mailer: git-send-email 2.9.0 In-Reply-To: <20171127162001.4055813-1-arnd@arndb.de> References: <20171127162001.4055813-1-arnd@arndb.de> X-Provags-ID: V03:K0:USZm/scWUZMysMY15rG+cHXbcP9W/d65IvLQzjJHxKSSJdiRhXt RHJFlmksBcumdqdAn9d2eiPp0f5Fh9fc/FKRTvZ9ol83lweOQPuzMYLWyWf1ZCsm1dY5H7z RrEHZsjBonujxtX8OTCfgil8Kzl8av9UZPsvCa5ARTR4gDcLf7zf+5hbpyboWXB98YOPZit 3jDj8VW57XCBgqf6nh1hA== X-UI-Out-Filterresults: notjunk:1; V01:K0:9OTl+QA3WIY=:+zrnZREMAmXPOczTSagWlP 15dzbUVUih4gQK2zCICAIhBoS7yCs2jTX0JkdC8gM1CBROi6iuwayUZbvb/1bCN96/FrYpb1O 8+PxbqWGy1tBB1RzhjCythkdWBDCxZoNe5NU9cR6Cm6B69g3bzVN0aNYvhI8ZCxkxq7bl0Wau EDqOVnx7c0SeHIzjo+c/fB2BTXPZH8Vad8GTtLoZHRnMq9qz58Hadqe1V6eSnVSzOWlbFj63+ 58vKjAJihsCHt8eqjq+5SpC00sREo+/ASAX3MiXl7Xap9T3/LrP/YWrBNEOdC0TFAEoz6mkB3 M4KVpzCA0wn52e76DylDeuaBDW5lNe7iP5zaMCkT9BVbw9hINe99RdoePKyIg5+GpuKJxW5Zy qYLAam7mrF12AtXNL0wTs981v2dxJvntR9j7RhCmf7+MbqPU/EiCMwtrg3dfDx8u1zPqlf/pF CiVrVg5nVBFpklyBKwOn06OccKdq2bdcsqbgCl9EcV6mQf/Uoim9cJkSPLA42TuQYen8MnjNF ZWvlYYxHBQ/ppVZ5JrUrxjvPfYcm7Rm68IoNxCxHi1xgmLaMpN/8lH7lOgS31NuEtiuwMuWqt 9iHuDKicZ6ZwI5IpBjr3IBOvAZ/beIygX3EECOwI7HjtLTvaqFNI6pONDDvFP6+Hce3jM9euW 7XhaLmMubgy80g0qJzuIfBtO5IbAfvF/BvXHQWOXn9YXcO/krmDT/U3YMRYQ/afCceAgyxPmF 46tJRPL1vTZjs0QgSnHOFO8Nzy4FznGREEMonw== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I tried to figure out what it would take to do a version 4 mmap packet socket interface to completely avoid the y2106 overflow problem. This is what I came up with, reusing most of the v3 code, except for the parts where we access the timestamps. For kselftest, I'm adding support for testing v4 in addition to v1-v3, but the test currently does not look at the timestamps, so it won't check that the timestamp format actually works as intended, only that I didn't break the parts that worked in the v3 selftest. Overall, this is more of a mess than I expected, so it's probably not worth doing a v4 format just for the timestamp, but the patch can serve as a reference for anyone that needs a new format for other reasons and fixes this along with the other changes. Signed-off-by: Arnd Bergmann --- Untested and rather invasive, so don't apply this part without discussion and testing --- include/uapi/linux/if_packet.h | 24 +++++- net/packet/af_packet.c | 115 ++++++++++++++++++++-------- tools/testing/selftests/net/psock_tpacket.c | 65 +++++++++------- 3 files changed, 142 insertions(+), 62 deletions(-) -- 2.9.0 diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index 67b61d91d89b..c2cf29acdd40 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -177,8 +177,27 @@ struct tpacket3_hdr { __u8 tp_padding[8]; }; +struct tpacket4_hdr { + __u32 tp_next_offset; + __u32 tp_nsec_hi; + __u32 tp_nsec_lo; + __u32 tp_snaplen; + __u32 tp_len; + __u32 tp_status; + __u16 tp_mac; + __u16 tp_net; + /* pkt_hdr variants */ + union { + struct tpacket_hdr_variant1 hv1; + }; + __u8 tp_padding[8]; +}; + struct tpacket_bd_ts { - unsigned int ts_sec; + union { + unsigned int ts_nsec_hi; + unsigned int ts_sec; + }; union { unsigned int ts_usec; unsigned int ts_nsec; @@ -250,7 +269,8 @@ struct tpacket_block_desc { enum tpacket_versions { TPACKET_V1, TPACKET_V2, - TPACKET_V3 + TPACKET_V3, + TPACKET_V4, }; /* diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 7432c6699818..34a07e4a93a5 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -164,6 +164,7 @@ union tpacket_uhdr { struct tpacket_hdr *h1; struct tpacket2_hdr *h2; struct tpacket3_hdr *h3; + struct tpacket4_hdr *h4; void *raw; }; @@ -200,7 +201,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *, struct packet_sock *, unsigned int status); static int prb_queue_frozen(struct tpacket_kbdq_core *); static void prb_open_block(struct tpacket_kbdq_core *, - struct tpacket_block_desc *); + struct tpacket_block_desc *, struct packet_sock *po); static void prb_retire_rx_blk_timer_expired(struct timer_list *); static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *); static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *); @@ -404,6 +405,7 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status) flush_dcache_page(pgv_to_page(&h.h2->tp_status)); break; case TPACKET_V3: + case TPACKET_V4: h.h3->tp_status = status; flush_dcache_page(pgv_to_page(&h.h3->tp_status)); break; @@ -430,6 +432,7 @@ static int __packet_get_status(struct packet_sock *po, void *frame) flush_dcache_page(pgv_to_page(&h.h2->tp_status)); return h.h2->tp_status; case TPACKET_V3: + case TPACKET_V4: flush_dcache_page(pgv_to_page(&h.h3->tp_status)); return h.h3->tp_status; default: @@ -439,17 +442,17 @@ static int __packet_get_status(struct packet_sock *po, void *frame) } } -static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts, +static __u32 tpacket_get_timestamp(struct sk_buff *skb, s64 *stamp, unsigned int flags) { struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); if (shhwtstamps && (flags & SOF_TIMESTAMPING_RAW_HARDWARE) && - ktime_to_timespec64_cond(shhwtstamps->hwtstamp, ts)) + (*stamp = ktime_to_ns(shhwtstamps->hwtstamp))) return TP_STATUS_TS_RAW_HARDWARE; - if (ktime_to_timespec64_cond(skb->tstamp, ts)) + if ((*stamp = ktime_to_ns(skb->tstamp))) return TP_STATUS_TS_SOFTWARE; return 0; @@ -460,19 +463,15 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, { union tpacket_uhdr h; struct timespec64 ts; + s64 stamp; __u32 ts_status; - if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) + if (!(ts_status = tpacket_get_timestamp(skb, &stamp, po->tp_tstamp))) return 0; h.raw = frame; - /* - * versions 1 through 3 overflow the timestamps in y2106, since they - * all store the seconds in a 32-bit unsigned integer. - * If we create a version 4, that should have a 64-bit timestamp, - * either 64-bit seconds + 32-bit nanoseconds, or just 64-bit - * nanoseconds. - */ + + ts = ns_to_timespec64(stamp); switch (po->tp_version) { case TPACKET_V1: h.h1->tp_sec = ts.tv_sec; @@ -486,6 +485,9 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, h.h3->tp_sec = ts.tv_sec; h.h3->tp_nsec = ts.tv_nsec; break; + case TPACKET_V4: + h.h4->tp_nsec_hi = upper_32_bits(stamp); + h.h4->tp_nsec_lo = lower_32_bits(stamp); default: WARN(1, "TPACKET version not supported.\n"); BUG(); @@ -633,7 +635,7 @@ static void init_prb_bdqc(struct packet_sock *po, p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv); prb_init_ft_ops(p1, req_u); prb_setup_retire_blk_timer(po); - prb_open_block(p1, pbd); + prb_open_block(p1, pbd, po); } /* Do NOT update the last_blk_num first. @@ -730,7 +732,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t) * opening a block thaws the queue,restarts timer * Thawing/timer-refresh is a side effect. */ - prb_open_block(pkc, pbd); + prb_open_block(pkc, pbd, po); goto out; } } @@ -792,30 +794,43 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1, { __u32 status = TP_STATUS_USER | stat; - struct tpacket3_hdr *last_pkt; + struct tpacket3_hdr *last_pkt_v3; + struct tpacket4_hdr *last_pkt_v4; struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1; struct sock *sk = &po->sk; if (po->stats.stats3.tp_drops) status |= TP_STATUS_LOSING; - last_pkt = (struct tpacket3_hdr *)pkc1->prev; - last_pkt->tp_next_offset = 0; + last_pkt_v3 = (struct tpacket3_hdr *)pkc1->prev; + last_pkt_v4 = (struct tpacket4_hdr *)pkc1->prev; + last_pkt_v3->tp_next_offset = 0; /* Get the ts of the last pkt */ if (BLOCK_NUM_PKTS(pbd1)) { - h1->ts_last_pkt.ts_sec = last_pkt->tp_sec; - h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec; + if (po->tp_version == TPACKET_V3) { + h1->ts_last_pkt.ts_sec = last_pkt_v3->tp_sec; + h1->ts_last_pkt.ts_nsec = last_pkt_v3->tp_nsec; + } else { + h1->ts_last_pkt.ts_nsec_hi = last_pkt_v4->tp_nsec_hi; + h1->ts_last_pkt.ts_nsec = last_pkt_v4->tp_nsec_lo; + } } else { /* Ok, we tmo'd - so get the current time. * * It shouldn't really happen as we don't close empty * blocks. See prb_retire_rx_blk_timer_expired(). */ - struct timespec64 ts; - ktime_get_real_ts64(&ts); - h1->ts_last_pkt.ts_sec = ts.tv_sec; - h1->ts_last_pkt.ts_nsec = ts.tv_nsec; + if (po->tp_version == TPACKET_V3) { + struct timespec64 ts; + ktime_get_real_ts64(&ts); + h1->ts_last_pkt.ts_sec = ts.tv_sec; + h1->ts_last_pkt.ts_nsec = ts.tv_nsec; + } else { + u64 ns = ktime_get_real_ns(); + h1->ts_last_pkt.ts_nsec_hi = upper_32_bits(ns); + h1->ts_last_pkt.ts_nsec = lower_32_bits(ns); + } } smp_wmb(); @@ -841,9 +856,8 @@ static void prb_thaw_queue(struct tpacket_kbdq_core *pkc) * */ static void prb_open_block(struct tpacket_kbdq_core *pkc1, - struct tpacket_block_desc *pbd1) + struct tpacket_block_desc *pbd1, struct packet_sock *po) { - struct timespec64 ts; struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1; smp_rmb(); @@ -856,10 +870,19 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1, BLOCK_NUM_PKTS(pbd1) = 0; BLOCK_LEN(pbd1) = BLK_PLUS_PRIV(pkc1->blk_sizeof_priv); - ktime_get_real_ts64(&ts); - h1->ts_first_pkt.ts_sec = ts.tv_sec; - h1->ts_first_pkt.ts_nsec = ts.tv_nsec; + if (po->tp_version == TPACKET_V3) { + struct timespec64 ts; + + ktime_get_real_ts64(&ts); + h1->ts_first_pkt.ts_sec = ts.tv_sec; + h1->ts_first_pkt.ts_nsec = ts.tv_nsec; + } else { + s64 ns = ktime_get_real_ns(); + + h1->ts_first_pkt.ts_nsec_hi = upper_32_bits(ns); + h1->ts_first_pkt.ts_nsec = lower_32_bits(ns); + } pkc1->pkblk_start = (char *)pbd1; pkc1->nxt_offset = pkc1->pkblk_start + BLK_PLUS_PRIV(pkc1->blk_sizeof_priv); @@ -936,7 +959,7 @@ static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc, * open this block and return the offset where the first packet * needs to get stored. */ - prb_open_block(pkc, pbd); + prb_open_block(pkc, pbd, po); return (void *)pkc->nxt_offset; } @@ -1068,7 +1091,7 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po, * opening a block also thaws the queue. * Thawing is a side effect. */ - prb_open_block(pkc, pbd); + prb_open_block(pkc, pbd, po); } } @@ -1113,6 +1136,7 @@ static void *packet_current_rx_frame(struct packet_sock *po, po->rx_ring.head, status); return curr; case TPACKET_V3: + case TPACKET_V4: return __packet_lookup_frame_in_block(po, skb, status, len); default: WARN(1, "TPACKET version not supported\n"); @@ -1171,6 +1195,7 @@ static void packet_increment_rx_head(struct packet_sock *po, case TPACKET_V2: return packet_increment_head(rb); case TPACKET_V3: + case TPACKET_V4: default: WARN(1, "TPACKET version not supported.\n"); BUG(); @@ -1279,7 +1304,7 @@ static int __packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb) return ROOM_NONE; } - if (po->tp_version == TPACKET_V3) { + if (po->tp_version == TPACKET_V3 || po->tp_version == TPACKET_V4) { if (__tpacket_v3_has_room(po, ROOM_POW_OFF)) ret = ROOM_NORMAL; else if (__tpacket_v3_has_room(po, 0)) @@ -2192,6 +2217,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, unsigned short macoff, netoff, hdrlen; struct sk_buff *copy_skb = NULL; struct timespec64 ts; + u64 ns; __u32 ts_status; bool is_drop_n_account = false; bool do_vnet = false; @@ -2318,7 +2344,9 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, skb_copy_bits(skb, 0, h.raw + macoff, snaplen); - if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) + if ((ts_status = tpacket_get_timestamp(skb, &ns, po->tp_tstamp))) + ts = ns_to_timespec64(ns); + else ktime_get_real_ts64(&ts); status |= ts_status; @@ -2365,6 +2393,19 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, memset(h.h3->tp_padding, 0, sizeof(h.h3->tp_padding)); hdrlen = sizeof(*h.h3); break; + case TPACKET_V4: + /* identical to v3, except for the timestamp */ + h.h4->tp_status |= status; + h.h4->tp_len = skb->len; + h.h4->tp_snaplen = snaplen; + h.h4->tp_mac = macoff; + h.h4->tp_net = netoff; + ns = timespec64_to_ns(&ts); + h.h4->tp_nsec_hi = upper_32_bits(ns); + h.h4->tp_nsec_lo = lower_32_bits(ns); + memset(h.h3->tp_padding, 0, sizeof(h.h3->tp_padding)); + hdrlen = sizeof(*h.h3); + break; default: BUG(); } @@ -2570,6 +2611,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame, ph.raw = frame; switch (po->tp_version) { + case TPACKET_V4: case TPACKET_V3: if (ph.h3->tp_next_offset != 0) { pr_warn_once("variable sized slot not supported"); @@ -2596,6 +2638,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame, off_max = po->tx_ring.frame_size - tp_len; if (po->sk.sk_type == SOCK_DGRAM) { switch (po->tp_version) { + case TPACKET_V4: case TPACKET_V3: off = ph.h3->tp_net; break; @@ -2608,6 +2651,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame, } } else { switch (po->tp_version) { + case TPACKET_V4: case TPACKET_V3: off = ph.h3->tp_mac; break; @@ -3658,6 +3702,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv len = sizeof(req_u.req); break; case TPACKET_V3: + case TPACKET_V4: default: len = sizeof(req_u.req3); break; @@ -3693,6 +3738,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv case TPACKET_V1: case TPACKET_V2: case TPACKET_V3: + case TPACKET_V4: break; default: return -EINVAL; @@ -3868,7 +3914,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, memset(&po->stats, 0, sizeof(po->stats)); spin_unlock_bh(&sk->sk_receive_queue.lock); - if (po->tp_version == TPACKET_V3) { + if (po->tp_version == TPACKET_V3 || po->tp_version == TPACKET_V4) { lv = sizeof(struct tpacket_stats_v3); st.stats3.tp_packets += st.stats3.tp_drops; data = &st.stats3; @@ -3906,6 +3952,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, val = sizeof(struct tpacket2_hdr); break; case TPACKET_V3: + case TPACKET_V4: val = sizeof(struct tpacket3_hdr); break; default: @@ -4250,6 +4297,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, po->tp_hdrlen = TPACKET2_HDRLEN; break; case TPACKET_V3: + case TPACKET_V4: po->tp_hdrlen = TPACKET3_HDRLEN; break; } @@ -4284,6 +4332,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, if (unlikely(!pg_vec)) goto out; switch (po->tp_version) { + case TPACKET_V4: case TPACKET_V3: /* Block transmit is not supported yet */ if (!tx_ring) { diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c index 7f6cd9fdacf3..b497632b6a70 100644 --- a/tools/testing/selftests/net/psock_tpacket.c +++ b/tools/testing/selftests/net/psock_tpacket.c @@ -3,7 +3,8 @@ * Author: Daniel Borkmann * Chetan Loke (TPACKET_V3 usage example) * - * A basic test of packet socket's TPACKET_V1/TPACKET_V2/TPACKET_V3 behavior. + * A basic test of packet socket's TPACKET_V1/TPACKET_V2/TPACKET_V3/TPACKET_V4 + * behavior. * * Control: * Test the setup of the TPACKET socket with different patterns that are @@ -19,6 +20,7 @@ * - TPACKET_V1: RX_RING, TX_RING * - TPACKET_V2: RX_RING, TX_RING * - TPACKET_V3: RX_RING + * - TPACKET_V4: RX_RING * * License (GPLv2): * @@ -310,12 +312,12 @@ static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr) __sync_synchronize(); } -static inline int __v3_tx_kernel_ready(struct tpacket3_hdr *hdr) +static inline int __v3_v4_tx_kernel_ready(struct tpacket3_hdr *hdr) { return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING)); } -static inline void __v3_tx_user_ready(struct tpacket3_hdr *hdr) +static inline void __v3_v4_tx_user_ready(struct tpacket3_hdr *hdr) { hdr->tp_status = TP_STATUS_SEND_REQUEST; __sync_synchronize(); @@ -329,7 +331,8 @@ static inline int __tx_kernel_ready(void *base, int version) case TPACKET_V2: return __v2_tx_kernel_ready(base); case TPACKET_V3: - return __v3_tx_kernel_ready(base); + case TPACKET_V4: + return __v3_v4_tx_kernel_ready(base); default: bug_on(1); return 0; @@ -346,7 +349,8 @@ static inline void __tx_user_ready(void *base, int version) __v2_tx_user_ready(base); break; case TPACKET_V3: - __v3_tx_user_ready(base); + case TPACKET_V4: + __v3_v4_tx_user_ready(base); break; } } @@ -372,6 +376,7 @@ static inline void *get_next_frame(struct ring *ring, int n) case TPACKET_V2: return ring->rd[n].iov_base; case TPACKET_V3: + case TPACKET_V4: return f0 + (n * ring->req3.tp_frame_size); default: bug_on(1); @@ -454,7 +459,8 @@ static void walk_tx(int sock, struct ring *ring) packet_len); total_bytes += ppd.v2->tp_h.tp_snaplen; break; - case TPACKET_V3: { + case TPACKET_V3: + case TPACKET_V4: { struct tpacket3_hdr *tx = next; tx->tp_snaplen = packet_len; @@ -517,22 +523,22 @@ static void walk_v1_v2(int sock, struct ring *ring) walk_tx(sock, ring); } -static uint64_t __v3_prev_block_seq_num = 0; +static uint64_t __v3_v4_prev_block_seq_num = 0; -void __v3_test_block_seq_num(struct block_desc *pbd) +void __v3_v4_test_block_seq_num(struct block_desc *pbd) { - if (__v3_prev_block_seq_num + 1 != pbd->h1.seq_num) { + if (__v3_v4_prev_block_seq_num + 1 != pbd->h1.seq_num) { fprintf(stderr, "\nprev_block_seq_num:%"PRIu64", expected " "seq:%"PRIu64" != actual seq:%"PRIu64"\n", - __v3_prev_block_seq_num, __v3_prev_block_seq_num + 1, + __v3_v4_prev_block_seq_num, __v3_v4_prev_block_seq_num + 1, (uint64_t) pbd->h1.seq_num); exit(1); } - __v3_prev_block_seq_num = pbd->h1.seq_num; + __v3_v4_prev_block_seq_num = pbd->h1.seq_num; } -static void __v3_test_block_len(struct block_desc *pbd, uint32_t bytes, int block_num) +static void __v3_v4_test_block_len(struct block_desc *pbd, uint32_t bytes, int block_num) { if (pbd->h1.num_pkts && bytes != pbd->h1.blk_len) { fprintf(stderr, "\nblock:%u with %upackets, expected " @@ -542,23 +548,23 @@ static void __v3_test_block_len(struct block_desc *pbd, uint32_t bytes, int bloc } } -static void __v3_test_block_header(struct block_desc *pbd, const int block_num) +static void __v3_v4_test_block_header(struct block_desc *pbd, const int block_num) { if ((pbd->h1.block_status & TP_STATUS_USER) == 0) { fprintf(stderr, "\nblock %u: not in TP_STATUS_USER\n", block_num); exit(1); } - __v3_test_block_seq_num(pbd); + __v3_v4_test_block_seq_num(pbd); } -static void __v3_walk_block(struct block_desc *pbd, const int block_num) +static void __v3_v4_walk_block(struct block_desc *pbd, const int block_num) { int num_pkts = pbd->h1.num_pkts, i; unsigned long bytes = 0, bytes_with_padding = ALIGN_8(sizeof(*pbd)); struct tpacket3_hdr *ppd; - __v3_test_block_header(pbd, block_num); + __v3_v4_test_block_header(pbd, block_num); ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd + pbd->h1.offset_to_first_pkt); @@ -580,17 +586,17 @@ static void __v3_walk_block(struct block_desc *pbd, const int block_num) __sync_synchronize(); } - __v3_test_block_len(pbd, bytes_with_padding, block_num); + __v3_v4_test_block_len(pbd, bytes_with_padding, block_num); total_bytes += bytes; } -void __v3_flush_block(struct block_desc *pbd) +void __v3_v4_flush_block(struct block_desc *pbd) { pbd->h1.block_status = TP_STATUS_KERNEL; __sync_synchronize(); } -static void walk_v3_rx(int sock, struct ring *ring) +static void walk_v3_v4_rx(int sock, struct ring *ring) { unsigned int block_num = 0; struct pollfd pfd; @@ -614,8 +620,8 @@ static void walk_v3_rx(int sock, struct ring *ring) while ((pbd->h1.block_status & TP_STATUS_USER) == 0) poll(&pfd, 1, 1); - __v3_walk_block(pbd, block_num); - __v3_flush_block(pbd); + __v3_v4_walk_block(pbd, block_num); + __v3_v4_flush_block(pbd); block_num = (block_num + 1) % ring->rd_num; } @@ -623,7 +629,7 @@ static void walk_v3_rx(int sock, struct ring *ring) pair_udp_close(udp_sock); if (total_packets != 2 * NUM_PACKETS) { - fprintf(stderr, "walk_v3_rx: received %u out of %u pkts\n", + fprintf(stderr, "walk_v3_v4_rx: received %u out of %u pkts\n", total_packets, NUM_PACKETS); exit(1); } @@ -631,10 +637,10 @@ static void walk_v3_rx(int sock, struct ring *ring) fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1); } -static void walk_v3(int sock, struct ring *ring) +static void walk_v3_v4(int sock, struct ring *ring) { if (ring->type == PACKET_RX_RING) - walk_v3_rx(sock, ring); + walk_v3_v4_rx(sock, ring); else walk_tx(sock, ring); } @@ -655,7 +661,7 @@ static void __v1_v2_fill(struct ring *ring, unsigned int blocks) ring->flen = ring->req.tp_frame_size; } -static void __v3_fill(struct ring *ring, unsigned int blocks, int type) +static void __v3_v4_fill(struct ring *ring, unsigned int blocks, int type) { if (type == PACKET_RX_RING) { ring->req3.tp_retire_blk_tov = 64; @@ -671,7 +677,7 @@ static void __v3_fill(struct ring *ring, unsigned int blocks, int type) ring->req3.tp_block_nr; ring->mm_len = ring->req3.tp_block_size * ring->req3.tp_block_nr; - ring->walk = walk_v3; + ring->walk = walk_v3_v4; ring->rd_num = ring->req3.tp_block_nr; ring->flen = ring->req3.tp_block_size; } @@ -695,7 +701,8 @@ static void setup_ring(int sock, struct ring *ring, int version, int type) break; case TPACKET_V3: - __v3_fill(ring, blocks, type); + case TPACKET_V4: + __v3_v4_fill(ring, blocks, type); ret = setsockopt(sock, SOL_PACKET, type, &ring->req3, sizeof(ring->req3)); break; @@ -804,6 +811,7 @@ static const char *tpacket_str[] = { [TPACKET_V1] = "TPACKET_V1", [TPACKET_V2] = "TPACKET_V2", [TPACKET_V3] = "TPACKET_V3", + [TPACKET_V4] = "TPACKET_V4", }; static const char *type_str[] = { @@ -854,6 +862,9 @@ int main(void) ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING); ret |= test_tpacket(TPACKET_V3, PACKET_TX_RING); + ret |= test_tpacket(TPACKET_V4, PACKET_RX_RING); + ret |= test_tpacket(TPACKET_V4, PACKET_TX_RING); + if (ret) return 1;