From patchwork Tue Nov 28 20:32:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnd Bergmann X-Patchwork-Id: 119901 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp2187043qgn; Tue, 28 Nov 2017 12:34:28 -0800 (PST) X-Google-Smtp-Source: AGs4zMZlI//96xak3ho+6l/S2CdpOcPcwl7SZIET7YOf6PU6MbNgjZ7RVU7554KNehaCNlfbVkR1 X-Received: by 10.99.172.25 with SMTP id v25mr387564pge.182.1511901268242; Tue, 28 Nov 2017 12:34:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511901268; cv=none; d=google.com; s=arc-20160816; b=QRRjDJ2SgpBV9Yd3uqML7pM1BP1IBwqpmA5miZZOEM0EFoCEd+IazTAw3pvYIigEqS +StIWrdNAWr6WiEGwgbbXZO0aEDrwDIbb2GjBDa4wp7ryhznE/NagPZYwIv/MFpubrgg dIa3heZUZTTgV1mDRgBS6xZbjR7h6dXyhdcVHAM9fpPCdRvAd7Rtrdno8744oUVbEPYW 5+GL3kLwmWO9AwdrAGNHNq76Qr+mEK6V/BU4J2aDuNu+BBFfh+CP0L6nfDTz21TnXRa+ BqCO0Yz7e7wJgDAsUWXBpgWgMJy/TUDRRVwY+G/8Sp9qqQQiVHmJ3th4ha1bthz/5+ol M/Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=jRHI2MDEfHSq/4/YzUoBvAX8+29dJjoEzwoIt2710Gw=; b=hLAtReS74DStemufyS6eqh2o/NZuv7hTxpwBvDaICGZmPFTl9vXhO551v/8RZ27JS4 YtB2Sij6rKggNmSOxaRCVsGG7SFtLqQjEMSIukuMD9Z4ZQq48eaegRMmQ0mMmS/HxC3N yT00MKy/3pksNtxQ8yVMWOW0iMkE6k/6JaMuVfZHT89ezNl98gItGaQzBnGzQmurqzf2 yMG2xKFJYipSlojVT7p5YRMKJc+DCTy7SbH0o8wCZ9o6C3WLj96d5+I9/Ibtpr+fKunb /6iMzhOr/2ktyNFOgFcSyJ9seIxu9MiA74eCRW6slKeoeYx09WcRDmBTOj6y8mW6TNRN 3QKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r27si30452pgn.85.2017.11.28.12.34.27; Tue, 28 Nov 2017 12:34:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754253AbdK1Ue0 (ORCPT + 28 others); Tue, 28 Nov 2017 15:34:26 -0500 Received: from mout.kundenserver.de ([212.227.126.187]:63053 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752701AbdK1UeY (ORCPT ); Tue, 28 Nov 2017 15:34:24 -0500 Received: from wuerfel.lan ([109.193.157.232]) by mrelayeu.kundenserver.de (mreue002 [212.227.15.129]) with ESMTPA (Nemesis) id 0MWbfO-1ee6jb18tG-00XZzU; Tue, 28 Nov 2017 21:34:01 +0100 From: Arnd Bergmann To: "David S. Miller" , Willem de Bruijn Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Richard Cochran , Arnd Bergmann , Thomas Gleixner , Mike Maloney , Eric Dumazet , Kees Cook , Hans Liljestrand , Andrey Konovalov , "Rosen, Rami" , "Reshetova, Elena" , Sowmini Varadhan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] [RFC v3] packet: experimental support for 64-bit timestamps Date: Tue, 28 Nov 2017 21:32:42 +0100 Message-Id: <20171128203346.1582725-1-arnd@arndb.de> X-Mailer: git-send-email 2.9.0 In-Reply-To: References: X-Provags-ID: V03:K0:X6FjdKL+PNUgT5t9kvXAPBhfzzH6NSa6XRjls9Tnn9VU+p13bEy jrLhCAgQ7ihLSVw/Cv8OF0Y9PWRZnUuzcfrn9KUaV3+b5spO2UwXVpQelispNWPAL0bM6+H 9oL2eT/+9TQq424LPaJBUYOaqA5MTTwrSyHq1KUQOFsKFSWyO3nogfI21X5yRs1mqErIiRb 7L3rzeAD51n9rKAzfbJwQ== X-UI-Out-Filterresults: notjunk:1; V01:K0:SCMYC0jy7ZQ=:sI27nCQf+tiyuIUUMUzh8r kILjnQUXTX2Wz8G4eMO/Thky2hF6Ppm4PcstX1ClYeGEZ1QlhW3zA0ERykKH4kjSRQ40Jf9Dd DCRJzetIPJhwCNWyoqYj+ZySDOxUV9C+lAJbqOLiFBlcyo81COy2AUIAwKH22adTB5sD+hRsG upPjiPTY1urmgLbR3ZBVL/+rPxGzcR9OKDrqDGuvTpuyBeyN2dChtmmtS3Jv0zC+R6l730eWf XsH5vV2d2+Pdys1ofAhw4kHN447Lm1kVRLDT3Y3+QHGbpSP3Gb8alSwdTLIqH977x7IdzSkIi E0SoVBjImKZE9HCu/qBfRKuFS7MadjSsRMppmpMy98/gbvi0Nayf8Qi7OW0QiOw2Nj3/HER1Z tNMXoQYZCL7AHZo1HrKiHZ5HCYj7CfMHs4PCn7ZYjsA7FNZ8q8dD8Rj0W0ekWgyu2amU+WfPF 5OEvrCiOiBuS+AGbDyEgpLfYT04G/FBJyGUC+Ao2C3FthlTl2ORO77W1Vm9e4D+hmsARWkDpT YruBW79sMeve+Vv7kETUxX6J3b5Eobo1QXumdXfGZWIGePi94+3Pkkbzd2e+eBxSHaDdxwvZ6 WQUGplvalnXZH+lMyAIPindpTRr9MSOXFfX2USKC9SuM1rJn59aWvd68RJZy/Ac8nWRc/n7pq hF2/DhXMTxNghM6w7inMnJmufEWF0Pe7NsjL6NhfpxfYQbFPAsNOm7FJnS7UWqIodML/J8t12 u2H16eEj55d1Ja7koJ5Q1IWZFFna2lIPVYr73Q== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As I noticed in my previous patch to remove the 'timespec' usage in the packet socket, the timestamps in the packet socket are slightly inefficient as they convert a nanosecond value into seconds/nanoseconds or seconds/microseconds. This adds two new socket options for the timestamp to resolve that: PACKET_SKIPTIMESTAMP sets a flag to indicate whether to generate timestamps at all. When this is set, all timestamps are hardcoded to zero, which saves a few cycles for the conversion and the access of the hardware clocksource. The idea was taken from pktgen, which has an F_NO_TIMESTAMP option for the same purpose. PACKET_TIMESTAMP_NS64 changes the interpretation of the time stamp fields: instead of having 32 bits for seconds plus 32 bits for nanoseconds or microseconds, we now always send down 64 bits worth of nanoseconds when this flag is set. Link: https://patchwork.kernel.org/patch/10077199/ Suggested-by: Willem de Bruijn Signed-off-by: Arnd Bergmann --- I still have not done any runtime testing on this patch, only implemented the suggestions from the previous versions. While I don't think anyone is actively looking for this feature, I don't think there are any reasons left against merging it either, and it might come in handy for someone. --- include/uapi/linux/if_packet.h | 2 + net/packet/af_packet.c | 159 +++++++++++++++++++++++++++++------------ net/packet/internal.h | 2 + 3 files changed, 116 insertions(+), 47 deletions(-) -- 2.9.0 diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index 67b61d91d89b..2eba54770e6b 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -57,6 +57,8 @@ struct sockaddr_ll { #define PACKET_QDISC_BYPASS 20 #define PACKET_ROLLOVER_STATS 21 #define PACKET_FANOUT_DATA 22 +#define PACKET_SKIPTIMESTAMP 23 +#define PACKET_TIMESTAMP_NS64 24 #define PACKET_FANOUT_HASH 0 #define PACKET_FANOUT_LB 1 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 7432c6699818..ed6291b564a9 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -200,7 +200,7 @@ static void prb_retire_current_block(struct tpacket_kbdq_core *, struct packet_sock *, unsigned int status); static int prb_queue_frozen(struct tpacket_kbdq_core *); static void prb_open_block(struct tpacket_kbdq_core *, - struct tpacket_block_desc *); + struct tpacket_block_desc *, struct packet_sock *); static void prb_retire_rx_blk_timer_expired(struct timer_list *); static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *); static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *); @@ -439,52 +439,92 @@ static int __packet_get_status(struct packet_sock *po, void *frame) } } -static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec64 *ts, +static __u32 tpacket_get_timestamp(struct sk_buff *skb, __u32 *hi, __u32 *lo, unsigned int flags) { + struct packet_sock *po = pkt_sk(skb->sk); struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); + ktime_t stamp; + u32 type; + + if (po->tp_skiptstamp) + return 0; if (shhwtstamps && - (flags & SOF_TIMESTAMPING_RAW_HARDWARE) && - ktime_to_timespec64_cond(shhwtstamps->hwtstamp, ts)) - return TP_STATUS_TS_RAW_HARDWARE; + (po->tp_tstamp & SOF_TIMESTAMPING_RAW_HARDWARE) && + shhwtstamps->hwtstamp) { + stamp = shhwtstamps->hwtstamp; + type = TP_STATUS_TS_RAW_HARDWARE; + } else if (skb->tstamp) { + stamp = skb->tstamp; + type = TP_STATUS_TS_SOFTWARE; + } else { + return 0; + } - if (ktime_to_timespec64_cond(skb->tstamp, ts)) - return TP_STATUS_TS_SOFTWARE; + if (po->tp_tstamp_ns64) { + __u64 ns = ktime_to_ns(stamp); - return 0; + *hi = upper_32_bits(ns); + *lo = lower_32_bits(ns); + } else { + struct timespec64 ts = ktime_to_timespec64(stamp); + + *hi = ts.tv_sec; + if (po->tp_version == TPACKET_V1) + *lo = ts.tv_nsec / NSEC_PER_USEC; + else + *lo = ts.tv_nsec; + } + + return type; +} + +static void packet_get_time(struct packet_sock *po, __u32 *hi, __u32 *lo) +{ + if (po->tp_skiptstamp) { + *hi = 0; + *lo = 0; + } else if (po->tp_tstamp_ns64) { + __u64 ns = ktime_get_real_ns(); + + *hi = upper_32_bits(ns); + *hi = lower_32_bits(ns); + } else { + struct timespec64 ts; + + ktime_get_real_ts64(&ts); + /* unsigned seconds overflow in y2106 here */ + *hi = ts.tv_sec; + if (po->tp_version == TPACKET_V1) + *lo = ts.tv_nsec / NSEC_PER_USEC; + else + *lo = ts.tv_nsec; + } } static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame, struct sk_buff *skb) { union tpacket_uhdr h; - struct timespec64 ts; - __u32 ts_status; + __u32 ts_status, hi, lo; - if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) + if (!(ts_status = tpacket_get_timestamp(skb, &hi, &lo, po->tp_tstamp))) return 0; h.raw = frame; - /* - * versions 1 through 3 overflow the timestamps in y2106, since they - * all store the seconds in a 32-bit unsigned integer. - * If we create a version 4, that should have a 64-bit timestamp, - * either 64-bit seconds + 32-bit nanoseconds, or just 64-bit - * nanoseconds. - */ switch (po->tp_version) { case TPACKET_V1: - h.h1->tp_sec = ts.tv_sec; - h.h1->tp_usec = ts.tv_nsec / NSEC_PER_USEC; + h.h1->tp_sec = hi; + h.h1->tp_usec = lo; break; case TPACKET_V2: - h.h2->tp_sec = ts.tv_sec; - h.h2->tp_nsec = ts.tv_nsec; + h.h2->tp_sec = hi; + h.h2->tp_nsec = lo; break; case TPACKET_V3: - h.h3->tp_sec = ts.tv_sec; - h.h3->tp_nsec = ts.tv_nsec; + h.h3->tp_sec = hi; + h.h3->tp_nsec = lo; break; default: WARN(1, "TPACKET version not supported.\n"); @@ -633,7 +673,7 @@ static void init_prb_bdqc(struct packet_sock *po, p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv); prb_init_ft_ops(p1, req_u); prb_setup_retire_blk_timer(po); - prb_open_block(p1, pbd); + prb_open_block(p1, pbd, po); } /* Do NOT update the last_blk_num first. @@ -730,7 +770,7 @@ static void prb_retire_rx_blk_timer_expired(struct timer_list *t) * opening a block thaws the queue,restarts timer * Thawing/timer-refresh is a side effect. */ - prb_open_block(pkc, pbd); + prb_open_block(pkc, pbd, po); goto out; } } @@ -812,10 +852,8 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1, * It shouldn't really happen as we don't close empty * blocks. See prb_retire_rx_blk_timer_expired(). */ - struct timespec64 ts; - ktime_get_real_ts64(&ts); - h1->ts_last_pkt.ts_sec = ts.tv_sec; - h1->ts_last_pkt.ts_nsec = ts.tv_nsec; + packet_get_time(po, &h1->ts_last_pkt.ts_sec, + &h1->ts_last_pkt.ts_nsec); } smp_wmb(); @@ -841,9 +879,8 @@ static void prb_thaw_queue(struct tpacket_kbdq_core *pkc) * */ static void prb_open_block(struct tpacket_kbdq_core *pkc1, - struct tpacket_block_desc *pbd1) + struct tpacket_block_desc *pbd1, struct packet_sock *po) { - struct timespec64 ts; struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1; smp_rmb(); @@ -856,10 +893,8 @@ static void prb_open_block(struct tpacket_kbdq_core *pkc1, BLOCK_NUM_PKTS(pbd1) = 0; BLOCK_LEN(pbd1) = BLK_PLUS_PRIV(pkc1->blk_sizeof_priv); - ktime_get_real_ts64(&ts); - - h1->ts_first_pkt.ts_sec = ts.tv_sec; - h1->ts_first_pkt.ts_nsec = ts.tv_nsec; + packet_get_time(po, &h1->ts_first_pkt.ts_sec, + &h1->ts_first_pkt.ts_nsec); pkc1->pkblk_start = (char *)pbd1; pkc1->nxt_offset = pkc1->pkblk_start + BLK_PLUS_PRIV(pkc1->blk_sizeof_priv); @@ -936,7 +971,7 @@ static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc, * open this block and return the offset where the first packet * needs to get stored. */ - prb_open_block(pkc, pbd); + prb_open_block(pkc, pbd, po); return (void *)pkc->nxt_offset; } @@ -1068,7 +1103,7 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po, * opening a block also thaws the queue. * Thawing is a side effect. */ - prb_open_block(pkc, pbd); + prb_open_block(pkc, pbd, po); } } @@ -2191,8 +2226,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, unsigned long status = TP_STATUS_USER; unsigned short macoff, netoff, hdrlen; struct sk_buff *copy_skb = NULL; - struct timespec64 ts; __u32 ts_status; + __u32 hi, lo; bool is_drop_n_account = false; bool do_vnet = false; @@ -2318,8 +2353,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, skb_copy_bits(skb, 0, h.raw + macoff, snaplen); - if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) - ktime_get_real_ts64(&ts); + if (!(ts_status = tpacket_get_timestamp(skb, &hi, &lo, po->tp_tstamp))) + packet_get_time(po, &hi, &lo); status |= ts_status; @@ -2329,8 +2364,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, h.h1->tp_snaplen = snaplen; h.h1->tp_mac = macoff; h.h1->tp_net = netoff; - h.h1->tp_sec = ts.tv_sec; - h.h1->tp_usec = ts.tv_nsec / NSEC_PER_USEC; + h.h1->tp_sec = hi; + h.h1->tp_usec = lo; hdrlen = sizeof(*h.h1); break; case TPACKET_V2: @@ -2338,8 +2373,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, h.h2->tp_snaplen = snaplen; h.h2->tp_mac = macoff; h.h2->tp_net = netoff; - h.h2->tp_sec = ts.tv_sec; - h.h2->tp_nsec = ts.tv_nsec; + h.h2->tp_sec = hi; + h.h2->tp_nsec = lo; if (skb_vlan_tag_present(skb)) { h.h2->tp_vlan_tci = skb_vlan_tag_get(skb); h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto); @@ -2360,8 +2395,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, h.h3->tp_snaplen = snaplen; h.h3->tp_mac = macoff; h.h3->tp_net = netoff; - h.h3->tp_sec = ts.tv_sec; - h.h3->tp_nsec = ts.tv_nsec; + h.h3->tp_sec = hi; + h.h3->tp_nsec = lo; memset(h.h3->tp_padding, 0, sizeof(h.h3->tp_padding)); hdrlen = sizeof(*h.h3); break; @@ -3792,6 +3827,30 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv po->tp_tstamp = val; return 0; } + case PACKET_SKIPTIMESTAMP: + { + int val; + + if (optlen != sizeof(val)) + return -EINVAL; + if (copy_from_user(&val, optval, sizeof(val))) + return -EFAULT; + + po->tp_skiptstamp = val; + return 0; + } + case PACKET_TIMESTAMP_NS64: + { + int val; + + if (optlen != sizeof(val)) + return -EINVAL; + if (copy_from_user(&val, optval, sizeof(val))) + return -EFAULT; + + po->tp_tstamp_ns64 = val; + return 0; + } case PACKET_FANOUT: { int val; @@ -3921,6 +3980,12 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, case PACKET_TIMESTAMP: val = po->tp_tstamp; break; + case PACKET_SKIPTIMESTAMP: + val = po->tp_skiptstamp; + break; + case PACKET_TIMESTAMP_NS64: + val = po->tp_tstamp_ns64; + break; case PACKET_FANOUT: val = (po->fanout ? ((u32)po->fanout->id | diff --git a/net/packet/internal.h b/net/packet/internal.h index 562fbc155006..20b69512210f 100644 --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -128,6 +128,8 @@ struct packet_sock { unsigned int tp_reserve; unsigned int tp_loss:1; unsigned int tp_tx_has_off:1; + unsigned int tp_skiptstamp:1; + unsigned int tp_tstamp_ns64:1; unsigned int tp_tstamp; struct net_device __rcu *cached_dev; int (*xmit)(struct sk_buff *skb);