[RFC,01/13] futex2: Implement wait and wake functions

Create a new set of futex syscalls known as futex2. This new interface
is aimed to implement a more maintainable code, while removing obsolete
features and expanding it with new functionalities.

Implements wait and wake semantics for futexes, along with the base
infrastructure for future operations. The whole wait path is designed to
be used by N waiters, thus making easier to implement vectorized wait.

* Syscalls implemented by this patch:

- futex_wait(void *uaddr, unsigned int val, unsigned int flags,
	     struct timespec *timo)

   The user thread is put to sleep, waiting for a futex_wake() at uaddr,
   if the value at *uaddr is the same as val (otherwise, the syscall
   returns immediately with -EAGAIN). timo is an optional timeout value
   for the operation.

   Return 0 on success, error code otherwise.

 - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)

   Wake `nr_wake` threads waiting at uaddr.

   Return the number of woken threads on success, error code otherwise.

** The `flag` argument

 The flag is used to specify the size of the futex word
 (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no
 default size.

 By default, the timeout uses a monotonic clock, but can be used as a
 realtime one by using the FUTEX_REALTIME_CLOCK flag.

 By default, futexes are of the private type, that means that this user
 address will be accessed by threads that shares the same memory region.
 This allows for some internal optimizations, so they are faster.
 However, if the address needs to be shared with different processes
 (like using `mmap()` or `shm()`), they need to be defined as shared and
 the flag FUTEX_SHARED_FLAG is used to set that.

 By default, the operation has no NUMA-awareness, meaning that the user
 can't choose the memory node where the kernel side futex data will be
 stored. The user can choose the node where it wants to operate by
 setting the FUTEX_NUMA_FLAG and using the following structure (where X
 can be 8, 16, or 32):

  struct futexX_numa {
          __uX value;
          __sX hint;
  };

 This structure should be passed at the `void *uaddr` of futex
 functions. The address of the structure will be used to be waited/waken
 on, and the `value` will be compared to `val` as usual. The `hint`
 member is used to defined which node the futex will use. When waiting,
 the futex will be registered on a kernel-side table stored on that
 node; when waking, the futex will be searched for on that given table.
 That means that there's no redundancy between tables, and the wrong
 `hint` value will led to undesired behavior.  Userspace is responsible
 for dealing with node migrations issues that may occur. `hint` can
 range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use
 the same node the current process is using.

 When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be
 stored on a global table on some node, defined at compilation time.

** The `timo` argument

As per the Y2038 work done in the kernel, new interfaces shouldn't add
timeout options known to be buggy. Given that, `timo` should be a 64bit
timeout at all platforms, using an absolute timeout value.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
---
 MAINTAINERS                                   |   2 +-
 arch/arm/tools/syscall.tbl                    |   2 +
 arch/arm64/include/asm/unistd.h               |   2 +-
 arch/arm64/include/asm/unistd32.h             |   4 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   2 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   2 +
 include/linux/syscalls.h                      |   7 +
 include/uapi/asm-generic/unistd.h             |   8 +-
 include/uapi/linux/futex.h                    |  56 ++
 init/Kconfig                                  |   7 +
 kernel/Makefile                               |   1 +
 kernel/futex2.c                               | 625 ++++++++++++++++++
 kernel/sys_ni.c                               |   4 +
 tools/include/uapi/asm-generic/unistd.h       |   8 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl    |   2 +
 15 files changed, 728 insertions(+), 4 deletions(-)
 create mode 100644 kernel/futex2.c

Message ID	20210215152404.250281-2-andrealmeid@collabora.com
State	New
Headers	show Return-Path: <linux-kselftest-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A0BCC43381 for <linux-kselftest@archiver.kernel.org>; Mon, 15 Feb 2021 15:25:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4BEF964E8D for <linux-kselftest@archiver.kernel.org>; Mon, 15 Feb 2021 15:25:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230359AbhBOPZb (ORCPT <rfc822;linux-kselftest@archiver.kernel.org>); Mon, 15 Feb 2021 10:25:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230233AbhBOPZP (ORCPT <rfc822;linux-kselftest@vger.kernel.org>); Mon, 15 Feb 2021 10:25:15 -0500 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F074C0613D6; Mon, 15 Feb 2021 07:24:34 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: tonyk) with ESMTPSA id C5A9F1F44DE7 From: =?utf-8?q?Andr=C3=A9_Almeida?= <andrealmeid@collabora.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Darren Hart <dvhart@infradead.org>, linux-kernel@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: kernel@collabora.com, krisman@collabora.com, pgriffais@valvesoftware.com, z.figura12@gmail.com, joel@joelfernandes.org, malteskarupke@fastmail.fm, linux-api@vger.kernel.org, fweimer@redhat.com, libc-alpha@sourceware.org, linux-kselftest@vger.kernel.org, shuah@kernel.org, acme@kernel.org, corbet@lwn.net, =?utf-8?b?QW5kcsOp?= =?utf-8?q?_Almeida?= <andrealmeid@collabora.com> Subject: [RFC PATCH 01/13] futex2: Implement wait and wake functions Date: Mon, 15 Feb 2021 12:23:52 -0300 Message-Id: <20210215152404.250281-2-andrealmeid@collabora.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210215152404.250281-1-andrealmeid@collabora.com> References: <20210215152404.250281-1-andrealmeid@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kselftest.vger.kernel.org> X-Mailing-List: linux-kselftest@vger.kernel.org
Series	Add futex2 syscalls \| expand [RFC,00/13] Add futex2 syscalls [RFC,01/13] futex2: Implement wait and wake functions [RFC,02/13] futex2: Add support for shared futexes [RFC,03/13] futex2: Implement vectorized wait [RFC,04/13] futex2: Implement requeue operation [RFC,05/13] futex2: Add compatibility entry point for x86_x32 ABI [RFC,06/13] docs: locking: futex2: Add documentation [RFC,07/13] selftests: futex2: Add wake/wait test [RFC,08/13] selftests: futex2: Add timeout test [RFC,09/13] selftests: futex2: Add wouldblock test [RFC,10/13] selftests: futex2: Add waitv test [RFC,11/13] selftests: futex2: Add requeue test [RFC,12/13] perf bench: Add futex2 benchmark tests [RFC,13/13] kernel: Enable waitpid() for futex2

[RFC,01/13] futex2: Implement wait and wake functions

Commit Message

Comments

Patch