From patchwork Thu Jul 20 16:30:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704760 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37471EB64DC for ; Thu, 20 Jul 2023 16:33:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229776AbjGTQdY (ORCPT ); Thu, 20 Jul 2023 12:33:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229990AbjGTQdX (ORCPT ); Thu, 20 Jul 2023 12:33:23 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA53619A1 for ; Thu, 20 Jul 2023 09:31:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FgooZspr4TQIjuc/tHX/XT4D8n/wo2nwvhkclMi4+iU=; b=jPPiAZqqYxOAXry4nBWbwLOw7CAhzUdoZVDhrintt9G8yRHxlz+JgS/dE4wKnktMJeELJ7 yzdmRyQgc6qMr5aAvYbgqbMtrSjxpk9StOxXc1bAME32XGqrGUgzdglI0r0Fp6d4au7/Y1 pFvdcLslC0JxsH0vXaZwQ/fo0/1/MKE= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-558-fsF-jKQQP5yEcGDp2gQ-Yw-1; Thu, 20 Jul 2023 12:31:53 -0400 X-MC-Unique: fsF-jKQQP5yEcGDp2gQ-Yw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 986C22812951; Thu, 20 Jul 2023 16:31:49 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6C05640C206F; Thu, 20 Jul 2023 16:31:40 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 01/20] tracing/filters: Dynamically allocate filter_pred.regex Date: Thu, 20 Jul 2023 17:30:37 +0100 Message-Id: <20230720163056.2564824-2-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Every predicate allocation includes a MAX_FILTER_STR_VAL (256) char array in the regex field, even if the predicate function does not use the field. A later commit will introduce a dynamically allocated cpumask to struct filter_pred, which will require a dedicated freeing function. Bite the bullet and make filter_pred.regex dynamically allocated. While at it, reorder the fields of filter_pred to fill in the byte holes. The struct now fits on a single cacheline. No change in behaviour intended. The kfree()'s were patched via Coccinelle: @@ struct filter_pred *pred; @@ -kfree(pred); +free_predicate(pred); Signed-off-by: Valentin Schneider --- kernel/trace/trace_events_filter.c | 64 ++++++++++++++++++------------ 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 1dad64267878c..91fc9990107f1 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -70,15 +70,15 @@ enum filter_pred_fn { }; struct filter_pred { - enum filter_pred_fn fn_num; - u64 val; - u64 val2; - struct regex regex; + struct regex *regex; unsigned short *ops; struct ftrace_event_field *field; - int offset; + u64 val; + u64 val2; + enum filter_pred_fn fn_num; + int offset; int not; - int op; + int op; }; /* @@ -186,6 +186,14 @@ enum { PROCESS_OR = 4, }; +static void free_predicate(struct filter_pred *pred) +{ + if (pred) { + kfree(pred->regex); + kfree(pred); + } +} + /* * Without going into a formal proof, this explains the method that is used in * parsing the logical expressions. @@ -623,7 +631,7 @@ predicate_parse(const char *str, int nr_parens, int nr_preds, kfree(inverts); if (prog_stack) { for (i = 0; prog_stack[i].pred; i++) - kfree(prog_stack[i].pred); + free_predicate(prog_stack[i].pred); kfree(prog_stack); } return ERR_PTR(ret); @@ -750,7 +758,7 @@ static int filter_pred_string(struct filter_pred *pred, void *event) char *addr = (char *)(event + pred->offset); int cmp, match; - cmp = pred->regex.match(addr, &pred->regex, pred->regex.field_len); + cmp = pred->regex->match(addr, pred->regex, pred->regex->field_len); match = cmp ^ pred->not; @@ -763,7 +771,7 @@ static __always_inline int filter_pchar(struct filter_pred *pred, char *str) int len; len = strlen(str) + 1; /* including tailing '\0' */ - cmp = pred->regex.match(str, &pred->regex, len); + cmp = pred->regex->match(str, pred->regex, len); match = cmp ^ pred->not; @@ -813,7 +821,7 @@ static int filter_pred_strloc(struct filter_pred *pred, void *event) char *addr = (char *)(event + str_loc); int cmp, match; - cmp = pred->regex.match(addr, &pred->regex, str_len); + cmp = pred->regex->match(addr, pred->regex, str_len); match = cmp ^ pred->not; @@ -836,7 +844,7 @@ static int filter_pred_strrelloc(struct filter_pred *pred, void *event) char *addr = (char *)(&item[1]) + str_loc; int cmp, match; - cmp = pred->regex.match(addr, &pred->regex, str_len); + cmp = pred->regex->match(addr, pred->regex, str_len); match = cmp ^ pred->not; @@ -874,7 +882,7 @@ static int filter_pred_comm(struct filter_pred *pred, void *event) { int cmp; - cmp = pred->regex.match(current->comm, &pred->regex, + cmp = pred->regex->match(current->comm, pred->regex, TASK_COMM_LEN); return cmp ^ pred->not; } @@ -1004,7 +1012,7 @@ enum regex_type filter_parse_regex(char *buff, int len, char **search, int *not) static void filter_build_regex(struct filter_pred *pred) { - struct regex *r = &pred->regex; + struct regex *r = pred->regex; char *search; enum regex_type type = MATCH_FULL; @@ -1169,7 +1177,7 @@ static void free_prog(struct event_filter *filter) return; for (i = 0; prog[i].pred; i++) - kfree(prog[i].pred); + free_predicate(prog[i].pred); kfree(prog); } @@ -1553,9 +1561,12 @@ static int parse_pred(const char *str, void *data, goto err_free; } - pred->regex.len = len; - strncpy(pred->regex.pattern, str + s, len); - pred->regex.pattern[len] = 0; + pred->regex = kzalloc(sizeof(*pred->regex), GFP_KERNEL); + if (!pred->regex) + goto err_mem; + pred->regex->len = len; + strncpy(pred->regex->pattern, str + s, len); + pred->regex->pattern[len] = 0; /* This is either a string, or an integer */ } else if (str[i] == '\'' || str[i] == '"') { @@ -1597,9 +1608,12 @@ static int parse_pred(const char *str, void *data, goto err_free; } - pred->regex.len = len; - strncpy(pred->regex.pattern, str + s, len); - pred->regex.pattern[len] = 0; + pred->regex = kzalloc(sizeof(*pred->regex), GFP_KERNEL); + if (!pred->regex) + goto err_mem; + pred->regex->len = len; + strncpy(pred->regex->pattern, str + s, len); + pred->regex->pattern[len] = 0; filter_build_regex(pred); @@ -1608,7 +1622,7 @@ static int parse_pred(const char *str, void *data, } else if (field->filter_type == FILTER_STATIC_STRING) { pred->fn_num = FILTER_PRED_FN_STRING; - pred->regex.field_len = field->size; + pred->regex->field_len = field->size; } else if (field->filter_type == FILTER_DYN_STRING) { pred->fn_num = FILTER_PRED_FN_STRLOC; @@ -1691,10 +1705,10 @@ static int parse_pred(const char *str, void *data, return i; err_free: - kfree(pred); + free_predicate(pred); return -EINVAL; err_mem: - kfree(pred); + free_predicate(pred); return -ENOMEM; } @@ -2287,8 +2301,8 @@ static int ftrace_function_set_filter_pred(struct filter_pred *pred, return ret; return __ftrace_function_set_filter(pred->op == OP_EQ, - pred->regex.pattern, - pred->regex.len, + pred->regex->pattern, + pred->regex->len, data); } From patchwork Thu Jul 20 16:30:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A80BAEB64DC for ; Thu, 20 Jul 2023 16:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231738AbjGTQdx (ORCPT ); Thu, 20 Jul 2023 12:33:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231479AbjGTQda (ORCPT ); Thu, 20 Jul 2023 12:33:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E72D171E for ; Thu, 20 Jul 2023 09:32:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ru1xpHoly8RVJpDGIBIPQWc2v6Ebq5LepRuPaIIUDyE=; b=XBHVfK5P4B8SxU2qNjsgxERdRcr/YGunHigHs/+dmTqAs8u7cyYB7Y6uYNX/oR8ecMaSVa wXGfg1QANZZQLuJVBHz0uvry7cPixHjlrdpTyJgCeTaWqZvhOV0AaO3baPb7Hz2hJJHc+8 hRZ4qA9ZtmcPJDhnZZHQ4F7zQ4VOnsg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-468-L0PDvKruOe6zKXZsgB5siQ-1; Thu, 20 Jul 2023 12:32:33 -0400 X-MC-Unique: L0PDvKruOe6zKXZsgB5siQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B209B1044590; Thu, 20 Jul 2023 16:32:30 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B4F2E40C206F; Thu, 20 Jul 2023 16:32:16 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 04/20] tracing/filters: Enable filtering the CPU common field by a cpumask Date: Thu, 20 Jul 2023 17:30:40 +0100 Message-Id: <20230720163056.2564824-5-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The tracing_cpumask lets us specify which CPUs are traced in a buffer instance, but doesn't let us do this on a per-event basis (unless one creates an instance per event). A previous commit added filtering scalar fields by a user-given cpumask, make this work with the CPU common field as well. This enables doing things like $ trace-cmd record -e 'sched_switch' -f 'CPU & CPUS{12-52}' \ -e 'sched_wakeup' -f 'target_cpu & CPUS{12-52}' Signed-off-by: Valentin Schneider --- kernel/trace/trace_events_filter.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 1e14f801685a8..3009d0c61b532 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -68,6 +68,7 @@ enum filter_pred_fn { FILTER_PRED_FN_PCHAR_USER, FILTER_PRED_FN_PCHAR, FILTER_PRED_FN_CPU, + FILTER_PRED_FN_CPU_CPUMASK, FILTER_PRED_FN_CPUMASK, FILTER_PRED_FN_FUNCTION, FILTER_PRED_FN_, @@ -937,6 +938,14 @@ static int filter_pred_cpu(struct filter_pred *pred, void *event) } } +/* Filter predicate for current CPU vs user-provided cpumask */ +static int filter_pred_cpu_cpumask(struct filter_pred *pred, void *event) +{ + int cpu = raw_smp_processor_id(); + + return do_filter_scalar_cpumask(pred->op, cpu, pred->mask); +} + /* Filter predicate for cpumask field vs user-provided cpumask */ static int filter_pred_cpumask(struct filter_pred *pred, void *event) { @@ -1440,6 +1449,8 @@ static int filter_pred_fn_call(struct filter_pred *pred, void *event) return filter_pred_pchar(pred, event); case FILTER_PRED_FN_CPU: return filter_pred_cpu(pred, event); + case FILTER_PRED_FN_CPU_CPUMASK: + return filter_pred_cpu_cpumask(pred, event); case FILTER_PRED_FN_CPUMASK: return filter_pred_cpumask(pred, event); case FILTER_PRED_FN_FUNCTION: @@ -1659,6 +1670,7 @@ static int parse_pred(const char *str, void *data, switch (field->filter_type) { case FILTER_CPUMASK: + case FILTER_CPU: case FILTER_OTHER: break; default: @@ -1714,6 +1726,8 @@ static int parse_pred(const char *str, void *data, i++; if (field->filter_type == FILTER_CPUMASK) { pred->fn_num = FILTER_PRED_FN_CPUMASK; + } else if (field->filter_type == FILTER_CPU) { + pred->fn_num = FILTER_PRED_FN_CPU_CPUMASK; } else { switch (field->size) { case 8: From patchwork Thu Jul 20 16:30:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704758 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EF31EB64DD for ; Thu, 20 Jul 2023 16:33:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231535AbjGTQd4 (ORCPT ); Thu, 20 Jul 2023 12:33:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231290AbjGTQdp (ORCPT ); Thu, 20 Jul 2023 12:33:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38C1819B0 for ; Thu, 20 Jul 2023 09:32:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870774; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Aip5VpUJA7qveL6maeQFBeNCXZHRxiQ1qBeFLUBOMzY=; b=KOlGoNQ1nMWph6oOFnpFaLMmisN/fkG1cWN6c1/gqm6HD0gxEiqJft2Wyc8vTJwOWYRGAQ 3Emgylf4vg0RtV+B0aFkeP3iVMYG+p5XXEJ3gQ1RTH6Q0Vk5vmqcLaENpSVjgBKb5FywV3 nga/eOvvRB3+cUUlvSzZrG8CkdMMGG4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-660-hS_t_EPEMJWOi3NXCeMnHQ-1; Thu, 20 Jul 2023 12:32:51 -0400 X-MC-Unique: hS_t_EPEMJWOi3NXCeMnHQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 91B94901848; Thu, 20 Jul 2023 16:32:46 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D8DB640C206F; Thu, 20 Jul 2023 16:32:38 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 06/20] tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU Date: Thu, 20 Jul 2023 17:30:42 +0100 Message-Id: <20230720163056.2564824-7-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Steven noted that when the user-provided cpumask contains a single CPU, then the filtering function can use a scalar as input instead of a full-fledged cpumask. When the mask contains a single CPU, directly re-use the unsigned field predicate functions. Transform '&' into '==' beforehand. Suggested-by: Steven Rostedt Signed-off-by: Valentin Schneider --- kernel/trace/trace_events_filter.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 2fe65ddeb34ef..54d642fabb7f1 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1750,7 +1750,7 @@ static int parse_pred(const char *str, void *data, * then we can treat it as a scalar input. */ single = cpumask_weight(pred->mask) == 1; - if (single && field->filter_type == FILTER_CPUMASK) { + if (single && field->filter_type != FILTER_CPU) { pred->val = cpumask_first(pred->mask); kfree(pred->mask); } @@ -1761,6 +1761,11 @@ static int parse_pred(const char *str, void *data, FILTER_PRED_FN_CPUMASK; } else if (field->filter_type == FILTER_CPU) { pred->fn_num = FILTER_PRED_FN_CPU_CPUMASK; + } else if (single) { + pred->op = pred->op == OP_BAND ? OP_EQ : pred->op; + pred->fn_num = select_comparison_fn(pred->op, field->size, false); + if (pred->op == OP_NE) + pred->not = 1; } else { switch (field->size) { case 8: From patchwork Thu Jul 20 16:30:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57FE1C0015E for ; Thu, 20 Jul 2023 16:34:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231459AbjGTQeP (ORCPT ); Thu, 20 Jul 2023 12:34:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231743AbjGTQeG (ORCPT ); Thu, 20 Jul 2023 12:34:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9246419B3 for ; Thu, 20 Jul 2023 09:33:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eoYdZ3j5mn2MTw1iUVYlPJSnwNWgHj3M8L9iz2kPLwU=; b=HydDn72CehzrZvIyJPDBlvtdLiM6MpixhN2qMqtTinigwG/7HU4e0olqgpIn9cuSSQLoGb C5MmBi5fFL+l/9EY0pJ/fkQA5rt04R8VqJnFuIe/daNQRYTS1LLMjJblGUPa7xwlvT4lR0 JpOWIcbZIvhcQu2fXLAfo6Du0t/UFCc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-646-xF48V0OiM2SJcrNgAEdCkA-1; Thu, 20 Jul 2023 12:32:57 -0400 X-MC-Unique: xF48V0OiM2SJcrNgAEdCkA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C1AF3936D28; Thu, 20 Jul 2023 16:32:54 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C1F9740C206F; Thu, 20 Jul 2023 16:32:46 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 07/20] tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU Date: Thu, 20 Jul 2023 17:30:43 +0100 Message-Id: <20230720163056.2564824-8-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Steven noted that when the user-provided cpumask contains a single CPU, then the filtering function can use a scalar as input instead of a full-fledged cpumask. In this case we can directly re-use filter_pred_cpu(), we just need to transform '&' into '==' before executing it. Suggested-by: Steven Rostedt Signed-off-by: Valentin Schneider --- kernel/trace/trace_events_filter.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 54d642fabb7f1..fd72dacc5d1b8 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1750,7 +1750,7 @@ static int parse_pred(const char *str, void *data, * then we can treat it as a scalar input. */ single = cpumask_weight(pred->mask) == 1; - if (single && field->filter_type != FILTER_CPU) { + if (single) { pred->val = cpumask_first(pred->mask); kfree(pred->mask); } @@ -1760,7 +1760,12 @@ static int parse_pred(const char *str, void *data, FILTER_PRED_FN_CPUMASK_CPU : FILTER_PRED_FN_CPUMASK; } else if (field->filter_type == FILTER_CPU) { - pred->fn_num = FILTER_PRED_FN_CPU_CPUMASK; + if (single) { + pred->op = pred->op == OP_BAND ? OP_EQ : pred->op; + pred->fn_num = FILTER_PRED_FN_CPU; + } else { + pred->fn_num = FILTER_PRED_FN_CPU_CPUMASK; + } } else if (single) { pred->op = pred->op == OP_BAND ? OP_EQ : pred->op; pred->fn_num = select_comparison_fn(pred->op, field->size, false); From patchwork Thu Jul 20 16:30:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704756 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 567F5EB64DC for ; Thu, 20 Jul 2023 16:34:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231944AbjGTQeh (ORCPT ); Thu, 20 Jul 2023 12:34:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230526AbjGTQeS (ORCPT ); Thu, 20 Jul 2023 12:34:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86A2B19A1 for ; Thu, 20 Jul 2023 09:33:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870802; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G5HBHaVT0pfIJAbbI8C24iwVVdenu59ApFnKMQXGGMQ=; b=aV3mV1lHwwlMzR6aaEDCnpOfob9J0NixECFA0gvFzQDMx3YIHU0LuNXgjIzkX5HtZWy2sr vzjjPEbKlu5f6jpqzr2QlKbrHmI+AW0Thf4ynnT0x6LhvUqrfm0RvPgAZxS2nLTaIsBI9N 7RP8+Y5aVWEqLiE/z1OpWPjN8gFpIT8= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-543-hhF8YKTPPJa4Xr8UF_kvXw-1; Thu, 20 Jul 2023 12:33:21 -0400 X-MC-Unique: hhF8YKTPPJa4Xr8UF_kvXw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 679323815EF2; Thu, 20 Jul 2023 16:33:18 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E3F8940C206F; Thu, 20 Jul 2023 16:33:10 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Peter Zijlstra , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 10/20] jump_label,module: Don't alloc static_key_mod for __ro_after_init keys Date: Thu, 20 Jul 2023 17:30:46 +0100 Message-Id: <20230720163056.2564824-11-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Peter Zijlstra When a static_key is marked ro_after_init, its state will never change (after init), therefore jump_label_update() will never need to iterate the entries, and thus module load won't actually need to track this -- avoiding the static_key::next write. Therefore, mark these keys such that jump_label_add_module() might recognise them and avoid the modification. Use the special state: 'static_key_linked(key) && !static_key_mod(key)' to denote such keys. Link: http://lore.kernel.org/r/20230705204142.GB2813335@hirez.programming.kicks-ass.net NOT-Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Valentin Schneider --- @Peter: I've barely touched this patch, it's just been writing a comment and fixing benign compilation issues, so credit's all yours really! --- include/asm-generic/sections.h | 5 ++++ include/linux/jump_label.h | 1 + init/main.c | 1 + kernel/jump_label.c | 49 ++++++++++++++++++++++++++++++++++ 4 files changed, 56 insertions(+) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index db13bb620f527..c768de6f19a9a 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -180,6 +180,11 @@ static inline bool is_kernel_rodata(unsigned long addr) addr < (unsigned long)__end_rodata; } +static inline bool is_kernel_ro_after_init(unsigned long addr) +{ + return addr >= (unsigned long)__start_ro_after_init && + addr < (unsigned long)__end_ro_after_init; +} /** * is_kernel_inittext - checks if the pointer address is located in the * .init.text section diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index f0a949b7c9733..88ef9e776af8d 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -216,6 +216,7 @@ extern struct jump_entry __start___jump_table[]; extern struct jump_entry __stop___jump_table[]; extern void jump_label_init(void); +extern void jump_label_ro(void); extern void jump_label_lock(void); extern void jump_label_unlock(void); extern void arch_jump_label_transform(struct jump_entry *entry, diff --git a/init/main.c b/init/main.c index ad920fac325c3..cb5304ca18f4d 100644 --- a/init/main.c +++ b/init/main.c @@ -1403,6 +1403,7 @@ static void mark_readonly(void) * insecure pages which are W+X. */ rcu_barrier(); + jump_label_ro(); mark_rodata_ro(); rodata_test(); } else diff --git a/kernel/jump_label.c b/kernel/jump_label.c index d9c822bbffb8d..661ef74dee9b7 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -530,6 +530,45 @@ void __init jump_label_init(void) cpus_read_unlock(); } +static inline bool static_key_sealed(struct static_key *key) +{ + return (key->type & JUMP_TYPE_LINKED) && !(key->type & ~JUMP_TYPE_MASK); +} + +static inline void static_key_seal(struct static_key *key) +{ + unsigned long type = key->type & JUMP_TYPE_TRUE; + key->type = JUMP_TYPE_LINKED | type; +} + +void jump_label_ro(void) +{ + struct jump_entry *iter_start = __start___jump_table; + struct jump_entry *iter_stop = __stop___jump_table; + struct jump_entry *iter; + + if (WARN_ON_ONCE(!static_key_initialized)) + return; + + cpus_read_lock(); + jump_label_lock(); + + for (iter = iter_start; iter < iter_stop; iter++) { + struct static_key *iterk = jump_entry_key(iter); + + if (!is_kernel_ro_after_init((unsigned long)iterk)) + continue; + + if (static_key_sealed(iterk)) + continue; + + static_key_seal(iterk); + } + + jump_label_unlock(); + cpus_read_unlock(); +} + #ifdef CONFIG_MODULES enum jump_label_type jump_label_init_type(struct jump_entry *entry) @@ -650,6 +689,15 @@ static int jump_label_add_module(struct module *mod) static_key_set_entries(key, iter); continue; } + + /* + * If the key was sealed at init, then there's no need to keep a + * a reference to its module entries - just patch them now and + * be done with it. + */ + if (static_key_sealed(key)) + goto do_poke; + jlm = kzalloc(sizeof(struct static_key_mod), GFP_KERNEL); if (!jlm) return -ENOMEM; @@ -675,6 +723,7 @@ static int jump_label_add_module(struct module *mod) static_key_set_linked(key); /* Only update if we've changed from our initial state */ +do_poke: if (jump_label_type(iter) != jump_label_init_type(iter)) __jump_label_update(key, iter, iter_stop, true); } From patchwork Thu Jul 20 16:30:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704755 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1D15EB64DC for ; Thu, 20 Jul 2023 16:35:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230133AbjGTQfb (ORCPT ); Thu, 20 Jul 2023 12:35:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229899AbjGTQfP (ORCPT ); Thu, 20 Jul 2023 12:35:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42AB62135 for ; Thu, 20 Jul 2023 09:33:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870829; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w2k5nztFy3WXKlhL7LY6m+LcDW7MzG+nauWvTii+Ef4=; b=A9cIk7hV7QKZoF+6s1x00u4oDJ4UYPgSPYNU1guRhg3Xp205VkA1MdHLzD4qrNZ/Lp33my bnBPEEye3dVSxgc276iTmzvA0mtyK7WSt/EQEiRKmMaVVZMD2LCrRMHRz3xQE0c7IHzomg NV+q9a8Iq9AFi6zrdLR6c6bYgtBhmbk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-77-dQqQ7nvGOe-_KMx6MDuCag-1; Thu, 20 Jul 2023 12:33:43 -0400 X-MC-Unique: dQqQ7nvGOe-_KMx6MDuCag-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F1ADA1044592; Thu, 20 Jul 2023 16:33:33 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F11B740C206F; Thu, 20 Jul 2023 16:33:25 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Josh Poimboeuf , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 12/20] objtool: Warn about non __ro_after_init static key usage in .noinstr Date: Thu, 20 Jul 2023 17:30:48 +0100 Message-Id: <20230720163056.2564824-13-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Later commits will depend on having no runtime-mutable text in early entry code. (ab)use the .noinstr section as a marker of early entry code and warn about static keys used in it that can be flipped at runtime. Suggested-by: Josh Poimboeuf Signed-off-by: Valentin Schneider --- tools/objtool/check.c | 20 ++++++++++++++++++++ tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/special.h | 2 ++ tools/objtool/special.c | 3 +++ 4 files changed, 26 insertions(+) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index d308330f2910e..d973bb4df4341 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1968,6 +1968,9 @@ static int add_special_section_alts(struct objtool_file *file) alt->next = orig_insn->alts; orig_insn->alts = alt; + if (special_alt->key_sym) + orig_insn->key_sym = special_alt->key_sym; + list_del(&special_alt->list); free(special_alt); } @@ -3476,6 +3479,20 @@ static int validate_return(struct symbol *func, struct instruction *insn, struct return 0; } +static int validate_static_key(struct instruction *insn, struct insn_state *state) +{ + if (state->noinstr && state->instr <= 0) { + if ((strcmp(insn->key_sym->sec->name, ".data..ro_after_init"))) { + WARN_INSN(insn, + "Non __ro_after_init static key \"%s\" in .noinstr section", + insn->key_sym->name); + return 1; + } + } + + return 0; +} + static struct instruction *next_insn_to_validate(struct objtool_file *file, struct instruction *insn) { @@ -3625,6 +3642,9 @@ static int validate_branch(struct objtool_file *file, struct symbol *func, if (handle_insn_ops(insn, next_insn, &state)) return 1; + if (insn->key_sym) + validate_static_key(insn, &state); + switch (insn->type) { case INSN_RETURN: diff --git a/tools/objtool/include/objtool/check.h b/tools/objtool/include/objtool/check.h index daa46f1f0965a..35dd21f8f41e1 100644 --- a/tools/objtool/include/objtool/check.h +++ b/tools/objtool/include/objtool/check.h @@ -77,6 +77,7 @@ struct instruction { struct symbol *sym; struct stack_op *stack_ops; struct cfi_state *cfi; + struct symbol *key_sym; }; static inline struct symbol *insn_func(struct instruction *insn) diff --git a/tools/objtool/include/objtool/special.h b/tools/objtool/include/objtool/special.h index 86d4af9c5aa9d..0e61f34fe3a28 100644 --- a/tools/objtool/include/objtool/special.h +++ b/tools/objtool/include/objtool/special.h @@ -27,6 +27,8 @@ struct special_alt { struct section *new_sec; unsigned long new_off; + struct symbol *key_sym; + unsigned int orig_len, new_len; /* group only */ }; diff --git a/tools/objtool/special.c b/tools/objtool/special.c index 91b1950f5bd8a..1f76cfd815bf3 100644 --- a/tools/objtool/special.c +++ b/tools/objtool/special.c @@ -127,6 +127,9 @@ static int get_alt_entry(struct elf *elf, const struct special_entry *entry, return -1; } alt->key_addend = reloc_addend(key_reloc); + + reloc_to_sec_off(key_reloc, &sec, &offset); + alt->key_sym = find_symbol_by_offset(sec, offset & ~2); } return 0; From patchwork Thu Jul 20 16:30:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704754 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B34D4EB64DA for ; Thu, 20 Jul 2023 16:36:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230128AbjGTQg2 (ORCPT ); Thu, 20 Jul 2023 12:36:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231487AbjGTQgA (ORCPT ); Thu, 20 Jul 2023 12:36:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E501B1FC8 for ; Thu, 20 Jul 2023 09:34:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870837; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=izi9UHkrSeZ4AHa49u6aA8W9HneeYs8L2V/Y70LmMpM=; b=Cghh/O4OvUZCiqvL6zqbluVyaOfNYNNNpb7jyNYsDo/srfZoF+4H8NZXLYvW4knwRbKMGf 6yZJnIPqWjWUo2oHckGdTea6ElGWzT6aBjFFk14umfvzak2zviT0dOgkSZ0OevisWhBFxc XQZeJYjeLL9xasWwj4Czyzg2ExLDC3Y= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-267-UUYzhPw5OJ2eJI-qriS1tw-1; Thu, 20 Jul 2023 12:33:54 -0400 X-MC-Unique: UUYzhPw5OJ2eJI-qriS1tw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CEF671C05EB9; Thu, 20 Jul 2023 16:33:50 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BCF3340C2070; Thu, 20 Jul 2023 16:33:42 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 14/20] x86/kvm: Make kvm_async_pf_enabled __ro_after_init Date: Thu, 20 Jul 2023 17:30:50 +0100 Message-Id: <20230720163056.2564824-15-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org objtool now warns about it: vmlinux.o: warning: objtool: exc_page_fault+0x2a: Non __ro_after_init static key "kvm_async_pf_enabled" in .noinstr section The key can only be enabled (and not disabled) in the __init function kvm_guest_init(), so mark it as __ro_after_init. Signed-off-by: Valentin Schneider --- arch/x86/kernel/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 1cceac5984daa..319460090a836 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -44,7 +44,7 @@ #include #include -DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); +DEFINE_STATIC_KEY_FALSE_RO(kvm_async_pf_enabled); static int kvmapf = 1; From patchwork Thu Jul 20 16:30:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AABE8EB64DA for ; Thu, 20 Jul 2023 16:40:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231480AbjGTQk3 (ORCPT ); Thu, 20 Jul 2023 12:40:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229864AbjGTQkG (ORCPT ); Thu, 20 Jul 2023 12:40:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DC721FED for ; Thu, 20 Jul 2023 09:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689871087; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gPdSzcTTJlOeO2aEdgyECLhQy37txJLWdi6TDmez9kY=; b=LWKi0K/q2ObYXtpQvUhLsqzKoDTeKes4e0GXnlqnYpQMO8NYEmeIrfVOd7bmBuDRriu5GP dGnmn4Qc7SNbRZfSXmNOZ11Fya4QTCNaoIPGnx9qbaULS4N4wFAm0fMTPHRfrUGXN40YN3 NtXC7pNwAPZ7tvbknJRn9U7IyABBIuE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-230-kFK-c7oyNgC289gF1X8KZw-1; Thu, 20 Jul 2023 12:34:11 -0400 X-MC-Unique: kFK-c7oyNgC289gF1X8KZw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FA07101A54E; Thu, 20 Jul 2023 16:34:08 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 29A9240C206F; Thu, 20 Jul 2023 16:33:59 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: "Paul E . McKenney" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 16/20] rcu: Make RCU dynticks counter size configurable Date: Thu, 20 Jul 2023 17:30:52 +0100 Message-Id: <20230720163056.2564824-17-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org CONTEXT_TRACKING_WORK reduces the size of the dynticks counter to free up some bits for work deferral. Paul suggested making the actual counter size configurable for rcutorture to poke at, so do that. Make it only configurable under RCU_EXPERT. Previous commits have added build-time checks that ensure a kernel with problematic dynticks counter width can't be built. Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney Signed-off-by: Valentin Schneider --- include/linux/context_tracking.h | 3 ++- include/linux/context_tracking_state.h | 3 +-- kernel/rcu/Kconfig | 33 ++++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 3 deletions(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 8aee086d0a25f..9c0c622bc27bb 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -12,7 +12,8 @@ #ifdef CONFIG_CONTEXT_TRACKING_WORK static_assert(CONTEXT_WORK_MAX_OFFSET <= CONTEXT_WORK_END + 1 - CONTEXT_WORK_START, - "Not enough bits for CONTEXT_WORK"); + "Not enough bits for CONTEXT_WORK, " + "CONFIG_RCU_DYNTICKS_BITS might be too high"); #endif #ifdef CONFIG_CONTEXT_TRACKING_USER diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 828fcdb801f73..292a0b7c06948 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -58,8 +58,7 @@ enum ctx_state { #define CONTEXT_STATE_START 0 #define CONTEXT_STATE_END (bits_per(CONTEXT_MAX - 1) - 1) -#define RCU_DYNTICKS_BITS (IS_ENABLED(CONFIG_CONTEXT_TRACKING_WORK) ? 16 : 31) -#define RCU_DYNTICKS_START (CT_STATE_SIZE - RCU_DYNTICKS_BITS) +#define RCU_DYNTICKS_START (CT_STATE_SIZE - CONFIG_RCU_DYNTICKS_BITS) #define RCU_DYNTICKS_END (CT_STATE_SIZE - 1) #define RCU_DYNTICKS_IDX BIT(RCU_DYNTICKS_START) diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index bdd7eadb33d8f..1ff2aab24e964 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -332,4 +332,37 @@ config RCU_DOUBLE_CHECK_CB_TIME Say Y here if you need tighter callback-limit enforcement. Say N here if you are unsure. +config RCU_DYNTICKS_RANGE_BEGIN + int + depends on !RCU_EXPERT + default 31 if !CONTEXT_TRACKING_WORK + default 16 if CONTEXT_TRACKING_WORK + +config RCU_DYNTICKS_RANGE_BEGIN + int + depends on RCU_EXPERT + default 2 + +config RCU_DYNTICKS_RANGE_END + int + default 31 if !CONTEXT_TRACKING_WORK + default 16 if CONTEXT_TRACKING_WORK + +config RCU_DYNTICKS_BITS_DEFAULT + int + default 31 if !CONTEXT_TRACKING_WORK + default 16 if CONTEXT_TRACKING_WORK + +config RCU_DYNTICKS_BITS + int "Dynticks counter width" if CONTEXT_TRACKING_WORK + range RCU_DYNTICKS_RANGE_BEGIN RCU_DYNTICKS_RANGE_END + default RCU_DYNTICKS_BITS_DEFAULT + help + This option controls the width of the dynticks counter. + + Lower values will make overflows more frequent, which will increase + the likelihood of extending grace-periods. + + Don't touch this unless you are running some tests. + endmenu # "RCU Subsystem" From patchwork Thu Jul 20 16:30:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704752 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06041EB64DA for ; Thu, 20 Jul 2023 16:39:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229703AbjGTQjy (ORCPT ); Thu, 20 Jul 2023 12:39:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229586AbjGTQjs (ORCPT ); Thu, 20 Jul 2023 12:39:48 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 541A035A1 for ; Thu, 20 Jul 2023 09:38:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689871063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QESi3DN0fC8QnZsNV/N4VXfEg//bCfMhfN1lMbJjDRM=; b=iU8xckKi9lSBBhZFQBostjkXaY8dFSzdE6B56zeYYq4nfwzSMpTkHxj9NeK48CZgBtRXXa KkwR+IvznUgJC4jaE3l6O+PFCZHnacAIsmqy6u8MaGwJbp3XuIqQF9jW5Uv2uEutyKuvPN qPoZya+XOb4GYVQ9XNmh/X0NIiN6gNc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-183-lKQNgwDjNBmDn1pKucKVhA-1; Thu, 20 Jul 2023 12:34:27 -0400 X-MC-Unique: lKQNgwDjNBmDn1pKucKVhA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BF85A936D34; Thu, 20 Jul 2023 16:34:24 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BFF9140C2070; Thu, 20 Jul 2023 16:34:16 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Peter Zijlstra , Nicolas Saenz Julienne , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 18/20] context_tracking, x86: Defer kernel text patching IPIs Date: Thu, 20 Jul 2023 17:30:54 +0100 Message-Id: <20230720163056.2564824-19-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org text_poke_bp_batch() sends IPIs to all online CPUs to synchronize them vs the newly patched instruction. CPUs that are executing in userspace do not need this synchronization to happen immediately, and this is actually harmful interference for NOHZ_FULL CPUs. As the synchronization IPIs are sent using a blocking call, returning from text_poke_bp_batch() implies all CPUs will observe the patched instruction(s), and this should be preserved even if the IPI is deferred. In other words, to safely defer this synchronization, any kernel instruction leading to the execution of the deferred instruction sync (ct_work_flush()) must *not* be mutable (patchable) at runtime. This means we must pay attention to mutable instructions in the early entry code: - alternatives - static keys - all sorts of probes (kprobes/ftrace/bpf/???) The early entry code leading to ct_work_flush() is noinstr, which gets rid of the probes. Alternatives are safe, because it's boot-time patching (before SMP is even brought up) which is before any IPI deferral can happen. This leaves us with static keys. Any static key used in early entry code should be only forever-enabled at boot time, IOW __ro_after_init (pretty much like alternatives). Objtool is now able to point at static keys that don't respect this, and all static keys used in early entry code have now been verified as behaving like so. Leverage the new context_tracking infrastructure to defer sync_core() IPIs to a target CPU's next kernel entry. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Valentin Schneider --- arch/x86/include/asm/context_tracking_work.h | 6 +++-- arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 24 ++++++++++++++++---- arch/x86/kernel/kprobes/core.c | 4 ++-- arch/x86/kernel/kprobes/opt.c | 4 ++-- arch/x86/kernel/module.c | 2 +- include/linux/context_tracking_work.h | 4 ++-- 7 files changed, 32 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h index 5bc29e6b2ed38..2c66687ce00e2 100644 --- a/arch/x86/include/asm/context_tracking_work.h +++ b/arch/x86/include/asm/context_tracking_work.h @@ -2,11 +2,13 @@ #ifndef _ASM_X86_CONTEXT_TRACKING_WORK_H #define _ASM_X86_CONTEXT_TRACKING_WORK_H +#include + static __always_inline void arch_context_tracking_work(int work) { switch (work) { - case CONTEXT_WORK_n: - // Do work... + case CONTEXT_WORK_SYNC: + sync_core(); break; } } diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 29832c338cdc5..b6939e965e69d 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -43,6 +43,7 @@ extern void text_poke_early(void *addr, const void *opcode, size_t len); */ extern void *text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); +extern void text_poke_sync_deferrable(void); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern void *text_poke_copy(void *addr, const void *opcode, size_t len); extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 72646d75b6ffe..fcce480e1919e 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -1933,9 +1934,24 @@ static void do_sync_core(void *info) sync_core(); } +static bool do_sync_core_defer_cond(int cpu, void *info) +{ + return !ct_set_cpu_work(cpu, CONTEXT_WORK_SYNC); +} + +static void __text_poke_sync(smp_cond_func_t cond_func) +{ + on_each_cpu_cond(cond_func, do_sync_core, NULL, 1); +} + void text_poke_sync(void) { - on_each_cpu(do_sync_core, NULL, 1); + __text_poke_sync(NULL); +} + +void text_poke_sync_deferrable(void) +{ + __text_poke_sync(do_sync_core_defer_cond); } /* @@ -2145,7 +2161,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE); } - text_poke_sync(); + text_poke_sync_deferrable(); /* * Second step: update all but the first byte of the patched range. @@ -2207,7 +2223,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * not necessary and we'd be safe even without it. But * better safe than sorry (plus there's not only Intel). */ - text_poke_sync(); + text_poke_sync_deferrable(); } /* @@ -2228,7 +2244,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries } if (do_sync) - text_poke_sync(); + text_poke_sync_deferrable(); /* * Remove and wait for refs to be zero. diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index f7f6042eb7e6c..a38c914753397 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -735,7 +735,7 @@ void arch_arm_kprobe(struct kprobe *p) u8 int3 = INT3_INSN_OPCODE; text_poke(p->addr, &int3, 1); - text_poke_sync(); + text_poke_sync_deferrable(); perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1); } @@ -745,7 +745,7 @@ void arch_disarm_kprobe(struct kprobe *p) perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1); text_poke(p->addr, &p->opcode, 1); - text_poke_sync(); + text_poke_sync_deferrable(); } void arch_remove_kprobe(struct kprobe *p) diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 57b0037d0a996..88451a744ceda 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -521,11 +521,11 @@ void arch_unoptimize_kprobe(struct optimized_kprobe *op) JMP32_INSN_SIZE - INT3_INSN_SIZE); text_poke(addr, new, INT3_INSN_SIZE); - text_poke_sync(); + text_poke_sync_deferrable(); text_poke(addr + INT3_INSN_SIZE, new + INT3_INSN_SIZE, JMP32_INSN_SIZE - INT3_INSN_SIZE); - text_poke_sync(); + text_poke_sync_deferrable(); perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE); } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index b05f62ee2344b..8b4542dc51b6d 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -242,7 +242,7 @@ static int write_relocate_add(Elf64_Shdr *sechdrs, write, apply); if (!early) { - text_poke_sync(); + text_poke_sync_deferrable(); mutex_unlock(&text_mutex); } diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h index fb74db8876dd2..13fc97b395030 100644 --- a/include/linux/context_tracking_work.h +++ b/include/linux/context_tracking_work.h @@ -5,12 +5,12 @@ #include enum { - CONTEXT_WORK_n_OFFSET, + CONTEXT_WORK_SYNC_OFFSET, CONTEXT_WORK_MAX_OFFSET }; enum ct_work { - CONTEXT_WORK_n = BIT(CONTEXT_WORK_n_OFFSET), + CONTEXT_WORK_SYNC = BIT(CONTEXT_WORK_SYNC_OFFSET), CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET) }; From patchwork Thu Jul 20 16:30:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 704753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E21AFEB64DC for ; Thu, 20 Jul 2023 16:37:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231478AbjGTQhm (ORCPT ); Thu, 20 Jul 2023 12:37:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231360AbjGTQhS (ORCPT ); Thu, 20 Jul 2023 12:37:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0493C30F7 for ; Thu, 20 Jul 2023 09:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689870880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qlKugqrUieeoCK15s+XaSmSpmpWi+6cYdLHEecxMVFQ=; b=Trf10nGyLqf9QYWmxP99v+vg8S0eS0e5sGyTHMpcvaHyrfzvszLvhPSnZ5L6IgNWbYOg0C OnjUxcrkVdlEexP5SnkHZIwatWV1jio5ZHUBbevLTbZyluNY7s+9cEkagVmSN/hvcuBLG4 q+5MRJ0nxf4gNZ5NHhF5SDaOirzN1pc= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-240-myilGmoaNUKUY_DOur6_gg-1; Thu, 20 Jul 2023 12:34:35 -0400 X-MC-Unique: myilGmoaNUKUY_DOur6_gg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 95D7E3815EEF; Thu, 20 Jul 2023 16:34:32 +0000 (UTC) Received: from vschneid.remote.csb (unknown [10.42.28.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 28EAC40C2070; Thu, 20 Jul 2023 16:34:24 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: [RFC PATCH v2 19/20] context_tracking,x86: Add infrastructure to defer kernel TLBI Date: Thu, 20 Jul 2023 17:30:55 +0100 Message-Id: <20230720163056.2564824-20-vschneid@redhat.com> In-Reply-To: <20230720163056.2564824-1-vschneid@redhat.com> References: <20230720163056.2564824-1-vschneid@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Kernel TLB invalidation IPIs are a common source of interference on NOHZ_FULL CPUs. Given NOHZ_FULL CPUs executing in userspace are not accessing any kernel addresses, these invalidations do not need to happen immediately, and can be deferred until the next user->kernel transition. Rather than make __flush_tlb_all() noinstr, add a minimal noinstr variant that doesn't try to leverage INVPCID. FIXME: not fully noinstr compliant XXX: same issue as with ins patching, when do we access data that should be invalidated? Signed-off-by: Valentin Schneider --- arch/x86/include/asm/context_tracking_work.h | 4 ++++ arch/x86/include/asm/tlbflush.h | 1 + arch/x86/mm/tlb.c | 17 +++++++++++++++++ include/linux/context_tracking_state.h | 4 ++++ include/linux/context_tracking_work.h | 2 ++ 5 files changed, 28 insertions(+) diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h index 2c66687ce00e2..9d4f021b5a45b 100644 --- a/arch/x86/include/asm/context_tracking_work.h +++ b/arch/x86/include/asm/context_tracking_work.h @@ -3,6 +3,7 @@ #define _ASM_X86_CONTEXT_TRACKING_WORK_H #include +#include static __always_inline void arch_context_tracking_work(int work) { @@ -10,6 +11,9 @@ static __always_inline void arch_context_tracking_work(int work) case CONTEXT_WORK_SYNC: sync_core(); break; + case CONTEXT_WORK_TLBI: + __flush_tlb_all_noinstr(); + break; } } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 80450e1d5385a..323b971987af7 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -17,6 +17,7 @@ DECLARE_PER_CPU(u64, tlbstate_untag_mask); void __flush_tlb_all(void); +void noinstr __flush_tlb_all_noinstr(void); #define TLB_FLUSH_ALL -1UL #define TLB_GENERATION_INVALID 0 diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 267acf27480af..631df9189ded4 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1237,6 +1237,23 @@ void __flush_tlb_all(void) } EXPORT_SYMBOL_GPL(__flush_tlb_all); +void noinstr __flush_tlb_all_noinstr(void) +{ + /* + * This is for invocation in early entry code that cannot be + * instrumented. A RMW to CR4 works for most cases, but relies on + * being able to flip either of the PGE or PCIDE bits. Flipping CR4.PCID + * would require also resetting CR3.PCID, so just try with CR4.PGE, else + * do the CR3 write. + * + * TODO: paravirt + */ + if (cpu_feature_enabled(X86_FEATURE_PGE)) + __native_tlb_flush_global(this_cpu_read(cpu_tlbstate.cr4)); + else + flush_tlb_local(); +} + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info *info; diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 292a0b7c06948..3571c62cbb9cd 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -62,6 +62,10 @@ enum ctx_state { #define RCU_DYNTICKS_END (CT_STATE_SIZE - 1) #define RCU_DYNTICKS_IDX BIT(RCU_DYNTICKS_START) +/* + * When CONFIG_CONTEXT_TRACKING_WORK=n, _END is 1 behind _START, which makes + * the CONTEXT_WORK size computation below 0, which is what we want! + */ #define CONTEXT_WORK_START (CONTEXT_STATE_END + 1) #define CONTEXT_WORK_END (RCU_DYNTICKS_START - 1) diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h index 13fc97b395030..47d5ced39a43a 100644 --- a/include/linux/context_tracking_work.h +++ b/include/linux/context_tracking_work.h @@ -6,11 +6,13 @@ enum { CONTEXT_WORK_SYNC_OFFSET, + CONTEXT_WORK_TLBI_OFFSET, CONTEXT_WORK_MAX_OFFSET }; enum ct_work { CONTEXT_WORK_SYNC = BIT(CONTEXT_WORK_SYNC_OFFSET), + CONTEXT_WORK_TLBI = BIT(CONTEXT_WORK_TLBI_OFFSET), CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET) };