diff mbox series

[v1,bpf-next,06/11] bpf: Introduce two attach types for BPF_PROG_TYPE_SK_REUSEPORT.

Message ID 20201201144418.35045-7-kuniyu@amazon.co.jp
State New
Headers show
Series Socket migration for SO_REUSEPORT. | expand

Commit Message

Kuniyuki Iwashima Dec. 1, 2020, 2:44 p.m. UTC
This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
check if the attached eBPF program is capable of migrating sockets.

When the eBPF program is attached, the kernel runs it for socket migration
only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
The kernel will change the behaviour depending on the returned value:

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fall back to the random selection
  - SK_DROP, cancel the migration

Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 include/uapi/linux/bpf.h       | 2 ++
 kernel/bpf/syscall.c           | 8 ++++++++
 tools/include/uapi/linux/bpf.h | 2 ++
 3 files changed, 12 insertions(+)

Comments

Martin KaFai Lau Dec. 2, 2020, 7:19 p.m. UTC | #1
On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:
> On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> >
> > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> > check if the attached eBPF program is capable of migrating sockets.
> >
> > When the eBPF program is attached, the kernel runs it for socket migration
> > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> > The kernel will change the behaviour depending on the returned value:
> >
> >   - SK_PASS with selected_sk, select it as a new listener
> >   - SK_PASS with selected_sk NULL, fall back to the random selection
> >   - SK_DROP, cancel the migration
> >
> > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > ---
> >  include/uapi/linux/bpf.h       | 2 ++
> >  kernel/bpf/syscall.c           | 8 ++++++++
> >  tools/include/uapi/linux/bpf.h | 2 ++
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 85278deff439..cfc207ae7782 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> >         BPF_XDP_CPUMAP,
> >         BPF_SK_LOOKUP,
> >         BPF_XDP,
> > +       BPF_SK_REUSEPORT_SELECT,
> > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> >         __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index f3fe9f53f93c..a0796a8de5ea 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
> >                 if (expected_attach_type == BPF_SK_LOOKUP)
> >                         return 0;
> >                 return -EINVAL;
> > +       case BPF_PROG_TYPE_SK_REUSEPORT:
> > +               switch (expected_attach_type) {
> > +               case BPF_SK_REUSEPORT_SELECT:
> > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> > +                       return 0;
> > +               default:
> > +                       return -EINVAL;
> > +               }
> 
> this is a kernel regression, previously expected_attach_type wasn't
> enforced, so user-space could have provided any number without an
> error.
I also think this change alone will break things like when the usual
attr->expected_attach_type == 0 case.  At least changes is needed in
bpf_prog_load_fixup_attach_type() which is also handling a
similar situation for BPF_PROG_TYPE_CGROUP_SOCK.

I now think there is no need to expose new bpf_attach_type to the UAPI.
Since the prog->expected_attach_type is not used, it can be cleared at load time
and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined
internally at filter.[c|h]) in the is_valid_access() when "migration"
is accessed.  When "migration" is accessed, the bpf prog can handle
migration (and the original not-migration) case.

> 
> >         case BPF_PROG_TYPE_EXT:
> >                 if (expected_attach_type)
> >                         return -EINVAL;
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 85278deff439..cfc207ae7782 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> >         BPF_XDP_CPUMAP,
> >         BPF_SK_LOOKUP,
> >         BPF_XDP,
> > +       BPF_SK_REUSEPORT_SELECT,
> > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> >         __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > --
> > 2.17.2 (Apple Git-113)
> >
Martin KaFai Lau Dec. 3, 2020, 4:24 a.m. UTC | #2
On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:
> On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:

> > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:

> > >

> > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to

> > > check if the attached eBPF program is capable of migrating sockets.

> > >

> > > When the eBPF program is attached, the kernel runs it for socket migration

> > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.

> > > The kernel will change the behaviour depending on the returned value:

> > >

> > >   - SK_PASS with selected_sk, select it as a new listener

> > >   - SK_PASS with selected_sk NULL, fall back to the random selection

> > >   - SK_DROP, cancel the migration

> > >

> > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/

> > > Suggested-by: Martin KaFai Lau <kafai@fb.com>

> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>

> > > ---

> > >  include/uapi/linux/bpf.h       | 2 ++

> > >  kernel/bpf/syscall.c           | 8 ++++++++

> > >  tools/include/uapi/linux/bpf.h | 2 ++

> > >  3 files changed, 12 insertions(+)

> > >

> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > > index 85278deff439..cfc207ae7782 100644

> > > --- a/include/uapi/linux/bpf.h

> > > +++ b/include/uapi/linux/bpf.h

> > > @@ -241,6 +241,8 @@ enum bpf_attach_type {

> > >         BPF_XDP_CPUMAP,

> > >         BPF_SK_LOOKUP,

> > >         BPF_XDP,

> > > +       BPF_SK_REUSEPORT_SELECT,

> > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,

> > >         __MAX_BPF_ATTACH_TYPE

> > >  };

> > >

> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c

> > > index f3fe9f53f93c..a0796a8de5ea 100644

> > > --- a/kernel/bpf/syscall.c

> > > +++ b/kernel/bpf/syscall.c

> > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,

> > >                 if (expected_attach_type == BPF_SK_LOOKUP)

> > >                         return 0;

> > >                 return -EINVAL;

> > > +       case BPF_PROG_TYPE_SK_REUSEPORT:

> > > +               switch (expected_attach_type) {

> > > +               case BPF_SK_REUSEPORT_SELECT:

> > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:

> > > +                       return 0;

> > > +               default:

> > > +                       return -EINVAL;

> > > +               }

> > 

> > this is a kernel regression, previously expected_attach_type wasn't

> > enforced, so user-space could have provided any number without an

> > error.

> I also think this change alone will break things like when the usual

> attr->expected_attach_type == 0 case.  At least changes is needed in

> bpf_prog_load_fixup_attach_type() which is also handling a

> similar situation for BPF_PROG_TYPE_CGROUP_SOCK.

> 

> I now think there is no need to expose new bpf_attach_type to the UAPI.

> Since the prog->expected_attach_type is not used, it can be cleared at load time

> and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined

> internally at filter.[c|h]) in the is_valid_access() when "migration"

> is accessed.  When "migration" is accessed, the bpf prog can handle

> migration (and the original not-migration) case.

Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.
I think there will be cases that bpf prog wants to do both
without accessing any field from sk_reuseport_md.

Lets go back to the discussion on using a similar
idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().
I am not aware there is loader setting a random number
in expected_attach_type, so the chance of breaking
is very low.  There was a similar discussion earlier [0].

[0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/

> 

> > 

> > >         case BPF_PROG_TYPE_EXT:

> > >                 if (expected_attach_type)

> > >                         return -EINVAL;

> > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h

> > > index 85278deff439..cfc207ae7782 100644

> > > --- a/tools/include/uapi/linux/bpf.h

> > > +++ b/tools/include/uapi/linux/bpf.h

> > > @@ -241,6 +241,8 @@ enum bpf_attach_type {

> > >         BPF_XDP_CPUMAP,

> > >         BPF_SK_LOOKUP,

> > >         BPF_XDP,

> > > +       BPF_SK_REUSEPORT_SELECT,

> > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,

> > >         __MAX_BPF_ATTACH_TYPE

> > >  };

> > >

> > > --

> > > 2.17.2 (Apple Git-113)

> > >
Kuniyuki Iwashima Dec. 3, 2020, 2:16 p.m. UTC | #3
From:   Martin KaFai Lau <kafai@fb.com>

Date:   Wed, 2 Dec 2020 20:24:02 -0800
> On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:

> > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:

> > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:

> > > >

> > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to

> > > > check if the attached eBPF program is capable of migrating sockets.

> > > >

> > > > When the eBPF program is attached, the kernel runs it for socket migration

> > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.

> > > > The kernel will change the behaviour depending on the returned value:

> > > >

> > > >   - SK_PASS with selected_sk, select it as a new listener

> > > >   - SK_PASS with selected_sk NULL, fall back to the random selection

> > > >   - SK_DROP, cancel the migration

> > > >

> > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/

> > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>

> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>

> > > > ---

> > > >  include/uapi/linux/bpf.h       | 2 ++

> > > >  kernel/bpf/syscall.c           | 8 ++++++++

> > > >  tools/include/uapi/linux/bpf.h | 2 ++

> > > >  3 files changed, 12 insertions(+)

> > > >

> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > > > index 85278deff439..cfc207ae7782 100644

> > > > --- a/include/uapi/linux/bpf.h

> > > > +++ b/include/uapi/linux/bpf.h

> > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {

> > > >         BPF_XDP_CPUMAP,

> > > >         BPF_SK_LOOKUP,

> > > >         BPF_XDP,

> > > > +       BPF_SK_REUSEPORT_SELECT,

> > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,

> > > >         __MAX_BPF_ATTACH_TYPE

> > > >  };

> > > >

> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c

> > > > index f3fe9f53f93c..a0796a8de5ea 100644

> > > > --- a/kernel/bpf/syscall.c

> > > > +++ b/kernel/bpf/syscall.c

> > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,

> > > >                 if (expected_attach_type == BPF_SK_LOOKUP)

> > > >                         return 0;

> > > >                 return -EINVAL;

> > > > +       case BPF_PROG_TYPE_SK_REUSEPORT:

> > > > +               switch (expected_attach_type) {

> > > > +               case BPF_SK_REUSEPORT_SELECT:

> > > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:

> > > > +                       return 0;

> > > > +               default:

> > > > +                       return -EINVAL;

> > > > +               }

> > > 

> > > this is a kernel regression, previously expected_attach_type wasn't

> > > enforced, so user-space could have provided any number without an

> > > error.

> > I also think this change alone will break things like when the usual

> > attr->expected_attach_type == 0 case.  At least changes is needed in

> > bpf_prog_load_fixup_attach_type() which is also handling a

> > similar situation for BPF_PROG_TYPE_CGROUP_SOCK.

> > 

> > I now think there is no need to expose new bpf_attach_type to the UAPI.

> > Since the prog->expected_attach_type is not used, it can be cleared at load time

> > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined

> > internally at filter.[c|h]) in the is_valid_access() when "migration"

> > is accessed.  When "migration" is accessed, the bpf prog can handle

> > migration (and the original not-migration) case.

> Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.

> I think there will be cases that bpf prog wants to do both

> without accessing any field from sk_reuseport_md.

> 

> Lets go back to the discussion on using a similar

> idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().

> I am not aware there is loader setting a random number

> in expected_attach_type, so the chance of breaking

> is very low.  There was a similar discussion earlier [0].

> 

> [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/


Thank you for the idea and reference.

I will remove the change in bpf_prog_load_check_attach() and set the
default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type()
for backward compatibility if expected_attach_type is 0.


> > > >         case BPF_PROG_TYPE_EXT:

> > > >                 if (expected_attach_type)

> > > >                         return -EINVAL;

> > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h

> > > > index 85278deff439..cfc207ae7782 100644

> > > > --- a/tools/include/uapi/linux/bpf.h

> > > > +++ b/tools/include/uapi/linux/bpf.h

> > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {

> > > >         BPF_XDP_CPUMAP,

> > > >         BPF_SK_LOOKUP,

> > > >         BPF_XDP,

> > > > +       BPF_SK_REUSEPORT_SELECT,

> > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,

> > > >         __MAX_BPF_ATTACH_TYPE

> > > >  };

> > > >

> > > > --

> > > > 2.17.2 (Apple Git-113)
Martin KaFai Lau Dec. 4, 2020, 5:56 a.m. UTC | #4
On Thu, Dec 03, 2020 at 11:16:08PM +0900, Kuniyuki Iwashima wrote:
> From:   Martin KaFai Lau <kafai@fb.com>

> Date:   Wed, 2 Dec 2020 20:24:02 -0800

> > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:

> > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:

> > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:

> > > > >

> > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to

> > > > > check if the attached eBPF program is capable of migrating sockets.

> > > > >

> > > > > When the eBPF program is attached, the kernel runs it for socket migration

> > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.

> > > > > The kernel will change the behaviour depending on the returned value:

> > > > >

> > > > >   - SK_PASS with selected_sk, select it as a new listener

> > > > >   - SK_PASS with selected_sk NULL, fall back to the random selection

> > > > >   - SK_DROP, cancel the migration

> > > > >

> > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/

> > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>

> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>

> > > > > ---

> > > > >  include/uapi/linux/bpf.h       | 2 ++

> > > > >  kernel/bpf/syscall.c           | 8 ++++++++

> > > > >  tools/include/uapi/linux/bpf.h | 2 ++

> > > > >  3 files changed, 12 insertions(+)

> > > > >

> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > > > > index 85278deff439..cfc207ae7782 100644

> > > > > --- a/include/uapi/linux/bpf.h

> > > > > +++ b/include/uapi/linux/bpf.h

> > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {

> > > > >         BPF_XDP_CPUMAP,

> > > > >         BPF_SK_LOOKUP,

> > > > >         BPF_XDP,

> > > > > +       BPF_SK_REUSEPORT_SELECT,

> > > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,

> > > > >         __MAX_BPF_ATTACH_TYPE

> > > > >  };

> > > > >

> > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c

> > > > > index f3fe9f53f93c..a0796a8de5ea 100644

> > > > > --- a/kernel/bpf/syscall.c

> > > > > +++ b/kernel/bpf/syscall.c

> > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,

> > > > >                 if (expected_attach_type == BPF_SK_LOOKUP)

> > > > >                         return 0;

> > > > >                 return -EINVAL;

> > > > > +       case BPF_PROG_TYPE_SK_REUSEPORT:

> > > > > +               switch (expected_attach_type) {

> > > > > +               case BPF_SK_REUSEPORT_SELECT:

> > > > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:

> > > > > +                       return 0;

> > > > > +               default:

> > > > > +                       return -EINVAL;

> > > > > +               }

> > > > 

> > > > this is a kernel regression, previously expected_attach_type wasn't

> > > > enforced, so user-space could have provided any number without an

> > > > error.

> > > I also think this change alone will break things like when the usual

> > > attr->expected_attach_type == 0 case.  At least changes is needed in

> > > bpf_prog_load_fixup_attach_type() which is also handling a

> > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK.

> > > 

> > > I now think there is no need to expose new bpf_attach_type to the UAPI.

> > > Since the prog->expected_attach_type is not used, it can be cleared at load time

> > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined

> > > internally at filter.[c|h]) in the is_valid_access() when "migration"

> > > is accessed.  When "migration" is accessed, the bpf prog can handle

> > > migration (and the original not-migration) case.

> > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.

> > I think there will be cases that bpf prog wants to do both

> > without accessing any field from sk_reuseport_md.

> > 

> > Lets go back to the discussion on using a similar

> > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().

> > I am not aware there is loader setting a random number

> > in expected_attach_type, so the chance of breaking

> > is very low.  There was a similar discussion earlier [0].

> > 

> > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/

> 

> Thank you for the idea and reference.

> 

> I will remove the change in bpf_prog_load_check_attach() and set the

> default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type()

> for backward compatibility if expected_attach_type is 0.

check_attach_type() can be kept.  You can refer to
commit aac3fc320d94 for a similar situation.
Kuniyuki Iwashima Dec. 6, 2020, 4:32 a.m. UTC | #5
I'm sending this mail just for logging because I failed to send mails only
to LKML, netdev, and bpf yesterday.


From:   Martin KaFai Lau <kafai@fb.com>

Date:   Thu, 3 Dec 2020 21:56:53 -0800
> On Thu, Dec 03, 2020 at 11:16:08PM +0900, Kuniyuki Iwashima wrote:

> > From:   Martin KaFai Lau <kafai@fb.com>

> > Date:   Wed, 2 Dec 2020 20:24:02 -0800

> > > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:

> > > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:

> > > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:

> > > > > >

> > > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to

> > > > > > check if the attached eBPF program is capable of migrating sockets.

> > > > > >

> > > > > > When the eBPF program is attached, the kernel runs it for socket migration

> > > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.

> > > > > > The kernel will change the behaviour depending on the returned value:

> > > > > >

> > > > > >   - SK_PASS with selected_sk, select it as a new listener

> > > > > >   - SK_PASS with selected_sk NULL, fall back to the random selection

> > > > > >   - SK_DROP, cancel the migration

> > > > > >

> > > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/

> > > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>

> > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>

> > > > > > ---

> > > > > >  include/uapi/linux/bpf.h       | 2 ++

> > > > > >  kernel/bpf/syscall.c           | 8 ++++++++

> > > > > >  tools/include/uapi/linux/bpf.h | 2 ++

> > > > > >  3 files changed, 12 insertions(+)

> > > > > >

> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > > > > > index 85278deff439..cfc207ae7782 100644

> > > > > > --- a/include/uapi/linux/bpf.h

> > > > > > +++ b/include/uapi/linux/bpf.h

> > > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {

> > > > > >         BPF_XDP_CPUMAP,

> > > > > >         BPF_SK_LOOKUP,

> > > > > >         BPF_XDP,

> > > > > > +       BPF_SK_REUSEPORT_SELECT,

> > > > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,

> > > > > >         __MAX_BPF_ATTACH_TYPE

> > > > > >  };

> > > > > >

> > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c

> > > > > > index f3fe9f53f93c..a0796a8de5ea 100644

> > > > > > --- a/kernel/bpf/syscall.c

> > > > > > +++ b/kernel/bpf/syscall.c

> > > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,

> > > > > >                 if (expected_attach_type == BPF_SK_LOOKUP)

> > > > > >                         return 0;

> > > > > >                 return -EINVAL;

> > > > > > +       case BPF_PROG_TYPE_SK_REUSEPORT:

> > > > > > +               switch (expected_attach_type) {

> > > > > > +               case BPF_SK_REUSEPORT_SELECT:

> > > > > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:

> > > > > > +                       return 0;

> > > > > > +               default:

> > > > > > +                       return -EINVAL;

> > > > > > +               }

> > > > > 

> > > > > this is a kernel regression, previously expected_attach_type wasn't

> > > > > enforced, so user-space could have provided any number without an

> > > > > error.

> > > > I also think this change alone will break things like when the usual

> > > > attr->expected_attach_type == 0 case.  At least changes is needed in

> > > > bpf_prog_load_fixup_attach_type() which is also handling a

> > > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK.

> > > > 

> > > > I now think there is no need to expose new bpf_attach_type to the UAPI.

> > > > Since the prog->expected_attach_type is not used, it can be cleared at load time

> > > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined

> > > > internally at filter.[c|h]) in the is_valid_access() when "migration"

> > > > is accessed.  When "migration" is accessed, the bpf prog can handle

> > > > migration (and the original not-migration) case.

> > > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.

> > > I think there will be cases that bpf prog wants to do both

> > > without accessing any field from sk_reuseport_md.

> > > 

> > > Lets go back to the discussion on using a similar

> > > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().

> > > I am not aware there is loader setting a random number

> > > in expected_attach_type, so the chance of breaking

> > > is very low.  There was a similar discussion earlier [0].

> > > 

> > > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/

> > 

> > Thank you for the idea and reference.

> > 

> > I will remove the change in bpf_prog_load_check_attach() and set the

> > default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type()

> > for backward compatibility if expected_attach_type is 0.

> check_attach_type() can be kept.  You can refer to

> commit aac3fc320d94 for a similar situation.


I confirmed bpf_prog_load_fixup_attach_type() is called just before
bpf_prog_load_check_attach(), so I will add the fixup code to this patch.
Thank you.
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 85278deff439..cfc207ae7782 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -241,6 +241,8 @@  enum bpf_attach_type {
 	BPF_XDP_CPUMAP,
 	BPF_SK_LOOKUP,
 	BPF_XDP,
+	BPF_SK_REUSEPORT_SELECT,
+	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f3fe9f53f93c..a0796a8de5ea 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2036,6 +2036,14 @@  bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 		if (expected_attach_type == BPF_SK_LOOKUP)
 			return 0;
 		return -EINVAL;
+	case BPF_PROG_TYPE_SK_REUSEPORT:
+		switch (expected_attach_type) {
+		case BPF_SK_REUSEPORT_SELECT:
+		case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
+			return 0;
+		default:
+			return -EINVAL;
+		}
 	case BPF_PROG_TYPE_EXT:
 		if (expected_attach_type)
 			return -EINVAL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 85278deff439..cfc207ae7782 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -241,6 +241,8 @@  enum bpf_attach_type {
 	BPF_XDP_CPUMAP,
 	BPF_SK_LOOKUP,
 	BPF_XDP,
+	BPF_SK_REUSEPORT_SELECT,
+	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	__MAX_BPF_ATTACH_TYPE
 };