mbox series

[00/31] net/tcp: Add TCP-AO support

Message ID 20220818170005.747015-1-dima@arista.com
Headers show
Series net/tcp: Add TCP-AO support | expand

Message

Dmitry Safonov Aug. 18, 2022, 4:59 p.m. UTC
This patchset implements the TCP-AO option as described in RFC5925. There
is a request from industry to move away from TCP-MD5SIG and it seems the time
is right to have a TCP-AO upstreamed. This TCP option is meant to replace
the TCP MD5 option and address its shortcomings. Specifically, it provides
more secure hashing, key rotation and support for long-lived connections
(see the summary of TCP-AO advantages over TCP-MD5 in (1.3) of RFC5925).
The patch series starts with six patches that are not specific to TCP-AO
but implement a general crypto facility that we thought is useful
to eliminate code duplication between TCP-MD5SIG and TCP-AO as well as other
crypto users. These six patches are being submitted separately in
a different patchset [1]. Including them here will show better the gain
in code sharing. Next are 18 patches that implement the actual TCP-AO option,
followed by patches implementing selftests.

The patch set was written as a collaboration of three authors (in alphabetical
order): Dmitry Safonov, Francesco Ruggeri and Salam Noureddine. Additional
credits should be given to Prasad Koya, who was involved in early prototyping
a few years back. There is also a separate submission done by Leonard Crestez
whom we thank for his efforts getting an implementation of RFC5925 submitted
for review upstream [2]. This is an independent implementation that makes
different design decisions.

For example, we chose a similar design to the TCP-MD5SIG implementation and
used setsockopt()s to program per-socket keys, avoiding the extra complexity
of managing a centralized key database in the kernel. A centralized database
in the kernel has dubious benefits since it doesn’t eliminate per-socket
setsockopts needed to specify which sockets need TCP-AO and what are the
currently preferred keys. It also complicates traffic key caching and
preventing deletion of in-use keys.

In this implementation, a centralized database of keys can be thought of
as living in user space and user applications would have to program those
keys on matching sockets. On the server side, the user application programs
keys (MKTS in TCP-AO nomenclature) on the listening socket for all peers that
are expected to connect. Prefix matching on the peer address is supported.
When a peer issues a successful connect, all the MKTs matching the IP address
of the peer are copied to the newly created socket. On the active side,
when a connect() is issued all MKTs that do not match the peer are deleted
from the socket since they will never match the peer. This implementation
uses three setsockopt()s for adding, deleting and modifying keys on a socket.
All three setsockopt()s have extensive sanity checks that prevent
inconsistencies in the keys on a given socket. A getsockopt() is provided
to get key information from any given socket.

Few things to note about this implementation:
- Traffic keys are cached for established connections avoiding the cost of
  such calculation for each packet received or sent.
- Great care has been taken to avoid deleting in-use MKTs
  as required by the RFC.
- Any crypto algorithm supported by the Linux kernel can be used
  to calculate packet hashes.
- Fastopen works with TCP-AO but hasn’t been tested extensively.
- Tested for interop with other major networking vendors (on linux-4.19),
  including testing for key rotation and long lived connections.

There are a couple of limitations that we’re aware of, including (but not
limited to) the following:
- setsockopt(TCP_REPAIR) not supported yet
- IPv4-mapped-IPv6 addresses not tested
- static key not implemented yet
- CONFIG_TCP_AO depends on CONFIG_TCP_MD5SIG
- A small window for a race condition exists between accept and key
  adding/deletion on a listening socket but can be easily overcome by using
  the getsockopt() to make sure the right keys are there on a newly accepted
  connection
- Key matching by TCP port numbers, peer ranges, asterisks is unsupported
  as it’s unlikely to be useful

[1]: https://lore.kernel.org/all/20220726201600.1715505-1-dima@arista.com/
[2]: https://lore.kernel.org/all/cover.1658815925.git.cdleonard@gmail.com/

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Bob Gilligan <gilligan@arista.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Ivan Delalande <colona@arista.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Leonard Crestez <cdleonard@gmail.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Salam Noureddine <noureddine@arista.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Dmitry Safonov (31):
  crypto: Introduce crypto_pool
  crypto_pool: Add crypto_pool_reserve_scratch()
  net/tcp: Separate tcp_md5sig_info allocation into tcp_md5sig_info_add()
  net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
  net/tcp: Use crypto_pool for TCP-MD5
  net/ipv6: sr: Switch to using crypto_pool
  tcp: Add TCP-AO config and structures
  net/tcp: Introduce TCP_AO setsockopt()s
  net/tcp: Prevent TCP-MD5 with TCP-AO being set
  net/tcp: Calculate TCP-AO traffic keys
  net/tcp: Add TCP-AO sign to outgoing packets
  net/tcp: Add tcp_parse_auth_options()
  net/tcp: Add AO sign to RST packets
  net/tcp: Add TCP-AO sign to twsk
  net/tcp: Wire TCP-AO to request sockets
  net/tcp: Sign SYN-ACK segments with TCP-AO
  net/tcp: Verify inbound TCP-AO signed segments
  net/tcp: Add TCP-AO segments counters
  net/tcp: Add TCP-AO SNE support
  net/tcp: Add tcp_hash_fail() ratelimited logs
  net/tcp: Ignore specific ICMPs for TCP-AO connections
  net/tcp: Add option for TCP-AO to (not) hash header
  net/tcp: Add getsockopt(TCP_AO_GET)
  net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)
  selftests/net: Add TCP-AO library
  selftests/net: Verify that TCP-AO complies with ignoring ICMPs
  selftest/net: Add TCP-AO ICMPs accept test
  selftest/tcp-ao: Add a test for MKT matching
  selftest/tcp-ao: Add test for TCP-AO add setsockopt() command
  selftests/tcp-ao: Add TCP-AO + TCP-MD5 + no sign listen socket tests
  selftests/aolib: Add test/benchmark for removing MKTs

 crypto/Kconfig                                |   12 +
 crypto/Makefile                               |    1 +
 crypto/crypto_pool.c                          |  323 +++
 include/crypto/pool.h                         |   33 +
 include/linux/sockptr.h                       |   23 +
 include/linux/tcp.h                           |   24 +
 include/net/dropreason.h                      |   25 +
 include/net/seg6_hmac.h                       |    7 -
 include/net/tcp.h                             |  193 +-
 include/net/tcp_ao.h                          |  283 +++
 include/uapi/linux/snmp.h                     |    5 +
 include/uapi/linux/tcp.h                      |   62 +
 net/ipv4/Kconfig                              |   15 +-
 net/ipv4/Makefile                             |    1 +
 net/ipv4/proc.c                               |    5 +
 net/ipv4/tcp.c                                |  191 +-
 net/ipv4/tcp_ao.c                             | 1939 +++++++++++++++++
 net/ipv4/tcp_input.c                          |   94 +-
 net/ipv4/tcp_ipv4.c                           |  385 +++-
 net/ipv4/tcp_minisocks.c                      |   37 +-
 net/ipv4/tcp_output.c                         |  188 +-
 net/ipv6/Kconfig                              |    2 +-
 net/ipv6/Makefile                             |    1 +
 net/ipv6/seg6.c                               |    3 -
 net/ipv6/seg6_hmac.c                          |  204 +-
 net/ipv6/tcp_ao.c                             |  151 ++
 net/ipv6/tcp_ipv6.c                           |  327 ++-
 tools/testing/selftests/Makefile              |    1 +
 tools/testing/selftests/net/tcp_ao/.gitignore |    2 +
 tools/testing/selftests/net/tcp_ao/Makefile   |   50 +
 .../selftests/net/tcp_ao/bench-lookups.c      |  403 ++++
 .../selftests/net/tcp_ao/connect-deny.c       |  217 ++
 tools/testing/selftests/net/tcp_ao/connect.c  |   81 +
 .../selftests/net/tcp_ao/icmps-accept.c       |    1 +
 .../selftests/net/tcp_ao/icmps-discard.c      |  447 ++++
 .../testing/selftests/net/tcp_ao/lib/aolib.h  |  333 +++
 .../selftests/net/tcp_ao/lib/netlink.c        |  341 +++
 tools/testing/selftests/net/tcp_ao/lib/proc.c |  267 +++
 .../testing/selftests/net/tcp_ao/lib/setup.c  |  297 +++
 tools/testing/selftests/net/tcp_ao/lib/sock.c |  294 +++
 .../testing/selftests/net/tcp_ao/lib/utils.c  |   30 +
 .../selftests/net/tcp_ao/setsockopt-closed.c  |  191 ++
 .../selftests/net/tcp_ao/unsigned-md5.c       |  483 ++++
 43 files changed, 7516 insertions(+), 456 deletions(-)
 create mode 100644 crypto/crypto_pool.c
 create mode 100644 include/crypto/pool.h
 create mode 100644 include/net/tcp_ao.h
 create mode 100644 net/ipv4/tcp_ao.c
 create mode 100644 net/ipv6/tcp_ao.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/.gitignore
 create mode 100644 tools/testing/selftests/net/tcp_ao/Makefile
 create mode 100644 tools/testing/selftests/net/tcp_ao/bench-lookups.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/connect-deny.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/connect.c
 create mode 120000 tools/testing/selftests/net/tcp_ao/icmps-accept.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/icmps-discard.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/lib/aolib.h
 create mode 100644 tools/testing/selftests/net/tcp_ao/lib/netlink.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/lib/proc.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/lib/setup.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/lib/sock.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/lib/utils.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/setsockopt-closed.c
 create mode 100644 tools/testing/selftests/net/tcp_ao/unsigned-md5.c


base-commit: e34cfee65ec891a319ce79797dda18083af33a76

Comments

David Ahern Aug. 21, 2022, 11:51 p.m. UTC | #1
On 8/21/22 2:34 PM, Leonard Crestez wrote:
> On 8/18/22 19:59, Dmitry Safonov wrote:
>> This patchset implements the TCP-AO option as described in RFC5925. There
>> is a request from industry to move away from TCP-MD5SIG and it seems
>> the time
>> is right to have a TCP-AO upstreamed. This TCP option is meant to replace
>> the TCP MD5 option and address its shortcomings. Specifically, it
>> provides
>> more secure hashing, key rotation and support for long-lived connections
>> (see the summary of TCP-AO advantages over TCP-MD5 in (1.3) of RFC5925).
>> The patch series starts with six patches that are not specific to TCP-AO
>> but implement a general crypto facility that we thought is useful
>> to eliminate code duplication between TCP-MD5SIG and TCP-AO as well as
>> other
>> crypto users. These six patches are being submitted separately in
>> a different patchset [1]. Including them here will show better the gain
>> in code sharing. Next are 18 patches that implement the actual TCP-AO
>> option,
>> followed by patches implementing selftests.
>>
>> The patch set was written as a collaboration of three authors (in
>> alphabetical
>> order): Dmitry Safonov, Francesco Ruggeri and Salam Noureddine.
>> Additional
>> credits should be given to Prasad Koya, who was involved in early
>> prototyping
>> a few years back. There is also a separate submission done by Leonard
>> Crestez
>> whom we thank for his efforts getting an implementation of RFC5925
>> submitted
>> for review upstream [2]. This is an independent implementation that makes
>> different design decisions.
> 
> Is this based on something that Arista has had running for a while now
> or is a recent new development?
> 

...

> Seeing an entirely distinct unrelated implementation is very unexpected.
> What made you do this?
> 

I am curious as well. You are well aware of Leonard's efforts which go
back a long time, why go off and do a separate implementation?
Salam Noureddine Aug. 22, 2022, 6:42 p.m. UTC | #2
On Sun, Aug 21, 2022 at 1:34 PM Leonard Crestez <cdleonard@gmail.com> wrote:
>
> On 8/18/22 19:59, Dmitry Safonov wrote:
> > This patchset implements the TCP-AO option as described in RFC5925. There
> > is a request from industry to move away from TCP-MD5SIG and it seems the time
> > is right to have a TCP-AO upstreamed. This TCP option is meant to replace
> > the TCP MD5 option and address its shortcomings.
...
> >
> > The patch set was written as a collaboration of three authors (in alphabetical
> > order): Dmitry Safonov, Francesco Ruggeri and Salam Noureddine. Additional
> > credits should be given to Prasad Koya, who was involved in early prototyping
> > a few years back. There is also a separate submission done by Leonard Crestez
> > whom we thank for his efforts getting an implementation of RFC5925 submitted
> > for review upstream [2]. This is an independent implementation that makes
> > different design decisions.
>
> Is this based on something that Arista has had running for a while now
> or is a recent new development?
>

This is based on prototype code we had worked on internally three years
ago. The implementation effort was restarted to get it over the finish
line. For business reasons we had to have our own implementation ready
and not tied to unmerged upstream code. Please also note that our
implementation is based on linux-4.19 and was only ported forward to
mainline recently. So it wasn’t ready to be submitted upstream.

> > For example, we chose a similar design to the TCP-MD5SIG implementation and
> > used setsockopt()s to program per-socket keys, avoiding the extra complexity
> > of managing a centralized key database in the kernel. A centralized database
> > in the kernel has dubious benefits since it doesn’t eliminate per-socket
> > setsockopts needed to specify which sockets need TCP-AO and what are the
> > currently preferred keys. It also complicates traffic key caching and
> > preventing deletion of in-use keys.
>
> My implementation started with per-socket lists but switched to a global
> list because this way is much easier to manage from userspace. In
> practice userspace apps will want to ensure that all sockets use the
> same set of keys anyway.
>

We did consider a global list early on but we didn’t find it
beneficial. We still believe that per-socket lists reduce complexity
of the implementation, are more scalable and ensure predictable
behavior. Our expectation is that TCP-AO will be only useful for a
limited set of routing applications, rather than used transparently
like IPSEC for non-routing apps. We would be happy to discuss this in
more detail.

> > In this implementation, a centralized database of keys can be thought of
> > as living in user space and user applications would have to program those
> > keys on matching sockets. On the server side, the user application programs
> > keys (MKTS in TCP-AO nomenclature) on the listening socket for all peers that
> > are expected to connect. Prefix matching on the peer address is supported.
...
>
> My series doesn't try to prevent inconsistencies inside the key lists
> because it's not clear that the kernel should prevent userspace from
> shooting itself in the foot. Worst case is connection failure on
> misconfiguration which seems fine.
>
> The RFC doesn't specify in detail how key management is to be performed,
> for example if two valid keys are available it doesn't mention which one
> should be used. Some guidance is found in RFC8177 but again not very much.
>
> I implemented an ABI that can be used by userspace for RFC8177-style key
> management and asked for feedback but received very little. If you had
> come with a clear ABI proposal I would have tried to implement it.
>
> Here's a link to our older discussion:
>
> https://lore.kernel.org/netdev/e7f0449a-2bad-99ad-4737-016a0e6b8b84@gmail.com/
>
> Seeing an entirely distinct unrelated implementation is very unexpected.
> What made you do this?
>
> --
> Regards,
> Leonard

Our goal was not to have a competing TCP-AO upstream submission but
to implement the RFC for our customers to use. Had there been an
already upstreamed implementation we would have used it and
implemented customer requirements on top of it. Just like we do with
all other kernel features. This is not a bad situation, we believe it
is good for the upstream community to have two fully functioning
implementations to consider. Possibly a third collaborative
submission might emerge that takes the best of both. A year ago, there
wasn’t much available online about TCP-AO besides the RFC. We are
excited with the current interest in TCP-AO and hope to see it
upstreamed soon.

Best,

Salam
Dmitry Safonov Aug. 22, 2022, 8:35 p.m. UTC | #3
Hi Leonard, David,

On 8/22/22 00:51, David Ahern wrote:
> On 8/21/22 2:34 PM, Leonard Crestez wrote:
>> On 8/18/22 19:59, Dmitry Safonov wrote:
>>> This patchset implements the TCP-AO option as described in RFC5925. There
>>> is a request from industry to move away from TCP-MD5SIG and it seems
>>> the time
>>> is right to have a TCP-AO upstreamed. This TCP option is meant to replace
>>> the TCP MD5 option and address its shortcomings. Specifically, it
>>> provides
>>> more secure hashing, key rotation and support for long-lived connections
>>> (see the summary of TCP-AO advantages over TCP-MD5 in (1.3) of RFC5925).
>>> The patch series starts with six patches that are not specific to TCP-AO
>>> but implement a general crypto facility that we thought is useful
>>> to eliminate code duplication between TCP-MD5SIG and TCP-AO as well as
>>> other
>>> crypto users. These six patches are being submitted separately in
>>> a different patchset [1]. Including them here will show better the gain
>>> in code sharing. Next are 18 patches that implement the actual TCP-AO
>>> option,
>>> followed by patches implementing selftests.
>>>
>>> The patch set was written as a collaboration of three authors (in
>>> alphabetical
>>> order): Dmitry Safonov, Francesco Ruggeri and Salam Noureddine.
>>> Additional
>>> credits should be given to Prasad Koya, who was involved in early
>>> prototyping
>>> a few years back. There is also a separate submission done by Leonard
>>> Crestez
>>> whom we thank for his efforts getting an implementation of RFC5925
>>> submitted
>>> for review upstream [2]. This is an independent implementation that makes
>>> different design decisions.
>>
>> Is this based on something that Arista has had running for a while now
>> or is a recent new development?
>>
> 
> ...
> 
>> Seeing an entirely distinct unrelated implementation is very unexpected.
>> What made you do this?
>>
> 
> I am curious as well. You are well aware of Leonard's efforts which go
> back a long time, why go off and do a separate implementation?

When I started working on this, there was a prototype that was neither
good for upstream, nor for customers. At the moment Leonard submitted
his RFC, I was already giving feedback/reviews to local code and
patches. So, as I was aware of the details of TCP-AO, I started giving
Leonard feedback and reviews, based on what I've learned from RFC/code.
I thought whatever code will make it upstream, it can benefit from my
reviews. Some of my comments were based on a better code I saw locally,
or a way of improving it that I've suggested to both sides.

I'm quite happy that Leonard addressed some of my comments (i.e.
extendable syscalls), I see that it improved his patches.
On the other hand, some of the comments I've left moved to "known
limitations" with no foreseeable plan to fix them, while they were
addressed in local/Arista code.

And now a little bit more than a year later, it seems that the quality
of local patches has reached a point where they can be submitted for
an upstream review. So, please don't misunderstand me, it's not that
"drop your implementation, take ours" and it's not that we've
intentionally hidden that we're working on TCP-AO. It's that it is the
first moment we can make upstream aware of an alternative implementation.

Personally, I think it's best for opensource community:
- Arista's implementation is now public
- there are now at least 4 people that understand RFC5925 and the
  code/details
- in a discussion, it will be possible to find what will be the best
  from both implementations for Linux and come up with better code

At this particular moment, it seems neither of patch sets is ready to be
merged "as-is". But it seems that there's enough interest from both
sides and likely it guarantees that there will be enough effort to make
something merge-able, that will work for all interested parties.

As for my part, I'm interested in the best code upstream, regardless who
is the author. This includes:
- reusing the existing TCP-MD5 code, rather than copying'n'pasting for
  TCP-AO with intent to refactor it some day later
- making setsockopt()s and other syscalls extendable
- cover functionality with selftests
- following RFC5925 in implementation, especially "required" and "must"
  parts

I hope that clarifies how and why now there are two patch sets that
implement the same RFC/functionality.

Thanks,
          Dmitry
Leonard Crestez Aug. 23, 2022, 3:30 p.m. UTC | #4
On 8/22/22 23:35, Dmitry Safonov wrote:
> Hi Leonard, David,
> 
> On 8/22/22 00:51, David Ahern wrote:
>> On 8/21/22 2:34 PM, Leonard Crestez wrote:
>>> On 8/18/22 19:59, Dmitry Safonov wrote:
>>>> This patchset implements the TCP-AO option as described in RFC5925. There
>>>> is a request from industry to move away from TCP-MD5SIG and it seems
>>>> the time
>>>> is right to have a TCP-AO upstreamed. This TCP option is meant to replace
>>>> the TCP MD5 option and address its shortcomings. Specifically, it
>>>> provides
>>>> more secure hashing, key rotation and support for long-lived connections
>>>> (see the summary of TCP-AO advantages over TCP-MD5 in (1.3) of RFC5925).
>>>> The patch series starts with six patches that are not specific to TCP-AO
>>>> but implement a general crypto facility that we thought is useful
>>>> to eliminate code duplication between TCP-MD5SIG and TCP-AO as well as
>>>> other
>>>> crypto users. These six patches are being submitted separately in
>>>> a different patchset [1]. Including them here will show better the gain
>>>> in code sharing. Next are 18 patches that implement the actual TCP-AO
>>>> option,
>>>> followed by patches implementing selftests.
>>>>
>>>> The patch set was written as a collaboration of three authors (in
>>>> alphabetical
>>>> order): Dmitry Safonov, Francesco Ruggeri and Salam Noureddine.
>>>> Additional
>>>> credits should be given to Prasad Koya, who was involved in early
>>>> prototyping
>>>> a few years back. There is also a separate submission done by Leonard
>>>> Crestez
>>>> whom we thank for his efforts getting an implementation of RFC5925
>>>> submitted
>>>> for review upstream [2]. This is an independent implementation that makes
>>>> different design decisions.
>>>
>>> Is this based on something that Arista has had running for a while now
>>> or is a recent new development?
>>>
>>
>> ...
>>
>>> Seeing an entirely distinct unrelated implementation is very unexpected.
>>> What made you do this?
>>>
>>
>> I am curious as well. You are well aware of Leonard's efforts which go
>> back a long time, why go off and do a separate implementation?
> 
> When I started working on this, there was a prototype that was neither
> good for upstream, nor for customers. At the moment Leonard submitted
> his RFC, I was already giving feedback/reviews to local code and
> patches. So, as I was aware of the details of TCP-AO, I started giving
> Leonard feedback and reviews, based on what I've learned from RFC/code.
> I thought whatever code will make it upstream, it can benefit from my
> reviews. Some of my comments were based on a better code I saw locally,
> or a way of improving it that I've suggested to both sides.
> 
> I'm quite happy that Leonard addressed some of my comments (i.e.
> extendable syscalls), I see that it improved his patches.
> On the other hand, some of the comments I've left moved to "known
> limitations" with no foreseeable plan to fix them, while they were
> addressed in local/Arista code.
> 
> And now a little bit more than a year later, it seems that the quality
> of local patches has reached a point where they can be submitted for
> an upstream review. So, please don't misunderstand me, it's not that
> "drop your implementation, take ours" and it's not that we've
> intentionally hidden that we're working on TCP-AO. It's that it is the
> first moment we can make upstream aware of an alternative implementation.
> 
> Personally, I think it's best for opensource community:
> - Arista's implementation is now public
> - there are now at least 4 people that understand RFC5925 and the
>    code/details
> - in a discussion, it will be possible to find what will be the best
>    from both implementations for Linux and come up with better code
> 
> At this particular moment, it seems neither of patch sets is ready to be
> merged "as-is". But it seems that there's enough interest from both
> sides and likely it guarantees that there will be enough effort to make
> something merge-able, that will work for all interested parties.
> 
> As for my part, I'm interested in the best code upstream, regardless who
> is the author. This includes:
> - reusing the existing TCP-MD5 code, rather than copying'n'pasting for
>    TCP-AO with intent to refactor it some day later

I had a requirement to deploy on linux 5.4 so I very deliberately 
avoided touching MD5. I'm not sure there very much duplication anyway.

> - making setsockopt()s and other syscalls extendable
> - cover functionality with selftests

My implementation is tested with a standalone python package, this is a 
design choice which doesn't particularly matter.

> - following RFC5925 in implementation, especially "required" and "must"
>    parts

I'm not convinced that "don't delete current key" needs to be literally 
interpreted as a hard requirement for the linux ABI. Most TCP RFCs don't 
specify any sort of API at all and it would be entirely valid to 
implement BGP-TCP-AO as a single executable with no internally 
documented boundaries.

> I hope that clarifies how and why now there are two patch sets that
> implement the same RFC/functionality.

As far as I can tell the biggest problem is that is quite difficult to 
implement the userspace side of TCP-AO complete with key rollover. Our 
two implementation both claim to support this but through different ABI 
and both require active management from userspace.

I think it would make sense to push key validity times and the key 
selection policy entirely in the kernel so that it can handle key 
rotation/expiration by itself. This way userspace only has to configure 
the keys and doesn't have to touch established connections at all.

My series has a "flags" field on the key struct where it can filter by 
IP, prefix, ifindex and so on. It would be possible to add additional 
flags for making the key only valid between certain times (by wall time).

The kernel could then make key selections itself:
  - send rnextkeyid based on the key with the longest recv lifetime
  - send keyid based on remote rnextkeyid.
    - If not applicable (rnextkeyid not found locally, or for SYN) pick 
based on longest send lifetime.
  - If all keys expire then return an error on write()
  - Solve other ambiguities in a predictable way: if valid times are 
equal then pick the lowest numeric send_id or recv_id.

Explicit key selection from userspace could still be supported but it 
would be optional and most apps wouldn't bother implementing their own 
policy. The biggest advantage is that it would be much easier for 
applications to adopt TCP-AO.

--
Regards,
Leonard
Dmitry Safonov Aug. 23, 2022, 4:31 p.m. UTC | #5
On 8/23/22 16:30, Leonard Crestez wrote:
> On 8/22/22 23:35, Dmitry Safonov wrote:
>> Hi Leonard, David,
[..]
>> At this particular moment, it seems neither of patch sets is ready to be
>> merged "as-is". But it seems that there's enough interest from both
>> sides and likely it guarantees that there will be enough effort to make
>> something merge-able, that will work for all interested parties.
>>
>> As for my part, I'm interested in the best code upstream, regardless who
>> is the author. This includes:
>> - reusing the existing TCP-MD5 code, rather than copying'n'pasting for
>>    TCP-AO with intent to refactor it some day later
> 
> I had a requirement to deploy on linux 5.4 so I very deliberately
> avoided touching MD5. I'm not sure there very much duplication anyway.

Yeah, I know what you mean: we deployed it on v4.19. But for the code
upstream I personally prefer to see "reusing" rather than copying.
Lesser code is easier to maintain in future.
Upstream submissions in my view should be based on "what would be easier
to maintain in future", rather than on "what would be easier to backport
to my maintenance release".

>> - making setsockopt()s and other syscalls extendable
>> - cover functionality with selftests
> 
> My implementation is tested with a standalone python package, this is a
> design choice which doesn't particularly matter.
> 
>> - following RFC5925 in implementation, especially "required" and "must"
>>    parts
> 
> I'm not convinced that "don't delete current key" needs to be literally
> interpreted as a hard requirement for the linux ABI. Most TCP RFCs don't
> specify any sort of API at all and it would be entirely valid to
> implement BGP-TCP-AO as a single executable with no internally
> documented boundaries.

I agree that RFC requirements and "musts" can be implemented in
userspace, rather than in kernel. On the other hand, my opinion is that
if you have "must"/"must not"/"required" in RFC and it's not hard to
limit those in kernel, than you _should_ do it.
In this point of view, debugging "hey, setsockopt() for key removal
returned -EBUSY, what's going on?" is better than "hey, tcp connection
died on my side and I didn't have tcp dump running, what was that?".

>> I hope that clarifies how and why now there are two patch sets that
>> implement the same RFC/functionality.
> 
> As far as I can tell the biggest problem is that is quite difficult to
> implement the userspace side of TCP-AO complete with key rollover. Our
> two implementation both claim to support this but through different ABI
> and both require active management from userspace.
> 
> I think it would make sense to push key validity times and the key
> selection policy entirely in the kernel so that it can handle key
> rotation/expiration by itself. This way userspace only has to configure
> the keys and doesn't have to touch established connections at all.

Respectfully I disagree here. I think all such policies should be
implemented in userspace. The kernel has to have as lesser as possible,
but enough to provide a way to sign, verify, log messages on TCP segments.
All the logic that may change, all business decisions and key management
should be implemented in userspace, keeping the kernel part as easier in
"KISS" sense as possible.

> My series has a "flags" field on the key struct where it can filter by
> IP, prefix, ifindex and so on. It would be possible to add additional
> flags for making the key only valid between certain times (by wall time).
> 
> The kernel could then make key selections itself:
>  - send rnextkeyid based on the key with the longest recv lifetime
>  - send keyid based on remote rnextkeyid.
>    - If not applicable (rnextkeyid not found locally, or for SYN) pick
> based on longest send lifetime.
>  - If all keys expire then return an error on write()
>  - Solve other ambiguities in a predictable way: if valid times are
> equal then pick the lowest numeric send_id or recv_id.
> 
> Explicit key selection from userspace could still be supported but it
> would be optional and most apps wouldn't bother implementing their own
> policy. The biggest advantage is that it would be much easier for
> applications to adopt TCP-AO.

Personally, I would think that all you mentioned better stay in
userspace app. The kernel should do as minimal and as much predictable
as possible job here, without 10 possible outcomes. If you want to share
the logic of key rotation/expiration, all timers and synchronization
between different BGP applications, that would be a proper job for a
shared library.

Thanks,
          Dmitry
Andrew Lunn Aug. 24, 2022, 12:46 p.m. UTC | #6
> I think it would make sense to push key validity times and the key selection
> policy entirely in the kernel so that it can handle key rotation/expiration
> by itself. This way userspace only has to configure the keys and doesn't
> have to touch established connections at all.

I know nothing aobut TCP-AO, nor much about kTLS. But doesn't kTLS
have the same issue? Is there anything which can be learnt from kTLS?
Maybe the same mechanisms can be used? No point inventing something
new if you can copy/refactor working code?

> My series has a "flags" field on the key struct where it can filter by IP,
> prefix, ifindex and so on. It would be possible to add additional flags for
> making the key only valid between certain times (by wall time).

What out for wall clock time, it jumps around in funny ways. Plus the
kernel has no idea what time zone the wall the wall clock is mounted
on is in.

    Andrew
Leonard Crestez Aug. 27, 2022, 8:55 a.m. UTC | #7
On 8/24/22 15:46, Andrew Lunn wrote:
>> I think it would make sense to push key validity times and the key selection
>> policy entirely in the kernel so that it can handle key rotation/expiration
>> by itself. This way userspace only has to configure the keys and doesn't
>> have to touch established connections at all.
> 
> I know nothing aobut TCP-AO, nor much about kTLS. But doesn't kTLS
> have the same issue? Is there anything which can be learnt from kTLS?
> Maybe the same mechanisms can be used? No point inventing something
> new if you can copy/refactor working code?
> 
>> My series has a "flags" field on the key struct where it can filter by IP,
>> prefix, ifindex and so on. It would be possible to add additional flags for
>> making the key only valid between certain times (by wall time).
> 
> What out for wall clock time, it jumps around in funny ways. Plus the
> kernel has no idea what time zone the wall the wall clock is mounted
> on is in.

A close equivalent seems to exist in ipsec in the "xfrm_lifetime_cfg" 
struct, specifically the soft/hard expires timers. These are optional 
validity times for each xfrm_state which is equivalent to a "key".

I'm not familiar with how those are used but ipsec usually relies on 
complex userspace daemons for managing xfrm states and policies and 
those daemons should be capable of adding and removing keys based on 
internal timers. Still, the linux kernel supports checking for key 
validity on it's own.

--
Regards,
Leonard