Re: git: 7e5bf68495cc - main - netlink: add netlink support
Date: Thu, 09 Mar 2023 15:38:14 UTC
On 1 Oct 2022, at 16:19, Alexander V. Chernikov wrote:
> The branch main has been updated by melifaro:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=7e5bf68495cc0a8c9793a338a8a02009a7f6dbb6
>
> commit 7e5bf68495cc0a8c9793a338a8a02009a7f6dbb6
> Author: Alexander V. Chernikov <melifaro@FreeBSD.org>
> AuthorDate: 2022-01-20 21:39:21 +0000
> Commit: Alexander V. Chernikov <melifaro@FreeBSD.org>
> CommitDate: 2022-10-01 14:15:35 +0000
>
> netlink: add netlink support
>
> Netlinks is a communication protocol currently used in Linux
> kernel to modify,
> read and subscribe for nearly all networking state. Interfaces,
> addresses, routes,
> firewall, fibs, vnets, etc are controlled via netlink.
> It is async, TLV-based protocol, providing 1-1 and 1-many
> communications.
>
> The current implementation supports the subset of NETLINK_ROUTE
> family. To be more specific, the following is supported:
> * Dumps:
> - routes
> - nexthops / nexthop groups
> - interfaces
> - interface addresses
> - neighbors (arp/ndp)
> * Notifications:
> - interface arrival/departure
> - interface address arrival/departure
> - route addition/deletion
> * Modifications:
> - adding/deleting routes
> - adding/deleting nexthops/nexthops groups
> - adding/deleting neghbors
> - adding/deleting interfaces (basic support only)
> * Rtsock interaction
> - route events are bridged both ways
>
> The implementation also supports the NETLINK_GENERIC family
> framework.
>
> Implementation notes:
> Netlink is implemented via loadable/unloadable kernel module,
> not touching many kernel parts.
> Each netlink socket uses dedicated taskqueue to support async
> operations
> that can sleep, such as interface creation. All message
> processing is
> performed within these taskqueues.
>
> Compatibility:
> Most of the Netlink data models specified above maps to FreeBSD
> concepts
> nicely. Unmodified ip(8) binary correctly works with
> interfaces, addresses, routes, nexthops and nexthop groups. Some
> software such as net/bird require header-only modifications to
> compile
> and work with FreeBSD netlink.
>
> Reviewed by: imp
> Differential Revision: https://reviews.freebsd.org/D36002
> MFC after: 2 months
> ---
> etc/mtree/BSD.include.dist | 4 +
> sys/modules/Makefile | 1 +
> sys/modules/netlink/Makefile | 17 +
> sys/net/route.c | 11 +
> sys/net/route/route_ctl.h | 7 +
> sys/net/rtsock.c | 42 ++
> sys/netlink/netlink.h | 257 +++++++++
> sys/netlink/netlink_ctl.h | 102 ++++
> sys/netlink/netlink_debug.h | 82 +++
> sys/netlink/netlink_domain.c | 689 +++++++++++++++++++++++
> sys/netlink/netlink_generic.c | 472 ++++++++++++++++
> sys/netlink/netlink_generic.h | 112 ++++
> sys/netlink/netlink_io.c | 528 ++++++++++++++++++
> sys/netlink/netlink_linux.h | 54 ++
> sys/netlink/netlink_message_parser.c | 472 ++++++++++++++++
> sys/netlink/netlink_message_parser.h | 270 +++++++++
> sys/netlink/netlink_message_writer.c | 686 +++++++++++++++++++++++
> sys/netlink/netlink_message_writer.h | 250 +++++++++
> sys/netlink/netlink_module.c | 228 ++++++++
> sys/netlink/netlink_route.c | 135 +++++
> sys/netlink/netlink_route.h | 43 ++
> sys/netlink/netlink_var.h | 142 +++++
> sys/netlink/route/common.h | 213 ++++++++
> sys/netlink/route/iface.c | 857
> +++++++++++++++++++++++++++++
> sys/netlink/route/iface_drivers.c | 165 ++++++
> sys/netlink/route/ifaddrs.h | 90 +++
> sys/netlink/route/interface.h | 245 +++++++++
> sys/netlink/route/neigh.c | 571 +++++++++++++++++++
> sys/netlink/route/neigh.h | 105 ++++
> sys/netlink/route/nexthop.c | 1000
> ++++++++++++++++++++++++++++++++++
> sys/netlink/route/nexthop.h | 102 ++++
> sys/netlink/route/route.c | 972
> +++++++++++++++++++++++++++++++++
> sys/netlink/route/route.h | 366 +++++++++++++
> sys/netlink/route/route_var.h | 101 ++++
> 34 files changed, 9391 insertions(+)
>
> diff --git a/sys/netlink/netlink.h b/sys/netlink/netlink.h
> new file mode 100644
> index 000000000000..6a68dcec1382
> --- /dev/null
> +++ b/sys/netlink/netlink.h
> @@ -0,0 +1,257 @@
> +/*-
> + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
> + *
> + * Copyright (c) 2021 Ng Peng Nam Sean
> + * Copyright (c) 2022 Alexander V. Chernikov <melifaro@FreeBSD.org>
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above
> copyright
> + * notice, this list of conditions and the following disclaimer in
> the
> + * documentation and/or other materials provided with the
> distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
> AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE
> LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
> GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Copyright (C) The Internet Society (2003). All Rights Reserved.
> + *
> + * This document and translations of it may be copied and furnished
> to
> + * others, and derivative works that comment on or otherwise explain
> it
> + * or assist in its implementation may be prepared, copied, published
> + * and distributed, in whole or in part, without restriction of any
> + * kind, provided that the above copyright notice and this paragraph
> are
> + * included on all such copies and derivative works. However, this
> + * document itself may not be modified in any way, such as by
> removing
> + * the copyright notice or references to the Internet Society or
> other
> + * Internet organizations, except as needed for the purpose of
> + * developing Internet standards in which case the procedures for
> + * copyrights defined in the Internet Standards process must be
> + * followed, or as required to translate it into languages other than
> + * English.
> + *
> + * The limited permissions granted above are perpetual and will not
> be
> + * revoked by the Internet Society or its successors or assignees.
> + *
> + * This document and the information contained herein is provided on
> an
> + * "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
> ENGINEERING
> + * TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
> + * BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
> + * HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> +
> + */
> +
> +/*
> + * This file contains structures and constants for RFC 3549 (Netlink)
> + * protocol. Some values have been taken from Linux implementation.
> + */
> +
> +#ifndef _NETLINK_NETLINK_H_
> +#define _NETLINK_NETLINK_H_
> +
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +
> +struct sockaddr_nl {
> + uint8_t nl_len; /* sizeof(sockaddr_nl) */
> + sa_family_t nl_family; /* netlink family */
> + uint16_t nl_pad; /* reserved, set to 0 */
> + uint32_t nl_pid; /* desired port ID, 0 for auto-select */
> + uint32_t nl_groups; /* multicast groups mask to bind to */
> +};
> +
> +#define SOL_NETLINK 270
> +
> +/* Netlink socket options */
> +#define NETLINK_ADD_MEMBERSHIP 1 /* Subscribe for the specified
> group notifications */
> +#define NETLINK_DROP_MEMBERSHIP 2 /* Unsubscribe from the specified
> group */
> +#define NETLINK_PKTINFO 3 /* XXX: not supported */
> +#define NETLINK_BROADCAST_ERROR 4 /* XXX: not supported */
> +#define NETLINK_NO_ENOBUFS 5 /* XXX: not supported */
> +#define NETLINK_RX_RING 6 /* XXX: not supported */
> +#define NETLINK_TX_RING 7 /* XXX: not supported */
> +#define NETLINK_LISTEN_ALL_NSID 8 /* XXX: not supported */
> +
> +#define NETLINK_LIST_MEMBERSHIPS 9
> +#define NETLINK_CAP_ACK 10 /* Send only original message header in
> the reply */
> +#define NETLINK_EXT_ACK 11 /* Ack support for receiving additional
> TLVs in ack */
> +#define NETLINK_GET_STRICT_CHK 12 /* Strict header checking */
> +
> +
> +/*
> + * RFC 3549, 2.3.2 Netlink Message Header
> + */
> +struct nlmsghdr {
> + uint32_t nlmsg_len; /* Length of message including header */
> + uint16_t nlmsg_type; /* Message type identifier */
> + uint16_t nlmsg_flags; /* Flags (NLM_F_) */
> + uint32_t nlmsg_seq; /* Sequence number */
> + uint32_t nlmsg_pid; /* Sending process port ID */
> +};
> +
> +/*
> + * RFC 3549, 2.3.2 standard flag bits (nlmsg_flags)
> + */
> +#define NLM_F_REQUEST 0x01 /* Indicateds request to kernel */
> +#define NLM_F_MULTI 0x02 /* Message is part of a group terminated by
> NLMSG_DONE msg */
> +#define NLM_F_ACK 0x04 /* Reply with ack message containing
> resulting error code */
> +#define NLM_F_ECHO 0x08 /* (not supported) Echo this request back */
> +#define NLM_F_DUMP_INTR 0x10 /* Dump was inconsistent due to
> sequence change */
> +#define NLM_F_DUMP_FILTERED 0x20 /* Dump was filtered as requested */
> +
> +/*
> + * RFC 3549, 2.3.2 Additional flag bits for GET requests
> + */
> +#define NLM_F_ROOT 0x100 /* Return the complete table */
> +#define NLM_F_MATCH 0x200 /* Return all entries matching criteria */
> +#define NLM_F_ATOMIC 0x400 /* Return an atomic snapshot (ignored) */
> +#define NLM_F_DUMP (NLM_F_ROOT | NLM_F_MATCH)
> +
> +/*
> + * RFC 3549, 2.3.2 Additional flag bits for NEW requests
> + */
> +#define NLM_F_REPLACE 0x100 /* Replace existing matching config
> object */
> +#define NLM_F_EXCL 0x200 /* Don't replace the object if exists */
> +#define NLM_F_CREATE 0x400 /* Create if it does not exist */
> +#define NLM_F_APPEND 0x800 /* Add to end of list */
> +
> +/* Modifiers to DELETE requests */
> +#define NLM_F_NONREC 0x100 /* Do not delete recursively */
> +
> +/* Flags for ACK message */
> +#define NLM_F_CAPPED 0x100 /* request was capped */
> +#define NLM_F_ACK_TLVS 0x200 /* extended ACK TVLs were included */
> +
> +/*
> + * RFC 3549, 2.3.2 standard message types (nlmsg_type).
> + */
> +#define NLMSG_NOOP 0x1 /* Message is ignored. */
> +#define NLMSG_ERROR 0x2 /* reply error code reporting */
> +#define NLMSG_DONE 0x3 /* Message terminates a multipart message. */
> +#define NLMSG_OVERRUN 0x4 /* overrun detected, data is lost */
> +
> +#define NLMSG_MIN_TYPE 0x10 /* < 0x10: reserved control messages */
> +
> +/*
> + * Defition of numbers assigned to the netlink subsystems.
> + */
> +#define NETLINK_ROUTE 0 /* Routing/device hook */
> +#define NETLINK_UNUSED 1 /* not supported */
> +#define NETLINK_USERSOCK 2 /* not supported */
> +#define NETLINK_FIREWALL 3 /* not supported */
> +#define NETLINK_SOCK_DIAG 4 /* not supported */
> +#define NETLINK_NFLOG 5 /* not supported */
> +#define NETLINK_XFRM 6 /* (not supported) PF_SETKEY */
> +#define NETLINK_SELINUX 7 /* not supported */
> +#define NETLINK_ISCSI 8 /* not supported */
> +#define NETLINK_AUDIT 9 /* not supported */
> +#define NETLINK_FIB_LOOKUP 10 /* not supported */
> +#define NETLINK_CONNECTOR 11 /* not supported */
> +#define NETLINK_NETFILTER 12 /* not supported */
> +#define NETLINK_IP6_FW 13 /* not supported */
> +#define NETLINK_DNRTMSG 14 /* not supported */
> +#define NETLINK_KOBJECT_UEVENT 15 /* not supported */
> +#define NETLINK_GENERIC 16 /* Generic netlink (dynamic families) */
> +
So, really fun thing here, we also have `#define NETLINK_GENERIC 0` in
sys/net/if_mib.h. (And that’s exposed to userspace, and used there, so
we can’t just change that.)
Which leads to much fun if we decided to do something like including the
netlink_generic header in other headers, so we can define messages that
contain the genlmsghdr struct.
I ran into that experimenting with netlink for carp(4). I think I can
work around it by adding a separate ip_carp_nl.h header for the netlink
stuff, but sooner or later this is going to bite us.
Kristof