Re: git: 7e5bf68495cc - main - netlink: add netlink support

From: Kristof Provost <kp_at_FreeBSD.org>
Date: Thu, 09 Mar 2023 15:38:14 UTC
On 1 Oct 2022, at 16:19, Alexander V. Chernikov wrote:
> The branch main has been updated by melifaro:
>
> URL: 
> https://cgit.FreeBSD.org/src/commit/?id=7e5bf68495cc0a8c9793a338a8a02009a7f6dbb6
>
> commit 7e5bf68495cc0a8c9793a338a8a02009a7f6dbb6
> Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
> AuthorDate: 2022-01-20 21:39:21 +0000
> Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
> CommitDate: 2022-10-01 14:15:35 +0000
>
>     netlink: add netlink support
>
>     Netlinks is a communication protocol currently used in Linux 
> kernel to modify,
>      read and subscribe for nearly all networking state. Interfaces, 
> addresses, routes,
>      firewall, fibs, vnets, etc are controlled via netlink.
>     It is async, TLV-based protocol, providing 1-1 and 1-many 
> communications.
>
>     The current implementation supports the subset of NETLINK_ROUTE
>     family. To be more specific, the following is supported:
>     * Dumps:
>      - routes
>      - nexthops / nexthop groups
>      - interfaces
>      - interface addresses
>      - neighbors (arp/ndp)
>     * Notifications:
>      - interface arrival/departure
>      - interface address arrival/departure
>      - route addition/deletion
>     * Modifications:
>      - adding/deleting routes
>      - adding/deleting nexthops/nexthops groups
>      - adding/deleting neghbors
>      - adding/deleting interfaces (basic support only)
>     * Rtsock interaction
>      - route events are bridged both ways
>
>     The implementation also supports the NETLINK_GENERIC family 
> framework.
>
>     Implementation notes:
>     Netlink is implemented via loadable/unloadable kernel module,
>      not touching many kernel parts.
>     Each netlink socket uses dedicated taskqueue to support async 
> operations
>      that can sleep, such as interface creation. All message 
> processing is
>      performed within these taskqueues.
>
>     Compatibility:
>     Most of the Netlink data models specified above maps to FreeBSD 
> concepts
>      nicely. Unmodified ip(8) binary correctly works with
>     interfaces, addresses, routes, nexthops and nexthop groups. Some
>     software such as net/bird require header-only modifications to 
> compile
>     and work with FreeBSD netlink.
>
>     Reviewed by:    imp
>     Differential Revision: https://reviews.freebsd.org/D36002
>     MFC after:      2 months
> ---
>  etc/mtree/BSD.include.dist           |    4 +
>  sys/modules/Makefile                 |    1 +
>  sys/modules/netlink/Makefile         |   17 +
>  sys/net/route.c                      |   11 +
>  sys/net/route/route_ctl.h            |    7 +
>  sys/net/rtsock.c                     |   42 ++
>  sys/netlink/netlink.h                |  257 +++++++++
>  sys/netlink/netlink_ctl.h            |  102 ++++
>  sys/netlink/netlink_debug.h          |   82 +++
>  sys/netlink/netlink_domain.c         |  689 +++++++++++++++++++++++
>  sys/netlink/netlink_generic.c        |  472 ++++++++++++++++
>  sys/netlink/netlink_generic.h        |  112 ++++
>  sys/netlink/netlink_io.c             |  528 ++++++++++++++++++
>  sys/netlink/netlink_linux.h          |   54 ++
>  sys/netlink/netlink_message_parser.c |  472 ++++++++++++++++
>  sys/netlink/netlink_message_parser.h |  270 +++++++++
>  sys/netlink/netlink_message_writer.c |  686 +++++++++++++++++++++++
>  sys/netlink/netlink_message_writer.h |  250 +++++++++
>  sys/netlink/netlink_module.c         |  228 ++++++++
>  sys/netlink/netlink_route.c          |  135 +++++
>  sys/netlink/netlink_route.h          |   43 ++
>  sys/netlink/netlink_var.h            |  142 +++++
>  sys/netlink/route/common.h           |  213 ++++++++
>  sys/netlink/route/iface.c            |  857 
> +++++++++++++++++++++++++++++
>  sys/netlink/route/iface_drivers.c    |  165 ++++++
>  sys/netlink/route/ifaddrs.h          |   90 +++
>  sys/netlink/route/interface.h        |  245 +++++++++
>  sys/netlink/route/neigh.c            |  571 +++++++++++++++++++
>  sys/netlink/route/neigh.h            |  105 ++++
>  sys/netlink/route/nexthop.c          | 1000 
> ++++++++++++++++++++++++++++++++++
>  sys/netlink/route/nexthop.h          |  102 ++++
>  sys/netlink/route/route.c            |  972 
> +++++++++++++++++++++++++++++++++
>  sys/netlink/route/route.h            |  366 +++++++++++++
>  sys/netlink/route/route_var.h        |  101 ++++
>  34 files changed, 9391 insertions(+)
>
> diff --git a/sys/netlink/netlink.h b/sys/netlink/netlink.h
> new file mode 100644
> index 000000000000..6a68dcec1382
> --- /dev/null
> +++ b/sys/netlink/netlink.h
> @@ -0,0 +1,257 @@
> +/*-
> + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
> + *
> + * Copyright (c) 2021 Ng Peng Nam Sean
> + * Copyright (c) 2022 Alexander V. Chernikov <melifaro@FreeBSD.org>
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above 
> copyright
> + *    notice, this list of conditions and the following disclaimer in 
> the
> + *    documentation and/or other materials provided with the 
> distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' 
> AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 
> THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
> PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE 
> LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
> CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE 
> GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
> INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
> CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
> ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 
> POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Copyright (C) The Internet Society (2003).  All Rights Reserved.
> + *
> + * This document and translations of it may be copied and furnished 
> to
> + * others, and derivative works that comment on or otherwise explain 
> it
> + * or assist in its implementation may be prepared, copied, published
> + * and distributed, in whole or in part, without restriction of any
> + * kind, provided that the above copyright notice and this paragraph 
> are
> + * included on all such copies and derivative works.  However, this
> + * document itself may not be modified in any way, such as by 
> removing
> + * the copyright notice or references to the Internet Society or 
> other
> + * Internet organizations, except as needed for the purpose of
> + * developing Internet standards in which case the procedures for
> + * copyrights defined in the Internet Standards process must be
> + * followed, or as required to translate it into languages other than
> + * English.
> + *
> + * The limited permissions granted above are perpetual and will not 
> be
> + * revoked by the Internet Society or its successors or assignees.
> + *
> + * This document and the information contained herein is provided on 
> an
> + * "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 
> ENGINEERING
> + * TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
> + * BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
> + * HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> +
> + */
> +
> +/*
> + * This file contains structures and constants for RFC 3549 (Netlink)
> + * protocol. Some values have been taken from Linux implementation.
> + */
> +
> +#ifndef _NETLINK_NETLINK_H_
> +#define _NETLINK_NETLINK_H_
> +
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +
> +struct sockaddr_nl {
> +	uint8_t		nl_len;		/* sizeof(sockaddr_nl) */
> +	sa_family_t	nl_family;	/* netlink family */
> +	uint16_t	nl_pad;		/* reserved, set to 0 */
> +	uint32_t	nl_pid;		/* desired port ID, 0 for auto-select */
> +	uint32_t	nl_groups;	/* multicast groups mask to bind to */
> +};
> +
> +#define	SOL_NETLINK			270
> +
> +/* Netlink socket options */
> +#define NETLINK_ADD_MEMBERSHIP		1 /* Subscribe for the specified 
> group notifications */
> +#define NETLINK_DROP_MEMBERSHIP		2 /* Unsubscribe from the specified 
> group */
> +#define NETLINK_PKTINFO			3 /* XXX: not supported */
> +#define NETLINK_BROADCAST_ERROR		4 /* XXX: not supported */
> +#define NETLINK_NO_ENOBUFS		5 /* XXX: not supported */
> +#define NETLINK_RX_RING			6 /* XXX: not supported */
> +#define NETLINK_TX_RING			7 /* XXX: not supported */
> +#define NETLINK_LISTEN_ALL_NSID		8 /* XXX: not supported */
> +
> +#define NETLINK_LIST_MEMBERSHIPS	9
> +#define NETLINK_CAP_ACK			10 /* Send only original message header in 
> the reply */
> +#define NETLINK_EXT_ACK			11 /* Ack support for receiving additional 
> TLVs in ack */
> +#define NETLINK_GET_STRICT_CHK		12 /* Strict header checking */
> +
> +
> +/*
> + * RFC 3549, 2.3.2 Netlink Message Header
> + */
> +struct nlmsghdr {
> +	uint32_t nlmsg_len;   /* Length of message including header */
> +	uint16_t nlmsg_type;  /* Message type identifier */
> +	uint16_t nlmsg_flags; /* Flags (NLM_F_) */
> +	uint32_t nlmsg_seq;   /* Sequence number */
> +	uint32_t nlmsg_pid;   /* Sending process port ID */
> +};
> +
> +/*
> + * RFC 3549, 2.3.2 standard flag bits (nlmsg_flags)
> + */
> +#define NLM_F_REQUEST		0x01	/* Indicateds request to kernel */
> +#define NLM_F_MULTI		0x02	/* Message is part of a group terminated by 
> NLMSG_DONE msg */
> +#define NLM_F_ACK		0x04	/* Reply with ack message containing 
> resulting error code */
> +#define NLM_F_ECHO		0x08	/* (not supported) Echo this request back */
> +#define NLM_F_DUMP_INTR		0x10	/* Dump was inconsistent due to 
> sequence change */
> +#define NLM_F_DUMP_FILTERED	0x20	/* Dump was filtered as requested */
> +
> +/*
> + * RFC 3549, 2.3.2 Additional flag bits for GET requests
> + */
> +#define NLM_F_ROOT		0x100	/* Return the complete table */
> +#define NLM_F_MATCH		0x200	/* Return all entries matching criteria */
> +#define NLM_F_ATOMIC		0x400	/* Return an atomic snapshot (ignored) */
> +#define NLM_F_DUMP		(NLM_F_ROOT | NLM_F_MATCH)
> +
> +/*
> + * RFC 3549, 2.3.2 Additional flag bits for NEW requests
> + */
> +#define NLM_F_REPLACE		0x100	/* Replace existing matching config 
> object */
> +#define NLM_F_EXCL		0x200	/* Don't replace the object if exists */
> +#define NLM_F_CREATE		0x400	/* Create if it does not exist */
> +#define NLM_F_APPEND		0x800	/* Add to end of list */
> +
> +/* Modifiers to DELETE requests */
> +#define NLM_F_NONREC		0x100	/* Do not delete recursively */
> +
> +/* Flags for ACK message */
> +#define NLM_F_CAPPED		0x100	/* request was capped */
> +#define NLM_F_ACK_TLVS		0x200	/* extended ACK TVLs were included */
> +
> +/*
> + * RFC 3549, 2.3.2 standard message types (nlmsg_type).
> + */
> +#define NLMSG_NOOP		0x1	/* Message is ignored. */
> +#define NLMSG_ERROR		0x2	/* reply error code reporting */
> +#define NLMSG_DONE		0x3	/* Message terminates a multipart message. */
> +#define NLMSG_OVERRUN		0x4	/* overrun detected, data is lost */
> +
> +#define NLMSG_MIN_TYPE		0x10	/* < 0x10: reserved control messages */
> +
> +/*
> + * Defition of numbers assigned to the netlink subsystems.
> + */
> +#define NETLINK_ROUTE		0	/* Routing/device hook */
> +#define NETLINK_UNUSED		1	/* not supported */
> +#define NETLINK_USERSOCK	2	/* not supported */
> +#define NETLINK_FIREWALL	3	/* not supported */
> +#define NETLINK_SOCK_DIAG	4	/* not supported */
> +#define NETLINK_NFLOG		5	/* not supported */
> +#define NETLINK_XFRM		6	/* (not supported) PF_SETKEY */
> +#define NETLINK_SELINUX		7	/* not supported */
> +#define NETLINK_ISCSI		8	/* not supported */
> +#define NETLINK_AUDIT		9	/* not supported */
> +#define NETLINK_FIB_LOOKUP	10	/* not supported */
> +#define NETLINK_CONNECTOR	11	/* not supported */
> +#define NETLINK_NETFILTER	12	/* not supported */
> +#define NETLINK_IP6_FW		13	/* not supported  */
> +#define NETLINK_DNRTMSG		14	/* not supported */
> +#define NETLINK_KOBJECT_UEVENT	15	/* not supported */
> +#define NETLINK_GENERIC		16	/* Generic netlink (dynamic families) */
> +
So, really fun thing here, we also have `#define NETLINK_GENERIC 0` in 
sys/net/if_mib.h. (And that’s exposed to userspace, and used there, so 
we can’t just change that.)

Which leads to much fun if we decided to do something like including the 
netlink_generic header in other headers, so we can define messages that 
contain the genlmsghdr struct.

I ran into that experimenting with netlink for carp(4). I think I can 
work around it by adding a separate ip_carp_nl.h header for the netlink 
stuff, but sooner or later this is going to bite us.

Kristof