BGP Unnumbered on NixOS

This is a follow up on my investigation into Broadcast OSPF on nixos yesterday.

A year ago I was researching bgp unnumbered on debian - and now that I’ve built up enough infrastructure to rapidly deploy and configure nixos boxes, its time to revisit it on nixos.


BGP unnumbered remains in my opinion the holy grail of resilient, high performance host networking. We get true multipathing and fast convergence if a cable is pulled - without dropping TCP connections - all without L2 ‘hacks’ such as floating ips, stp, lacp, or mlag. ‘Unnumbered’ means participating interfaces don’t need ip addresses; hosts simply have /32 loopback addresses.

Why does all this matter? Every line of configuration costs something. With bgp unnumbered, we get a lot of advanced functionality for extremely minimal configuration.


In this lab, im deploying two nodes with two direct links to the other node. They are each connected with a single link to a gateway router.

Heres the inventory im using with nixa:

routers:
  hosts:
    10.0.0.1:
      loopback: 10.0.0.1
      as_number: 64790
      bgp_interfaces:
        - enp4s0
        - enp0s20f0
        - enp0s20f1
        - enp0s20f2
        - enp0s20f3
    10.0.0.2:
      loopback: 10.0.0.2
      as_number: 64791
      bgp_interfaces:
        - enp4s0
        - enp0s20f0
        - enp0s20f1
        - enp0s20f2
        - enp0s20f3
  templates:
    - bgp.nix
  nix-channel: nixos-24.11

And here are the relevant bits of the bgp.nix template:

  networking = {
    hostName = "{{attrs.hostname}}";
    nameservers = [ "1.1.1.1" ];
    firewall.enable = false;
    dhcpcd.enable = false;
    interfaces.lo.ipv4.addresses = [{
      address = "{{hostvars["loopback"]}}";
      prefixLength = 32;
    }];
{% for i in hostvars["bgp_interfaces"] %}
    interfaces.{{i}}.useDHCP = false;
{% endfor %}
  };

  boot.kernel.sysctl = {
    "net.ipv4.conf.all.forwarding" = 1;
    "net.ipv4.fib_multipath_hash_policy" = 1;
    "net.ipv4.fib_multipath_use_neigh" = 1;
  };

  services.frr = {
    bgpd.enable = true;
    config = ''
      log syslog
      debug bgp
      frr defaults datacenter

      router bgp {{hostvars["as_number"]}}
        bgp router-id {{hostvars["loopback"]}}
        bgp fast-convergence
        bgp bestpath compare-routerid
        bgp bestpath as-path multipath-relax
{% for i in hostvars["bgp_interfaces"] %}
        neighbor {{i}} interface remote-as external
{% endfor %}
        address-family ipv4 unicast
          redistribute connected
    '';
  };

enp0s20f2 and enp0s20f3 on each node are directly connected to the other node.

the interfaces.{{i}}.useDHCP = false; bit by the way is the trick to get interfaces to come ‘up’ with no ip in nixos.


With all that in place, vtysh will show us routes to the other nodes:

vtysh <<<'show ip route'
...
B>* 10.0.0.2/32 [20/0] via fe80::260:e0ff:fe8a:2ca3, enp0s20f2, weight 1, 00:26:17
  *                    via fe80::260:e0ff:fe8a:2ca4, enp0s20f3, weight 1, 00:26:17

And ip r will show ecmp routes installed in the kernel:

[root@5cbb4880-da5b-5fb6-b685-4acb5ed54f34:~]# ip r
default nhid 32 via inet6 fe80::290:bff:fea5:e2d0 dev enp0s20f0 proto bgp metric 20
10.0.0.0 nhid 32 via inet6 fe80::290:bff:fea5:e2d0 dev enp0s20f0 proto bgp metric 20
10.0.0.1 nhid 49 proto bgp metric 20
	nexthop via inet6 fe80::260:e0ff:fe8a:2e06 dev enp0s20f3 weight 1
	nexthop via inet6 fe80::260:e0ff:fe8a:2e05 dev enp0s20f2 weight 1
172.30.190.0/24 nhid 32 via inet6 fe80::290:bff:fea5:e2d0 dev enp0s20f0 proto bgp metric 20

And heres iperf showing almost 2gbps:

[root@50ae2383-89c0-547c-9f56-5df657915a83:~]# iperf -c 10.0.0.2 -P4 -t2
Connecting to host 10.0.0.2, port 5201
[  5] local 10.0.0.1 port 35648 connected to 10.0.0.2 port 5201
[  7] local 10.0.0.1 port 35662 connected to 10.0.0.2 port 5201
[  9] local 10.0.0.1 port 35678 connected to 10.0.0.2 port 5201
[ 11] local 10.0.0.1 port 35694 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  29.0 MBytes   243 Mbits/sec    0    222 KBytes
[  7]   0.00-1.00   sec  57.4 MBytes   481 Mbits/sec    0    284 KBytes
[  9]   0.00-1.00   sec   114 MBytes   959 Mbits/sec    0    399 KBytes
[ 11]   0.00-1.00   sec  29.2 MBytes   245 Mbits/sec    0    222 KBytes
[SUM]   0.00-1.00   sec   230 MBytes  1.93 Gbits/sec    0

If I plug in more links, we get more routes - no config change:

[root@5cbb4880-da5b-5fb6-b685-4acb5ed54f34:~]# ip r show 10.0.0.1
10.0.0.1 nhid 60 proto bgp metric 20
	nexthop via inet6 fe80::260:e0ff:fe8a:2e06 dev enp0s20f3 weight 1
	nexthop via inet6 fe80::260:e0ff:fe8a:2e05 dev enp0s20f2 weight 1
	nexthop via inet6 fe80::260:e0ff:fe8a:2e04 dev enp0s20f1 weight 1
	nexthop via inet6 fe80::260:e0ff:fe8a:2e02 dev enp4s0 weight 1

On the gateway, the config is jusy slightly different:

  services.frr = {
    bgpd.enable = true;
    config = ''
      log syslog
      debug bgp
      frr defaults datacenter

      router bgp {{hostvars["as_number"]}}
        bgp router-id {{hostvars["loopback"]}}
        bgp fast-convergence
        bgp bestpath compare-routerid
        bgp bestpath as-path multipath-relax
{% for i in hostvars["bgp_interfaces"] %}
        neighbor {{i}} interface remote-as external
{% endfor %}
        network 0.0.0.0/0
        address-family ipv4 unicast
          redistribute connected
    '';
  };

network 0.0.0.0/0 is added to inject the default route into bgp.

I also had to add internalIPs to my nat config to successfully translate traffic sourced from 10.0.0.1, 10.0.0.2:

  networking.nat = {
    enable = true;
    externalInterface = "enp2s0f0";
    internalInterfaces = [ "br0" ];
    internalIPs = [ "10.0.0.0/24" ];
  };

Nathan Hensel

on caving, mountaineering, networking, computing, electronics


2024-12-23