This is a follow up on my investigation into Broadcast OSPF on nixos yesterday.
A year ago I was researching bgp unnumbered on debian - and now that I’ve built up enough infrastructure to rapidly deploy and configure nixos boxes, its time to revisit it on nixos.
BGP unnumbered remains in my opinion the holy grail of resilient, high performance host networking. We get true multipathing and fast convergence if a cable is pulled - without dropping TCP connections - all without L2 ‘hacks’ such as floating ips, stp, lacp, or mlag. ‘Unnumbered’ means participating interfaces don’t need ip addresses; hosts simply have /32 loopback addresses.
Why does all this matter? Every line of configuration costs something. With bgp unnumbered, we get a lot of advanced functionality for extremely minimal configuration.
In this lab, im deploying two nodes with two direct links to the other node. They are each connected with a single link to a gateway router.
Heres the inventory im using with nixa:
routers:
hosts:
10.0.0.1:
loopback: 10.0.0.1
as_number: 64790
bgp_interfaces:
- enp4s0
- enp0s20f0
- enp0s20f1
- enp0s20f2
- enp0s20f3
10.0.0.2:
loopback: 10.0.0.2
as_number: 64791
bgp_interfaces:
- enp4s0
- enp0s20f0
- enp0s20f1
- enp0s20f2
- enp0s20f3
templates:
- bgp.nix
nix-channel: nixos-24.11
And here are the relevant bits of the bgp.nix template:
networking = {
hostName = "{{attrs.hostname}}";
nameservers = [ "1.1.1.1" ];
firewall.enable = false;
dhcpcd.enable = false;
interfaces.lo.ipv4.addresses = [{
address = "{{hostvars["loopback"]}}";
prefixLength = 32;
}];
{% for i in hostvars["bgp_interfaces"] %}
interfaces.{{i}}.useDHCP = false;
{% endfor %}
};
boot.kernel.sysctl = {
"net.ipv4.conf.all.forwarding" = 1;
"net.ipv4.fib_multipath_hash_policy" = 1;
"net.ipv4.fib_multipath_use_neigh" = 1;
};
services.frr = {
bgpd.enable = true;
config = ''
log syslog
debug bgp
frr defaults datacenter
router bgp {{hostvars["as_number"]}}
bgp router-id {{hostvars["loopback"]}}
bgp fast-convergence
bgp bestpath compare-routerid
bgp bestpath as-path multipath-relax
{% for i in hostvars["bgp_interfaces"] %}
neighbor {{i}} interface remote-as external
{% endfor %}
address-family ipv4 unicast
redistribute connected
'';
};
enp0s20f2
and enp0s20f3
on each node are directly connected to the other node.
the interfaces.{{i}}.useDHCP = false;
bit by the way is the trick to get interfaces to come ‘up’ with no ip in nixos.
With all that in place, vtysh will show us routes to the other nodes:
vtysh <<<'show ip route'
...
B>* 10.0.0.2/32 [20/0] via fe80::260:e0ff:fe8a:2ca3, enp0s20f2, weight 1, 00:26:17
* via fe80::260:e0ff:fe8a:2ca4, enp0s20f3, weight 1, 00:26:17
And ip r
will show ecmp routes installed in the kernel:
[root@5cbb4880-da5b-5fb6-b685-4acb5ed54f34:~]# ip r
default nhid 32 via inet6 fe80::290:bff:fea5:e2d0 dev enp0s20f0 proto bgp metric 20
10.0.0.0 nhid 32 via inet6 fe80::290:bff:fea5:e2d0 dev enp0s20f0 proto bgp metric 20
10.0.0.1 nhid 49 proto bgp metric 20
nexthop via inet6 fe80::260:e0ff:fe8a:2e06 dev enp0s20f3 weight 1
nexthop via inet6 fe80::260:e0ff:fe8a:2e05 dev enp0s20f2 weight 1
172.30.190.0/24 nhid 32 via inet6 fe80::290:bff:fea5:e2d0 dev enp0s20f0 proto bgp metric 20
And heres iperf showing almost 2gbps:
[root@50ae2383-89c0-547c-9f56-5df657915a83:~]# iperf -c 10.0.0.2 -P4 -t2
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 35648 connected to 10.0.0.2 port 5201
[ 7] local 10.0.0.1 port 35662 connected to 10.0.0.2 port 5201
[ 9] local 10.0.0.1 port 35678 connected to 10.0.0.2 port 5201
[ 11] local 10.0.0.1 port 35694 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 29.0 MBytes 243 Mbits/sec 0 222 KBytes
[ 7] 0.00-1.00 sec 57.4 MBytes 481 Mbits/sec 0 284 KBytes
[ 9] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 399 KBytes
[ 11] 0.00-1.00 sec 29.2 MBytes 245 Mbits/sec 0 222 KBytes
[SUM] 0.00-1.00 sec 230 MBytes 1.93 Gbits/sec 0
If I plug in more links, we get more routes - no config change:
[root@5cbb4880-da5b-5fb6-b685-4acb5ed54f34:~]# ip r show 10.0.0.1
10.0.0.1 nhid 60 proto bgp metric 20
nexthop via inet6 fe80::260:e0ff:fe8a:2e06 dev enp0s20f3 weight 1
nexthop via inet6 fe80::260:e0ff:fe8a:2e05 dev enp0s20f2 weight 1
nexthop via inet6 fe80::260:e0ff:fe8a:2e04 dev enp0s20f1 weight 1
nexthop via inet6 fe80::260:e0ff:fe8a:2e02 dev enp4s0 weight 1
On the gateway, the config is jusy slightly different:
services.frr = {
bgpd.enable = true;
config = ''
log syslog
debug bgp
frr defaults datacenter
router bgp {{hostvars["as_number"]}}
bgp router-id {{hostvars["loopback"]}}
bgp fast-convergence
bgp bestpath compare-routerid
bgp bestpath as-path multipath-relax
{% for i in hostvars["bgp_interfaces"] %}
neighbor {{i}} interface remote-as external
{% endfor %}
network 0.0.0.0/0
address-family ipv4 unicast
redistribute connected
'';
};
network 0.0.0.0/0
is added to inject the default route into bgp.
I also had to add internalIPs
to my nat config to successfully translate traffic sourced from 10.0.0.1, 10.0.0.2:
networking.nat = {
enable = true;
externalInterface = "enp2s0f0";
internalInterfaces = [ "br0" ];
internalIPs = [ "10.0.0.0/24" ];
};