I’ve been wondering if different kernels affect the ip forwarding performance of linux routers, particularly the RT or ‘realtime’ kernel. Nixos makes it really easy to try different kernels, so I tried a few.
The devices under test are two c2758 boxes with lots of 1000gbase-t interfaces. The source and target are c3558 boxes, each has a link to each router. All are running bgp. The c3558 boxes therefore have two routes to eachother:
[root@lanner-c8f0:~]# ip r show 10.0.0.0
10.0.0.0 nhid 101 proto bgp metric 20
nexthop via inet6 fe80::260:e0ff:fe8a:2e02 dev enp2s0f0 weight 1
nexthop via inet6 fe80::260:e0ff:fe8a:2ca0 dev enp2s0f1 weight 1
I’ll be using iperf(3) to measure throughput of small packets:
iperf -c 10.0.0.0 -P8 -u -l36 -b0 -t 60
There is a misconception that to measure 64-byte pps forwarding capability with iperf you can just use -l64
.
This is not the size of the IP packet, this is the size of the UDP payload.
To get a packet with Total Length: 64
, a 36 byte udp payload is correct. Here it is in wireshark:
Internet Protocol Version 4, Src: 172.30.190.200, Dst: 10.0.0.1
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
Total Length: 64
Identification: 0xbdad (48557)
010. .... = Flags: 0x2, Don't fragment
...0 0000 0000 0000 = Fragment Offset: 0
Time to Live: 64
Protocol: UDP (17)
Header Checksum: 0x0818 [validation disabled]
[Header checksum status: Unverified]
Source Address: 172.30.190.200
Destination Address: 10.0.0.1
User Datagram Protocol, Src Port: 57727, Dst Port: 5201
Source Port: 57727
Destination Port: 5201
Length: 44
Checksum: 0x7525 [unverified]
[Checksum Status: Unverified]
[Stream index: 9]
[Timestamps]
UDP payload (36 bytes)
This can also be verified in wireshark in ‘Statistics -> Packet Lengths’. With -l64
, they’re all 92 bytes.
I’ll be using my nixa config management tool to easily reconfigure and bounce the routers:
nixa > nix-shell --run 'python3 nixa --limit datto --action boot'
10.0.0.2 is reachable
10.0.0.3 is reachable
applying template datto.nix to datto: ['10.0.0.2', '10.0.0.3']
10.0.0.2:
---
+++
@@ -9,6 +9,7 @@
nix.optimise.automatic = true;
nix.gc.automatic = true;
system.stateVersion = "24.11";
+ boot.kernelPackages = pkgs.linuxPackages-rt_latest;
networking = {
hostName = "spine-green-476d";
rebuilding NixOS on 10.0.0.2
Rebooting 10.0.0.2
10.0.0.2 is reachable
10.0.0.3:
---
+++
@@ -9,6 +9,7 @@
nix.optimise.automatic = true;
nix.gc.automatic = true;
system.stateVersion = "24.11";
+ boot.kernelPackages = pkgs.linuxPackages-rt_latest;
networking = {
hostName = "spine-blue-db5d";
rebuilding NixOS on 10.0.0.3
Rebooting 10.0.0.3
10.0.0.3 is reachable
Now that we have the right -l
value for 64 byte packets, the Lost/Total Datagrams
column in iperf will correspond 1:1 with 64-byte packets. When you flood a device with UDP in iperf, there will be packet loss - so I’ll be taking the recieved mbits/sec and recieved successful datagrams. The following example would be (25799702-3584655)/60, or 370,250 pps:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
...
[SUM] 0.00-60.00 sec 1.54 GBytes 220 Mbits/sec 0.000 ms 0/25806379 (0%) sender
[SUM] 0.00-60.00 sec 1.32 GBytes 190 Mbits/sec 0.016 ms 3584655/25799702 (14%) receiver
Measurements
First off, heres the standard kernel over the two paths:
384,003 pps
111 Mbits/sec
Heres the standard kernel with one path shut down:
219,324 pps
63.2 Mbits/sec
Heres boot.kernelPackages = pkgs.linuxPackages-rt_latest;
:
377,069 pps
109 Mbits/sec
boot.kernelPackages = pkgs.linuxPackages_5_4
:
382,472 pps
110 Mbits/sec
Heres two direct cables between source and dest, no routers:
360,602 pps
104 Mbits/sec
Conclusion
Looking at the performance of the direct crossover, I’d say I need to find more capable hosts to generate packets. I was surprised to see this, as the 14nm c3558 boxes are a generation newer than the 22nm c2758 routers and have dual channel ddr4 instead of single ddr3. The results here show that these c2758 routers can forward packets at least as fast as these particular endpoints can generate them.