nftables vs pf IPv4 filtering tests as a firewall
Servicio de la Universitat de València

This is a basic benchmark that is not close to the real world.

That's because in "real world" most of the packets that goes throuh a firewall are related to a previous connection and here I'm testing packets no related to a previous connection.

All the packets must go through all the ruleset until last rule.

Linux vyos 1.2.3 with nftables

Linux only forwards 6Mpps without any rule instead of 12Mpps that can forward FreeBSD.

That seems to be a mellanox driver queue balancing because it only uses 12 cores at 100% and the other cores are idle. FreeBSD tests uses all 24 cores.

The number of cores that use linux depends of the range of source IP/destination IP in a test. Using 10 IPs in the range I only get 6 cores but adding more IPs in the range I got more cores used.

I think that in a real world with a lot of source IP /destination IP queues wil be better ballanced.

The performance of forwarding packets depends hightly on the amount of rules, and only with 1000 rules the performance drops to 250kpps and with 10K rules drops to 21Kpps.

Note that any packet must go through all ruleset, that's not a common scenario because established and related connections should match at the first rule.

vyos_rules_pps.png

BSDRP 1.96 using pf

Withouth any rule I got 12Mpps, FreeBSD queue balancig works really well, all 24 cores are close of 100% os use.

But as you can se when pf is enabled the throughput drops to 1.8Mpps, it doesn't matter the amount of rules that you have. So if your ruleset is quite small you may choose nftables.

By the other hand the amount of rules doesn't seem to affect pf.

I've done a test to compare between "set ruleset-optimization" "none" or "basic" but no diferences in performance.

Pf not evaluate all the ruleset for each packet(as test packets are udp tcp related rules are not evaluated) 1000 rules counters

bsdrp_rules_pps.png

nftables vs pf / rules vs packets per second

As you can see, linux nftables has better performance filtering than FreeBSD pf if you have less than 100 rules.

bsdrp_vyos_rules_pps.png

Who is the winner?

IMHO there are no winner clearly, it depens of the number of rules and the amount of packets that traverse your ruleset vs the amount that belongs to a established connection.

By the way not all packets must traverse all the ruleset, if you have 1000 rules some packets will match one of the firsts rules and others will match on of the latest rules.

Description

The device under test is dual socket Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz with dual port MEllanox connectx-4 Mellanox Technologies MT27630 Family

-------------------------------
| SOURCE HOST                 |
| IP:   198.18.0.110          |
| ARP:  90:1b:0e:43:c6:3b     |
-------------------------------
              |
-------------------------------
|       cisco 9200            |
-------------------------------
              |
-------------------------------
| IF:   eth2                  |
| MAC:  98:03:9b:af:11:18     |
| IP:   198.18.0.22           |
| net:  198.18.0.0            |
| mask: 255.255.255.0         |
|                             |
|         THIS ROUTER         |
|                             |
| net:  198.19.0.0            |
| mask: 255.255.255.0         |
| IP:   198.19.0.22           |
| MAC:  98:03:9b:af:11:19     |
| IF:   eth3.4001             |
-------------------------------
              |
-------------------------------
|       cisco 9200            |
-------------------------------
              |
-------------------------------
| DESTINATION HOST            |
| IP:   198.19.0.110          |
| ARP:  90:1b:0e:43:c6:3c     |
-------------------------------

PCI info

dev.mlx5_core.1.hw.board_id: FJT2420110034
dev.mlx5_core.1.hw.fw_version: 14.22.4020


mlx5_core0@pci0:94:0:0: class=0x020000 card=0x009215b3 chip=0x101515b3 rev=0x00 hdr=0x00
    vendor     = 'Mellanox Technologies'
    device     = 'MT27710 Family [ConnectX-4 Lx]'
    class      = network
    subclass   = ethernet
mlx5_core1@pci0:94:0:1: class=0x020000 card=0x009215b3 chip=0x101515b3 rev=0x00 hdr=0x00
    vendor     = 'Mellanox Technologies'
    device     = 'MT27710 Family [ConnectX-4 Lx]'
    class      = network
    subclass   = ethernet

Packet generator

Generator device is a bsdrp 1.9 with intel 10Gb NIC.

It generates 60 bytes UDP packets from 198.19.10.1 to 198.19.10.250 port 2000

GENERATOR
pkt-gen -i ix0 -f tx -n 1000000000 -l 60 -d 198.19.10.1:2000-198.19.10.250 -D 98:03:9b:af:11:18 -s 198.18.10.1:2000-198.18.10.254 -w 4 -U

RECEIVER
pkt-gen -i ix1 -f rx -w 4

volver