Survey of existing DNS tools

ToolUDPTCPTLSPipeliningUses query fileReplay pcapsComments
dnsperf from DNS-OARCYNNNYNhttps://github.com/DNS-OARC/dnsperf
resperf from DNS-OARCYNNNYN

For testing resolvers

https://github.com/DNS-OARC/dnsperf

dnsperf-tcpYYNYYNhttps://github.com/Sinodun/dnsperf-tcp
dnsperf-tlsYYYYYNhttps://github.com/Sinodun/dnsperf-tcp/tree/feature/tls_openssl. A re-factor was required to accomodate TLS usage within the threading model used here and we believe this introduces a performance overhead at very low queries per connection (below 500). This is being investigated.
FlamethrowerYYYYYNhttps://github.com/DNS-OARC/flamethrower
tcpscalerYYYYYN


https://github.com/jonglezb/tcpscaler


perftcpdnsNYNNY - in hex formatNA performance testing tool for DNS over TCP, available in the contrib directory of all recent versions of BIND9
queryperf++YYNN

Opensource framework for testing DNS servers that uses both UDP and TCP: https://github.com/jinmei/queryperfpp
DroolYYN?YYdrool can replay DNS traffic from packet capture (PCAP) files https://github.com/DNS-OARC/drool
dns-benchmarking/YNNNY?https://gitlab.labs.nic.cz/knot/dns-benchmarking/tree/master
ISC Performance labYNNNY?https://github.com/isc-projects/perflab

Investigation of HTTP benchmarking tools

Survey

We've also looked a some tools to see if we can reuse anything from the HTTP measurement world to help with benchmarking of Dot or DoH.

As a minimum requirement a tool must be capable of making test appear to come from at least 1000 individual clients (aka virtual users or VU in web server performance speak) but this would still require some orchestration to reach (ideally) ~30,000 VU per test VM to minimise the total number of test VM needed.

Others surveyed but not tested: 

Work on Tsung

After further work on Tsung we concluded that peak traffic generation for single client instance of 30k clients was limited to 100kqps, as discussed below. This would mean many client VM's would be needed to achieve moderate DNS traffic levels. We suspect the fundamental reason behind this is twofold:

  1. Each session is scripted, so you can mimic real traffic. But that scripting is at some level interpreted.
  2. HTTP testing happens at a lower order of magnitude of connections and traffic than DNS. Hitting a website with 10,000 HTTP GETs down the same connection just doesn't really happen.

We did do some development work on it to add DNS as a proof of concept, the result can be found in the dns-client branch in this GitHub repoIt proved reasonably easy to get a DNS plugin doing synchronous queries working over UDP, TCP and TCP/TLS. Even taking successive lookups from a queries file as used by dnsperf. Our specific observations are:

So, the problem is that we're unable to use Tsung to generate baseline performance numbers to compare with existing numbers. While it might (after some work) be useful for real-life type scenarios, there are concerns about how well it will scale horizontally (though that may well be the dynamic variable issue above). The original author does designed for many parallel runs of complex sessions - so, probably good for real life tests, but a problem if you're trying to establish a baseline of maximum queries a server can handle.