Advanced DNS Analytics and Measurements
The aim of the ADAM project is to develop software tools and methods for acquiring, storing, visualizing and analyzing DNS traffic.
The Domain Name System (DNS) is the most fundamental middleware service in today's Internet. Virtually every other Internet service uses DNS for translating domain names to IP addresses, and for several other purposes.
CZ.NIC, z.s.p.o. manages and operates the country code top-level domain .CZ, also known as the national domain for the Czech Republic. With approximately 1.33 million second-level domains it is a relatively small ccTLD, yet its DNS servers in 10 countries all over the worlds have to answer, on the average, around 15 thousand queries from DNS resolvers every second.
For operational and security reasons, it is quite important to collect data about DNS traffic – both summary statistics and detailed information about individual DNS transactions. After appropriate processing, such data can be used for a number of purposes, including
- monitoring of DNS servers
- analysis of security incidents and attacks
- effective planning of DNS infrastructure upgrades
- publishing information about DNS zone and registry status
DNS traffic data is a nice example of Big Data, and therefore demands sophisticated approaches in all processing phases – acquisition, storage, visualization and analysis.
The ADAM project tries to address all the above phases. The entire data processing chain is depicted in the following schema:
Information about DNS transactions is extracted directly on DNS servers (top left) and sent to a central location, where it is stored in an Apache Hadoop cluster. The Hadoop database can be queried via the Impala SQL interface. Data from Hadoop as well as other sources, such as the domain registry and DNS crawler, is processed with R and other tools and stored in the service database. This database can be queried using SQL, but it also exposes all available data in a REST API. The latter can either be accessed directly as open data or used e.g. for visualization.
Software tools and methods developed by the ADAM project are primarily intended for internal use in CZ.NIC, but they are all public and open-source, so that other interested parties can adopt them and possibly modify for their purposes.
The ADAM project consists of several subprojects.
C++ library for generating and processing Compacted-DNS format defined in RFC 8618.
High-speed probe able to process DNS traffic in real time and generate either C-DNS or Apache Parquet format. The data can then be saved on a local disk or directly transferred to a remote location.
A Python program that takes a list of DNS domains and performs predefined DNS queries and other actions on each domain. The tool is designed to run multiple queries in parallel so that processing the whole .CZ zone is a matter of hours on a commodity hardware.
Interface for machine-based access to data in the service database.