This is a new look at Intrusion detection and monitoring DNS network logs. Armed with hundreds of thousands of archived traffic patterns associated with benign and malicious network traffic, we’ll unravel the predictable to sporadic. Then, using state of the art technologies scale out Bayesian simulations, model the underlying traffic distributions to millions of domain names seen in DNS logs, providing new insight to old attacks.
We begin by discussing methods for performing large scale Bayesian inference on DNS logs aggregated into count data, representing the number of requests from tens of millions of stub IPs made to hundreds of millions of domains. We describe novel mixtures of common discrete distributions, or hidden Markov processes, that model some of the most sporadic network traffic volumes to domain names. For example, we discuss how the zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) distributions, and their more generalized forms, provide parameters we can use to differentiate traffic volumes associated with day-to-day threats from spam and malvertising to widespread threats arising from botnets.
Using Apache Spark and Stripe’s newly released Rainier- a powerful Bayesian inference software for the JVM- we run tens of thousands of simulations per domain, fitting the underlying distribution of requests, then repeating this for millions of domains. We profile the performance by fitting a variety of mixtures of distributions to different sporadic traffic volumes. Running simulations often, we then show how to efficiently trend parameter estimates using exponential moving averages to model day/night and weekday/weekend traffic distributions. With hundreds of thousands of simulated and archived traffic patterns associated with benign and malicious network traffic, we show how to reduce false alarms to effectively monitor evolving online threats and masquerading malicious traffic.