Modern distributed systems need more than just monitoring. The growing importance of observability is in direct proportion to the explosion in data that is passing across today’s networks, and the demand for a better understanding of how it can be used, learned from and actioned. To this end we have been rethinking edge network observability and the role it plays in one of the greatest challenges of internet connectivity — extracting business intelligence, writes Shannon Weyrick, Vice President of Research at NS1 Labs
When it comes to network observability, data is not particularly useful without context, especially considering the vast range of data encoded in network traffic sources. The aim of such monitoring must be to extract the ‘signal’ from the noise — the key piece or pieces of information that point to something that administrators can or should act on.
Bringing observability to the edge
We deliver our service to customers across a global edge network. Back in 2015, we experienced a DDoS attack that brought to light some gaps in our network visibility. This prompted us to develop pktvisor – an observability agent that could watch internet traffic, which we made available as an open-source project.
Our customers face the same challenge. They need a better understanding of what is happening on their networks, and specifically, for our customers, traffic across their DNS. Often they have an unclear understanding of where the traffic is coming from, if it’s legitimate, or which records it relates to. pktvisor addresses this by supplementing existing metrics with deeper insights to help understand traffic patterns, highlight query and response details, and identify malicious or unintended traffic.
With the advent of technology developments such as IoT, edge compute and high throughput stream processing concepts, we have been able to enhance the pktvisor edge sensor by giving it an edge control plane. Out of this we are now launching a new solution called Orb – an open source dynamic edge observability platform.
Orb is extensible, vendor-neutral, and cloud-native, and it helps organisations to understand their networks, distributed applications and traffic-flows in real-time.
Rather than sending all edge data to Orb for central batch processing, extraction of signals takes place directly at the edge through the pktvisor agent, which can dynamically tap into network traffic streams. Using high throughput stream processing algorithms, pktvisor conducts deep analysis on data at the edge and then uses that information to extract the important signals – resulting in distributed business intelligence. These aggregated signals are then sent at a selected interval to Orb, and may include information such as top queries, top result codes, traffic rate percentiles or any number of other counters and statistics.
We call this approach “small on data, big on information” because it lets Orb users collect the needles in the haystack without sifting through all the hay.
Extracting the ‘signal’ from the noise
If we think back to where this started, DDoS mitigation is an excellent use case for Orb because it allows us to rapidly identify a set of signals that point to a DDoS attack taking place. Such an attack could be happening at multiple layers in the infrastructure and it needs to be found at speed. Collecting data alone cannot achieve that goal. It falls short of being able to be described as ‘observed’ if it has not led to an insight or action, whether automated or reviewed by a human. In fact, a human review of the raw data without context would be impossible in a timely manner and is unlikely to yield any viable action.
Network observability administered through Orb has a number of other benefits:
- Data sets are smaller and higher quality. This means they can be acted on faster and retained longer.
- Post-processing and automation are faster. Because the data we do not need has already been removed and information summarised at the edge, getting to insights and action takes less time.
- Localised, real-time action. Where signals are identified at the edge, action can be taken at that point autonomously, removing the round-trip post-processing to the central system.
- Global, near real-time signal available centrally. A global view of network-wide signals and observations is available in near real-time.
- Collection systems are more reliable. This is because the workload is shared across the endpoints at the edge. In addition, because we are not collecting the raw data, the information collected and processed by Orb is not a function of the traffic entering the networks, but rather a function of the number of agents in the system. This means that large spikes in traffic do not put pressure on downstream processing.
A new level of network observability control
Orb is database agnostic and designed to integrate with a host of network observability stacks that exist in the market today. This means that operators can use their preferred visualisation and database tools (such as Prometheus and Grafana) to understand and analyse the signals being collected, whether on-premises or on a SaaS platform. Each agent has a local web-server, and simple web queries are used to pull data (in JSON or Prometheus format) from the pktvisor agents into Orb.
One of the key factors that differentiates Orb is that it not only provides a way to collect data from the pktvisor network observability agents but can act as a control plane for a fleet of agents using the same principles as IoT systems. Users can develop policies using the Orb UI or directly through the Orb API, and these are then pushed out to agents. This means that the agents can be reprogrammed in real-time to monitor different traffic, and the resulting summaries can be sent to dynamic locations.
Future Innovation for Open Source
Orb brings to the open source community a new approach to network observability, allowing network operators to understand and utilise real-time data sets to gain new insights, protect their networks, and improve performance. Traffic pattern analysis, load balancing, domain record management and threat detection are all areas that will benefit from the architecture.
Orb is not a replacement for the big data approach to observability, which certainly still has a role to play in areas such as collecting data sets for machine learning training. But the complexity of network observability across the typical modern enterprise stack needed simplification. Orb, along with our other open source projects, address some of the biggest challenges and we are eager to work with the open source community to add new data sources, features, and capabilities that add value for users.
Note from The Stack: NS1 is on track to have both a self-hosted version and a SaaS version available for initial testing in October. Those wanting to get involved (whether through feature requests, use case discussions or more can file file an issue, follow the public work board, start a discussion on GitHub.)