The Kubecon caravan rolled into London this week with the impact of AI on observability dominating the first tranche of keynotes at the cloud native fest.

That’s not to say it was panic stations over how enterprises are focusing attention – and budget – on AI rather than cloud native projects.

Rather, it was a more realistic appraisal of where genAI and LLMs start fitting into the practical development and operation of cloud platforms and applications.

Vijay Samuel, principal MTS, architect, at ebay explained how rather than handing responsibility to AI, his team had developed explainers for telemetry signals.

Telemetry was critical at the company, he explained, which generates 15 PB of logs per day, and has 10 billion active time series.

“The fundamental problem with the site becoming more and more complex is that as humans, we have limits to how much we can comprehend at any given point in time.”

LLMs initially promised to solve much of this complexity, he said. The problem is, LLMs are probabilistic, which doesn’t sit well with observability.

When he asked ChatGPT to make sense of its Prometheus postings, “I tried, tried, tried, and eventually I learned about what hallucination actually meant, and then I gave up.”

The answer, he said, was “To realize one thing, that the probabilities need to work in our favor. If we prompt in a very deterministic way, give very crisp context, then it becomes a little bit more deterministic.”

“We came to the realization that we need to build what we like to call building block capabilities, capabilities that are of high quality, highly deterministic, that we can confidently rely on.”

So, for example, it built a log explainer, which could analyze log lines and detect error or latent patterns that could need explaining and produce a summary. It produced similar tools for traces, metrics and changes.

The more data engineers piled into an LLM, Samuel said, “the more it will hallucinate over time.”

So, he said, “We leveraged LLMs for what they are actually good at, which is to summarize, do simple reasoning and then explain specifically the critical path.” Though he added, the technology was advancing in leaps and bounds and “If I walk down the stage, something new would have come up.”

Christine Yen, CEO of observability vendor Honeycomb.io also discussed how introducing LLMs into software stacks upended observability workflows and key tasks such as testing and debugging.

“LLMs take some of the ways that we're used to ensuring consistent, reliable behavior and make them a little more difficult,” she said. As they are probabilistic, testing was more problematic and debuggable with such a “long tail of possible inputs.”

But she pointed out, there had always been “black boxes” in software architectures, such as APIs, and the industry had developed other measures to work around these, such as service level objectives or evaluations.

So, Yen said it was possible to leverage these approaches when working with LLMs. “Getting good observability into these systems is all about systematically tracking the inputs and outputs.”

“By capturing all of that, I can start to reason about how the inputs impact the outputs of my black box, how my application, my business logic, impacts all of that, and ultimately the impact on the experience the end user is having.”

Which might be reassuring for those companies that only just getting their head around open source and cloud native. And there are plenty of those, which is why the CNCF took the wraps off additional certifications aimed at enterprises, including Certified Open Developer on the Enterprise Code Program. This is aimed at providing “enterprise developers with essential open source training.

CNCF CTO Chris Aniszczyk said, “It's really meant for enterprises to try to figure out how to navigate open source usage some of the crazy upcoming regulations that are coming how to deal with licensing issues.” He added, “It’s sorely needed.”

Join peers following The Stack on LinkedIn

The link has been copied!