Skip to content

Search the site

Zscaler is using 3 TRILLION customer logs weekly to train AI

CEO: "These transactions generate a vast quantity of proprietary logs that feed our massive data lake..."

Security firm Zscaler is processing over three trillion logs from customers’ IT estates every week and using them to train its defensive AI systems.

That’s according to CEO Jay Chaudhry, as he revealed that the company expects to hit a record $3 billion in ARR in its just-starting fiscal 2025.

The data runs through Zscaler's "Zero Trust Exchange" platform which handles the security of 47 million users across nearly 8,700 customers.

That's despite what he admitted was a challenging macro environment with continued scrutiny of cybersecurity spending – and a high-focus on vendor resilience in the wake of Crowdstrike's outage (over 700 of Zscaler's largest customers registered for a recent briefing on its approach to this.)

See also: AWS outage saw “cell management” system get flustered by big shards

Wrapping up Zscaler’s financial year on an earnings call Chaudhry said: “Our cloud platform [is] surpassing over half a trillion transactions daily.

“These transactions generate a vast quantity of proprietary logs that feed our massive data lake… These are complete logs that have structured and unstructured data, including the full URL. We leverage this proprietary data to train AI models that power innovations throughout our platform.”

These are big numbers, but perhaps unclear ones to many. How is the data "proprietary"? Which elements are structured versus unstructured. Is there risk of data exposure, etc.? We asked a helpful engineer to explain. 

Your security logs are now… proprietary data?

They told The Stack: “A firewall log generally has little information above a 5-tuple (Source IP, Destination IP, Source Port, Destination Port, Protocol). 

“It may have some other metadata like AppID (if it's NGFW), but that requires other licenses and hardware, and potentially some information on TLS like the certificate name. [But] As a Proxy, Zscaler has information about all of the traffic.  It intercepts TLS (based on policy) and validates all of the TLS certificates (who issued it, has it been revoked) and have the full URL/URI information.  This would be the metadata about the transaction. 

“With data-protection it scans inline traffic for Data Loss Prevention, as well as traffic out of path using CASB/API.  This data might be unstructured (like a Word document) or could be structured (like a Excel file, or a CSV)."

They added: "Zscaler doesn't store this data, but it stores the metadata about the transaction. For example if someone sends a Word document containing credit card information, which gets blocked by Data Protection policy, it can log the SHA1 hash of the file and transmit the offending document securely to the Data Privacy officer of the customer.

See also: Citi retires 6% of its legacy applications – and 20,000 people

“So it has full log information, and full traceability of all transactions.  It doesn't store customer data (the payload) but stores log information to identify all of the transactions which pass through [its cloud platform].”

Their explanation came after Zscaler continued to build out its secure access service edge (SASE) proposition – launching an SD-WAN offering that brings it up to a level playing field with the likes of CATO Networks as a single-vendor SASE provider.  (SASE, in essence, essentially involves having someone running your network – East to West and North to South – and security as a highly integrated package as a managed service.)

Zscaler's FY2024 by the numbers.

Being able to offer that SD-WAN element where its network and security proposition reaches right down to office branches or industrial hubs has been important for Zscaler, which said a “top 10 pharmaceutical company” had bought this to “protect over 30 manufacturing sites, eliminating the need for firewalls and making each site like a Starbucks.” 

Zscaler’s CEO also revealed that the company now serves 13 of the US government’s 15 executive departments, including the Pentagon, and 40% of the Fortune 500. Among other wins was a large SI with 300,000 users which Chaudhry said is consolidating multiple point products, including secure web gateways, load balances, VPNs, firewalls, and MPLS network.”

For the full year its revenue was $2.17 billion, up 34% year-on-year.

Data centre CapEx was approximately 8% of revenue.

Shares fell sharply though on an outlook that underwhelmed analysts. GAAP profit remains some years out, although Zscaler trimmed net losses to -$57.7 million, from -$202.3 million in fiscal 2023.

Latest