If ‘data is the new oil’ then organisations need to be aware of oil spills. Exposed data like employee credentials is responsible for 54% of all attacks on United States federal agencies, according to CISA – and poor controls around customer data can result in colossal fines for those who failed to stay attuned to the global regulatory regimes that govern them.
The flipside of this story: From global fast-food chains to pharmaceutical providers and multinational banks, putting first-party data to powerful use is a priority of CDOs and CIOs alike – not least with pressure mounting to unlock value from generative AI, something that requires harmonised and cleaned-up datasets from across these heterogeneous regulatory environments, as well as diverse IT databases and applications.
BigID’s Brenton Gumucio cares personally about security – a career in the Marine Corps, then working on secure authentication processes with the National Geospatial Intelligence Agency means it’s more than just a job.
Enter BigID...
BigID, a rapidly growing startup that names several large enterprises and retail giants among its customers, finds, analyses and ‘de-risks’ this identity data.
Whether that’s exposed Personally Identifiable Information that could cause a GDPR (or Executive Order 14117) incident, or exposed passwords, private keys, and security tokens that are manna to the cybercriminals wreaking sustained havoc on major industries around the world.
As BigID Regional Vice President (RVP) Brenton Gumucio told The Stack: “We accurately identify and classify and categorise data at large scale for really complex organisations. This could be a large retailer looking for PII for regulatory compliance use cases, or pre-sanitising data that's going into AI or LLM models to make sure that no sensitive data gets uploaded.
“This is all about reducing risk – and where to derive value from data.”
BigID uses machine learning and other patented data mapping tools to discover and understand the context around personal data. It takes an approach based around its “4 Cs” of “catalog, classification, cluster analysis, and correlation” across not just traditional structured data like SQL database rows, but also semi-structured, and unstructured data.
Enter MongoDB…
It’s been able to build out that capability in large part because of the database that underpins it. BigID relies on MongoDB Atlas.
Sitting down with The Stack, Gumucio goes as far as to say that MongoDB’s is invaluable, noting “it’s not just an application database, it's also an analytical layer on top of the database.”
He adds: “We have a large European bank, where we're scanning all of their mailboxes for 800 million people's attributes and entities; hundreds of millions of entities… All of that data gets pushed back into MongoDB, and we do aggregate queries, and sorting and functions to derive insights and [understand] how sensitive this data is on top of MongoDB…”
See also: Fintech CTO Nick Fryer on how Dojo rebuilt in the cloud to disrupt the payments market
The company chose MongoDB early on because of its incredibly flexible document model, he adds. Unlike SQL databases that require users to declare a table's schema before inserting data, MongoDB does not require its documents (data records stored in its unique “BSON” format) to have the same schema. i.e. Powerfully, for customers like BigID that need to adapt to different enterprise’s data discovery and mapping requirements, these document structures can be flexibly and dynamically updated, for example to map new data records to an entity or an object.
"MongoDB has a data model that allows us to continually iterate, adapt, and update."
Gumucio said: “The relationship we have with MongoDB [is one of] great partners. We couldn't do what we do at the scale that we do it on a traditional database. Most databases were created when people listen to cassettes on a Walkman. Mongo came out with a data model that allows us to continually iterate, adapt, and update.
“We can add new functionality, change our data model with the times and run super complex queries on top of it, to do a lot of the heavy lifting that we need.”
See also: How automakers are breaking down siloes to fuel innovation with MongoDB
Gumucio himself knows a few things about iteration and adaptation.
Despite being a hands-on engineer himself and speaking with the fluency of someone deeply comfortable with both BigID’s and MongoDB’s platforms, his was not the traditional journey into technology.
As he explains modestly to The Stack, he had “probably the most unorthodox journey to BigID that you can have” – leaving highschool to join the Marine Corps; then later joining the French Foreign Legion.
Returning from some overseas stints (that no-doubt deserve a lengthy article in their own right) and looking to up-skill, he took an intensive coding bootcamp (“seven months: coding day-in, day-out…” he recalls)
A handful of existing security passes from his military career plus his new-found engineering chops came in handy. Gumucio wound up doing software engineering for the National Geospatial Intelligence Agency and the FBI via a technology consultancy focused on Identity and Access Management (IAM), before ultimately joining rapidly growing BigID.
People are sending passwords in Slack, etc... It's a very real data risk that is ungoverned by traditional tools.
“In 2019 I read this story about a company called BigID that at the time was solely serving privacy (GDPR, CCPA, CPRA and all of that)” he recalls.
See also: How SEGA’s Felix Baker delivered a data transformatiom with MongoDB
“But in the identity access management space, managing entitlements is complicated. No one really knows who has access to what; there’s often these very manual certifications that have to happen quarterly, or annually where people attest to ‘Brenton needs access to this data.’
“That made me really want to make the jump to BigID as I knew it was only a matter of time [before security also became a core part of its capabilities.] We've seen that come to fruition: Data context is now an incredibly big part of the security organisation in reducing risk,” he adds.
“People are sending messages on Slack or Teams” he explains, for example. “That represents a very real data risk that is ungoverned by traditional tools. We can iterate over scanned messages, identify where maybe you and I are in the same Scrum team, and I’m asked ‘Hey, can you send me the password for this database?’ That's what we find, identify, help protect and reduce risk from” he says – and doing it at global enterprise scale would not be possible without that close partnership.