The modern enterprise runs on a complex web of information services, applications and data. So interwoven are the threads within this digital layer that the industry has popularised the term ‘data fabric’ to express the textural nature of the technology architectures now being deployed, writes Shawn Rogers, VP Analytics Strategy, TIBCO
If we accept the fact that any given business will operate with different forms of data (be it transactional, time-stamped, geo-tagged, machine data or structured, semi-structured and unstructured) across multiple locations, all emanating from and feeding into a variety of applications and services, then a data fabric is a means of coalescing all those resources into a single, unified view of the data estate.
Follow The Stack on LinkedIn
A leading analyst firm explains a data fabric as a ‘design concept’ for attaining reusable and augmented data integration services, data pipelines and semantics for flexible and integrated data delivery. Used effectively, a data fabric can enable data management and integration to be delivered across multiple deployments and orchestration platforms.
The benefits that come from using a data fabric include faster results and a more automated level of data access and sharing throughout an organisation. Combined with new and sometimes esoteric approaches to data management (such as DataOps, to plug the DBA function more directly into Ops-operations), a data fabric can work with still-nascent technologies like semantic enrichment and AI/ML-assisted active metadata or knowledge graphs.
The ‘so what’
Of course, the litmus test for any technology that operates at this kind of functional substrate level is the ‘so what’ factor. Data fabrics work to help shape data management and integration, but so what?
The real ‘so what’ here is the fact that modern IT systems have to channel an increasingly complex array of data sources. Traditional database channels are now joined by machine data streams from the Internet of Things (IoT) and all the log file data that we now seek to work with as we increase the volume on our approach to the digital instrumentation of devices at all levels.
What all that comes down to is one word: expansiveness. Modern enterprise systems have to shoulder the management burden resulting from the creation of an expansive array of data. More users, more devices, more machine information streams and more third-party data sources all forming neural connections to an increasing diversity of business-critical use cases. This creates expansiveness.
Data fabrics are inherently not small, they are expansive by their nature and by definition. So this factor is important when we think about the need to be able to draw in data from an expansive range of sources to then combine and transform it. Some of it will be data-in-motion and some will be data-at-rest – and a lot of it will be diverse, distributed and disaggregated. Using a data fabric enables us to wrap ourselves around that breadth and complexity of data, even when it is expansively expansive.
Data fabric validation... why should I?
So why does any given business need a data fabric? Simply put, a data fabric is a direct route to reducing risk. In a world of highly distributed and diverse data silos that have overstretched current data management, integration and delivery processes and broken traditional data architectures, risk is a real factor.
If business users perceive and know that there is less risk in using an application or service that draws upon a trusted set of data, then they are more easily able to make mission-critical decisions based upon those technologies. In a world where data compliance and privacy regulations require greater data governance and security than ever before, no-risk (or quantifiably low-risk) data can be a prime mover advantage. But what’s really different here is that this is not just an IT architectural play. We’ve built distributed architectures before. But just creating any form of new connective tissue to thread an IT system together does not create the perfect beast. We know that distributed architectures can create a complex web of interconnected data, but that doesn’t accommodate aspects of data engineering such as looking after data quality or ensuring data security and privacy.
We need to be able to bring together relevant data sets to provide the right combination of information streams to the right applications at the right time. This is where AI and ML delivered through the data fabric can provide the difference. We can train data models over time by exposure to data streams and enable AI and ML to get smarter; but before that, we can start to perform ‘data discovery’ processes to look for what data occurrences and instances could and should be built into the logic of the system that we are looking to create.
Will it hurt?
We must remember that a data fabric is not a single product. It combines an integrated collection of data management and data integration capabilities, as well as shared data assets, deployed in support of a distributed data architecture. It is not a rip-and-replace effort and it is not a cookie cutter implementation for the business to apply ‘carte blanche’ across its operational base.
A business might start by looking at the aspects of the data fabric that relate to its customer domain as a more strategic first-stage approach, which could allow some ‘Proof of Concept’ innovation to go on inside the data team itself.
Organisations will naturally wonder whether it is painful and expensive to implement a data fabric. The most prudent answer is: perhaps a little, at first... but the process itself gets exponentially easier from day one onwards. A robust data fabric helps an enterprise connect to multiple data sources with pre-built connectors and pre-packaged components. It also supports wider use of data sharing and built-in data preparation and data quality controls. Add this to the data fabric’s more holistic ability to deliver data governance and it’s easier to see why the Return on Investment is solid.