Toward a True Data Fabric for the Modern Enterprise - Whitepaper - Fosfor
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Whitepaper Toward a True Data Fabric for the Modern Enterprise © 2022 Larsen & Toubro Infotech Limited
Toward a True Data Fabric for the Modern Enterprise Toward a True Data Fabric for the Modern Enterprise Global data creation is expected to surpass 180 zettabytes by 2025.¹ It’s an embarrassment of riches, in more ways than one. Despite the vast amount of data that every organization creates, stores, shares, and consumes, the journey from data to decision-making remains a challenge. New research from New Vantage Partners underscores the prevalence of data problems. The firm’s Big Data and AI Executive Survey 2021 revealed that only 48.5% of leading companies are driving innovation with data.² What’s more, only: 41% 39% 30% 24% 24% Compete on Manage data Have a well-articu- Have forged a Have created analytics as a business lated data strategy data culture a data-driven asset for their company organization It’s not just minor players that are failing to become data driven. This particular survey includes major brands, such as AIG, Pfizer, Starbucks, and Walmart. As to what’s holding the world’s leading brands from building data-driven organizations, a close look at the data cataloging and data management ecosystems used by those organizations provides some important insights. In this whitepaper, we’ll examine how the widespread use of multiple data cataloging systems is failing to fulfill data needs and deliver ROI for organizations. We’ll explore how this limitation affects core data personas, as well as data accessibility and ROI across the organization. Finally, we’ll outline current best practices for data cataloging, bringing in the latest research, analyst insights, and solutions to make recommendations for a better approach to building a true data fabric architecture. ¹ https://www.statista.com/statistics/871513/worldwide-data-created/ ² https://www.newvantage.com/thoughtleadership © 2022 Larsen & Toubro Infotech Limited i
Toward a True Data Fabric for the Modern Enterprise Contents Why the Current Data Fabric Landscape Needs a Change 1 Lack of Data Uniformity, Integrity, Access, and Activation Across the Organization 2 Inability to Justify the ROI of Multiple Data Cataloging Tools 2 Data Impermanence 3 Empowering Key Personas on Their Distinct Data Journeys 4 Data Protection Officer (DPO)/Chief Data Officer (CDO) 4 Data Owners 5 Data Engineers 5 Data Scientists and Data Analysts 5 Business Analysts 6 Best Practices for Better Data Fabric Architecture 7 1. Automated Intelligence 7 2. Robust Collaboration 8 3. Data Discovery 8 4. Lineage Analysis 8 5. Data Governance/Protection 9 6. Metadata Curation 9 An Introduction to Fosfor Optic 10 Key Value Drivers 10 What Forrester Has to Say 11 Where Optic Wins 11 Case Study 1: Global Energy Management & Automation 13 Case Study 2: Global Pharmaceutical Major 14 Toward a Unified, Governed, and Collaborative Intelligence System 15 © 2022 © 2022 Larsen Larsen & Toubro & Toubro Infotech Infotech Limited Limited
Toward a True Data Fabric for the Modern Enterprise Why the Current Data Fabric Landscape Needs a Change In theory, the “end-to-end data journey” makes perfect sense. One begins with the raw materials—in most cases, large amounts of data from many different sources— and creates pipelines for turning that data into real business outcomes. In practice, however, those journeys are myriad, and the personas interacting with them vary in terms of expertise, business use cases, and immediate needs. Without the right approach to data fabric architecture, most enterprises soon find that their end-to-end data journeys are broken. They end up using different tools to perform cataloging, data quality, data profiling, dashboarding, and so on. This multi-tool approach to data fabric creates a number of disadvantages, as discussed in the following sections. What is a Data Fabric? Gartner describes a data fabric as “an integrated layer (fabric) of data and connecting processes. A data fabric utilizes continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilization of integrated and reusable data across all environments, including hybrid and multi- cloud platforms.”³ ³ https://www.gartner.com/smarterwithgartner/data-fabric-architecture-is-key-to-modernizing-data- management-and-integration © 2022 Larsen & Toubro Infotech Limited 1
Toward a True Data Fabric for the Modern Enterprise Lack of Data Uniformity, Integrity, Access, and Activation Across the Organization One of the most ubiquitous obstacles to creating a uniform data fabric layer is the data silo. Data silos, to be more accurate: 451 Research found that 25% of organizations have more than 50 distinct data silos, including 39% of organizations that identify as “data-driven.⁴ Why is this such a problem? For one thing, diverse data users rely on data activation to unlock value from data—to develop insights that inform business decisions.⁵ However, fragmented data silos can lead to inconsistencies, inaccuracies, and even problems with data quality. By some estimates, poor data quality alone can lead to an average of $15 million per year in losses.⁶ Additionally, it can be difficult to provide safe, organization-wide access to restricted or governed data. This leads to a lot of back-and-forth communication to grant safe access, which puts yet another drag on the data-to-decisions journey. Inability to Justify the ROI of Multiple Data Cataloging Tools One can make a strong business case for a data cataloging tool. A report from Eckerson Group, suggests that 80% of self-service analysis without a data catalog is spent preparing data for analysis; a data catalog can cut that percentage to 20.⁷ ⁴ https://www.sinequa.com/press/sinequa-releases-451-research-pathfinder-report-information-driven- compliance-and-insight-two-sides-of-the-same-coin/ ⁵ https://www.dnb.com/content/dam/english/economic-and-industry-insight/forrester-b2b-data-activation- priority-2018.pdf ⁶ https://www.gartner.com/smarterwithgartner/how-to-create-a-business-case-for-data-quality-improvement ⁷ https://www.eckerson.com/articles/the-business-value-of-a-data-catalog © 2022 Larsen & Toubro Infotech Limited 2
Toward a True Data Fabric for the Modern Enterprise Indeed, a data catalog can shorten the time to value for disparate data users. This includes: • Streamlined data preparation and analysis • Alignment with data governance efforts • Deeper data insights and relationships across various data sources • Semantic data discovery • Dynamic evaluation of the DQ index on each data load • Quick access to already identified data • Integrated ways to perform data lineage • A data health check dashboard • Seamless data consumption and integration with various tools • Auto tag generations, auto relationships, and auto recommendations and entity extractions However, the investment in a single data cataloging tool is significant, including software, data sourcing, and training. Furthermore, it takes a lot more repeated effort to manage multiple data cataloging tools. When this kind of investment needs to be made for multiple tools, we find that the underutilization of both features and the full capacity of each tool makes it difficult to justify the return on investment. Data Impermanence Adopting multiple substandard tools to fulfill the specific requirements of different business personas is not the permanent solution to the ever existing need to deliver data across every business level. An investment in a proper data catalog, with out-of- the-box features designed specifically for data fabric, is a solution that can fulfill the requirements of every persona across the enterprise. © 2022 Larsen & Toubro Infotech Limited 3
Toward a True Data Fabric for the Modern Enterprise Empowering Key Personas on Their Distinct Data Journeys Most data-driven enterprises need their data fabric architecture to enable the work of five core personas. Each persona encounters different interactions, checkpoints, and usage patterns that must be accounted for within the data fabric architecture, as follows: Data Protection Officer (DPO)/Chief Data Officer (CDO) The role of CDO continues to evolve and increase in prominence. Research from Forrester found that 58% of organizations had appointed CDOs.⁸ Yet, the aforementioned research from New Vantage Partners indicates that only 33% of companies have well-established, successful CDO roles. To make more informed decisions, CDOs need a consolidated and final view of data, including the health of data and frequency of input. Unfortunately, many CDOs today lack a comprehensive way to look at overall data health statistics. What’s more, many CDOs struggle to apply organizational-level compliance checks on their data, nor can they mask/unmask personal identifiable information (PII) in a particularly efficient way. This hobbles their decision-making capabilities, whether consciously or not. Organizational leadership relies on the CDO to uncover insights and business impact from various data operations. The CDO is asked to provide business justifications for investments in new data management practices and solutions—data fabric architecture included. As such, CDOs rely on the quality of data, compliance, governance policies, and data accessibility to inform their decision making. ⁸ https://www.forrester.com/blogs/chief-data-officers-rule-and-deliver-results/ © 2022 Larsen & Toubro Infotech Limited 4
Toward a True Data Fabric for the Modern Enterprise Data Owners Comprehensive control over data visibility is essential to the core role of a data owner. Unfortunately, many find it cumbersome to manage data controls on a day-to-day basis or lack granular access and governance. To ensure the well-governed operation of the data domains they oversee, data owners need intuitive ways to publish data assets and control access. With respect to data cataloging, data owners will likely rely on profile-based policies for access control, while overseeing data glossary and data quality activities. Data Engineers Many data engineers share an inability to trace data quality issues back and pinpoint the root cause of data corruption. They lack a single view to evaluate data health statistics or identify relationships with existing datasets—to vital pieces of their daily responsibilities. What data engineers need is a simple and integrated way to perform data lineage. Surprisingly enough, most would save a lot of time and effort if they could quickly narrow in on the table or column containing inappropriate data. Most data engineers would also benefit from a data health check dashboard so they could keep tabs on the data coming in the system, as well as which data currently resides in silos. Data Scientists and Data Analysts For these roles, data quality and profiling are essential. As data scientists and analysts create reports on the fly, set up data pipelines, or even push data to an ETL, they need to first trust the data they’re working with. This often requires a lot of cleaning and wrangling, tasks that can consume 80% of a data scientist’s valuable time.⁹ Under these conditions, data scientists and analysts continually struggle to trust data quality on each data load. To remove these hindrances, data scientists and analysts need a quick, semantic path to data discovery, including a dynamic way to evaluate the data quality index on each data load. ⁹ https://towardsdatascience.com/a-data-scientists-guide-to-identify-and-resolve-data-quality-issues-1fae1fc09c8d © 2022 Larsen & Toubro Infotech Limited 5
Toward a True Data Fabric for the Modern Enterprise Business Analysts This group of data users need the ability to easily find and consume reliable data across data sets. Typically, business analysts face three primary challenges: • Inadequate data presentation • Problematic data integration that inhibits data identification • Deficient (or non-existent) process for automatically generate tags, relationships, and suggestions Ideally, analysts have a semantic way of discovering data assets with multi-facet search support to ensure that no useful intelligence goes overlooked. © 2022 Larsen & Toubro Infotech Limited 6
Toward a True Data Fabric for the Modern Enterprise Best Practices for Better Data Fabric Architecture What breakaway enterprises understand is that the first step to becoming a data- driven organization is knowing and managing data assets. The closer an enterprise can bring its data users to the intelligence they need, the more efficient the data- to-insights pipeline becomes. One approach to data fabric is to weave data across enterprise assets. That means ensuring that data is readily available at every step of the journey for specific data users to generate insights on. For example, can the data analyst create reports on the fly? When every step of the data journey is closely knit, stakeholders from top to bottom can go beyond just data cataloging. Indeed, a robust data cataloging solution not only automates the recording of metadata, but makes possible persona-based, simple, intuitive, and contextual interfaces for faster discovery. This is now a requirement for many enterprises.10 As organizations accelerate toward more ambitious digital transformations, they need more than what traditional data cataloging solutions can offer, namely: 1. Automated Intelligence Also known as intelligent cataloging, automated intelligence uses machine learning and AI to automatically tag, classify, and organize data assets. With the help of advanced cognitive engines, automated intelligence can provide relevant suggestions based on data asset popularity, content, user preferences, and so on. 10 https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/designing-data-governance- that-delivers-value1fae1fc09c8d © 2022 Larsen & Toubro Infotech Limited 7
Toward a True Data Fabric for the Modern Enterprise By mining and establishing connections between data sets, automated intelligence supports the discovery of similar or related assets in the catalog. Additionally, automated intelligence algorithms are used to generate the data trustability index (DTI) on the fly for both sample data and complete data. Data science teams often rely on KPIs related to DTI to get a better sense of the data. 2. Robust Collaboration Democratize data consumption by providing an accessible data catalog that allows even non-technical users to locate and utilize data. The ability to share metadata in a collaborative way, using an appropriate access request flow, can be further enhanced by help desk and self-service features. Look for solutions that enable secure exposure of data to downstream activities, such as business intelligence reporting, as well as third-party applications outside the catalog (using specific data APIs). 3. Data Discovery Automated intelligence and collaboration are two key pillars of any data catalog. Together they help build trust and confidence in the data. A modern data catalog solution ought to support these pillars with robust discovery features, such as: • Facets and contextual search • Metadata, profile, and sample records • Favorites list for frequently used assets 4. Lineage Analysis Data professionals responsible for data lineage need reliable answers to two important questions: Where did the data originate? How (and why) has the data changed since its point of origin? © 2022 Larsen & Toubro Infotech Limited 8
Toward a True Data Fabric for the Modern Enterprise The answers to these questions help analysts see the end-to-end data journey, from point of origin to end-user consumption. When something goes wrong with data, such as data corruption or discrepancies, data lineage helps analysts quickly identify the root cause. Additionally, data lineage provides analysts with a complete field-level view of the different data types in play. To understand data lineage in practice, let’s say the analyst performing data lineage identifies incorrect values in a particular column of a dataset. Ideally, the analyst can double-click on each asset and trace it to other data sets involved to understand how a particular column was incorrectly modified. Overall, strong data lineage can contribute to better data quality, streamline IT impact analysis, and improve business applications. 5. Data Governance/Protection Integrating an automated data governance solution with a data catalog ensures data users can access data compliantly and securely, according to their needs. While all users access the same data catalog, only users with the appropriate permissions will have access to certain data sets, thereby protecting sensitive data. Here’s how: • User management system with robust integrations (SSO, LDAP, AD) • Persona-based access control with support for a variety of categories • Multi-hierarchical access approval flows • Centralized activity notifications for each authenticated user 6. Metadata Curation As organizations increasingly adopt a hybrid multi-cloud environment, a data catalog tool that can connect to and extract metadata from multiple databases (data warehouses/lakes, ETL, and BI tools), is key to scaling data access in a centralized catalog. This requires a one-stop location to maintain the organization’s metadata with proper controls and restrictions. © 2022 Larsen & Toubro Infotech Limited 9
Toward a True Data Fabric for the Modern Enterprise An Introduction to Fosfor Optic The right catalog solution can help support data fabric architecture that delivers the kind of business value today’s enterprises insist on. That’s why actionable insights are such an essential deliverable of enterprise data management and the data catalog journey. Fosfor Optic is an autonomous and intelligent data cataloging product with features that enable enterprise data fabric out of the box. It empowers your business users by creating a modern data culture with democratized data and intelligence assets, including an added layer of intelligent governance. As such, Optic is well-suited for enabling workplace productivity. It maximizes ROI by creating a data marketplace where people can easily access valuable insights in less time with the help of a unified data management architecture. Key Value Drivers • Cognitive data discovery from 50+ connectors for each defined persona • Continuous execution of state-of-art analytics over existing data assets • Auto-discovery of new insights, knowledge nuggets, and relationships • 60-degree experience from discovery to consumption, including data quality index, data lineage, and data traceability While there’s a healthy debate as to whether a single solution can handle every aspect of data fabric architecture, we agree with Gartner that the right solutions can help digital leaders build data fabric better.11 Optic provides a unified platform to connect all data assets, while facilitating superior data accessibility, governance, sharing, and privacy. 11 https://www.gartner.com/smarterwithgartner/data-fabric-architecture-is-key-to-modernizing-data- management-and-integration © 2022 Larsen & Toubro Infotech Limited 10
Toward a True Data Fabric for the Modern Enterprise What Forrester Has to Say In “The BI Fabric Baby Is Slowly But Surely Growing Up,” Forrester Vice President and Principal Analyst Boris Evelson underscores the importance of data catalogs in creating a more tightly integrated business intelligence fabric: “ A single place to catalog all data sources used by multiple enterprise BI platforms. This is also a single place (a common BI portal can also be used for this function) for data source governance — tagging data sources with the levels of data quality and approved use cases, promoting data sets from development to production, etc. [...] productized offerings from global system integrators like LTI’s Mosaic Catalog (recently rebranded as LTI’s Fosfor Optic) can be used for this purpose.”12 Where Optic Wins Today, 86% of organizations use two or more business intelligence platforms.13 Data sources are myriad, disparate, and often siloed. This leads to issues of data quality, integrity, access, and collaboration, which we’ve previously detailed in this paper. Trust and governance issues tend to crop up as well. Optic helps alleviate these challenges with a sophisticated product architecture and a robust feature set, including: • Enterprise data discovery that uses a powerful discovery engine to search across a variety of data assets 12 https://www.forrester.com/blogs/the-bi-fabric-baby-is-slowly-but-surely-growing-up/ 13 Ibid © 2022 Larsen & Toubro Infotech Limited 11
Toward a True Data Fabric for the Modern Enterprise • Hybrid-cloud readiness to catalog metadata from different clouds into a single view • BI report catalog for easy discovery of reports, including a unified dashboard • Intelligent cataloging that leverages ML for smart tagging and recommendations • Virtual data mart providing a single view for structured, semi-structured, and unstructured data • Self-service data consumption so users can write queries on disjointed data sets while keeping data where it actually lives With Optic, data engineers enjoy baked-in data lineage, including a detailed view of tables and fields, a quality dashboard, and auto relationships. The state-of-the-art suggestion engine alone can save data engineers an estimated three hours of data exploration per occurrence. And Optic provides out-of-the-box KPIs and data metrics that can save engineers approximately 2.5 hours on data quality measurements per occurrence. For data scientists and data analysts, Optic offers semantic data search and DQ dashboards out of the box. In terms of quickly accessing already identified data, Optic makes it easy to bookmark data, or navigate to what they need using My Collection (including a robust search function). These cognitive data discovery capabilities alone can save up two hours per occurrence on data search and exploration. As for evaluating data flows, data scientists and analysts can save up to two hours per occurrence thanks to integrated data lineages. For business analysts, Optic allows for this seamless consumption of data and integration with various analytics tools. Additionally, Optic offers automatic tagging, business terms, recommendations, and entity extractions. The data quality engine built into Optic can save approximately 2.5 hours of time on data quality and profiling per occurrence. Finally, Fosfor Optic tackles the challenge of proving ROI, as the following case study demonstrates. © 2022 Larsen & Toubro Infotech Limited 12
Toward a True Data Fabric for the Modern Enterprise Case Study 1: Global Energy Management & Automation A global energy management and automation firm was facing issues with its IT landscape, which included 4,000 application systems and 60 different ERP implementations. Because these systems were functioning in silos, insight-based decision making was a challenge. The firm’s CIO was looking for a data marketplace solution that could: • Bring autonomy of data access to global information seekers • Ensure foundational information security, privacy, and governance • Provide data-as-a-service capabilities to ingest and consume data on demand • Integrate data from any data source (enterprise, structured/unstructured, social media, IoT, partner ecosystem) • Build new services on top of the data platform to support and accelerate innovation • Promote a culture of data-driven decision making Working with LTI, this firm deployed Optic and Spectra over an AWS Cloud instance in just four weeks. The resulting intelligent data store became the foundation for the proposed data marketplace, leading to the following business benefits: • 100+ information assets scanned and curated to the marketplace • 80% increase in accuracy when identifying related or duplicate information assets • A centralized business glossary available to everyone © 2022 Larsen & Toubro Infotech Limited 13
Toward a True Data Fabric for the Modern Enterprise Case Study 2: Global Pharmaceutical Major The R&D division for a U.S. leader in pharmaceuticals and clinical trials had gathered comprehensive data during its development of new drugs for oncology and immunology diseases. The data was structured and unstructured, and comprised documents which were stored in a Microsoft Azure Data Lake. The client used Collibra and Mark Logic to capture data taxonomy and metadata to connect all the structured and unstructured data. However, since these two systems worked in silos, the R&D scientists found it difficult to find all associated data and documents on Azure using relevant keywords. The client needed a solution that could: • Provide a search portal that would sit at the top of their existing data infrastructure and scan metadata from structured and unstructured data residing on Azure BLOB storage • Integrate with MarkLogic and Collibra to extract the data taxonomy and metadata on the fly, associate with data from Azure BLOB, and make it all searchable The pharmaceutical major deployed Optic on its Azure platform to create a data marketplace. Thanks to the 50+ data connectors inherent to Optic, the team brought the data assets together for cognitive and intuitive discovery, as well as comprehensive metadata scanning. The result was a unified data accessibility portal on top of the client’s existing data infrastructure. This was the first time that Optic was installed and configured on an Azure platform using the PaaS architecture. This was also the first instance of an enterprise integrating Optic with Collibra using restful API calls, and scanning metadata from Azure BLOB and MarkLogic server. The results were remarkable: • Helped R&D scientists avoid duplication of research and experimentation efforts • Brought together diverse technologies and siloed data • Delivered 100% coverage of metadata scans from Azure BLOB storage • Ensured sub-second response time in API integration calls from Collibra and MarkLogic servers • Achieved 85%+ accuracy in serving relevant search results for both keyword and faceted search © 2022 Larsen & Toubro Infotech Limited 14
Toward a True Data Fabric for the Modern Enterprise Toward a Unified, Governed, and Collaborative Intelligence System If data-driven enterprises have any hope to live up to their names, they’ll need better ways to extract business value from their data. Fortunately, the new way of data cataloging is here, one in which the design, deployment, and proper utilization for data—across all environments and platforms—can be enabled by a single data cataloging solution. In other words, the right data, at the right time, for the right people. It’s a simple concept that continues to elude many enterprises, as we’ve underscored in our scrutiny of the current data landscape. Yet it is possible to create a collaborative data workspace for diverse data users—for everyone from the CDO to business analysts—one that addresses the unique data problems these personas face in their everyday work. The prerequisites? • Automated intelligence • Robust collaboration • Data discovery and lineage analysis • Data governance • Metadata curation These are the five core pillars or a collaborative data workspace. And they comprise the foundation of the entire Fosfor product suite, Optic included. Because it’s the enterprises that can optimize all aspects of the data-to-decisions lifecycle—that can put the right data in more hands as fast as possible—that will achieve truly modern data fabric fit for the modern enterprise. © 2022 Larsen & Toubro Infotech Limited 15
Toward a True Data Fabric for the Modern Enterprise Author’s Profile: Sajid Rashiyani Sajid Rashiyani is Product Head for Optic, a Data Cataloging product based on Data Fabric architecture by Fosfor a product suite by LTI. With 12+ years of industry experience in innovation-driven product design, development, and release expertise, he has spearheaded several industry-specific data & analytics solutions for Fortune 500 clients globally. Recently he has been conferred as an Ambassador for Queen Elizabeth Awards in Engineering UK. As an ambassador his responsibility will be to engage with the brightest early career engineers from all fields around the world and inspire them by transferring skills, knowledge and thereby enriching their personal & professional life. Sajid likes to share his knowledge and is among the top product management coach at Upgrad. He holds multiple certifications in Product Management & Cloud and brings immense knowledge around solving complex business problems with out- of-box technology solutions. An acclaimed thought-leader, Sajid is passionate about talking and sharing his views on technology and AI-driven product innovation at many forums. © 2022 Larsen & Toubro Infotech Limited 16
The Fosfor Product Suite is the only end-to-end suite for optimizing all aspects of the data-to-decisions lifecycle. Fosfor helps you make better decisions, ensuring you have the right data in more hands in the fastest time possible. The Fosfor Product Suite is made up of Spectra, a comprehensive DataOps platform; Optic, a data fabric to facilitate data discovery-to-consumption journeys; Refract, a Data Science and MLOps platform; Aspect, a no-code unstructured data processing platform; and Lumin, an augmented analytics platform. Taken together, the Fosfor suite helps businesses discover the hidden value in their data. The Fosfor Data Products Unit is part of LTI, a global technology consulting and digital solutions company with hundreds of clients and operations in 31 countries. For more information, visit Fosfor.com. © 2022 Larsen & Toubro Infotech Limited
You can also read