In a data mesh architecture, data access is a critical component that enables data consumers to discover and access the data they need in a self-serve manner. However, this does not mean that data access is unregulated or uncontrolled. A data mesh architecture requires a different kind of more distributed and decentralized control.
In a traditional centralized data architecture, data access is typically controlled by a centralized data team, which can create bottlenecks in the data pipeline and slow down data access. This can be particularly problematic in large organizations with many different data consumers with different needs and requirements.
In contrast, in a data mesh architecture, data access is decentralized and self-serve, which allows data consumers to access the data they need directly. This is achieved through data discovery, data catalogs, and data access APIs. With these capabilities, data consumers can search for and discover relevant data sets, understand their structure and quality, and access them using a standardized interface.
Current pain points
Without a data access layer in a data mesh architecture, there can be several pain points, including:
- Data Inconsistency: Each data product team may use its own data storage, which can lead to inconsistent data across the organization. Without a centralized data access layer, it can be challenging to ensure that the data is up-to-date, accurate, and consistent.
- Data Security: Data security is a critical concern in any organization. Without a data access layer, there may be limited control over who can access data, how data is accessed, and how it is protected. This can increase the risk of data breaches, leaks, and unauthorized access.
- Data Governance: Data governance is essential for ensuring that data is properly managed, tracked, and audited. Without a centralized data access layer, it can be challenging to enforce data governance policies consistently across the organization.
- Data Integration: Data mesh architecture emphasizes the need for data interoperability between different data products. Without a data access layer, integrating data from different sources can be a complex and time-consuming process.
- Scalability: As the organization grows and the number of data products increases, managing and scaling without a centralized data access layer can become challenging. It can be difficult to ensure that all data products are performing optimally and meeting the organization's needs.
- Complexity: Without a centralized data access layer, the complexity of managing data products and maintaining consistency can increase. This can result in higher costs, longer development cycles, and slower time-to-market.
Canner — The Data Access Layer of Data Mesh
Canner’s "Data Access Layer" is designed around four core principles:
1. Data Virtualization
The Data Access Layer is collaborative and distributed, with each silo or data source independently scalable or aggregated together.
2. Data Productization
Transform data models into domain-oriented datasets, which can be owned by data owners, shared, and governed by open APIs. This allows for interchangeable metadata and access rules, letting data speak the language of your business.
3. Data Authorization
Implement a consistent data authorization framework from data sources to data applications that is integrated with existing Identity and Access Management (IAM). This ensures that data authorization is consistent across data sources, IAM, and data applications.
4. Data Consumption
Data consumers can generate queries and APIs with intent and contextual settings. These are applied to the corresponding datasets via intent declaration, and then delivered to target consumers for final analytics and display.
By enabling self-serve data access, a data mesh architecture can improve the speed and efficiency of data consumption while also promoting data democratization and collaboration. This means that different teams and individuals across the organization can access and use data in a more agile and flexible way, without relying on a centralized data team to provision and manage data access. Furthermore, allowing data consumers to access data directly can promote a culture of data-driven decision-making, where individuals and teams are empowered to make decisions based on data rather than gut instincts or incomplete information.
No reproduction without permission, please indicate the source if authorized.