The growth of data innovation has exploded in recent years, mainly due to the thriving cloud data warehouse communities such as Snowflake, Redshift, and BigQuery, with their primary focus on SQL users and business intelligence use cases. Lakehouse architecture brings analytics closer to data lakes, enabling heterogeneous and distributed data processing engines to ingest sources, including diverse workloads such as data science, machine learning capabilities, and near real-time analytics enablement. It has also spawned thriving innovations in integrated data services that automate and unify data modeling, transformation, and metrics, such as dbt and LookML. Collectively, these tools lay the foundation upon which next-generation operational and analytical data applications can be constructed for various data consumers of different personas.
The Data Access Challenges
First of all, let's look at the ultimate goal for data access in a company:
Allow data consumers to find and access their data, fast and simple.
Let's look at a company's current data access workflow; here's what it looks like for a data consumer requesting and accessing data.
- Data consumers need to find who owns the data.
- Request access from the data owners; then, they must filter or mask certain rows and columns to specific groups or users before exposing them to usage.
- Based on different data applications/tools, such as using Excel, BI, AI, or RESTful API, data owners need to evaluate the best data access method.
- Last is automation. Periodically, update and deliver to end applications and ensure they are secure and auditable.
Data access in an enterprise encounters several challenges.
From left to right, the data heterogeneity needs to homogenize in the metadata and logical level; data authorization is a sophisticated access control and authorization for domain-specific datasets and their associated data applications; empower datasets with semantic meanings through the process of data productization; Finally, based on different data consumers' persona provide endpoints.
A complete data access layer must solve data heterogeneity, usability, and authorization while enabling consistency and scalability across different data applications.
The Data Access Layer
As a company grows, it starts to pile up with hundreds and thousands of requests from data consumers, and the backlogs of demands will need days and even weeks to resolve.
The bottleneck is obvious: the "data access workflow". We need a data access layer that is a secure, efficient, automated, and intelligent way for data consumers to access data themselves, avoid delays and misalignment, and data owners can authorize. Audit datasets make sure the right person accesses the suitable dataset.
We need a data access layer that is secure, efficient, and intelligent.
Ultimately, achieve equilibrium between data, people, and applications.
Four-core design principles
Data Access Layer consists of four core design principles.
1. Data Virtualization
Data Access Layer is collaborative and distributed in nature, with each silo or data source independently scalable or together as an aggregate.
2. Data Productization
Transform data models to domain-oriented datasets; Domain-oriented datasets owned by data owners can be shared and governed by open APIs, with the flexibility of interchangeable metadata and access rules, let data speak your business language.
3. Data Authorization
Consistent data authorization framework from sources to data applications and integrated with existing Identity and Access Management (IAM). Make data authorization consistent across data sources, IAM, and data applications.
4. Data Consumption
Data consumers can generate Queries and APIs with intent and contextual settings, applied to the corresponding datasets via intent declaration, and deliver them to target consumers where final analytics are performed and displayed.
Enterprises can significantly eliminate data complexity, communication, and productivity through the data access layer.
- Data access from days to minutes: Reduce 60% of data integration cost with up-to-date data delivery.
- Reduce duplicate datasets: Create masked and filtered datasets without physically moving data.
- Achieve self-service analytics: Improve data productivity across analytical and operational data applications and tools.
No reproduction without permission, please indicate the source if authorized.