data vault case study

WhereScape

Data Vault 2.0: Best Practices and Modern Integration

Unlock the full potential of your data with our latest white paper, “Data Vault 2.0: Best Practices and Modern Integration.” Discover how the evolved Data Vault methodology can transform your data management, offering unparalleled flexibility, scalability, and agility to meet the demands of today’s dynamic data environments.

Download this report to learn: 

  • Evolving Data Vault: Move beyond the limitations of Data Vault 1.0 with advanced strategies for managing storage, complexity, performance, and scalability.
  • Foundational Principles: Master the core principles of Data Vault 2.0 to navigate the complexities of modern data architectures effortlessly.
  • Innovative Modeling Techniques: Dive into effective modeling techniques that utilize hubs, links, and satellites for a future-proof data warehouse.
  • Implementation Excellence: Apply best practices for a seamless Data Vault 2.0 implementation, enhancing automation, optimization, and ELT integration.
  • Future-Ready Integration: Leverage the synergy between Data Vault 2.0 and cutting-edge technologies, including cloud platforms, big data tools, and machine learning, to supercharge your data analytics capabilities.

Transform your organization’s data management strategy and advance innovation and growth. Take advantage of this essential guide for data professionals looking to elevate their data warehousing to the next level.

Download Now — Gain immediate access to insights and strategies that will set your data initiatives apart. Empower your data journey with Data Vault 2.0 today!

Enter your details to download your report

Learn more about our unique data productivity capabilities for these leading platforms.

Deploy on Microsoft Azure and integrate with Microsoft applications.

Seamlessly work with Amazon Web Services (AWS).

Leverage a complete range of Google infrastructure and data solutions.

Ingest data from multiple sources and deliver more business insights.

DataBricks

Deliver a wider variety of real-time data for Al, ML and data science.

data vault case study

“It took the architects a day and a half to solve all four use cases. They built two Data Vaults on the host application data, linked the two applications together and documented the whole process. This was impressive by any standard. After that it was an easy process to get all the documents signed.”

Daniel Seymore, Head of BI, Investec South Africa

Read Case Study

"At seven months into the project we can say it really worked out. We have been able to really quickly develop an initial MVP for our first country and that was really good. The automation and the changes we needed to do were rapidly applied. We had to remodel a few things and that was done within a day with the automation in WhereScape."

Carsten Griefnow, Senior BI Manager

"It’s like having five people with only really two people working on it."

Will Mealing, Head of Data & Analytics at L&G

data vault case study

JSSA Enhances Operational Efficiency and Data Management with WhereScape

Gartner report, unifying wherescape with databricks.

data vault case study

ON-DEMAND WEBINAR

Data vault and databricks: automation techniques, best practices, and use cases.

data vault case study

Loading From Web APIS

Data Topics

  • Data Architecture
  • Data Literacy
  • Data Science
  • Data Strategy
  • Data Modeling
  • Governance & Quality
  • Data Education
  • Data Governance & Data Quality | News & Articles
  • Data Governance Blogs

Hybrid Architectures in Data Vault 2.0

Are you drowning in data? Feeling shackled by rigid data warehouses that can’t keep pace with your ever-evolving business needs? You’re not alone. Traditional data storage strategies are crumbling under the weight of diverse data sources, leaving you with limited analytics and frustrated decisions. But what if there was a better way? A way to […]

data vault case study

Are you drowning in data? Feeling shackled by rigid data warehouses that can’t keep pace with your ever-evolving business needs? You’re not alone. Traditional data storage strategies are crumbling under the weight of diverse data sources, leaving you with limited analytics and frustrated decisions. But what if there was a better way? A way to embrace the vast ocean of data at your fingertips and unlock its limitless potential? Enter the game-changer: hybrid architectures.

data vault case study

In this article, we’ll show you how hybrid architectures can transform your data strategy from a sinking ship to a high-seas cruiser, ready to navigate the turbulent waters of your business landscape.

Components of a Hybrid Architecture

The Data Lake

Imagine a vast, ever-expanding repository, the Grand Canyon of your data. This is the  data lake , the landing zone for raw, unfiltered data from all corners of your enterprise: structured (databases, logs) and unstructured (social media, sensor readings). Think of it as the raw material fueling your analytical engine.

  • Advantages: Scalability for massive data volumes, flexibility for diverse data types, cost-effectiveness for storing raw data.
  • Challenges:  Data governance to ensure quality and lineage, schema evolution to manage new data types, and query optimization for efficient exploration.

The Data Vault

Now, picture a meticulously crafted cathedral within the data lake, organized with a purpose. This is the Data Vault, the heart of your analytical power. It houses core business entities (customers, products, transactions) represented by “business keys,” independent of any specific source system. Data from the lake is cleansed, transformed, and enriched before entering the vault, becoming the building blocks for analysis.

  • Advantages: Historical analysis through event-driven data capture, efficient querying through a normalized data structure, agility, and adaptability through modular design.
  • Challenges: Maintaining data integrity during integration, balancing schema stability with evolving business needs, and ensuring data accessibility for diverse users.

Imagine intricate bridges connecting the cathedral to the surrounding landscape. These links connect data across the lake and vault, revealing relationships and context. They allow you to explore how customer orders connect to social media mentions or how sensor readings correlate with product performance.

  • Advantages: Unlocking deeper insights through cross-domain analysis, enriching the vault with context from the lake, and enabling flexible exploration of data relationships.
  • Challenges: Designing intuitive link structures for efficient querying, maintaining consistency between links and their corresponding data elements, and ensuring data security and access control across linked data sources.

The Tools and Techniques

Think of the architects, builders, and caretakers of this data ecosystem. Tools and techniques like ETL/ELT pipelines, data quality tools, data lake management platforms, and Data Vault modeling techniques are crucial in building, maintaining, and utilizing the hybrid architecture.

  • Advantages: Automation for efficient data flow, governance for data quality and security, and best practices for optimizing performance and scalability.
  • Challenges: Choosing the right tools for your specific needs, staying up to date with evolving technologies, training, and empowering data management teams.

Advantages of Hybrid Architectures

The promise of hybrid architectures in  Data Vault 2.0  extends far beyond simply throwing data into a lake and building a neat house on top:

1. Flexibility to dance with the data:  The data lake welcomes all data types, whether sensor readings, social media buzz, or traditional transaction logs, without forcing them into rigid schemas. This opens doors to unforeseen analyses, allowing you to discover hidden correlations and previously unimaginable insights. 

2. Scalability:  Hybrid architecture scales effortlessly. The data lake’s vastness accommodates data volumes that would make traditional systems choke, allowing you to capture every aspect of your business activity.

3. Cost-effectiveness:  Budget constraints often pinch data initiatives. Hybrid architectures offer a breath of fresh air. Raw data resides in the cost-effective data lake, while the curated core of the Data Vault minimizes storage needs for frequently accessed analysis. This intelligent allocation of resources lets you maximize your data ROI.

4. Agility: Adapt and conquer in the data jungle: Hybrid architectures equip you with agility. New data sources can be easily integrated into the lake, requiring minimal changes to the Data Vault structure. This translates to quicker analysis, swifter decision-making, and the ability to outmaneuver your competitors.

5. Deeper insights:  Traditional data warehouses often offer surface-level views. Hybrid architectures unlock hidden treasures. By connecting the dots between structured and unstructured data in the lake and the carefully curated Data Vault, you gain a 360-degree view of your business.

Challenges and Considerations in Hybrid Architectures

1. Data governance:  Clear policies and procedures are crucial for managing data flow between the lake and the vault, preventing inconsistencies, and maintaining trust in your data assets. Think data dictionaries, audit trails, and access control mechanisms – all essential tools for keeping your data intact.

2. Schema evolution:  While the core business entities should remain consistent, accommodating new data sources might require careful adjustments to the Data Vault schema. Striking the right balance between agility and data integrity requires thoughtful planning and collaboration between data architects and business stakeholders.

3. Query optimization:  Efficiently querying across the data lake and the Data Vault can be tricky due to their different structures. Utilizing tools like MPP query engines and optimizing link structures becomes paramount for navigating the vast data landscape and retrieving the insights you seek. 

4. Skills and training:  Implementing a hybrid architecture requires expertise in Data Vault modeling, data lake management, and data integration tools. Invest in training your teams or recruit individuals with the necessary skills. 

5. Tool selection:  With many tools available, choosing the right ones can feel like navigating a minefield. ETL/ELT pipelines,  data quality tools , data lake management platforms, and Data Vault modeling tools all play their part, but selecting the wrong ones can hinder your progress. Research, compare, and choose tools that seamlessly integrate and align with your needs and data landscape. 

When implemented thoughtfully, hybrid architectures empower data-driven organizations to leverage the flexibility of data lakes alongside the analytical power of Data Vault 2.0. By carefully addressing the challenges and utilizing the right tools, organizations can unlock deeper insights and improved decision-making from their diverse data assets.

data vault case study

How to Build a Modern Data Platform Utilizing Data Vault

When looking to build out a new data lake, one of the most important factors is to establish the warehousing architecture that will be used as the foundation for the data platform . 

While there are several traditional methodologies to consider when establishing a new data lake (from Inmon and Kimball, for example), one alternative presents a unique opportunity: a Data Vault.

Utilizing a Data Vault architecture allows companies to build a scalable data platform that provides durability and accelerates business value. 

Here’s how:

What is a Data Vault?

After trying and failing to implement a large-scale data warehouse with existing architectures, Dan Linstedt and his team at Lockheed Martin created the Data Vault methodology in the early ’90s to address the challenges they had faced. 

At its core, Data Vault is a complete system that provides a methodology, architecture, and model to successfully and efficiently implement a highly business-focused data warehouse. There are many ways these various components can be utilized and implemented, however, it’s important to stick to and follow the standard recommendations of the Data Vault system when building a Data Vault. Projects can quickly become unsuccessful if the standards are not followed.

Pros and Cons of Using a Data Vault

Let’s take a closer look at a few of the reasons why we would want to design a data lake using Data Vault — and some of the potential drawbacks to consider.

  • Insert only architecture
  • Historical record tracking
  • Keeps all data; the good and the bad
  • Provides auditability
  • Can be built incrementally
  • Adaptable to changes without re-engineering
  • Model enables data loads with a high degree of parallelism
  • Technology agnostic
  • Fault-tolerant ingestion pipelines
  • Models can be more complex
  • Teams need additional training to know how to correctly implement a Data Vault
  • Amount of storage needed to maintain complete history
  • Data isn’t immediately user-ready when it’s ingested into the Data Vault (business vault and information marts need to be created to provide value to the business)

3 Core Data Vault Modelling Concepts

There are three core structures that make up a Data Vault architecture:

As we step through the structures below, take note of the required fields — these are mandated by the Data Vault architecture. While the hashing applications described below are not technically mandated, Data Vault 2.0 highly recommends them. Hashing provides many advantages over using standard composite or surrogate keys and data comparisons:

  • Query Performance – Fewer comparisons to make when joining tables together.
  • Load Performance – Tables can be loaded in parallel because ingestion pipelines don’t need to wait for other surrogate keys to be created in the database. Every pipeline can compute all the needed keys.
  • Deterministic – Meaning that the key can be computed from the data. There are no lookups necessary. This is advantageous because any system that has the same data can compute the same key.
  • Business hashes can be used to better distribute data in many large distributed systems.
  • Content hashes can efficiently be used to detect changed records in a dataset no matter how many columns it contains.
  • Data Sharing – Hashed keys can enable high degrees of sharing for sensitive data. Relationships between datasets can be exposed through Link tables without actually exposing any sensitive data.

Note: all of the structures listed below are non-volatile. This means that you can’t modify the data in the rows. If there needs to be an update to a row, a new row must be inserted into the table which would contain the change.

A Hub represents a core business entity within a company. This can be things like a customer, product, or a store.  

Hubs don’t contain any context data or details about the entity. They only contain the defined business key and a few mandated Data Vault fields. A critical attribute of a Hub is that they contain only one row per key.

A table showing an example data vault

A Link defines the relationship between business keys from two or more Hubs .  

Just like the Hub, a Link structure contains no contextual information about the entities. There should also be only one row representing the relationship between two entities. In order to represent a relationship that no longer exists, we would need to create a satellite table off this Link table which would contain an is_deleted flag; this is known as an Effectivity Satellite.

a chart showing an example data vault link

One huge advantage Data Vault has against other data warehousing architectures is that relationships can be added between Hubs with ease. Data Vault focuses on being agile and implementing what is needed to accomplish the current business goals. If relationships aren’t currently known or data sources aren’t yet accessible , this is ok because Links are easily created when they are needed. Adding a new Link in no way impacts existing Hubs or Satellites.

Often, with more traditional approaches, these kinds of changes can lead to larger impacts on the existing model and data reloads. This is one of the factors that makes Data Vault modeling an agile and iterative process. Models don’t have to be developed in a “big bang” approach.

In Data Vault architecture, a Satellite houses all the contextual details regarding an entity  

In my business, data changes very frequently. How can non-volatile contextual tables work for me?

When there is a change in the data, a new row must be inserted with the changed data. These records are differentiated from one another by utilizing the hash key and one of the Data Vault mandated fields: the load_date. For a given record, the load_date enables us to determine what the most recent record is.

Table showing an example data vault satellite

In the example above, we see two records for the same product_hash. The most recent record, which is defined by the load_date, corrects a spelling error in the product_name field. 

But won’t that take forever to determine what has changed between the source and the Data Vault? 

No — this is very performant with the use of a content_hash. While it’s optional with a Data Vault model , it provides a huge advantage when examining records that have changed between source and target systems. 

The content_hash is computed when populating the Data Vault Staging area (more on this below), and it would utilize all relevant contextual data fields. When any of these contextual data fields are updated, a different content_hash would be computed. This allows us to detect changes very quickly. Depending on the technology in use, this would most commonly be accomplished with an Outer Join, although some systems offer even more optimized techniques.

To help with differentiation, Satellites are created based on data source and its rate of change. Generally, you would design a new Satellite table for each data source and then further separate data from those sources that may have a high frequency of change. Separating high and low-frequency data attributes can assist with ingestion throughput and significantly reduce the space that historical data consumes. Separating the attributes by frequency isn’t required, but it can offer some advantages. 

Another common consideration when creating Satellites is data classification. Satellites enable data to be split apart based on classification or sensitivity. This makes it easier to handle special security considerations by physically separating data elements.

What Does a Data Vault Look Like?

If you’ve worked with a more traditional data warehousing model, dimension modeling and star schemas will be very familiar to you. 

In a simplified example of star schema (a sales order for a retail establishment) you may end up with something like this:

A simple example of a Star Schema model.

Taking our Data Vault concepts from above and applying them to our same example, we may instead end up with something like this:

An example of a data vault model

We can see right away that we gained a few tables over our star schema example. We no longer have a single Dimension table representing Customer, but we have replaced that with a Customer Hub table and two Satellite tables. One Satellite contains data from a retailer’s Salesforce instance, while the other contains data from the retailer’s webstore. 

In short, the Data Vault methodology permits teams to ingest new data sources very quickly.  

Instead of reengineering the model and wasting valuable cycles determining the impact of those changes, data from a new source can be ingested into a completely new Satellite table. This speed also enables data engineers to iterate rapidly with business users on the creation of new information marts. 

Need to integrate completely new business entities to the Data Vault? You can add new Hubs at any time, and you can define new relationships by building new Link tables between Hubs. This process has zero impact on the existing model.

A Modern Data Platform Architecture

a diagram illustrating an example data vault

Staging is essentially a landing zone for the majority of the data that will enter the Data Vault. 

It often doesn’t contain any historical data, and the data mirrors the schema of the source systems. We want to ingest data from the source system as fast as possible, so only hard business rules are applied to the data (i.e. anything that doesn’t change the content of the data).

The Staging area can also be implemented in what is known as a Persistent Staging Area (PSA). Here, historical data can be kept for some time in case it is needed to resolve issues or referenced back to. A PSA is also a great option to use as a foundation for a Data Lake! You won’t want all use-cases hitting your enterprise data warehouse (EDW), so having a PSA/Data Lake is a great capability to enable Data Science , Data Mining, and other Machine Learning use-cases.

Ideally, the pipelines that ingest data into Staging should be generatable and as automated as possible. We shouldn’t be wasting a lot of time ingesting data into the Data Vault. Most of our time should be spent working with the business and implementing their requirements in Information Marts.

Enterprise Data Warehouse

Raw is where our main Data Vault model lives (Hubs, Links, Satellites). 

Data is ingested in the Raw layer directly from the Staging layer, or potentially directly into the Raw layer when handling real-time data sources. When ingesting into the Raw layer, there should also be no business rules applied to the data. 

Ingesting data into Raw is a crucial step in the Data Vault architecture and must be done correctly to maintain consistency. As was mentioned earlier for Staging, these Raw ingestion pipelines should be generatable and as automated as possible. We shouldn’t be handwriting SQL statements to do the source to target diffs. One incorrect SQL statement and you will have unreliable and inconsistent tables. The genius of Data Vault is that it enables highly repeatable and consistent patterns that can be automated and make our lives a lot easier and more efficient.

Business Vault

The Business Vault is an optional tier in the Data Vault where the business can define common business entities, calculations, and logic. This could be things like Master Data or creating business logic that is used across the business in various Information Marts. These things shouldn’t be implemented in every information mart differently, it should be implemented once in the Business Vault and used multiple times through the Information Marts.

Metrics Vault

The Metrics Vault is an optional tier used to hold operational metrics data for the Data Vault ingestion processes. This information can be invaluable when diagnosing potential problems with ingestion. It can also act as an audit trail for all the processes that are interacting with the Data Vault.

Information Delivery

Information marts.

The Informational Marts are where the business users will finally have access to the data. All business rules and logic are now applied into these Marts.

For implementing business rules and logic, the Data Vault Methodology also leans heavily on the use of SQL Views over creating pipelines. Views enable developers to very rapidly implement and iterate with the business on requirements when implementing Information Marts. Having too many pipelines is also just more things to maintain and worry about rerunning.  Business users can query Views knowing they are always accessing the latest data.

So does this mean I have to fit all my business logic into Views now?

No — The preference of the Data Vault methodology leans towards using Views, but there are certain things Views aren’t the right fit for (i.e. extremely complex logic, machine learning , etc.). If it feels like a struggle trying to get SQL to perform your business logic, a View probably isn’t the right tool. For these cases, a traditional pipeline is going to be your best bet.

If all my business logic is in Views, isn’t that going to slow down my BI reports?

Like anything else, it depends. There are many considerations from the size and volume of the data, the complexity of the business logic, to the database technology and capabilities of that system. 

Most times Views will perform just fine and meet the needs of most business needs. If, however, Views aren’t performing for your use-case, the Data Vault methodology offers more advanced structures known as Point In Time (PIT) and Bridge tables which can greatly improve join performance. As a last resort, Data Vault methodology states we can materialize our data (i.e. materialized view or new table).

The concept of an Information Mart is also a logical boundary. Your Data Vault can also be used to populate other platform capabilities such a NoSQL, graph, and search. These can still be considered as a form of information mart. These external tools would typically be populated using ETL pipelines.

Error Marts

Error Marts are an optional layer in the Data Vault that can be useful for surfacing data issues to the business users. Remember that all data, correct or not, should remain as historical data in the Data Vault for audit and traceability.

Metrics Marts

The Metics Mart is an optional tier used to surface operational metrics for analytical or reporting purposes.

Moving Forward With Your Data Platform

Choosing the right warehousing architecture for your enterprise isn’t only about ease of migration or implementation. 

The foundation you build will either support or inhibit business users and drive or limit business value. Utilizing Data Vault may not be traditional, but it could be exactly what you need for your business.

Looking for More Information on Implementing a Data Vault?

We know it can be a challenge to build a data platform that maintains clean data, caters to business users, and drives efficiency.

If you have more questions or don’t quite know where to get started with building or managing a data platform using Data Vault, please reach out and talk to one of our experts !

data vault case study

More to explore

data vault case study

Experimenting with GenAI: Building Self-Healing CI/CD Pipelines for dbt Cloud

data vault case study

How to Effectively Version Control Your Machine Learning Pipeline

data vault case study

Enabling Dynamic Report Subscriptions in Power BI

data vault case study

Join our team

  • About phData
  • Leadership Team
  • All Technology Partners
  • Case Studies
  • phData Toolkit

Subscribe to our newsletter

  • © 2024 phData
  • Privacy Policy
  • Accessibility Policy
  • Website Terms of Use
  • Data Processing Agreement
  • End User License Agreement

data vault case study

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

  • Data Coach Overview
  • Course Collection

Accelerate and automate your data projects with the phData Toolkit

  • Get Started
  • Financial Services
  • Manufacturing
  • Retail and CPG
  • Healthcare and Life Sciences
  • Call Center Analytics Services
  • Snowflake Native Streaming of HL7 Data
  • Snowflake Retail & CPG Supply Chain Forecasting
  • Snowflake Plant Intelligence For Manufacturing
  • Snowflake Demand Forecasting For Manufacturing
  • Snowflake Data Collaboration For Manufacturing

data vault case study

  • MLOps Framework
  • Teradata to Snowflake
  • Cloudera CDP Migration

Technology Partners

Other technology partners.

data vault case study

Check out our latest insights

data vault case study

  • Dashboard Library
  • Whitepapers and eBooks

Data Engineering

Consulting, migrations, data pipelines, dataops, change management, enablement & learning, coe, coaching, pmo, data science and machine learning services, mlops enablement, prototyping, model development and deployment, strategy services, data, analytics, and ai strategy, architecture and assessments, reporting, analytics, and visualization services, self-service, integrated analytics, dashboards, automation, elastic operations, data platforms, data pipelines, and machine learning.

Data Vault: What is it and when should it be used?

Two men working at a white board

Evan Pearce

Data Vault 2.0 methodology takes not only modeling technique, but provides an entire methodology for all Data Warehouse Projects.   

For many years, business intelligence (BI) projects have and continue to be operating under a waterfall model. It’s defined by a long-stretched sequence of each phase that demands an exhaustive list of upfront requirements, a complete data model design followed by codifying all hard and soft business rules into ETL processes. The visualization layer is sequentially built and presented to end users for sign off – months or even years from the original start date.  

Requirements collecting, Data model, data processing, Data visualizations

Fairly often we also see teams adopt a “reduced scope” version of waterfall that aims to break large BI initiatives into smaller projects. While this helps to reduce overall complexity, this approach, when applied to BI, is still quite risky because of two primary concerns:  

  • the business requirements are now changing faster than the ability to deliver;  
  • and budget holders are unwilling to spend into long-term projects with no materialized short-term results.  

The above reasons are why we’ve seen a shift in project methodologies from waterfall into the iterative nimble approach of agile - which recognizes and provides some answers to these issues.  

Within the data analytics domain, agile alone does not address the significant challenges we encounter at the more detailed levels of Data Warehouse or BI projects. These include:  

  • iterating over data modeling  
  • minimizing refactoring  
  • designing of ETL or ELT routines that enable rapid response to changes in business logic or new additions of data  
  • an approach to gathering business requirements that will closely tie to the input required for design decisions  

In response to these challenges, Daniel Linstedt , author of Building Scalable Data Warehouse with Data Vault 2.0 , defines a methodology that focuses on getting the most out of agile practices with other proven disciplines and techniques to deliver what seems to be the most iterative approach to BI yet.  

Introducing Data Vault

Contrary to popular belief, Data Vault (DV) is not just a modeling technique, it’s an entire methodology for data warehouse projects. It binds together aspects of agile, BEAM requirements gathering, CMMI, TQM, Six Sigma and Data Vault Modelling to define an approach targeted at improving both the speed and quality of BI projects. I refer to it as the “guided missile approach” since it promotes both adaptation and accuracy.  

DV also encompasses agile methods on Data Warehouse project estimation and agile task sizing to determine the traditionally overlooked complexity or work effort involved across the common Data Warehouse components. At the lower levels, it also presents a very concise and iterative approach to tackling common technical deliverables (within the BI world) with new or changing feature requests. These include thought-out, repeatable, step-by-step and agile based processes to accomplish frequent tasks.  

These tasks include (but are not limited to) adding data attributes, slices, new sources, augmented sources, historical tracking, deprecating sources and source structure changes at both the ETL and Modelling phases.  

The DV model, in a nutshell, is a layer that exists between regular dimensional modeling (OLAP, Star Schema) and Staging that provides scaling with growing business requirements and serves to break down complexities of both the modeling and ETL. It’s composed of hubs (business entities), links (relationships) and satellites (descriptive attributes) which are modeled somewhere between the 3NF and star schema. The model is positioned inside the data integration layer of the Data Warehouse, commonly referred to as the Raw Data Vault, and is effectively used in combination with Kimball’s model.  

DataVault Star Chart process

Tip: If you are interested in understanding the model and its underlining rules, I suggest grabbing a copy of Dan’s book mentioned above.  

Data Vault 2.0 Benefits

Here is an overview of some key benefits from the Data Vault 2.0 Approach:  

  • It assumes the worst-case scenario for data modeling relationships. N:M relationships between business objects to eliminate the need for updates if a 1:M turns into an M:M. Thereby requiring virtually no additional work within Data Vault when the degree of relationship changes.  
  • It is designed for historical tracking all aspects of data - relationships and attributes as well as where the data is being sourced from over time. Satellites, which are similar to dimensions, operate similarly to SCD Type 2.  
  • Puts forth a set of design principles & structures for increasing historical tracking performance within the Vault (PiT and Bridge). The Data Vault model is flexible enough to adopt these structures at any point in time within the iterative modeling process and does not require advanced planning.  
  • Designed to logically separate spaces containing raw vs. altered data. Raw data vault is the basis for data that is auditable to source systems and the business vault provides a place for power users who need access to data one step down from the information mart.  
  • Separates out soft and hard business rules into different parts of the data integration. This enforces reusability of data across multiple end uses. For example, raw data is only sourced once within the Data Vault (less re-integrating into staging) and can be fed multiple times to downstream needs.  
  • For each agile iteration, the Data Vault model, which stores all the historical tracking of data, is easily extensible without having to worry about losing historical data. Also, historical tracking is stored independently from the dimensional model.  
  • Data Vault 2.0 advocates hash key implementation of business keys to reduce lookups and therefore increase loading parallelization. This results in less sequential loading dependencies.  
  • The Raw Data Vault is designed to be completely auditable.  
  • As a whole, the processing involved with going from Staging to Star Schema & OLAP is made much more smoothly & iterative with Data Vault.  
  • It provides a very thought out approach to combining data with multiple different business keys from heterogeneous data sources (a common problem with integrating data within the warehouse across multiple source systems). Business keys are not always 1:1 or in the same format.  
  • The “just in time” modeling mentality is a good match with the agile approach.  

The Drawbacks

While there are many advantages to Data Vault, it also has its shortcomings, such as:  

  • Data Vault is essentially a layer between the information mart / star schema and staging. There is some additional overhead that comes with developing this layer both in terms of ETL development and modeling. If the project is on a small scale or the project’s life is short-lived, it may not be worth pursuing a Data Vault model.  
  • One of the main driving factors behind using Data Vault is for both audit and historical tracking purposes. If none of these are important to you or your organization, it can be difficult to eat the overhead required to introduce another layer into your modeling. However, speaking from long-term requirements, it may be a worthwhile investment upfront.  
  • Data Vault represents a decomposed approach to relationships, business keys and attributes and therefore the number of tables being created is high when compared to denormalized structures such as star schema. However, consider that Data Vault compliments star schema so this comparison is for contrasting purposes only. For this reason, many joins are required to view data within the DV.  
  • At the time of writing this – DV resources are limited. Complex projects using DV 2.0 are not widespread information.  
  • The modeling approach, in general, can be very unconventional for those who have been operating under Kimball and (less-so) Inmon’s models.  

Should You Pursue Data Vault?

The answer depends on a few variables.  

We see the Data Vault modeling as a very viable approach to meet the needs of Data Warehouse projects, where both historical tracking and auditability are two important factors.  

Additionally, if relationships across business entities are constantly evolving in your data (example 1:M to M:M), Data Vault simplifies the capture of those relationships and lets you focus more on delivering real value.  

If your organisation plans on storing PII data within the warehouse and is subject to GDPR, HIPPA or other regulations, Data Vault will help with data audits and traceability.  

It’ll be important to take both the benefits and drawbacks listed above to help choose whether a Data Vault approach is advantageous for your use case.  

GOT A PROJECT?

Contact Us

A data vault, warehouse, lake, and hub explained

A data vault, warehouse, lake, and hub explained

Let’s cut right to the chase: you're reading this blog because you want expertise on storing data for your current data-driven business needs . Well, you've come to the right place! 

Throughout this post, we'll look at the definitions and sample use cases for data vaults, warehouses, lakes and hubs. The differences between them are subtle, but they all serve a different purpose in the data world today. 

Download the article as a pdf

Share it with colleagues. Print it as a booklet. Read it on the plane.

What is a Data Vault?

Data vault definition.

A data vault is a system made up of a model, methodology and architecture that is specifically designed to solve a complete business problem as requirements change. So, as your business requirements morph over time, the data vault will maintain the historical system of reference or archive of your data and easily relate it to the new standard of data that you have defined. I like to think of the data vault as a customized, dynamic solution that gives business users access to all data (current and historical).

Data Vault Use Case

The biggest data vault use case is when a business, such a bank, needs to audit their data.

Let’s say you decide you need to update your security model to include additional fields and new applications in your enterprise. Using a data vault, you are able to checkpoint the time you made the security model changes and update your infrastructure with the changes, including all associated applications. This means the business team continues receiving the full view of historical and current information regarding the audit trail.

Data Warehouses, lakes, hubs, and vaults explained

What is a Data Warehouse?

Data warehouse definition.

A data warehouse is a consolidated, structured repository for storing data assets. Data warehouses will store data in one of two ways: Star Schema or 3NF, but these are only fundamental principles in how you store your data model. We have seen, advised, and implemented both principles, but the one major flaw is that everything must be strictly defined (both in schema and integration).  

Your Guide to Enterprise Data Architecture   Data warehouses, lakes, vaults and more - explore the pros and cons of different options and learn when to use each one

Data Warehouse Use Case

The most common use case for creating and using a data warehouse is to consolidate data and answer a business-related question. This question may be, 'How many users are visiting my product pages from North America?' This ties together the information you're receiving from your end users with a business question that needs to be answered from a structured data set. This is what most would identify as the cookie cutter business intelligence solution.

Read more: Data Warehousing with CloverDX

But, there is an alternative approach that is becoming more popular, especially when you are talking about cloud and more powerful warehouses.

Organizations are adopting the ELT approach. This entails “staging” their data in their warehouse (such as HP Vertica), and then letting the power of the database perform the traditional transformation. Essentially, you are performing the most expensive operations with a system where you have more resources.

Data Warehouses, lakes, hubs, and vaults explained

How to get your data to your cloud data warehouse? There are several options. This clip is from our webinar on Data Ingestion into S3, Azure Blob, Redshift, Snowflake: What Are Your Options?

What is a Data Lake?

Data lake definition.

A data lake is a term that represents a methodology of storing raw data in a single repository. The type of data that’s stored in the lake does not matter and could be unstructured, structured, semi-structured, or binary. The fundamental idea for a data lake is to make available any/all data from applications so your data team can provide insights on a business problem or value proposition.

But the challenge begins when you want to try to make sense of your data. If you are dumping data into a data lake, how do you know what data you need and what data you don’t need? How do you determine where the data resides in the lake? This very quickly can become a data swamp if not managed correctly.

Data Lake Use Case

The use cases we see for creating a data lake revolve around reporting, visualization, analytics, and machine learning .

Learn more about data lakes in our guide to enterprise data architecture

Here is the architecture we see evolving:

Data Warehouses, lakes, hubs, and vaults explained

What is a Data Hub?

Data hub definition.

A data hub is a centralized system where data is stored, defined, and served from. We like to think of it as a hybrid of a data lake and a database warehouse, as it provides a central repository for your applications to dump data. It also adds a level of harmonization at ingest so the data is indexed and can easily be queried.

Please note that this is not the same as a data warehouse architecture, as the ETL processing is merely for indexing the data you have rather than mapping it into a strict structure. The challenge comes when you have to implement the data hub and how can you harmonize all of your siloed data sources.

Data Hub Use Case

In general, we see the same use cases for a data hub as we would for a data lake:   reporting, visualization, analytics, and machine learning.

Data Warehouses, lakes, hubs, and vaults explained

Hopefully, you have learned a little bit about each of these data models, as well as their individual values in dealing with multi-structured data. 

At the end of the day, there is not one model or technology that's superior to the other. It varies for each use case.

This means that you must analyze your requirements, needs, and budget before deciding which approach to use. Technology is constantly evolving , and each of these models will evolve with it.

Discover more: Data Architecture

New call-to-action

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

buying data integration software

Dos and don'ts when buying a data integration platform

Data architecture health check - do you have these symptoms?

Data architecture health check: Do you have these symptoms?

What is modern enterprise data architecture?

What is modern enterprise data architecture?

Data Vault Use Cases Beyond Classical Reporting – Part 1

Data Vault Use Cases Beyond Classical Reporting – Part 1

To put it simply, an Enterprise Data Warehouse (EDW) collects data from your company’s internal as well as external data sources, to be used for simple reporting and dashboarding purposes. Often, some analytical transformations are applied to that data as to create the reports and dashboards in a way that is both more useful and valuable. That said, there exist additional valuable use cases which are often missed by organizations when building a data warehouse. The truth being, EDWs can access untapped potential beyond simply reporting statistics of the past. To enable these opportunities, Data Vault brings a high grade of flexibility and scalability to make this possible in an agile manner.

Data Vault Use Cases

To begin, the data warehouse is often used to collect data as well as preprocess the information for reporting and dashboarding purposes only. When only utilizing this single aspect of an EDW, users are missing opportunities to take advantage of their data by limiting the EDW to such basic use cases.

A whole variety of use cases can be realized by using the data warehouse, e.g. to optimize and automate operational processes, predict the future, push data back to operational systems as a new input or to trigger events outside the data warehouse, to simply explore but a few new opportunities available.

Data Cleansing (within an operational system)

In Data Vault, we differ between raw and business data. Thus raw data is stored within the Raw Data Vault and similarly business data within the Business Vault. Though, within Data Vault 2.0, the Raw Data Vault is used to store the good, bad, and ugly data as it is delivered from the source system. On the other side, the Business Vault can create any truth, for example calculating a KPI such as profit, according to a business rule defined by the information subscriber.

For reporting and dashboarding purposes, data cleansing rules are typically applied to make the data more useful for the task and therefore, in turn, process the raw data into useful information. Though, these business rules for data cleansing can also be used to write the cleansed data back into the operational system. In the best-case scenario, the business rules are applied by using virtualized tables and views within the Business Vault. This cleansed data then can be pushed back into the operational system to implement the concept of Total Quality Management, or TQM, where in which errors would be fixed at the root cause which is often within the source system itself.

Thus, using the EDW for data cleansing can have several advantages. In the case of data cleansing tools for example, it is not always possible to perform complex scripts. As most tools have predefined lists of countries, etc. to clean some selected attributes. Also, most tools are created for the purpose of cleansing data from one operational source system, ignoring inconsistencies between multiple operational systems.

From the Data Vault perspective, data cleansing rules are ordinary business rules. That means, they are implemented using business satellites, often with the help of reference tables. The following figure shows an example for data cleansing using a Data Vault 2.0 architecture, as it is utilized internally at Scalefree.

The Scalefree EDW is the central library of data cleansing rules which can be used in multiple systems such as those of the EDW and operational systems. The shown data cleansing process is used among others to cleanse customer records and standardize phone numbers as well as the accompanying addresses. Besides the Information Marts , there is also the Interface Mart ‘Sales Interface’, which implements the API of the sales source system and applies data cleansing rules from the business data vault. A scheduled interface script loads the data from the interface mart into the source systems API. In this particular case, the script is written in python.

A critical aspect of this process is the proper documentation of data cleansing rules. An internal knowledge platform is used to store the documentation of every single data cleansing rule. This way, every employee accessing the documentation understands what data cleansing rule is applicable towards the operational data. This can also have value for business users, as they can then understand as to why their data was corrected overnight.

With the flexibility of Data Vault, organizations can apply new capabilities, which go beyond just the standard reporting and dashboarding. Thus, the data warehouse can be used to apply data cleansing within the operational systems by following centralized cleansing rule standards.

If you would like to learn more about Data Vault Use Cases and the latest technologies from the market, a good opportunity to do such is found within the World Wide Data Vault Consortium (WWDVC). Here you will be able to interact with the most experienced experts within the field. This year, from the 9th to the 13th of September, the conference will be held in Hanover, Germany for the first time.

There, Ivan Schotsmans will talk about Information Quality in the Data Warehouse. In which, he will present how to handle current and future challenges regarding Data Warehouse architecture, how to move to an agile approach in the manner of implementing a new data warehouse, and how to increase business involvement. To not miss out on the opportunity, register today as to not miss his and other interesting presentations that will be held by Wherescape, Vaultspeed and many other vendors and speakers!

Get Updates and Support

Please send inquiries and feature requests to [email protected] . 

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com .

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models.  The stencil is available at www.visualdatavault.com .

Scalefree

Subscribe to our free monthly newsletter

Used by 8k+ people to unlock the power of their data.

You May Also Like

Quick guide of a data vault 2.0 implementation, about information marts in data vault 2.0, using multi-active satellites the correct way – part 1, join the discussion 2 comments.

interesting approach, thanks for the post. I have a question on where those data cleansing rules are being applied… 1) Between the source systems and staging systems –> I assume not as we want the staging area to be a loading-only to offload the source system asap. No? What are the ‘hard rules’ between the source and staging? 2) the data cleansing between the interface marts and source systems seems to be outside the vault? Does this mean you do not keep the ‘cleaned’ up data in the vault directly? (but it would automatically come in after a next load from source)

Thank you for your comment!

Under the term “hard business rules” we classify technical rules that enforces correct data format while loading data from source systems into the staging layer. For instance, if an attribute provides information about points in time, then it should be of data type TIMESTAMP or equivalent. However source systems sometimes don’t deliver the correct type. In this case, the attribute should be remapped to the correct data type. Additionally, hard business rules can be normalization rules that handle complex data structures, e.g. to flatten nested JSON objects.

About your question regarding data cleansing: in our example, the data cleansing operation takes place both inside and outside the Data Vault. To clarify the idea behind this operation, the process starts with sourcing data from the Raw Vault, then soft rules that correct phone number and address formats would be applied onto the raw data and the results are written in Business Vault structures. The Interface Mart only selects cleansed records from the Business Vault, which haven’t yet been written back into the source operational system. And from there, an external script loads the data from the Interface Mart into the source system’s API – to update the original records with the correct, standardized format for phone numbers and addresses.

And you are correct – this external script doesn’t load the data from the Interface Mart into the Raw Vault, since the updated records in the source system should show up in the next staging operation and will be picked up into the Raw Vault then.

I hope that answers your questions.

Thank you kindly, Trung Ta

Leave a Reply Cancel Reply

Save my name, email, and website in this browser for the next time I comment.

  • Data Services
  • Salesforce Services
  • DevOps Services
  • Case Studies
  • Data Vault 2.0 Training
  • Data Vault 2.0 Automation
  • Agile Data Warehousing
  • Customized Training
  • Upcoming Events
  • Past Events

KNOWLEDGE HUB

  • Expert Sessions
  • Data Vault Solutions
  • Your Career
  • Scalefree Academy
  • Job Openings

© 2024 Scalefree International GmbH Imprint    |    Privacy Policy    |    Terms and Conditions

  • Project Review
  • Proof of Concept
  • Project Consulting
  • Data Vault 2.0
  • Data Warehouse Automation
  • Enterprise Data Warehouse
  • Cloud Computing
  • Managed Self-Service BI
  • Data Science
  • Data Analytics
  • Enterprise Data Strategy
  • DataVault4dbt
  • TurboVault4dbt
  • DataVault4coalesce
  • Our Partners
  • Boot Camp Class with certification (CDVP2)
  • Introduction Class
  • Multi-Temporal Class
  • Information Delivery Class
  • Modeling Class
  • Wherescape Workshop
  • dbt Workshop
  • Data Dreamland September 2024
  • Data Dreamland
  • DVIC Summit London
  • DVIC Summit München
  • WWDVC Europe
  • Data Warehouse
  • Data Architecture
  • Data Modeling
  • Intermediate
  • Data Vault Friday
  • Insight Agile Projects
  • Visual Data Vault
  • Data Innovators Exchange
  • Local Impact
  • Press Releases

Schedule Your Free Initial Consultation

Subscribe to Our Monthly Newsletter

You are currently viewing a placeholder content from Calendly . To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

You are currently viewing a placeholder content from Google Maps . To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

Respond to Survey

Unlock the Data Vault: A Case Study

If you haven’t read my past few articles covering the purpose, process, and benefits of unlocking your data vault, it’s worth getting caught up before continuing:

(Part 1) Unlock the Data Vault: Getting the Most of What You Already Have (Part 2) Unlock the Data Vault: Establishing Your Hypothesis (Part 3) Unlock the Data Vault: More Than Stats and Spreadsheets

As mentioned in my previous articles, the main objective of unlocking your data vault is leveraging the insights you already have to establish a solid foundation moving forward. While there is A LOT of standalone value gained through this process, our sights are set on bigger things. So, unlocking the vault is meant to be the first step in a broader brand development and positioning strategy.

For me, it’s always helpful to see examples of how concepts translate into real-world activity. The following case study is based on a real client’s experience, but I have created a fictitious brand name to keep focused on the process rather than the brand itself.

Riale, an up-and-coming retail company, was growing faster than expected before 2020 brought things to a grinding halt. Rather than pause and wait for the dust to settle, Riale decided to be proactive by treating the period of uncertainty as an opportunity to strategize for when their industry reopens.

Looking for strategic and tactical support, Riale approached us with ideas for a customer persona development project. Their objective was to learn as much as they could about their customers and prospects to ensure their brand positioning strategy maintains alignment with the market’s changing demands and expectations. Of course, we were excited for the opportunity and introduced the MacKenzie way of approaching their goals. The first step was to “Unlock the Data Vault,” and while the breadth of topics covered may seem like this is a huge undertaking, it’s a streamlined process that is as efficient as it is effective.

Objective #1: Tell us about your company

Process: Our own investigating provided some high-level details about Riale, but it was important to hear directly from their team. Using MacKenzie’s guided discussion worksheet we interviewed leadership, staff, and key stakeholders to gain a deeper understanding of who this brand is and – just as importantly – who they want to be. In addition to these high-level organizational attributes, we dug into details regarding the project itself. We ask each team member to share their vision of a successful project, what they hope to accomplish, and how they anticipate the resulting insights will be applied.

Outcome: Through our discussions, we identified a few specific brand attributes that makes Riale unique. These attributes were not apparent after our initial investigation, so hearing from their team directly shifted our perspective regarding the company’s present situation. Having gained this new perspective, we started building a project strategy that was more aligned with Riale’s stated project goals. Had we not had these discussions, our understanding of who this brand is might have remained slightly off; which would have detracted from the overall impact of the ensuing project.

Furthermore, we identified a few inconsistencies in terms of Riale’s definition of a successful project. Each interview uncovered varying project goals and objectives depending on the team members’ role or position within the company. This was a pivotal moment in the process because we were able to regroup and clarify the project’s overall purpose and the team’s collective vision before moving forward.

Objective #2: Tell us about your customers

Process: As part of the stakeholder interviews mentioned in Objective #1, we explore customer-centric topics. Primarily, we want to know who the customers are and how Riale sees itself delivering value to these customers.

Outcome: At the surface level, Riale was confident they had strong customer profiles and a firm understanding of the customers’ decision drivers. Throughout our discussions, it became apparent that the existing customer profiles were not as clear and detailed as originally thought. This was another pivotal moment in the project strategy process because it identified insights gaps and a few missing pieces that needed to be addressed. Fortunately, these missing pieces were available within a separate information source; it simply required breaking down siloes and organizing data in a way that allowed easier access.

Perhaps the biggest realization within these conversations was the limited factual support backing the existing customer profiles. In other words, Riale’s current understanding of their customers was largely based on theories and hypotheses rather than verifiable facts. Not much market research had been conducted up to this point which made it near impossible to validate customer profile accuracy and reliability. It is surprisingly common for this to be the case; especially for brands with a customer-centric operational approach. So much attention and energy are spent focused on the customer that ideas and opinions start taking the place of data and facts. Shining a spotlight on this reality enabled Riale to loosen its grip on current presumptions and allowed existing customer theories to be challenged through ensuing market research and customer feedback.

Objective #3: Introduce your top competitors

Process: Listing direct and indirect competitors sets the scene for a detailed discussion regarding brand positioning and market share.

Outcome: Receiving the list of Riale’s self-described competitors gave us a better understanding of how it sees itself within the competitive marketplace. Just as with the customer profiles, this list ended up being largely based on theories and hypotheses. Riale’s perceived competitors were based on industry connection or target audience overlap, but there was no hard evidence to prove Riale’s customers are considering any brands on the competitor list as viable alternatives.

Additionally, the list of indirect competitors was short. This is not a bad thing; rather, it highlights an opportunity to think outside the immediate circle of influence to consider how else customers might be receiving the value offered by Riale. In doing so, a new list of brand differentiators and questions about customer decision drivers began to emerge. This once again shifted the focus and purpose of the current project based on insights that would have otherwise remained hidden under the surface.

Objective #4: Paint the picture of your market today

Process: Brands often get hyper-focused on their own goals and objectives to the point where external variables are overlooked. Without a clear picture of the environment as it exists today, roadblocks and pitfalls will not be noticed until it is too late. Together, we map the surrounding environment in as much detail as possible based on the information currently held, including market variables, emerging trends, potential threats, and areas of opportunity.

Outcome: Having explored the internal elements of Riale, taking a step back to examine the surrounding environment adds a fresh layer of context. Not only do we become more informed and educated, but we also inspire Riale to think critically about external factors that may have been flying under the radar.

One such external factor was an emerging consumer behavior trend being written about by industry publications. While Riale itself had not seen or felt the impact of this trend, it was occasionally discussed during team strategy meetings. Through further discussion it was decided that this specific trend should be included as part of the project’s focus because it could have significant impact on the future of the company. This topic was not part of the original project scope, but its inclusion would require a shift in the overall project’s structure and schedule. Again, without conducting this process and discussion there would have been a missed opportunity to maximize the impact of ensuing efforts.

Recap: Unlock the Data Vault

This process was as valuable to Riale as a brand as it was to MacKenzie as a project partner. Both sides gained additional insights and understandings that shaped and influenced the collective next steps.

It was determined that some light analysis of Riale’s existing data inventory would eliminate the need for several facets of the original project request: thereby strengthening impact and increasing ROI. The team clarified its overall definition of project success and refined the project’s scope in ways that would address a wider range of organizational objectives. Having gained a new perspective on external market variables and primary competitors, Riale began revisiting its brand positioning strategy in ways that would offer short-term and long-term benefits.

Having unlocked the data vault, Riale was better prepared to launch a project focused on producing actionable insights and had refined its organizational alignment in ways that could not have been foreseen.

What is there to gain by unlocking your data vault? Let’s get started and find out!

data vault case study

Jenny Dinnen is President of Sales and Marketing at MacKenzie Corporation. Driven to maximize customer's value and exceed expectations, Jenny carries a can-do attitude wherever she goes. She maintains open communication channels with both her clients and her staff to ensure all goals and objectives are being met in an expeditious manner. Jenny is a big-picture thinker who leads MacKenzie in developing strategies for growth while maintaining a focus on the core services that have made the company a success. Basically, when something needs to get done, go see Jenny. Before joining MacKenzie, Jenny worked at HD Supply as a Marketing Manager and Household Auto Finance in their marketing department. Jenny received her undergrad degree in Marketing from the University of Colorado (Boulder) and her MBA from the University of Redlands.

Learn how we can team-up on a custom project built just for you.

Learn how our templated projects will accelerate your success timeline.

Learn how our 35+ years of experience can be put to work for you!

A step-by-step plan for your next customer insights project.

A cross-generational study of how consumer behaviors influence optimism.

A list of survey topics every brand should be considering right now.

Our multi-phase approach to strengthening your brand's position.

Facts and insights to help you stay connected with young consumers.

Jumpstart Survey Guide

An outline of 6 key survey topics every brand should be considering right now.

  • Comments This field is for validation purposes and should be left unchanged.

This worksheet covers 6 key areas to consider when planning your next project.

Start organizing your existing data files with this inventory checklist.

Customers that leave a brand for a competitor usually have specific reasons for doing so; whether tangible or intangible. In the case of a lost customer, it’s extremely valuable to understand WHY and WHAT (if anything) would bring that customer back. Sometimes nothing can be done and the customer is simply lost; but other times a simple adjustment can make a big difference. It might feel awkward at first, but reaching out to lost customers is actually beneficial in more than a few ways. For starters, you show customers that you care about them and that you still value their opinion. This is a strong gesture and it’s a way to passively reengage consumers who might consider returning.

Digital platforms, both social and shopping, have forever changed the consumer experience. At the same time, in-person experiences are still relevant and need to be aligned with the online experiences to provide a seamless, fluid flow between the real and digital worlds. Furthermore, consumers want personalized experiences that adjust as quickly as their preferences and favorite trends. This can be an overwhelming ordeal, or it can be as straightforward as direct and open two-way communication with your customers. This is the power of an ongoing Voice of Customer satisfaction program. By understand what your customers want, listening to how they feel your brand is performing, and being open to improvement suggestions, you will have all the data-driven insights you need win customer loyalty.

In addition to marketing reach and visibility, it’s important to consider the impact of specific brand messages. If the goal is to shape or reinforce brand image, customer feedback is the only way to measure marketing effectiveness. If the goal is to motivate action, such as visiting a website or purchasing a particular item, then sales tracking is one way to measure marketing effectiveness. However, transaction data doesn’t address purchase decision drivers so it’s difficult to attribute a sale to a particular marketing campaign. Again, direct customer feedback is required to identify their decision drivers.

For up-and-coming brands, or even established brands looking to expand their products/services, it’s important to answer the question, “Are consumers aware of our brand?” This is the first step in developing a strong, long-term marketing strategy and will provide a reliable benchmark for future reference. By segmenting awareness and perception data, you’ll have a clear indication of your current standing within specific markets and customer demographics.

It’s easy to become narrow-sighted and overly focused on one objective to the point where surrounding activity is unnoticed; especially with the rapid progression of technology and emerging consumer trends. The most successful brands will occasionally take a step back for a high-level, impartial look at the overall market landscape for a refreshed perspective on how their business fits into the bigger picture. Detailed Market Mapping not only acts as a preventative measure by exposing threats and potential danger, it will also uncover secondary variables that may be flying under the radar. This dual-benefit will swing the competitive balance in your favor by equipping your organization with the insights needed to make confident decisions; both short-term and long-term.

There’s no way to predict the future with absolute certainty but there are ways of applying market insights to improve success probabilities. Rather than conducting a series of trial-and-error initiatives, conducting research prior to forward-thinking strategic development will support informed decision making today and establish benchmarks for comparative analysis in the future. Just like having a map on a road trip, Market Mapping defines the surrounding area so decision makers can see where they’re going, stay on track and safely reach their destination. A clearly illustrated framework of your competitive environment enables your team to run through possible scenarios and how the market might react under specific circumstances.

Consumers are complex and unique individuals whose purchase decisions are influenced by a broader spectrum of attributes than the traditional price, product, placement, and positioning. Of course these are still relevant, but modern consumers are looking for more. Factors that may not seem to align at first glance are now connected (i.e. Health Concerns and Retail). The Market Mapping process will not only uncover the pieces it will act as a guide to reveal how the pieces fit together. Brands that understand their surroundings and create action plans based on market insights are best equipped to take proactive rather than reactive measures.

Understanding the past is a great place to start when strategically planning for the future. Consider market conditions and consumer preferences as they exist today, then examine the historical timeline of economic, social, political and technological shifts that have impacted your industry. By highlighting the environmental factors that have shaped consumer behavior, your brand will be equipped with consumer insights explaining the why, how and when of significant market shifts.

Imagine each existing dataset is a brick. If the bricks are haphazardly tossed in a pile then standing on that pile won’t be easy, But if the bricks are organized and positioned in a way that they work together, then there’s a foundation to not only stand on but to build on. Treat your existing data inventory as building materials rather than the byproducts of past efforts. No matter when or why data was collected, it still holds considerable value. After completing the Data Inventory Assessment, ensuing efforts will continue building on a solid foundation rather than merely adding more bricks to the pile.

Every company has its missing puzzle pieces, but not every company is willing to do something about it. Actively searching to identify the gaps in what you know about your customers and your market is an important part of strategic brand development. There’s also the benefit of ensuring that any steps taken are covering new ground rather than repeating what’s already been done. Pursuing the missing pieces will build upon the bigger picture and add cohesive value, but the only way to pursue the missing pieces is by knowing with certainty which pieces are missing.

By organizing and analyzing the existing customer data, brands will gain a clear picture what is quantifiably known at that point in time. This process will also itemize the types of data being held and it will establish benchmarks for future comparative analytics. Identifying the metrics that are most beneficial to current goals and objectives will produce the actionable insights needed to guide decision making.

Market research and data analytics are commonly applied on a “need-to-know” basis; meaning individual projects or initiatives are launched to address a specific set of questions for short-term application. After serving their intended purpose, these data files usually go into the vault and are rarely (if ever) seen again. But our Data Inventory Assessment digs into this data vault and offers a second life to the dormant files within.

data vault case study

Data Vault Modeling with ER/Studio Data Architect

by Sultan Shiffa Apr 18, 2018

I have been asked multiple times if ER/Studio supports Data Vault Modeling. The answer is yes, ER/Studio does support multiple approaches of data modeling including Data Vault. This article will introduce some key facts behind the Data Vault Architecture and explain why it has become so prominent in the Enterprise Data Warehouse world today.

Data vault architecture

Organizational data management and architectural needs have changed massively in recent decades. Today’s enterprise data solutions must be agile, flexible to rapid changes, scalable, reliable, and cheaper than ever.

Dan Linstedt, creator of the Data Vault method, gives the following formal definition (Source: Wikipedia ) for the Data Vault methodology:

“The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.”

Business changes drive most of the data modeling projects in an enterprise. These changes have a huge impact on the downstream related integrated systems and are associated with large development, test and implementation costs.

The Data Vault modeling approach has been introduced to address agility, flexibility, compliance, auditing and scalability issues that exist in traditional approaches for Data Warehouse data modeling according to Kimball and Inmon and to reduce large change-related costs.

Data Vault differentiates three core types of entities and is based on the concept of the business process decoupling business keys from description and context.

  • Hub – representing a list of unique business keys
  • Link – describes a unique list of relationships/interactions between business keys
  • Satellites – contain descriptions and the contexts of the business keys or links

ER/Studio objects for data modeling

ER/Studio provides a number of components like business data objects, shapes, text blocks, and data dictionary elements to represent Data Vault patterns.

Diagram 1 below shows a Data Vault model with Business Data Objects and text blocks. ER/Studio’s Business Data Objects (BDOs) are containers for describing a business concept. Business Data Objects allow you to combine entities into groups that can be used to describe and graphically represent entities/tables that share a common characteristic. Different text fonts and background colors can be used to visualize Hubs, Satellites and Links.

Additional descriptions and notes can be added to the BDO container to document the grouping purposes. Notes are included as part of the HTML report when users generate a report for the model. You can format the notes using standard HTML tags. You can collapse a BDO containment frame by clicking the  minus  sign in the top right hand corner of the BDO frame to show or hide the entities in the respective hub, link or satellite group. ER/Studio attachments can be used to bind an external piece of information like core architecture of the data warehouse to the BDO and to document the enterprise data warehouse architecture.

Diagram 1: BDO representing containers of Data Vault related objects

Diagram 2 below shows the explorer view of ER/Studio showing where the BDOs and shapes are created. These items can be easily found in the navigation tree within the models in which they are created.

Diagram 2: Explorer view of objects

Diagram 3 below depicts a functional view of a Data Vault model where Hubs and Links and their respective satellites are functionally grouped to show the different domains or functional areas. This has been achieved by using different shapes, text blocks, and coloring.

Diagram 3: Functional views of entities

ER/Studio data dictionaries and macros

Additionally, ER/Studio has local and enterprise data dictionary (shown in Diagram 4) components where domains can be defined as reusable attributes for common Data Vault attributes/columns like Surrogate_ID, Business_Key, Record_Source, Date_Time_Stamp. These attributes can be defined once and then dragged and dropped in an entity or a table.

Diagram 4: Data dictionary objects related to Data Vault

Last but not least, ER/Studio provides an automation interface and gets shipped with macros which allows you to create Hub, Link and Satellite template entities with predefined Data Vault attributes and use those template objects while creating recurring objects.

Diagram 5: Macros to create reusable entities

New Atlan Named a Leader in The Forrester Wave™: Enterprise Data Catalogs, Q3 2024. Read Full Report Learn More

The Forrester Wave™: Enterprise Data Catalogs for DataOps, Q2 2022

altText

  • Document hundreds of tables on autopilot
  • Explore data with natural language
  • Ask any question about your data stack to your personal AI copilot.

Start integrating with Atlan on the go

altText

The role of active metadata in the modern data stack

altText

A deep dive into the 10 data trends you should know

altText

May 24, 2022

altText

May 10, 2023

altText

Feb 02, 2022

altText

Join over 5k data leaders from companies like Amazon, Apple, and Spotify who subscribe to our weekly newsletter.

altText

Best practices for building a collaborative data culture

Data Mesh vs Data Vault: Key Differences, Practical Examples, Use Cases & What Suits Your Business

header image

Share this article

Data mesh is a new architectural paradigm that treats data as a product. On the other hand, a data vault is a specific type of data modeling methodology for designing agile, scalable data warehouses .

The Ultimate Guide to Data Mesh - Learn all about scoping, planning, and building a data mesh 👉 Download now

In this blog, we will learn how both data mesh and data vaults are two very different concepts, serving different purposes within the data architecture.

Table of contents #

  • Data mesh vs data vault: Navigating through the data maze
  • Steps to consider before implementing data mesh or data vault
  • Data mesh vs data vault: Practical examples and use cases
  • Navigating the depths: Unraveling data mesh and data vault
  • Comparing data mesh and data vault: Unveiling the contrasts in a comparative table
  • Bringing it all together
  • Data mesh vs data vault: Related reads

Data mesh vs data vault: Navigating through the data maze #

Let us begin with a brief comparison:

What is a data mesh? #

Data mesh is a decentralization of data domains where each domain is treated as a full product, owned by a cross-functional team that has data product owners, engineers, and analysts. It was created to address the complexities that arise from scaling data, especially in large organizations.

The core principles of data mesh include:

  • Domain-oriented decentralized data ownership and architecture
  • Data as a product
  • Self-serve data infrastructure as a platform
  • Federated computational governance

The main goal of data mesh is to democratize data and allow for better data governance and compliance . It is especially applicable to organizations that have massive amounts of data spread across multiple domains or teams, and where the monolithic, centralized data lake or warehouse has become a bottleneck.

What is a data vault? #

Data vault is based on the principles of flexibility, scalability, and adaptability, which makes it particularly well-suited for dealing with large, complex data environments, or rapidly changing systems.

Data vault consists of three types of tables: Hubs, Links, and Satellites.

  • Hubs represent business keys or unique list of values
  • Links represent the associations or relationships between the Hubs
  • Satellites hold descriptive data or context about the Hubs or Links

Steps to consider before implementing data mesh or data vault #

To understand the differences and how they can be applied to your business, here are a few steps to consider:

  • Assess your current state
  • Define your goals
  • Understand the concepts
  • Consult with experts
  • Run a pilot

Now, let us look into each of the above steps in brief:

Start by taking a deep dive into your current architecture, understanding where the data is coming from, who is using it, and how it is being used.

Identify your data goals. This could include things like speeding up the delivery of insights, reducing costs, ensuring data security and privacy, or scaling your data infrastructure.

Dive deep into both Data Mesh and Data Vault concepts. Research online resources, read case studies, and consult with peers in the industry.

Engage with experts who have implemented both Data Mesh and Data Vault. You can also attend conferences or webinars on these topics.

If feasible, consider running a pilot on a small scale to understand the practical implications of implementing each approach.

Remember, these are not mutually exclusive concepts. In some cases, you might find that a combination of data mesh and data vault could serve your needs better. It’s important to ensure that the approach you choose aligns with your overall business strategy and goals.

Data mesh vs data vault: Practical examples and use cases #

Now, let us better understand the use cases for data mesh and data vault using practical examples.

Data mesh use cases #

  • Consider a large multinational bank. They have departments like retail banking, corporate banking, finance, risk management, compliance, etc.
  • Each of these departments has its own set of applications and generates massive amounts of data that needs to be processed and analyzed.
  • In the traditional centralized data architecture approach, all this data would be pulled into a central data lake or warehouse.
  • But as the data grows, managing this monolith becomes increasingly complex and unwieldy. Data may be stale by the time it’s ingested and processed.
  • Furthermore, each department might have unique data needs that the centralized model doesn’t cater to effectively.

Enter data mesh. In a data mesh architecture, each department would become a data product owner. They are responsible for their data from creation to consumption, ensuring its quality, timeliness, and relevance.

  • The retail banking department can handle its own data related to account transactions, customer profiles, loan data, etc.
  • Similarly, the risk management department handles data about credit risk, market risk, operational risk, etc.
  • Each of these domains can use the technology stack that best suits its needs. Yet, they can adhere to the company-wide data platform standards and interfaces, fostering innovation while ensuring interoperability.

Data vault use cases #

  • Now let’s consider a telecom company. Over the years, they have merged with or acquired several other companies, each with its own IT systems and data formats.
  • They need a single source of truth, but given the complexity of their data landscape, traditional data warehousing methods aren’t flexible or adaptable enough.

This is where the data vault shines. Using data vault modeling, they can create a scalable data warehouse that can easily adapt to changes.

  • Each company’s unique identifiers for customers, services, etc., can be represented as Hubs.
  • The relationships between them, perhaps a customer subscribing to a service, can be represented as Links.
  • Additional information about customers or services, such as a customer’s address or a service’s pricing, can be represented as satellites.
  • As the telecom company acquires a new company, new hubs, links, and satellites can be added to the data vault without disturbing the existing structure. If a new source system provides additional information about a customer, a new satellite can be added to the existing customer hub.

So, in summary, data mesh is a great fit for organizations dealing with large-scale, domain-diverse data that want to democratize data ownership and processing.

On the other hand, data vault is a robust solution for companies dealing with complex, evolving data landscapes that need a flexible, adaptable data warehousing solution.

Navigating the depths: Unraveling data mesh and data vault #

Now that we know the basics of data mesh and data vault, let’s go a bit deeper into these concepts:

Additional factors to keep in kind for data mesh #

1. Cultural shift

One of the biggest challenges with data mesh isn’t the technology but the cultural and organizational change. It requires a shift from centralized data teams to decentralized domain-oriented data product teams.

This means changing how people work, think about data, and even how they’re organized and rewarded.

2. Data product thinking

Data mesh requires thinking about data as a product, which means data must provide value to its consumers. This includes aspects like data quality, data freshness, data discoverability, and data security.

3. Platform teams

While data ownership is decentralized in a data mesh, a centralized team typically still exists. But instead of owning all data, they are responsible for providing the data infrastructure platform that the data product teams use.

This could include data storage, data processing, data observability, and data security tooling.

4. Cross-functional teams

Data product teams in a data mesh are typically cross-functional. They can include data engineers, data analysts, data scientists, and data product owners, allowing for full lifecycle data ownership within the team.

Additional factors to keep in kind for data vault #

1. Complexity

One of the biggest criticisms of data vault is its complexity. The model consists of many different types of tables (Hubs, Links, Satellites), and the relationships between them can become quite complex, especially in large systems.

2. Modeling

Data vault requires a specific modeling technique that can be quite different from traditional data modeling methods. It requires a good understanding of the business and its entities and their relationships.

3. Automation

Given the complexity of the data vault model, automation is often recommended for creating and maintaining the data vault. This typically involves using specific data vault modeling and ETL tools.

4. Historical tracking

One of the key strengths of data vault is its ability to keep historical data, even when the underlying source systems change. This is due to the separation of business keys (Hubs), relationships (Links), and descriptive data (Satellites).

Because of its modular design, data vault can easily adapt to changes in business requirements or underlying systems. New source systems can be added, or existing ones can be modified, without significant impact on the existing data vault.

Understanding these aspects will help you determine how best to apply data mesh and data vault in your organization, and what kind of challenges you might face in their implementation.

Comparing data mesh and data vault: Unveiling the contrasts in a comparative table #

Now, let us look at a comparative table for a high-level overview of the key differences and similarities between data mesh and data vault. Remember that each approach has its strengths and weaknesses, and the best choice depends on your business’s specific needs, existing architecture, and future plans.

 Data MeshData Vault
DefinitionAn architectural paradigm that treats data as a product, decentralizing data domains and delegating them to cross-functional teams.A specific type of data modeling methodology for designing agile, scalable data warehouses.
Use CasesWorks best for large-scale, domain-diverse organizations that aim to democratize data ownership and processing.Ideal for companies dealing with complex, evolving data landscapes who need a flexible, adaptable data warehousing solution.
OwnershipData is owned by cross-functional, domain-specific teams.Data is owned by a centralized team, although it's flexible enough to accommodate decentralized access and management.
Handling ChangesAgile and adaptable, as each data domain is handled independently, yet under company-wide standards.Agile and adaptable due to its modular design, allows for easy integration of new systems or changes.
Historical TrackingDepends on the implementation by each data domain team.Built-in historical tracking due to the separation of Hubs, Links, and Satellites.
ComplexityOrganizational and cultural complexity due to shift towards decentralization. Technical complexity can vary based on each team's implementation.Higher technical complexity due to the specific data modeling technique and relationships between different entities.
InfrastructureDecentralized data domains can choose their own infrastructure, as long as they adhere to company-wide standards and interfaces.Typically implemented in a centralized data warehouse, but the design is flexible and can be implemented on different infrastructures.
Best ForBusinesses with a diverse set of data products need a high degree of autonomy and speed.Businesses undergoing rapid changes, or those needing to integrate a variety of systems and data sources.

Bringing it all together #

Data mesh is a decentralized data architecture paradigm that treats data as a product. This means data is owned, maintained, and used by cross-functional, domain-specific teams, known as data product teams. It’s best for organizations with large, diverse sets of data where a monolithic, centralized architecture would be unwieldy or inefficient.

Data vault , on the other hand, is a specific type of data modeling methodology for creating scalable, flexible data warehouses. It’s best suited to complex, evolving data environments, where agility and adaptability are key.

In comparison, while both approaches aim to handle complex, large-scale data, they serve different purposes. Data mesh is about the organization of teams and ownership of data, whereas data vault is about the technical design of the data warehouse. The best approach depends on your organization’s specific context, needs, and goals.

Data mesh vs data vault: Related reads #

  • What is Data Mesh ? - Examples, Case Studies, and Use Cases
  • Data Mesh Principles — 4 Core Pillars & Logical Architecture
  • Data Mesh Architecture : Core Principles, Components, and Why You Need It?
  • Data Mesh Setup and Implementation - An Ultimate Guide
  • How to Implement Data Mesh from a Data Governance Perspective?
  • Snowflake Data Mesh : Step-by-Step Setup Guide
  • Data Mesh Vs. Data Lake — Differences & Use Cases For 2023
  • Data Mesh vs. Data Fabric : How do you choose the best approach for your business needs?
  • Data Education
  • Data Articles

Universal Data Vault: Case Study in Combining “Universal” Data Model Patterns with Data Vault Architecture – Part 1

There are also many reasons for considering the Data Vault (DV) architecture for your data warehouse or operational data integration initiative – auditability, scalability, resilience, and adaptability.

But if your enterprise wants the benefits of both, does their combination (as a “Universal Data Vault”) cancel out the advantages of each? A case study looks at why a business might want to try this approach, followed by a brief technical description of how this might be achieved .

Country Endeavours Pty Ltd

“ Creative solutions for Difficult Problems”

Why Would Anyone Want a “Universal Data Vault”?

Patterns at work.

That’s the theory, and I’m pleased to say it works. I smile when I remember one interview for a consulting job. The interviewer reminded me that another telecommunications company we both knew had spent years developing an enterprise logical data model (ELDM). He then threw me a challenge, saying, “John, I want you to create the same sort of deliverable, on your own, in two weeks – we’ve got three projects about to kick off, and we need an ELDM to lay the foundations for data integration across all three.”

I cut a deal. I said that if I could use the telecommunications industry data model pattern from one of Len Silverston’s data model patterns books, modified minimally based on time-boxed interviews with the company’s best and brightest, I could deliver a fit-for-purpose model that could be extended and refined over a few additional weeks. And I did.

I’ve done this sort of thing many times now. I commonly use “patterns”, but it’s the way they’re applied that varies. Sometimes an ELDM is used to shape an Enterprise Data Warehouse (EDW). Other times it is used to define the XML schema for an enterprise service bus (ESB). Or maybe mold a master data management (MDM) strategy, or define the vocabulary for a business rules engine, or provide a benchmark for evaluation of the data architecture of a commercial-off-the-shelf package…

There’s a key message here. If you use a pattern-based ELDM for any of these initiatives, the job of information integration across them is much easier, as they are founded on a common information model. This is exactly what an IT manager at another of my clients wanted. But his challenges were daunting.

Some of you will recognize the data administration headaches when one company acquires another, and the two IT departments have to merge. It can be painful. Now try to imagine the following. As of June 30 th , 2010, there were 83 autonomous, but related, organizations across Australia, each with something like 5 fundamental IT systems, and each of these systems had maybe 20 central entities. If they were all in relational databases, that might add up to something like 8,000 core tables. Then on July 1 st , under orders of our national government, they become one organization. How do you merge 8,000 tables?

They had some warning that the organizational merger was going to happen, and they did amazingly well, given the complexity, but after the dust settled, they wanted to clean up one or two (or more!) loose ends.

Firstly, I created an ELDM (pattern-based, of course) as the foundation for data integration. It’s a bit of a story in its own right, but I ran a one-day “patterns” course for a combined team of business representatives and IT specialists, and the very next day, I facilitated their creation of a framework for an ELDM. It did take a few weeks for me to flesh out the technical details, but the important outcome was business ownership of the model. It reflected their concepts, albeit expressed as specializations of generic patterns. This agreed expression of core business concepts was vital, as we will shortly see.

A Data Vault Solution Looks Promising

The second major activity was the creation of an IT strategy that considered Master Data Management, Reference Data Management, an Enterprise Service Bus and, of course, Enterprise Data Warehouse components. The major issue identified was the need to consolidate the thousands of entities of historical data, sourced from the 83 disparate organizations. Further, the resulting data store needed to not only support Business Intelligence (BI) reporting, but also operational queries. If a “client” of this new, nationalized organization had a history recorded in several of the 83 legacy organizations, one consistent “single-view” was required to support day-to-day operations. An EDW with a services layer on top was nominated as part of the solution.

The organzation had started out with a few Kimball-like dimensional databases. They worked well for a while, but then some cross-domain issues started to appear, and work commenced on an Inmon-like EDW. For the strategic extension to the EDW, I suggested a Data Vault (DV) approach. This article isn’t intended to provide an authoritative description of DV – there’s a number of great books on the topic (with a new reference book by Dan Linstedt hopefully on its way soon), and the TDAN.com website has lots of great material. It is sufficient to say that the some of the arguments for DV were very attractive. This organization needed:

Auditability: If the DV was to contain data from hundreds of IT systems across the 83 legacy organizations, it was vital that the source of the EDW’s content was known.

Flexibility/Adaptability to change: The EDW had to be sufficiently flexible to accommodate the multiple data structures of its predecessor organizations. Further, even during the first few years of operation, this new organization kept having its scope of responsibility extended, resulting in even more demand for change.

I presented a DV design, and I was caught a bit off guard by a comment from one of the senior BI developers who said, “John, what’s the catch?” He then reminded me of the saying that if something looks too good to be true, it probably is! He was skeptical of the DV claims. Later, he was delighted to find DV really did live up to its promises.

But there was one more major challenge.

A Marriage of Convenience? The Generalization/Specialization Dilemma

Data model patterns are, by their very nature, generic. Len Silverston produced a series of 3 books on data model patterns (with Paul Agnew joining him for the third). Each volume has the theme of “universal” patterns in their titles. Len argues that you can walk into any organization in the world, and there is a reasonable chance they will have at least some of the common building blocks such as product, employee, customer, and the like. Because the patterns are deliberately generalized, their core constructs can often be applied, with minor tuning, to many situations. In Volume 3, Len & Paul note that data model patterns can vary from more specialized through to more generalized forms of any one pattern, but that while the more specialized forms can be helpful for communication with non-technical people, the more generalized forms are typically what is implemented in IT solutions. The bottom line? We can assume pattern implementation is quite generalized.

Conversely, DV practitioners often recommend that the DV Hubs, by default, avoid higher levels of generalization. There are many good reasons for this position. When you talk to people in an enterprise, you will likely discover that their choice of words to describe the business appears on a continuum, from highly generalized to very specialized. As an example, I did some work many years ago with an organization responsible for emergency response to wildfires, floods, and other natural disasters. Some spoke sweepingly of “deployable resources” – a very generalized concept. Others were a bit more specific, talking of fire trucks and aircraft, for example. Yet others spoke of “water tankers” (heavy fire trucks) and “slip-ons” (portable pump units temporarily attached to a lighter 4WD vehicle) rather than fire trucks. Likewise, some spoke of “fixed-wing planes” and “helicopters” rather than just referring to aircraft.

In practice, if much of the time an enterprise uses language that reflects just one point on the continuum, life’s easier; the DV implementation can pick this sweet spot, and implement these relatively specialized business concepts as Hubs. This may be your reality, too – if so, you can be thankful. But within my recent client’s enterprise involving 83 legacy organizations, there was no single “sweet spot”; like with the wildfire example, different groups wanted data represented at their sweet spot!

Here was the challenge. By default, data model patterns, as implemented in IT solutions are generalized, while DV Hub designs are, by default, likely to represent a more specialized view of the business world. Yet my client mandated that:

The DV implementation, while initially scoped for a small cross-section of the enterprise, had to have the flexibility to progressively accommodate back loading the history from all 83 legacy organizations. As a principle, any DV implementation should seek to create Hubs that are not simplistic, mindless transformations from the source system’s tables. Rather, each Hub should represent a business concept, and any one Hub may be sourced from many tables across many systems. So even though the combined source systems contained an estimated 8,000 tables, we did not expect we would need 8,000 Hubs. Nonetheless, if we went for Hubs representing somewhat specialized business concepts, we feared we might still end up with hundreds of Hubs. There was a need for a concise, elegant, but flexible data structure capable of accepting back loading of history from disparate systems. I met this need by designing a DV solution where the Hubs were based on “universal” data model patterns.

The DV implementation of the new EDW had to align with other enterprise initiatives (the enterprise service bus and its XML schema, MDM and RDM strategies, and an intended business rules engine). All of these, including the DV EDW, were mandated to be consistent with the one generalized, pattern-based ELDM. Interestingly, the business had shaped the original ELDM framework, and it was quite generalized, but with a continuum of specialization as well. I needed to create a DV solution that could reflect not just more generalized entities, but this continuum.

I called the resultant solution a “Universal Data Vault”. This name reflects my gratitude to the data model pattern authors such as Len Silverston (with his series on “universal” data model patterns) and David Hay (who started the data model patterns movement, and whose first book had a cheeky “universal data model” based on thing and thing type!), and Dan Linstedt’s Data Vault architecture.

We have now considered why I combined generalized data model patterns with Data Vault to gain significant business benefits. Next, we look at how we design and build a “Universal Data Vault”, from a more technical data model perspective.

Introducing the “Universal Data Vault” (UDV)

Foundations: a brief review of data vault – hubs, links and satellites.

Click image to enlarge in another tab

Without going into details of the transformation, we can note:

Each business concept (an aircraft, a schedule) becomes a Hub , with a business key (e.g. the aircraft’s registration number), plus a surrogate key that is totally internal to the DV. Each Hub also records the date & time when the DV first discovered the business instance (e.g. an aircraft registered as ABC-123), and the operational system that sourced the data.

Each relationship between business concepts (e.g. the assignment of an aircraft to an emergency) becomes a Link , with foreign keys pointing to the participating Hubs. Like a Hub, the Link also has its own internal surrogate key, a date & time noting when the relationship was first visible in the DV, and the source of this data. Note that while an operational system may restrict a relationship to being one-to-many, the Links allow many-to-many. In this case, this is essential to be able to record the history of aircraft assignment to many emergencies over time.

Hubs (and Links – though not shown here) can have Satellites to record a snapshot of data values at a point in time. For example, the emergency schedule Hub is likely to have changes over time to its severity classification, and likewise, its responsible incident controller.

Fundamentals of the UDV “How-to” Solution

One solution is to have a Hub for the generalized concept of a Deployable Resource, as well has Hubs for Aircraft (and Fixed Wing Plane and Helicopter), Fire Truck (and Water Tanker and Slip-On), Plant (Generator, Water Pump) and so on. The inheritance hierarchies between the generalized and specialized Hubs can be recorded using “Same-As” Links. This approach is most definitely something you should consider, and is explained in a number of publications on Data Vault. However, for this particular client, there was a concern that this approach may have resulted in hundreds of Hubs.

I created the “Universal Data Vault” (UDV) approach as an alternative. A sample UDV design follows. At first glance, it may appear to be a bit more complicated than the previous diagram. However, the generalized Asset Hub not only represents the incorporation of Aircraft, but additionally it can accommodate a large number of specialized resource Hubs for Fire Trucks, Plant & Equipment and so on. Likewise, the generali z ed Activity Hub not only handles the Emergency Response Schedule Hub, but also other specialized Hubs such as activities for preventative planned burns.

In the next part of this paper, we will look in more detail at the individual UDV components.

Share this post

John Giles

John Giles is an independent consultant, focusing on information architecture, but with a passion for seeing ideas taken to fruition. He has worked in IT since the late 1960s, across many industries. He is a Fellow in the Australian Computer Society, and completed a Master’s degree at RMIT University, with a minor thesis comparing computational rules implementations using traditional and object-oriented platforms. He is the author of “The Nimble Elephant: Agile Delivery of Data Models Using a Pattern-based Approach”.

View all posts by John Giles →

Support & Downloads

Quisque actraqum nunc no dolor sit ametaugue dolor. Lorem ipsum dolor sit amet, consyect etur adipiscing elit.

s

Contact Info

logo

WHAT IS DATA VAULT? – ALL YOU NEED TO KNOW

If you want to know what Data Vault is or want to learn about Data Vault, then you have come to the right place. We specialise in Analytics and Data Warehousing using the Data Vault method.   We have seen considerable benefits from using the Data Vault approach with our clients and have become passionate evangelists.

FREQUENTLY ASKED QUESTIONS ABOUT DATA VAULT

What is data vault.

Data Vault is a method and architecture for delivering a Data Analytics Service to an enterprise supporting its Business Intelligence, Data Warehousing, Analytics and Data Science requirements.   At the core it is a modern, agile way of designing and building efficient, effective Data Warehouses.

What is Data Vault

Why haven’t I heard of Data Vault?

Data Vault might not be widely known due to its niche focus within data management. Where traditional data warehousing methods (such as Kimball and Inmon) have been around for decades, Data Vault emerged in the 2000s as a solution to modern data platform requirements.

It is an open method that has grown purely organically, and its popularity varies by industry and region.

Where did Data Vault come from?

The origins of Data Vault go back into the 1990’s when Dan Linstedt, the inventor of the method, developed his ideas while working for Lockheed Martin. Having published papers, he used it throughout the 2000s refining the approach before publishing his first book “The Business of Data Vault Modeling” in 2010. Dan’s second book “Building a Scalable Data Warehouse with Data Vault 2.0” has become the definitive text on the method.

data vault case study

Are Data Vaults compatible with Star Schemas?

Data Vault 2.0 has staging, vault and mart layers. Star schemas live in the mart layer, each star schema exposes a subset of the vault for a particular group of users.   Typically, hubs and their satellites form dimensions, links and their satellites form facts.   Additional features (such as bridging tables and PIT tables) can be used to virtualise these – enabling them to run as performant views.

Who owns Data Vault?

The Data Vault method is open for anyone to use. Books, training, and coaching are available for all to access.

Dan Linstedt, Data Vault’s creator, owns the trademark Data Vault 2.0.

Is Data Vault suitable for my business?

Data Vault is designed specifically for organisations that need to run agile data projects where scalability, integration of multiple source systems, development speed and business orientation are important.

Is Data Vault free?

  Yes, Data Vault is an open published approach to running enterprise data platform projects.

Do data lakes work with Data Vault?

Yes, data lakes are frequently used as part of a data architecture that includes Data Vault.

The data lake forms a persistent staging area as part of the solution

data vault case study

STRUCTURING YOUR DATA VAULT PROJECT

Have you started a Data Vault project? Are you thinking about how to structure your Data Vault system? Our e-book sets out a framework that scales to a full enterprise analytics solution.

Extended Data Vault Migration Framework

LEARN ABOUT DATA WAREHOUSE MIGRATION

Do you feel trapped by your legacy Data Warehouse? Do you know where to start? Our framework is an Agile migration solution that just works.

data vault case study

DATA VAULT TECHNOLOGY LANDSCAPE GUIDE

Get ideas from our infographic of tools and technologies being used to support Data Vault projects and download the accompanying guide with links to the relevant websites.

Business Intelligence Download

BUSINESS INTELLIGENCE STRATEGY: 6 STEPS TO SUCCESS

Improve your organisation’s planning and business case for Business Intelligence. Our extensive experience is distilled into six steps in this white paper.

Microsoft Azure Data Vault Webinar

MICROSOFT WEBINAR: DATA VAULT WITH AZURE SYNAPSE ANALYTICS

Watch our webinar with Microsoft on how to build a Data Warehouse to work with your data, all the time.

Microsoft Azure Data Vault Webinar

MICROSOFT WHITE PAPER: DEPLOYING DATA VAULT ON AZURE SYNAPSE ANALYTICS

Download our white paper written for Microsoft: “10 Best Practices for Deploying Data Vault on Azure Synapse Analytics”.

DATA VAULT VIDEOS

A brief introduction of Data Vault playlist on YouTube

Learn About Data Vault 2.0 Part 1

Learn About Data Vault 2.0 Part 2

Learn About Data Vault 2.0 Part 3

Learn About Data Vault 2.0 Part 4

Learn About Data Vault 2.0 Part 5

Learn About Data Vault 2.0 Part 6

Learn About Data Vault 2.0 Part 7

Learn Business Data Vault Secrets

Learn more about Data Vault on the Datavault YouTube channel

Data Vault experts in Database solutions and GDPR solutions

DATA VAULT BOOKS

Learn about Data Vault

Building a Scalable Data Warehouse with Data Vault 2.0

by Dan Linstedt & Michael Olschimke

Dan Linstedt’s book “Building a Scalable Data Warehouse with Data Vault 2.0” is the definitive text on the subject.  

data vault case study

The Elephant in the Fridge: Guided Steps to Data Vault Success through Building Business-Centered Models

by John Giles

John Giles’ book is an easy read introduction to Data Vault taking a business perspective and is highly recommended.

data vault case study

The Nimble Elephant: Agile Delivery of Data Models using a Pattern-based Approach

If you are interested in Data Governance John Giles’ book is practical and readable introduction to agile data modelling.

data vault case study

The Data Vault Guru: a pragmatic guide on building a Data Vault

by Patrick Cuba

atrick Cuba’s book includes some of the latest thinking on the practicalities of implementing a Data Vault 2.0 solution.

DATA VAULT COMMUNITIES

data vault case study

The Data Vault User Group meets regularly as a forum for learning about Data Vault related topics, sharing experiences and informal discussions. Their website has free content available.

Q&A FORUM

The Data Vault User Group Q&A Forum is the place to get answers from other users and industry experts. Join free today.

data vault case study

A global community which seeks to unite Data Vault experts, vendors and practitioners and share best practices for Data Vault 2.0 with organisations worldwide.

the Institute of Development Studies and partner organisations

Non-performing Loan and Its Management in Ethiopia: A case study on Dashen Bank Mekelle Area Bank, in Tigray

Ids item types, copyright holder, usage metrics.

Mekelle University, Ethiopia

IMAGES

  1. Data Vault Case Study

    data vault case study

  2. Data Vault Automation

    data vault case study

  3. Is your data warehouse holding you back?

    data vault case study

  4. Design and build a Data Vault model in Amazon Redshift from a

    data vault case study

  5. Universal Data Vault: Case Study in Combining “Universal” Data Model

    data vault case study

  6. Data Vault Use Cases Beyond Classical Reporting: Part 3

    data vault case study

COMMENTS

  1. Data Vault Best Practice on the Lakehouse

    Explore best practices for implementing Data Vault modeling on the Databricks Lakehouse Platform using Delta Live Tables for scalable data warehousing.

  2. Data Vault 2.0 using Databricks Lakehouse Architecture on Azure

    This Article is about Data Vault 2.0 using Databricks Lakehouse Architecture on Azure and is presented in partnership with VaultSpeed and Scalefree our..

  3. Data Vault: Scalable Data Warehouse Modeling

    A data vault is a data modeling design pattern used to build a data warehouse for enterprise-scale analytics. The data vault has three types of entities: hubs, links, and satellites. Hubs represent core business concepts, links represent relationships between hubs, and satellites store information about hubs and relationships between them.

  4. Implementing Data Vault 2.0 on Fabric Data Warehouse

    Data Vault was developed by Dan Linstedt, who is the inventor of Data Vault and co-founder of Scalefree. It is designed to meet the challenging requirements of the first users in the U.S. government. These requirements led to certain design principles that are the basis for the features and characteristics of Data Vault today.

  5. The Definitive Intro To Data Vault

    The Data Vault is a detail-oriented, history-tracking and uniquely linked set of normalized tables that support one or more functional areas of business. With these principles in mind, let's understand two different types of vaults that exist in the Data Vault domain — The Raw Vault and The Business Vault.

  6. Data Vault 2.0

    Data Vault 2.0 is a data modelling methodology and system architecture that is specifically designed to address the challenges of managing and integrating large volumes of data in a scalable and ...

  7. Data Vault 2.0: Best Practices and Modern Integration

    This white paper explores the core principles, best practices, and integration possibilities of Data Vault 2.0, highlighting its relevance in contemporary data warehousing.

  8. Hybrid Architectures in Data Vault 2.0

    When implemented thoughtfully, hybrid architectures empower data-driven organizations to leverage the flexibility of data lakes alongside the analytical power of Data Vault 2.0. By carefully addressing the challenges and utilizing the right tools, organizations can unlock deeper insights and improved decision-making from their diverse data assets.

  9. Agile Development in Data Warehousing with Data Vault 2.0

    Conclusion. Data Vault 2.0 is not only a scalable and flexible modeling technique, but a complete methodology to accomplish enterprise vision in Data Warehousing and Information Delivery by following an agile approach and focusing on business value. By using agile methods in data warehousing, the focus in projects can be on the business value ...

  10. Data Vault . Much needed breather for Data Warehouse.

    Data Vault 2.0 is real-time ready, cloud ready, NoSQL ready and big data friendly.Data Vault 2.0 is really getting ready to rock the world with it's success stories and case studies

  11. How to Build a Modern Data Platform Utilizing Data Vault

    Building a new data lake? Consider using a data vault architecture for optimal business value. Learn more about the pros and cons.

  12. Data Vault: What is it and when should it be used?

    Raw data vault is the basis for data that is auditable to source systems and the business vault provides a place for power users who need access to data one step down from the information mart. Separates out soft and hard business rules into different parts of the data integration. This enforces reusability of data across multiple end uses.

  13. A data vault, warehouse, lake, and hub explained

    In this blog post, we discuss the definition of and sample use cases for a data vault, warehouse, lake, and hub, and the differences between them.

  14. Data Vault Use Cases Beyond Classical Reporting

    Data Vault Use Cases Beyond Classical Reporting - Part 1. To put it simply, an Enterprise Data Warehouse (EDW) collects data from your company's internal as well as external data sources, to be used for simple reporting and dashboarding purposes. Often, some analytical transformations are applied to that data as to create the reports and ...

  15. Unlock the Data Vault: A Case Study

    The first step was to "Unlock the Data Vault," and while the breadth of topics covered may seem like this is a huge undertaking, it's a streamlined process that is as efficient as it is effective. Objective #1: Tell us about your company. Process: Our own investigating provided some high-level details about Riale, but it was important to ...

  16. Data Mesh and Data Vault working together

    Data Mesh and Data Vault working together - a case study Data Mesh and Data Vault can be compared to onions and apples. Both look the same from the outside, but the properties of each are far different. We were really interested to hear Paul Rankin - head of data management platforms at Roche Diagnostics talking at the Data Vault User Group answering the question "Data Mesh and Data ...

  17. Data Vault Modeling with ER/Studio Data Architect

    The Data Vault modeling approach has been introduced to address agility, flexibility, compliance, auditing and scalability issues that exist in traditional approaches for Data Warehouse data modeling according to Kimball and Inmon and to reduce large change-related costs. Data Vault differentiates three core types of entities and is based on ...

  18. Data Mesh vs Data Vault: Key Differences, Use Cases, & Examples

    Learn the differences between data mesh vs data vault, and discover which approach best aligns with your organization's data needs and goals.

  19. Universal Data Vault: Case Study in Combining "Universal" Data Model

    There are plenty of reasons you might want to use proven "universal" data model patterns to improve the outcome of your enterprise initiative or your agile project - the patterns are proven, robust, efficient now, and flexible for the future. There are also many reasons for considering the Data Vault (DV) architecture for your data […]

  20. What is Data Vault

    WHAT IS DATA VAULT? - ALL YOU NEED TO KNOW If you want to know what Data Vault is or want to learn about Data Vault, then you have come to the right place. We specialise in Analytics and Data Warehousing using the Data Vault method. We have seen considerable benefits from using the Data Vault approach with our clients and have become passionate evangelists.

  21. Non-performing Loan and Its Management in Ethiopia: A case study on

    The study also looked at the trend of non-performing loans during the five-year period under review and the factors that account for non-performing loan from both borrower and lending institution side. Primary and secondary data were used in the study. Primary data gathering tools such as structured questionnaires and personal interviews were used.

  22. 12 Ways To Fund Part of Your College Tuition

    Parents whose children are in high school can still open 529 Plans and take advantage of potential investment growth. While you could get the most out of your 529 Plan if you open it when your ...