The true value of data can only be achieved by ensuring that data is relevant to your real business. Data Vault (or DV for short) is a methodology built around the idea that data needs to represent reality, not technical systems. At its core is the Data Vault model.
A Data Vault model that doesn’t represent reality is useless.
Your data model needs to contain the entities, attributes, and relationships that are familiar to the people who work in the real business. For example, customers buy products, product purchases are billed to customers, companies store product inventory, etc. These are not technical details - far from it!
Capturing this reality is sometimes a challenging task. One excellent and well-tested way of doing that is building a Conceptual Data Model (or CDM for short). A CDM that really reflects your business has a lot of value - even if it’s not always perfect, it helps all stakeholders to communicate with each other, and it guides you towards building actually useful data structures.
A conceptual model can be put into practice by incorporating it into the physical Data Vault model. The physical DV model is the actual design blueprint for your relational database, including columns, column lengths, primary keys, and foreign keys. Its core structure should be derived from the CDM, and its details filled in from the source systems that actually contain all that data. Here, automation is vitally important.
This article explains how to easily build a conceptual data model and then incorporate it into the physical DV model using two efficient and modern data solutions - Ellie and VaultSpeed.
Ellie is an intuitive data modeling tool with enterprise-level data modeling and information architecture features. Moreover, it’s cloud-native, so it’s easy to set up and use, and to collaborate with your team.
The point of Ellie in this process is to capture the real business needs in a structured way, so that the DV design is built on solid ground. Ensuring a shared understanding of the core business entities and relationships is crucially important - otherwise, we might be building things that are in the end difficult or even impossible to get value from.
Taxonomy is a word you might encounter in Data Vault methodology. In simple terms, it means capturing the structure of how the business identifies and categorizes things of importance.
Consider, for example, the business case of a car-motor-bike products store. Below you can see how they categorize their offering. This is an example of a business taxonomy:
The same structure built in Ellie looks like this:
The “boxes” on this canvas are entities - reusable business objects that have definitions and other metadata in Ellie’s repository. They are connected to each other in different ways, representing the reality of this store’s business.
Describing this structure as a conceptual model in Ellie is useful for all kinds of projects, but here we have a clear path forward: we want to build our Data Vault based on this.
Building integrations between tools is made so much easier when you can use REST APIs. Fortunately, both Ellie and VaultSpeed have well-documented REST APIs in place. In this example, we’re using a Python script that does the following:
This results in a pre-filled canvas for hub groups in VaultSpeed, saving us the time it would have taken to create them manually in VaultSpeed - and ensuring we’re actually following the business model!
Ellie allows you to create custom metadata in the glossary. For this particular integration, 3 fields were added:
Python is the easiest way to translate the Ellie model JSON file coming from the Model Export API into readable input for VaultSpeed’s API endpoints.
The script does several things:
Find more details and the full script in this VaultSpeed community post.
When facing enterprise-level data integration challenges, lots of data models need to be joined together. The scale of this effort requires automation. Data Vault is great for both integration and automation.
It might be tempting to base yourself on purely the source metadata to build the DV model, but you would end up with a model that represents only the source and not your actual business. This anti-pattern is often referred to as “fake Data Vault”. The result is that business will not really receive an integrated picture of their situation - merely a repeat of what they happen to have across their system landscape.
To deliver true integration, you need to start from a business perspective and blend that perspective with the reality that exists in various data sources.
That means there are 2 main criteria for building a good Data Vault model:
An integration that helps to integrate the CDM into the automation process brings value by reducing manual modeling work and leaving less margin for errors.
This use case is just one example of how Ellie and VaultSpeed can interact with each other. Other examples where integration would reduce manual work could be:
The beauty of this is that as both tools have flexible and powerful API endpoints, nearly anything can be done in between.
Ellie’s overall integration philosophy is to provide (and keep adding) APIs that allow users and partners to utilize the models and other metadata in all kinds of tools. Ellie’s API specifications are publicly available here. File-based exports and imports from the Business Glossary are provided for human-friendly purposes.
VaultSpeed’s vision on integration varies on the type of integration:
Using these two excellent tools in unison ensures a business-focused design that is fast and efficient to implement. It means that you are always doing the “right thing”, and at the same time doing it the “right way”!
Juha Korpela, Ellie CPO & Jonas De Keuster, VP - Product Marketing, Vaultspeed