January 8, 2025
/
4 MINUTES

The Three Rules for (Modern) Data Modeling

Data Culture
Data Modeling
Data Industry
Hannu Järvi
Co-Founder & Chief Success Officer

Isaac Asimov, frustrated with the recurring science fiction trope of robots turning against their masters, crafted the “Three Laws of Robotics”. His aim was to ensure that robots would be seen as a force for good.

I encounter a similar frustration within the domain of data modeling. Quality data models are crucial for success in any data-driven endeavor.

Yet, a data model is often treated as merely checking the box.

It’s created to satisfy process requirements, rather than serve their true masters — engineers, users of data, and analytics solutions. Data models are frequently ignored as a result.

My “Three Laws of Data Modeling” start with the premise that the role of data modeling is to ensure organizations succeed in data and analytics, more so in the age of AI.

These laws are designed to serve the people whose job is to make this happen, whether they are technical experts, analysts, or business professionals.

1. Data Models Must Reflect Reality

A good data model mirrors the real world it aims to analyze, because what is the purpose of modeling data if it cannot help solve the complexities of the real world?

Data models must focus on business over IT.

However, the data housed in operational systems still represents reality, even if the business reality it reflects is not immediately visible. Your business activities happen based on this data.

A data model must accurately capture the essential structure of this data, without getting stuck in the details of how the source system organizes it.

Don't lean too far this way either. Don’t discard a model simply because some aspects represent an over-simplified business perspective rather than the actual reality reflected in source system data. The real world is often incredibly complex. Capturing all its nuances is necessary only when it's relevant to our objectives.

Data analysts and engineers can navigate such mixed designs as long as they can distinguish what part is accurate and what part is a simplified representation of reality.

In practice, this might mean that your data model is a blend of conceptual and physical representations, with different parts defined at varying levels of detail.

Regardless of the level of detail, keep the following principles in mind:

  • Accurate Representation of Entities and Relationships: The entities in the model should correspond to tangible elements of the business—such as customers, products, or transactions. The relationships between these entities should capture the hierarchies, interactions and dependencies these entities form in real life.
  • Alignment with Business Processes: If a model doesn't align with how a business operates, it can cause confusion and result in a misinterpretation of the business. Collaborate closely with business stakeholders to capture nuances and edge cases that might otherwise be overlooked.

2. Data Models Must Be Easy to Understand

Analytics is a bridge between data and decision-making. For this bridge to function, the data model must be easily understood by IT professionals, the business, and, of course, analysts.

‘Easy’ can be relative as well. So ensure the data model communicates what's being represented as uniformly as possible. This includes presenting hierarchies, dependencies, and interactions of the same type in a consistent manner throughout the model.

  • Clarity and Simplicity: Avoid unnecessary complexity. Use consistent naming conventions, intuitive structures, and standardized formats.
  • Documentation and Communication: Document the model’s components, assumptions, and purpose.
  • Stakeholder Engagement: Regularly involve both technical and non-technical stakeholders in discussions about the model. Their feedback ensures the model remains intuitive and practical for diverse audiences.

3. The Modeling Process Must Support Creation of Good Data Models Regardless of Scope

Everything I’ve discussed so far is relatively easy as long as the scope of data modeling remains fairly small.

The challenges begin as the scope grows.

Larger scopes introduce difficulties — large systems are naturally harder to manage than smaller ones.

However, the biggest challenge is created because no one person creates the enterprise data model for a large organization. It’s a collaborative effort involving a diverse group of people with varying backgrounds.

This is where the demands for clarity, simplicity, and especially repeatability become crucial.

In my future posts, I’ll share my experiences on how to create good data models that meet these requirements at scale. Follow our Medium profile for more content on data modeling.