In this guide, we are going to delve into the world of the Data Model to understand the role it plays, why they are essential, and how to get thinking more strategically about your data.
What is a Data Model?
In most organisations, we store information about both physical and digital items in computer systems - typically a database of sorts. Whether it is the latest stock numbers, content for our website, or knowing who is due to turn up at next weeks conference, we store information digitally.
When we are storing this information, we need to consider what the things are, how they relate and any business rules which apply to them
Conceptual Data Model
If we took a look at an incredibly simple example of the old fashioned Lemonade Stand
We have a simple structure here, we have beautiful, crisp, refreshing Lemonade to sell and we are at the start of something big, so we're tracking and reporting on sales for our next big VC Pitch.
In the example above we can say that Product, Sales and Customer are all entities. Apart from Sales, they are representing real-world 'things'.
The final bit of information that we can see is the relationships between the entities. The Product is sold to the Customer through a Sale.
The example we have just seen is for a Conceptual Data Model. The role of the Conceptual Data Model is to give an overview without the detail needed to implement a real solution. With this model, you can have conversations with Business Stakeholders and be agnostic about technical detail and have a shared understanding of the Process and Data Requirements. There are two more models we will look at next.
Logical Data Model
Next on the list is the Logical Data Model which provides another level of detail for the entities and their relationships. Just like the Conceptual Data Model, we don't yet care about how it will be physically represented in our Database
We know from our first Model that a Product is sold and that a Customer is sold a Product. The Logical Data Model describes which attributes are used to build relationships. The Name and Cost are also attributes of the Product - just like width and height would be attributes of a Picture Frame.
Here we can see that each Product now has an ID (Identifier), as does Customer and Sales. These IDs are what we call Primary Keys (PK), they are an attribute which will uniquely identify them from all of the other data for that entity.
Where we have a relationship between two entities we hold the Primary Key from where the relationship is coming from, the Source, within a Foreign Key (FK) on the Destination. As per our diagram, we can see that a Sale has the Product ID of what was sold.
The Logical Data Model fleshes out all of the attributes used to describe the data with an entity. This becomes a bit more interesting these days with GDPR and the need to validate and justify why particular data is being stored.
Data Model Normalisation
The Model is also a key time to improve Logical representation through Normalisation. Normalisation is a process to reduce data redundancy and improve data integrity.
Reducing Data Redundancy is about ensuring you are not storing the same data in multiple places, something which causes great pains when not thought about in the design process. If you store customer information in 8 different places and they change their address, it is a pain to make sure all systems then get updated. A way around this could be a CustomerAddress Entity which relates a CustomerID, any place the CustomerID is used and needs an Address it will relate back to a single record in the CustomerAddress Entity.
Data Integrity is making sure related data is stored together and their relationships are maintained correctly. If I am running a Sales Report for all Orders, and then I decide to delete a Product, then I could end up with lots of Orders all pointing to a missing Product, and no detail - disaster! Maybe a way around this is an Attribute on Product called IsActive - someone can only buy a Product if IsActive is enabled, but my reporting will show all items.
To summarise the role of the Logical Data Model is to improve the design, dive deeper into the entities and their attributes and to see how they relate. Business Stakeholders should see their requirements being implemented in the design at this stage to meet their Business Requirements. The Model is still agnostic of how it will be actually built.
Physical Data Model
The last Model is the Physical Data Model. We traditionally thought about Relational Database Management Systems (RDMS) such as MS SQL or MySQL, there many different ways to store data - such as NoSQL. At this stage, we are diving into the real detail of how our entities, relationships and attributes will be implemented - We could pick our previous two models up and use them anywhere, now we are down to specifics of a system or platform.
The Physical Data Model is the conversation between those that will implement the solution and the requirements which have come from the Business.
There are many important decisions to be made, such as the correct DataType to be used for certain attributes - for example, if something can be either TRUE or FALSE, you wouldn't store that as Text as that could be any number of things other than TRUE or FALSE. There are details like if an attribute is NULLABLE, which means can it not have a value at all. If attributes hold Text, then how much text is enough, too much etc.
Considerations such as scalability (will it work with 100, 500, 10k, 1m records), maintainability (can we rename a product easily), performance (does it take less than 2s to return a product) and security (can Bob add and delete orders?) should all be there.
Why go through all this effort for a Data Model?
It is your business, and I am pretty sure it runs on data. The way which you work today is most likely not the one you will be working in 10, 5 or even 2 years time. You should take care to understand what data you are creating, where the data is used and be able to have conversations with your Business and IT colleagues about changes.
If someone comes along with a new way of processing Sales, how confident are you that you understand where you are today to adopt a new optimal process? If Finance add another datastore for Customers, are you able to figure out how to bring the data in line or integrate it into your ecosystem?
The Data that exists now will help drive so many decisions for the future, ensuring that you are able to always get the most benefit from it is essential, you don't want lack of design and forethought to leave you stuck in the mud.