Due to the extreme flood of information into business, companies around the globe are tasked with finding efficient methods of keeping up with new data while also managing the data they’re currently tracking. An incredible 95% of companies struggle to manage the vast quantities of data they’re faced with, demonstrating the extent to which this has become a global problem.
In order to manage incoming data, businesses use data warehouses, acting as huge storage facilities for information. However, not all data collected is completely new, with changing aspects or information also informing and reinforcing past datasets. A lot can change within a business, and the methods of capturing and storing data that companies use need to reflect these ongoing developments.
For example, if a dataset stored all of the names of employees within a company and their roles, this would continually need changing, editing, and additions to the list. Equally, if someone was promoted, their title within the database would have to be changed. Alterations within the world of business can often be overwhelming, which is why businesses are making the most of SQL data management tools that they have at their disposal.
One of the primary methods that businesses will employ when darling with changing data is SCDs – slowly changing dimensions. SCDs allow data engineers to create a constant flow of information, which will either replace, edit, or add on to already existing data sets.
What are Slowly Changing Dimensions?
Slowly changing dimensions are the main way that businesses can track changes in their own data. Whether it be changes to company policy, an edit that’s needed to reflect more current information, or any other in-cell change to data, slowly changing dimensions are one of the most efficient methods of documenting this change.
There are many types of SCDs, some of which simply replace one set of data with another, while others will create additional fields. With the latter option, a business has the added advantage of being able to then track change over time, plotting the movement of a particular data point throughout its history.
Depending on what data you’re working with, the past figures may not be particularly important. The flexibility of choice when it comes to slowly changing dimensions is one of their most effective qualities – a business can decide whether they want a real-time view of data, or a historical approach.
There are six main types of SCDS, each of which takes a slightly different approach to changing attributes within a data warehouse.
SCD Type 1 – Overwriting Data
The first type of SCD is the most straightforward, it simply overwrites data in any particular dimension. Instead of monitoring change, using this SCD format will just give you the most up-to-date data point.
With this format, any data that is edited is lost as it is directly replaced, instead of moved to a different location. With this cut and dry format of data tracking, type 1 SCD is very easy to implement, not needing any additional information apart from the new data and where it will be placed.
One of the main advantages of this simplistic approach to data management is that a business is able to retain up-to-date information without having to take up lots of space in their database.
SCD Type 2 – Creating a New Record
With a Type 2 SCD, whenever new data is collected, instead of being replaced as it is with type 1, a whole new dataset is created. This new record will contain the new data, giving the company access to both the old and new datasets. In order to keep track of when each record was created, these files often contain timestamps about the specific times and dates of each version.
Using this system of SCD, a business is able to compare complete datasets over time, moving between records to contextualize how certain elements have changed and morphed over several weeks, months, or years.
SCD Type 3 – Previous Two Values
Within SCD Type 3, instead of simply replacing the data, each field will have two values in it. These two values are the most recent two data points, allowing someone that’s reading through the data to get a snapshot of the recent period.
This is commonly used when comparing different quarters. For example, a company could use SCD Type 3 data tracking to see how they are currently performing in terms of revenue when compared to their previous quarter or year. This format of tracking is excellent for short-term data analysis and will allow data engineers to quickly understand the context of a specific data point’s movements.
Some cloud data warehouses are known for using different types of SCD tracking. For example, Snowflake typically offers Type 3, having a Merge button that creates this form of data tracking. If you’re unsure what Type a data warehouse favors, then you should read a comparison of top data warehouses to see which they commonly use. Take a look at this comparison of Druid vs Clickhouse to see how other data warehouses are approaching SCD.
SCD Type 4 – Two Tables Approach
SCD Type 4 is also known as the historical table approach, due to the fact that it heavily relies on creating and maintaining a sense of change through table inputs. Within this type of SCD, users will be able to see the most recent data in one fixed table. This table will show only the most recent data point, giving a current view of a specific point.
However, linked to this dataset in another table will be the full historical documentation of this data point. Every single time this point is updated, a new row in the table is added, listing the new data. This historical approach will then have a row for every single change, allowing business analysts to quickly move down the table and chart how change has occurred over a certain period.
While this is a popular approach, as it provides accessible data with ease while also having a full history available, it consumes a lot of space. As you’re also maintaining two different datasets, if this is the main SCD type that businesses use, they will put a lot more work on their data department.
Other Types of SCD
While there are many more than 4 types of SCD, the other formats are all simply modifications or combinations of earlier approaches. For example, Type 6 SCD is where Type 1, 2, and 3 are combined, with an expansive table being continually added to, with a historical record accessible through individual cells.
The other types of SCD are all more complicated than these first four, but with this added complexity providing other ways of analyzing, comparing, and contrasting data sets. Also, it’s worth mentioning that data engineers will refer to SCD Type 0 to data points that ignore any changes. For example, the date that someone joined a company, their birthday, or their first name is unlikely to change.
Why should my business use slowly changing dimensions in their data warehouses?
Slowly changing dimensions are designed specifically with the fast-paced change that businesses can go through in mind. Instead of treating data like a fixed entity, they acknowledge that information can change, things move on, and companies morph over time. By focusing on facilitating the collection of data and containing it in a format that acknowledges that change, SCDs can become fantastic tools for analysis.
After collecting data over time with type 2 or 3 SCD, a business is then able to begin to conduct analysis on these data points. Instead of simply knowing what a data point is, SCDs and the historical capturing they enact will ensure that your business has an understanding of how that data has developed and changed over time.
With an understanding of this change, a business can then use the additional context to inform their company decisions.
The vast majority of ETL tools now contain the ability to implement and deploy SCD fields. Whether you want to simply replace data, like with Type 1 SCD, or create a historical table to accompany the most recent data point, like with Type 4, these tools provide a range of ways to interact with and collect data within a business.
Especially considering the fast change that can occur in the world of business, it’s no wonder that these tools have become so vital. From plotting a data visualization of how a data point has changed over time to simply preparing your data warehouse for the inevitable shift of data, SCDs are now vital in data ecosystems.