- Navid Nassiri
Why you need business logic in your ETL pipeline
Digital media companies are used to handling vast amounts of data every day. But harnessing that data for useful analytical insights is often more complex than it first appears.
This could be for a number of reasons, including:
a growing number of data sets
growing volumes of data within each data set
different data requirements for different teams across the organization
To manage any or all of these factors across all of your data pipelines, you need a sophisticated automation engine that can handle the heavy lifting - or a heck of a lot of custom engineering in-house.
However, even with automation, you need to ensure the process is as efficient and effective as possible, which involves unifying disparate datasets and storing that data in a data warehouse, a process known as ETL.
What is automated ETL?
ETL stands for ‘Extract, Transform, Load’, and describes the process of turning raw data from your incoming sources into foundational data you can use for analysis. Each of these steps helps to form a data pipeline that should continuously update in real time:
Extract. This is the first stage of the data pipeline, in which raw data is collected, moved or copied from its original source to an interim storage solution.
Transform. At this stage, the raw data is converted into foundational data by applying standardization rules according to its intended purpose, as well as cleaning, verifying, formatting, sorting and labeling the data.
Load. Finally, the transformed data is loaded into a target destination such as a data warehouse or data lake. If this is the first time the pipeline runs, it’ll involve a ‘full loading’, followed by ‘incremental loading’ thereafter.
Why ETL alone isn’t enough to drive analytics
The problem with only using an ETL process is you have lots of data to look at, but no way to organize it in a way that’s useful for BI. The data needs to be ‘told’ what to do, and what to look for, whenever you need to make a specific query or request. For that, you also need a layer of business logic that sits between the presentation layer (UI) and the data access layer.
Business logic refers to the programming or algorithm that actions business rules that allow a database to communicate or connect with the end-user interface. The goal of well-written business logic is to enable a non-technical business person to query or ask questions using the data with terms they’ll understand.
For example, an ad campaign manager wants to know how many impressions version A of an ad creative got versus version B. Without a business logic layer, they would likely need to know the code required to fetch the necessary data from the database (e.g. using SQL) even with a presentation layer (UI). Or they'd have to wait for a data engineer to do it for them.
But with a business logic layer, Rev Ops and Marketing teams can generate requests through the UI and the business logic will transfer the query to the data access layer. The business logic will determine suitable results from the query and make them visible on the UI.
As a result, having a business logic layer helps to make data accessible to anyone in the business that needs it, for much faster analysis.
To learn more about how to automate your ETL pipelines effectively, check out our ultimate guide to ETL.