ETL

What is ‘extract’ in ETL?

Switchboard Apr 29

In this latest blog series, we’re focussing on ETL, a process for unifying data that stands for “extract, transform, and load”. Or for some DataOps teams, the process may be ELT (extract, then load, then transform).

However, ETL is not as simple as E, T and L. There are simply too many factors to describe your DataOps as a three-step process. The fact is, for each of the steps represented by these three letters – regardless of the number of data pipelines involved – there are a plethora of questions that need to be answered before you can get started.

Let’s start by looking at the letter E for Extract. What exactly does this part of the process involve and what questions do we need to ask before we can start extracting the data?

E for Extract

When people hear the term ‘extract’, they assume it simply refers to the process by which raw data is exported or copied from its original source, ready to be transformed, before it is loaded into its final destination. In reality, successful extraction of useful, actionable data refers to a whole lot more than this.

Factors to consider in data extraction

Timing is important: When do you extract the data? What timezone are the timestamps in the dataset?

Sometimes a data source fails or is missing: Do you then re-pull data from previous periods and backfill? How often do you check for errors? Backfilling is a critical, yet in our experience, often overlooked requirement for data integrity. If glitches occur frequently, you need to consider how to backfill a) with the correct data, b) with data at scale, and c) immediately upon the connection being restored – not hours, days or weeks after the event. These considerations all fall under the extraction process, but when done manually, they soon become repetitive, inefficient and error-prone, requiring a ton of person hours to fix exponential rows of data.

Sometimes extraction involves API calls to a data source: Do these have a rate limit? Does the API have a quota? What happens if the API is down? Do you know how the API works? Who maintains the code used to connect to the API? Do you have engineers who know how to manage sudden or unexpected API changes?

Security and regulatory compliance: What credentials are required to extract the data? Data extraction needs to be carried out securely – how is this security maintained? Who is responsible?

The scale of the data: How large are the datasets? Do you have the resources to extract that much, not only in terms of compute, storage, and scale, but also time and engineering resources? What happens when they grow in variety or volume?

How will these factors affect the overall cost of your DataOps?: Have you budgeted enough for the engineering work involved in extracting all your data? Are you extracting too much data, leading to unnecessary warehousing costs?

All these questions – and many more – must be answered comprehensively for the extraction of your data to be successful. Is your team equipped to respond to all these things effectively? If you’re still attempting to manage this process manually, then possibly not, and it may be time to consider automating your ETL.

In our next post, we’ll explore the in and outs of the ‘T’ or ‘transform’ phase in ETL.

In the meantime, check out our blog page for everything you need to know about ETL and data automation, or check out ETL: The Ultimate Guide.

What are your dashboards not telling you? Uncover blind spots before they cost you.

Schedule Demo

data
etl

Catch up with the latest from Switchboard

Performance Marketing

Subscribe to our newsletter

Submit your email, and once a month we'll send you our best time-saving articles, videos and other resources

PLATFORM

BY TEAM

BY BUSINESS CHALLENGE

about us

resources

featured resources

What is ‘extract’ in ETL?

Table of Contents

E for Extract

Factors to consider in data extraction

What are your dashboards not telling you? Uncover blind spots before they cost you.

Catch up with the latest from Switchboard

The Marketing Cloud Migration Strategy: Move Beyond Legacy Platforms

Predictive Analytics for Marketing: From Reactive to Proactive Campaigns

The Marketing Data Lake Strategy: When Warehouses Aren’t Enough

Cross-Platform Identity Resolution in 2025: The Holy Grail of Modern Marketing

Subscribe to our newsletter

RESOURCES

COMPANY