A Batch Pipeline is a process of moving data from one system to another system. In a broader sense, a pipeline pulls the data from the source, applies required transformation widgets, and then pushes the processed data to the sink.
A Batch Pipeline is termed a regular Pipeline. These pipelines are executed manually or repeatedly. During each run, the data is extracted from the data source, apply various transformations to the data, and finally, push the refined data to the Sink. This process is marked complete once all the data is processed.
Typically, the execution time of a batch pipeline depends on the source data size. The primary purpose of a batch pipeline is to keep data flowing to solve problems and make decision-making, that makes lives convenient and smoother.
ETL: Stands for Extract, Transform and Load. In ETL the data ingestion process is slower since it transforms data on a separate processing server before the loading process. Ideally, ETL is used when source data needs to be refined and manipulated before loading to the target system.
ELT: Stands for Extract, Load, and Transform. In ELT the data ingestion process is faster since the data is not sent to a separate server for restructuring. The ELT is flexible and efficient to ingest large amounts of data and processes both structured and unstructured datasets.
A pipeline defines the flow of data from a source system to a target system and explains how to transform the data along with the flow.
Initially to develop a pipeline you have to use single/multiple source systems, apply single/multiple transformation systems, and finally use single/multiple target systems to migrate data to target systems successfully. This target system data output helps to analyze, generate reports, and synchronize data.
Pre-requisites to create a Pipeline:
- Source
- Transformation
- Target
Refer to below link to learn how to Configure a Batch Pipeline:
https://dextrushelp.zendesk.com/hc/en-us/articles/6650721882388