The Snowflake Merge sink node allows the user to perform merge operations based on the selected columns in the snowflake merge table (The source table can be fetched from any data warehouse). This merge node includes Insert, Delete and Update operations where all three operations can be compiled at once/individually. The Snowflake Merge node can perform the following actions based on the merge key matching criteria:
- Update records when the merge key value is matched
- Insert records when the merge key value is not matched
- Delete records when the given condition is met
The Snowflake Merge node has two ports. One input port and one output port.
Input Port -> Data that needs to be merged is connected to the input port of the Snowflake Merge node.
Output Port -> The output port is the logical port. Which can be further fed into Analytics nodes if required.
Note: The output port of the Snowflake Merge node cannot be connected to any of the Source/Transformation nodes.
Configure the Snowflake Merge Node:
Snowflake Merge Node can be found in the Sink Palette. The node can also be found through the search box next to Palette.
- Drag and drop the Snowflake Merge node onto the canvas.
- The Configuration option (radio button) is enabled by default.
- The Configuration menu consists of Connection, Field Mapping, Operations, Compile SQL, and Description.
- The Connection tab comprises three options displayed vertically namely, Existing Connection, New Connection, and Recently used.
- Existing Connection: Displays the list of already existing connections with Type, Name, and Category details specified.
- New Connection: A new connection enables you to establish a connection to any datasource available in DataFactory by providing relevant information such as Account, Region, Warehouse, Connection JDBC URL, User Id, Password, Driver class name, Connection Status, and then click Test Connection to establish a new connection (Adhoc). The user is facilitated to even use existing connections by clicking Reusable Connection.
- Recently Used: The most recently used connections are displayed here.
- The Connection tab comprises three options displayed vertically namely, Existing Connection, New Connection, and Recently used.
-
- The Field Mapping tab displays three columns namely, Merge Key, Source Column, and Target column.
-
-
- Merge Key: The Merge Key acts as a Primary key to match records between two selected tables. Hover on the list of columns to see the Merge key against each column name. Click the
key icon against the required column name to make it a merge key. Then the
merge key icon turns orange color with the checkbox unselected.
- Source Column: Displays the list of column names populated from the source node.
- Target Column: Displays the list of column names populated from the selected Snowflake connection.
- Merge Key: The Merge Key acts as a Primary key to match records between two selected tables. Hover on the list of columns to see the Merge key against each column name. Click the
-
-
- The Field Mapping tab displays three columns namely, Merge Key, Source Column, and Target column.
-
- The Operations tab displays three checkboxes namely, Delete, Update and Insert operations.
-
-
- Delete: Enabling the Delete checkbox displays the Delete Expression snippet, provisioning the user to create a condition based on which the records are deleted at the target.
- Update: Enabling the Update checkbox displays the Update Expression snippet, provisioning the user to create an update condition, apart from automatically updating records when the merge key is matched between source and target tables.
-
-
- The Operations tab displays three checkboxes namely, Delete, Update and Insert operations.
Note: In the Value textbox, a string should be declared in single quotes, whereas numerical values can be declared without specifying any quotes.
-
-
-
-
- Insert: Enabling the Insert checkbox doesn’t display any Expression snippet like Delete and Update operations. By default, the insert operation inserts records into the target table when the merge key is not matched between the source and target tables.
-
-
-
- The Compile SQL tab displays Merge SQL query along with Pretty Print, Designer Mode, and Expert Mode options.
-
- Merge SQL: The Merge SQL query is formed based on the user selection in previous Field Mapping, and Operations tabs.
- Designer Mode: The Designer mode is selected by default. This readable query cannot be edited in this mode.
-
-
-
- Expert Mode: The Expert mode allows to make changes in the query.
-
-
-
- Pretty Print: Click the
Pretty print icon to view the merge SQL query in a structured format. This helps the user to edit the query easily.
- Pretty Print: Click the
-
- The Description tab displays various independent components.
- The Packet Size and Parallelism can be declared here to achieve better performance.
- Annotation can be used to mention brief details of the functionality achieved in the considered node.
- Description can be used to provide more details of the node conditions and can also be used to maintain a log or audit trail of all the changes done to the node over some time.
- The PreSQL and PostSQL query text boxes are available to type SQL queries to get desired results.
Note: All the Description tab options are optional for the user. All these options' functionality is independent of each other.
To demonstrate the Snowflake Merge sink node functionality, let us create a pipeline using “RDBMS (Snowflake) – Snowflake Merge” nodes. Where the data is read from the RDBMS data source connected to Snowflake Merge sink node to merge source and target tables based on Merge key matching criteria, then perform INSERT, UPDATE, and DELETE operations before data is finally written into Snowflake Merge node.
- Navigate to the Sink available in the palette
- Drag and drop the Snowflake Merge node onto the canvas.
- Create a pipeline by connecting the output port of the RDBMS (Snowflake) node to the input port of the Snowflake Merge node.
- Select the “Snowflake_Connection” option from the existing connection list.
- Ensure the connection status is active if a green tick mark is seen.
- Click the Field Mapping tab.
- The list of all source and target columns is displayed.
- Click the
(Merge Key) icon against the required column name to make it a merge key.
- The
(Merge Key) icon turns orange color with the checkbox unselected.
- Click the Source/Target columns drop-down list to change the column name as required.
Note: The merge key can be applied on single or multiple columns.
- In our use case, we selected the merge key on the “O_ORDERKEY” column name.
- Next, click the Operations tab.
- Click the Delete, Update and Insert operation checkboxes to perform all these operations on the selected source and target tables.
- A new Delete and Update Expression snippets are created.
- Under Delete Expression snippet, select “O_CUSTKEY” column name, then select “Greater Than Equal” operator, and type a value “1000” in the value textbox.
- Under Update Expression snippet, select “O_ORDERSTATUS” column name, then select “Equal” operator, and finally, type a value ‘O’ in the value textbox within single quotes.
- Click the
Add icon to add more expressions or click the
delete icon to delete existing expressions under Delete and Update Expression snippets.
- Click the Compile SQL tab.
- The Merge SQL query is displayed here describing the merge process applied on the two selected tables.
- Before running the created pipeline, the pipeline should be saved. Click the Save button.
- A Save Pipeline pop-up window is populated on the screen.
- Name the pipeline with a unique name and click the Save button to save the pipeline.
- Click the
icon to save the pipeline in the desired folder path.
- Click the Run button to compile the pipeline.
- The moment when the compilation process begins the log window appears at the bottom of the screen with the Log radio button selected.
- The status (green/red) of each compiled time log is captured and displayed on the screen.
- The number of output records processed at each node is displayed on the top of the node connected in the pipeline.
- Hover on the log message to understand how the ingestion process is completed.
Note: 9,894 records are obtained from the output port of the source node. These records are fed to the input port of the Snowflake Merge target node, where the below results are obtained based on the Merge Key match criteria:
- Total Records Processed: 4822
- Records Deleted: 0
- Records Updated: 4822
- Records Inserted: 0