The Binary Split condition splits the incoming source data as per the condition defined. Records that comply with the split condition will pass through the green output port and the records that don’t comply will pass through the red output port.
The binary split node has two ports. One input port and two output ports.
Input Port -> Data that needs to be split is connected to the input port of the binary split node.
Output Port -> There are two output ports for this node. One is the green output port and the other is the red output port. Based on the criteria defined in the split condition, data is split and passed through these output ports accordingly for further massaging in the downstream nodes of the pipeline.
Configure the Split Node:
Split Node can be found in the Transformations Palette. The node can also be found through the search box next to Palette.
- Drag and drop the split node onto the canvas.
- The Configuration option (radio button) is enabled by default.
- The Configuration menu consists of Binarysplit Details and Options/Description tabs.
- Binarysplit Details
- Input Fields: The input fields consist of column IDs that are populated from source system data.
- Operator: A list of all logical operators is displayed horizontally, where the user can choose any of them to apply an operator on the input field selected.
- Functions: A list of functions is displayed. The usage of any function helps the user to refine the resultant data accurately. The Spark SQL functions are categorized under String, Numeric, Date&Time, and Custom functions.
- Binarysplit Details
For more information about Spark SQL functions, refer to Apache Spark SQL Functions documentation.
Binaysplit details can be defined using the expression editor. There are two modes in the expression editor:
- Designer Mode: This mode allows you to drag and drop/ double-click on the specific input fields, operator, and functions onto the expression canvas to create a Where clause condition.
- Expert Mode: This mode allows you to type in the queries in the query space.
Note: A toggle switch is available to switch from one mode to the other mode.
The Eraser option enables the user to clear the split conditions defined in the expression editor.
Options / Description:
- Packet Size and Parallelism can be maintained here to achieve better performance.
- Annotation can be used to mention brief details of the functionality achieved in the filter node.
- Description can be used to provide more details of the filter conditions and can also be used to maintain a log or audit trail of all the changes done to the filter conditions over some time.