Filter conditions are defined in the Filter Node. Only the records that comply the filter condition will be passed through the output port of the filter node for further processing of the data through downstream nodes in the pipeline.
Filter Node has two ports. One input port and the other one is output port.
Input Port -> Data that needs to be filtered is connected to the input port of the filter node
Output Port -> Outputs only those records that match the filter condition
Configure the Filter Node:
Filter Node can be found in the Transformations Palette. The node can also be found through the search box next to Palette.
- Drag and drop the filter node onto the canvas.
- The Configuration option (radio button) is enabled by default.
- The Configuration menu consists of Filter Details and Options/Description.
- Filter Details
- Input Fields: The input fields consist of column IDs that are populated from source system data.
- Operator: A list of all logical operators is displayed horizontally, where the user can choose any of them to apply an operator on the input field selected.
- Functions: A list of functions is displayed. The usage of any function helps the user to refine the resultant data accurately. The Spark SQL functions are categorized under String, Numeric, Date&Time, and Custom functions.
- Filter Details
For more information about Spark SQL functions, refer Apache Spark SQL Functions documentation.
Filter conditions can be defined in the filter expression editor. There are two modes in the filter expression editor.
- Designer Mode: This mode allows you to drag and drop/ double-click on the specific input fields, operator, and functions onto the expression canvas to create a Where clause condition.
- Expert Mode: This mode allows you to type in the queries in the query space.
Note: toggle switch is available to switch from one mode to the other mode.
The Eraser option enables the user to clear the filter conditions defined in the expression editor.
Options / Description:
- Packet Size and Parallelism can be maintained here to achieve better performance.
- Annotation can be used to mention brief details of the functionality achieved in the filter node.
- Description can be used to provide more details of the filter conditions and can also be used to maintain a log or audit trail of all the changes done to the filter conditions over a period of time.