The derived fields node creates new fields (columns) in the selected table and allows to edit pre-populated fields fed from the incoming source data. The creation of new fields helps to perform various calculations and generate results as desired.
The derived fields node has two ports. One input port and one output port.
Input Port -> Data that needs to be processed is connected to the input port of the derived fields node.
Output Port -> One output port is populated with modified table details. This updated table data is passed through the output port for further processing in the downstream nodes of the pipeline.
Configure the Derived Fields Node:
Derived Fields Node can be found in the Transformations Palette. The node can also be found through the search box next to Palette.
- Drag and drop the derived fields node onto the canvas.
- The Configuration option (radio button) is enabled by default.
- The Configuration menu consists of Derived Details and Options/Description tabs.
- Derived Details
The derived details include Column Name, Alias, and Datatype.
-
-
- Column Name: A list of all column names of the selected table is retrieved and displayed.
- Alias: Alias provisions to rename the column name and this can be done by double-tapping on the text field beside the column name.
- DataType: Displays the data type of the column name.
-
-
-
- F(x): Click the icon to edit existing datatype and expression.
-
Note: Column name cannot be edited by clicking icon, but alternatively the column name can be changed by aliasing option.
Click the icon to edit the newly created field (column name), datatype, and expression.
-
-
- Plus icon: Click the to create a new Column Name by declaring its Datatype, Input Fields, Operator, and Functions.
- Delete icon: Click the to delete the newly created column name.
-
The Derived Fields pop-up window is displayed by clicking F(x) and Plus icons.
Derived Fields can be defined using the expression editor after providing a Column Name and selecting the appropriate Datatype. There are two modes in the expression editor:
- Designer Mode: This mode allows you to drag and drop/ double-click on the specific input fields, operator, and functions onto the expression canvas to create a Where clause condition.
- Expert Mode: This mode allows you to type in the queries in the query space.
Note: To switch from one mode to the other mode. Click either the Designer Mode/Expert Mode tab. The selected Mode tab is highlighted with a background color.
-
-
- Input Fields: The input fields consist of column IDs that are populated from incoming source node data.
-
- Operator: A list of all logical operators is displayed horizontally, where the user can choose any of them to apply an operator on the input field selected.
-
Note: Input operator allows to type value/string on the expression canvas.
-
-
- Functions: A list of functions is displayed. The usage of any function helps the user to refine the resultant data accurately. The Spark SQL functions are categorized under String, Numeric, Date&Time, and Custom functions.
-
For more information about Spark SQL functions, refer to Apache Spark SQL Functions documentation.
- Click the Save button to save the created expression by typing/ dragging and dropping the required Input fields, Operator, and Functions.
Options / Description:
- Packet Size and Parallelism can be maintained here to achieve better performance.
- Annotation can be used to mention brief details of the functionality achieved in the filter node.
- Description can be used to provide more details of the filter conditions and can also be used to maintain a log or audit trail of all the changes done to the filter conditions over some time.