New Features and Bug fixes in Version 1.8
JOB CICD:
The Job CICD development is in progress. 85% of the work has been completed in this sprint.
Batch:
Data profiling - Descriptive stats Config:
The descriptive stats node has been enabled as part of the Batch pipeline. This node can be mapped to the source and transformation nodes. When this node is attached to any source/transformation node, the User would be able to calculate 6 dataset-level metrics and 16 field-level metrics including data domain prediction and data domain adherence on top of it that is attached to. The config part and execution part has been done in the current sprint. The ability to view the metrics visually and drill down to the underlying KPI would be implemented in the next sprint. The ability to map this node to target nodes is available in the upcoming sprints.
Re-run and Resume:
When a batch pipeline execution fails in between, with this capability the user would be able to rerun the pipeline from the failed node that got failed from the first packet or would be able to resume the execution from the packet that got failed. The backend work is done. UI work will be done in upcoming sprints.
Jobs (Exposing Jobs as REST API(Backend)):
The Jobs can be triggered and progress can be monitored via REST API.
Only backend work is done in this sprint.
BSP
BSP View Configuration Mode:
The ability to view the configuration of the Source and Target of BSP in view mode only, while the BSP is in running mode has been enabled.
As BSP is a continuous process, if the user desires to view the Configuration of the Source and Target system without stopping the BSP execution this option is provided.
Session History for BSP:
Session history is implemented for BSP so that users would be able to see the history of all the executions of the BSP. This feature enables audit capability.
BSP LogsView - Visual Logs - Implementation:
Monitor mode for BSP execution with different KPIs like the size of data, count of data, etc at the BSP and individual table levels. The user would be able to view the KPI of the data processing in bar graphs and these graphs would be updated in real-time, the user also has the flexibility to plot these graphs over different time intervals in the past ranging from the last 5 minutes to 1 day.
Data Mesh:
Business Modeler Phase 1- work in progress.
R&D Progress:
DataFactory Custom JDBC POC has been completed. This POC would be helpful in the implementation of consumption or subscription of data products in the data marketplace via custom JDBC.
DATA WRANGLER:
- KPI Calculations:
- Upon enabling the sample quality switch column-level KPI information is provided like consistency percentage, mean, median, std deviation, null values count, unique values count, min, max, and p-value
AI-ML:
- Forecasting:
- Ingestion of time series data
- Univariate forecasting capability
- Clean and replace nulls in the dataset
- Using algorithms like Exponential Smoothing, AUTO ARIMA, and Polynomial Trend for model training
- Tuning selected model for a selected model and creating multiple versions of the based on different hyperparameters
- Predicting the next instances using predict model feature