New Features and Bug fixes in Version 1.5
BSP:
- Edit and Re-run Functionality is implemented to retain all BSP configuration settings
- Summary Page is added to give a glimpse of selected Source and Target tables with connection details
- Handling Save and Cancel in each step:
- Save: To save current page configuration settings and also retain previous configuration settings and
- Cancel: Displays a pop-up screen with an alert message to the user whether the user wants to save the configured pipeline or else ignore the completed configuration settings
Streaming:
- Delta lake version support in source: Providing the user to start the streaming from a particular version
- Delta lake Merge target -- Type 1: Ability to merge a source table to the target table by performing all three operations (INSERT, UPDATE, DELETE) at once
Batch:
- Optimizations
Optimization is done at the backend by removing iterative DB calls for better performance of asset list view in the project module
- CICD Closure
CICD flow would enable the Review and Approval process for the assets that have been created in DataFactory. It is a capability that would allow us to version the assets in DataFactory working on similar lines of GIT workflow.
As part of 1.5, CICD flow phase 2 has been initiated for the batch pipeline and completed the following capabilities:
1. Notifications on the review process
2. Listing the session history by version
3 UI restrictions in the review process according to the state of the assets
4. Ability to list version history of asset
With the above set of capabilities along with capabilities covered in 1.4. Users can use the CI/CD flow for batch pipeline end-to-end.
Features like rollback and version comparison will be covered in upcoming sprints.
- REST API Enhancements :
For the REST API source in the prior version only one level of flattening capability is enabled for the response JSON. In the current version full flattening of nested response JSON is enabled. With this capability, all the nested levels of JSON response will be flattened taking all the nested levels in JSON response into account.
The bug fixes and functionality gaps were addressed that we had in the exit criteria
Request parameters and pagination capabilities are also implemented.
- SAP as a source -- lean version- The first version of SAP as a source is developed
We have enabled the SAP Application layer as the source. With this capability, we would be able to ingest the data from the SAP application layer. We are working towards a lean version in this sprint.
The ability to specify a table name in SAP and create a DEXTRACTOR with all the column names in the SAP table with an optional filter condition is implemented.
We are not covering updates of connections and DEXTRACTOR in the current version, it will be part of the next sprint.
BDP:
SPARK version upgraded to 3.3.
RBAC:
The capability of role-based access has been initiated in this sprint. All the framework-related groundwork has been done. In the current sprint, the user cannot consume this capability and but in the next sprint it would be ready for consumption.
DATA WRANGLER:
- Dataset as an input for the wrangler recipe
- Added capability to use dataset input from the different databases as input for the wrangler recipe.
- Stratified sampling
- Added new sampling capability of stratified sampling.
- Configure datatype and data domain
- Added capability to change metadata configuration: datatype and data domain.
- Change the date format
- Added new transformation to change the date format.
- Outlier graph in column details
- Added an outlier graph in the column details section (IQR box plot) to identify outliers in the data.
- Outliers’ transformation
- Added two methods to detect outliers i.e., Percentile outliers and rule-based outliers.
- Pivot Table
- A pivot transformation turns multiple rows of data into one, denormalizing a data set into a more compact version by rotating the input data on a column value.
- Recipe details page enhancement
- Enhanced recipe details page with more information on dataset name, created by, last updated, and created on.
AI-ML:
- Graph for evaluation of the model
- Analyze with, ROC curve, prediction distribution, and feature impact.
- Correlation Table
- Added table to check the correlation between the variables.
- Replace Null
- Added functionality to replace the null values from the data.
- Remove or Cap Outliers
- Added functionality to remove or cap the Outliers from the data. Box plots to find outliers and histogram.
- Class Imbalance
- Added functionality to perform class imbalance operations for unbalanced datasets.
- Synthetic Data
- Added a feature to generate synthetic data using techniques like Gaussian Copula, and Tabular Preset.
- Recommendation for scaling on transform
- Added a feature to recommend the type of scaling method for transforming numerical data.
- Dimensionality Reduction
- Added feature to reduce the dimensionality using techniques like PCA, SVD, Gaussian Random Projection, Sparse Random Projection, and Feature Agglomeration.