New Features and Bug fixes in Version 1.2
Date of Publication: 04/05/2022
Batch Pipeline
Processor Creation: Create a processor for the existing connection so that it is used for the Pushdown feature. In the current build, this Pushdown feature is enabled for SQL Server and Redshift sources.
- SQL Server Pushdown: If data is migrating within the same SQL Server system, DataFactory will be orchestrating, and processing logic is pushed to the SQL Server system to get high throughput.
- Redshift Pushdown: If data is migrating within the same Redshift system, DataFactory will be orchestrating, and processing logic is pushed to the Redshift system to get high throughput.
- Azure CSV and Parquet as Source: Ability to query/ingest files (CSV and Parquet) on Azure DATALAKE and use each of them as a Source node in the Batch Pipeline.
- S3 CSV and Parquet as a source: Ability to query/ingest files (CSV and Parquet) on AWS S3 and use each of them as a Source node in the Batch Pipeline.
Streaming Pipeline
- RDBMS Source and Loader: Optimized RDBMS Source and RDBMS Load to make them more generic. But there is no impact on the end-user experience.
- Optimizations in doing MSSQL Merge as a Sink node in Streaming: The time taken to complete the Merge process is drastically brought down (Performance of MSSQL Merge is improved).
- Checkpoint Reset option existing on UI: When the data streaming process begins, the system will intelligently capture the current state till the records are been streamed. By resetting the checkpoint, we are forcing the system to stream the data from the initial point again.
- Displaying the whole count on UI logs: The number of records processed count is displayed in UI logs, instead of displaying it on the Sink widget.
Bulk Streaming Pipeline (BSP)
- Mongo DB as Source: Enabled the Mongo DB as a Source in Bulk Streaming Pipeline, which allows us to stream the data from multiple tables of Mongo DB to Kafka topics.
Projects
- COPY TO: Copy To Streaming Pipeline, BSP and Connections are enabled. The ability to migrate Assets (Streaming Pipeline, BSP, and Connections) from project to project is successful.
- Project level Pausing All/ Resuming All the scheduled jobs: Ability to Pause all scheduled jobs and Resuming all Paused Jobs within a project.
Jobs
- Sending the Email at the Job Level: The ability to send the Email at the job level with a pre-defined template by the system will cover all the necessary information regarding all the assets within that job. The user is provided with the flexibility to add additional information on the top of a pre-defined template.
- Bug Fixes:
- Scheduling (Schedule Now and Schedule Later), Re-scheduling, Edit schedule is working
- Asset level and Job-level Email (ability to receive emails now) issues are fixed
- In Job Dashboard: Upcoming Run details, Frequency, and Last three sessions are working
- Edit Email templates for the asset-level issue is fixed
- In Edit schedule, Category id is hardcoded as Category id:1, but now it is displaying the correct category id that is assigned by the user while creating the job
- Under Configure Email Notification window, for Notify field, selecting any group from View Groups avoids duplication of the selected group.
- While saving a job from the More option the schedule window loader is removed now
Reporting
- Bug Fixes: Delineation of datasets by project while creating a visual.
- Extending the support for Analytics hub for docker and Kubernetes solution/mode of deployments.
DB Explorer
- Bug fixes: Dataset creation issues are fixed.
Wrangler
AI/ML Model
- Cockpit (first version) for Admin and Non-Admin users: A bird view of the DataFactory tool for the selected project: System metrics, Drill downs, Notifications, Graphical representation, and many more features.
- GaussianNB, KNeighborsClassifier, and SVC estimators were added to AI/ML model.
- Added Tuning History.