Efficient Ways of Large Data Processing and Scanning in Azure Databricks

Working with data and database, writing query is daily routine, query running time might be big different for different queries but serve the same purpose, which shows query performance is not 100 percent determined by database system configuration, server hard ware, table index and statistics etc. but how well do you use SQL to construct your query.

To satisfy user strong demand, we built sandbox database on production for business partners to server their needs of BI reporting and business analytical, in the meanwhile, we are also in charge of database maintenance and monitoring so that we got chance to collect all kinds of user queries. Some of them even crash the system or drag the database performance down apparently, here I want to share my thinking of query optimization on large amount of data.

Posted 2024-04-09Updated 2024-04-119 minutes read (About 1288 words)

An Application of Microsoft Purview Authentication Workflow for API and Data Sources

Microsoft Purview is a powerful data catalog and governance solution that helps organization discover metadata and manage their data assets. In front end, it provides web portal for business user to browse metadata and manage data mapping; in the back end, it provides API to access data source for scanning and Purview instance for business metadata updating through data process pipeline in batch job so that one critical aspect of using Purview is API authentication, especially when integrating with other services and applications.

In this article, we’ll take a deep dive into how Purview uses OAuth2.0 authentication for service principals and the secure way to access and manage data source for scanning, providing a comprehensive understanding of the process.