Posted 2021-01-01Updated 2023-07-04Database / BI16 minutes read (About 2474 words)

SQL Server Table Partitioning in Large Scale Data Warehouse 3

This post is the last part on series of table partitioning, as plan this part is going to focus on some advanced topics like partition merge, split, conversion and performance optimization in terms of different business demands.

The first thing we will talk about might be interesting, we know one benefit of table partitioning is speed up loading and archiving data, we can easily feel the performance improvement on query against large data after table partitioning, but data archiving is not that apparent and it’s on the lower file system level to helps you mange data more on the backend efficiently by file group and database file. After all, we are intent to manage data through partitions and additionally, manage partitions through database files, but that is not that straight forward.

Posted 2020-12-26Updated 2023-07-04Database / BI39 minutes read (About 5843 words)

SQL Server Table Partitioning in Large Scale Data Warehouse 2

This post is part 2, we are focusing on design and create data process to extract, transform and load (ETL) big amount data into partitioned tables to be able to integrate to SSIS package then operate ETL process by scheduled job automatically.

In this case, we suppose transaction data with 100 columns and 100 million records need to be loaded into data warehouse from staging table. Technically, we can do partition switching from staging table (non partitioned table) to data warehouse table (partitioned table), but under the business reality, we can’t just simply do it like that, because

Source table and target table usually are in different database, it’s not able to switch data directly because of violating same file group condition by partition switching basic in part 1.
Staging table would be empty after partition switching, but the most of data transformations are applied to staging table then load final results into target table in data warehouse, so in business point of view, we can’t do partition switching from staging to target neither because big impact for entire daily, weekly or monthly data process.

There might be another question: why don’t we bulk load data from source to target or what benefits do we get from partition switching? Technically, yes, we can do bulk insert, however, for such big volume of data movement, the lead time is going to be hours (2-3 hours), if the process ran on business hours, it would cause big impact and it’s hard to tolerant by business users for critical data like transaction so from efficiency and performance perspective, we have to leverage partition switching to deliver transaction data in short period of time without interrupt BI reports, dashboard refresh and business analysis.

What is the main idea for production operation?

In order to promise transaction data is always available, we need to create a middle table to hold transaction data as source table for partition switching, we call that middle table as switch table. To satisfy all the requirements for partition switching, the switch table has to be created in as the same file group as target transaction table, identical columns, indexes, use the same partition column. We do the bulk insert from staging table to switch table then partition switch to target transaction table in data warehouse, as part 1 mentioned, this step will finish in flash as long as there is no blocking. At last, we drop switch table so the entire data process completes. Now let’s dig litter deeper on details for each steps.

Posted 2020-12-25Updated 2023-07-04Database / BI36 minutes read (About 5428 words)

SQL Server Table Partitioning in Large Scale Data Warehouse 1

This is a series articles to demonstrate table partitioning technology and how to apply it in a large scale data warehouse to improve database performance and even benefit for maintenance operation based on Microsoft SQL Server. This series is composed by three parts

The first one is a fundamental introduction on basis knowledge
The second part is showcase the entire production workflow to apply table partitioning to large tables
Finally, it’s going to be advanced topic like partition merge, split, conversion and performance optimization in terms of different business demands

Table Partitioning The Basics

For this part, I will reference Cathrine Wilhelmsen’s work, she is Microsoft Data Platform MVP, BimlHero Certified Expert, international speaker, author, blogger, and chronic volunteer. She loves data and coding, as well as teaching and sharing knowledge - oh, and sci-fi, chocolate, coffee, and cats :)

I learnt a lot and got many ideas from her blogs especially for table partitioning technology, below is her blog address, hope it can help you as well:blush:

Cathrine blog

Posted 2020-12-11Updated 2023-07-04Database / BI5 minutes read (About 696 words)

Oracle client side configuration for 12C

The client side need to do some configurations after Oracle 11g upgrade to 12C on Server in order to make database server is connectable. Before starting to configurate your clients, you have to get the below new server information from DBA

Host name
Port number
Service name
Your user name(usually it won’t be changed and replicated from old version)
Password(initial password for test connection then you need to update it)

and then you need to make a new connection strings to add it into ORA file(*.ora)

Posted 2020-12-02Updated 2023-07-04Database / BI17 minutes read (About 2559 words)

Sandbox solution for BI reporting and business analytics

At beginning, I’d like to share a story from my client and business partner. One day, my team worked on a big marketing project, data from all kinds of source like spreadsheet, csv, mainframe and SQL Server, we had to do cross reference all those data to generate analysis reports but the headache thing was they were isolated and no way to put them into a single unique environment to run one time query then return the results so we could only open multiple windows to compare them by eyeballs.

During the project, we often composed some complex queries then ran for a long time to return the result dataset, those datasets were quite important for future further analysis and share with the whole team, but the another panic thing was we could not save those dataset into database due to insufficient access so what we did was copy and paste everything in excel spreadsheet, after for a while, we found number of excel file explode and hard to find the report among those huge files, we feed up with the tedious work and decided created a bunch of views in database but that was also not controlled by us but infrastructure team, all we could do was submit the request then followed infrastructure team’s schedule and waited for month end deployment, no matter how urgent those reports would be.

That is the story, I think if you had ever experienced that, that solution might be right for you.

Posted 2020-11-25Updated 2023-07-04Database / BI2 minutes read (About 348 words)

Remote Connect MySQL Server from Client Machine Setup

You can only connect to MySQL Server from localhost after MySQL installation by default, but in production, all MySQL clients remotely connect to server, for simulating real production environment in your home network, some configurations need to be made to be able to let you connect MySQL from client machine other than localhost.

Revise or create MySQL configuration file (RHEL or Centos 7)

Modify or create /etc/my.cnf file

1	vim /etc/my.cnf

add a configuration item bind-address and let it value to be your MySQL server host ip address (eg. 192.168.1.114)

1	bind-address=192.168.1.114