Python Environment Setup for Implementation

Python Environment Setup for Implementation

Python is a good scripting language to boost your productivity on data analysis and BI reporting. As open source language, you can easily get the binary installation file from python official site for windows and source code on vary versions for Linux, in production, it’s better choose installation approach by source code.

We also need to setup python environment after installation so that we can not only use python interpreter to develop but also make it executable by CLI and even some ETL tool such as Microsoft SSIS.

Python environment variable configuration and local folder set up for your file, package and library

If python is installed in system-wise, then you need to create some new folders to store you python file, package and library, e.g. python install path is “D:\Python36", then you need to add python executable interpreter to be a part of the PATH variable. Next create python environment variable PYTHONPATH with the following paths and create empty file __init__.py file in each of these folders:

  • create a new folder under D drive “D:\pyLib” and set that directory as value of PYTHONPATH and create __init__.py file in “D:\pyLib”
  • you can also create subfolder to assign different permissions for different user group
    • create a subfolder “D:\pyLib\AD-group1” and create the __init__.py file in it.
    • create a subfolder “D:\pyLib\AD-group2” and create the __init__.py file in it.

For Linux, if you install python3 by source code and directory is /usr/local/python3, then edit ~/.bash_profile file, append the python directory into PATH

1
2
# Python3
export PATH=/usr/local/python3/bin/:$PATH

then run source ~/.bash_profile let setting take effect

if your system pre-installed python2 then it’s necessary to make a soft link

1
2
ln -s /user/local/python3/bin/python3 /user/bin/python3
ln -s /user/local/python3/bin/pip3 /user/bim/pip3
Read more
Generate non comma delimiter CSV file

Generate non comma delimiter CSV file

This is some tip but sometime makes you life easier when you work with business team and deal with data from business input. Excel spreadsheet is kind of standard file format to communicate with business team, but in data manipulation side, it’s not a ideal input data format, technically, we usually convert spreadsheet to csv file. But default comma delimiter might cause some trouble because business data might contains quite a lot of , in attributes such as comments, suggestions, reasons etc. In order to better identify the column, non-comma delimiter should be used like | pipe.

How do we generate the | delimiter csv file. There are two ways

Read more
Sandbox solution for BI reporting and business analytics

Sandbox solution for BI reporting and business analytics

At beginning, I’d like to share a story from my client and business partner. One day, my team worked on a big marketing project, data from all kinds of source like spreadsheet, csv, mainframe and SQL Server, we had to do cross reference all those data to generate analysis reports but the headache thing was they were isolated and no way to put them into a single unique environment to run one time query then return the results so we could only open multiple windows to compare them by eyeballs.

During the project, we often composed some complex queries then ran for a long time to return the result dataset, those datasets were quite important for future further analysis and share with the whole team, but the another panic thing was we could not save those dataset into database due to insufficient access so what we did was copy and paste everything in excel spreadsheet, after for a while, we found number of excel file explode and hard to find the report among those huge files, we feed up with the tedious work and decided created a bunch of views in database but that was also not controlled by us but infrastructure team, all we could do was submit the request then followed infrastructure team’s schedule and waited for month end deployment, no matter how urgent those reports would be.

That is the story, I think if you had ever experienced that, that solution might be right for you.

Read more
A little thought on SQL query performance optimization 1

A little thought on SQL query performance optimization 1

Working with data and database, writing query is daily routine, query running time might be big different for different queries but serve the same purpose, which shows query performance is not 100 percent determined by database system configuration, server hard ware, table index and statistics etc. but how well do you use SQL to construct your query.

To satisfy user strong demand, we built sandbox database on production for business partners to server their needs of BI reporting and business analytical, in the meanwhile, we are also in charge of database maintenance and monitoring so that we got chance to collect all kinds of user queries. Some of them even crash the system or drag the database performance down apparently, here I want to share my thinking of query optimization on large amount of data.

Before writing a query

Just like a project, a query is able to achieve the things with backend business logics even if it is small, so make a plan before creation is very necessary to be able to let the query return result in flash and only consume minimum system resources.

  • What level detail of data do you want to query from? Business data is built on certain level, e.g. in common scenario, there are customer level, account level and transaction level data. It’s better clarify target data is on which level or mix different level.
  • What the time scope? The data amount volume would be big if you want to query multiple years even months, so if you evaluate the data is big, it’s better use loop or paging technic instead of returning outcome in one single page.
  • What are query tables look like? A table contains a lot of information, not only business attributes or columns but data type, keys, index, constraints, triggers. Familiar with table structure is going to give additional benefits when you write your queries against those tables.
Read more
How to let local images display on your hexo blog website

How to let local images display on your hexo blog website

Hexo blog framework friendly supports markdown which is the most popular syntax for technical blog writing. When you try to illustrate some ideas, it’s common to use screenshot photos in your articles to get reader better understanding so that you have to insert images into your blog, unlike typing text content that is straight forward, inserting images and deploying them to public blog site might be some tricks there, let’s see some common scenarios on images display issue and how to resolved them.

Use absolute path to insert image

The intuitive way for junior blogger is referencing absolute path + file name, I would say there is no any problem if the blog is only for local review or your local machine is a web server to host your blog website, otherwise, your images won’t be displayed on your public blog site after deployment if without any configuration. Like below shows when you use that way inert on your local blog, images can be displayed as expected

Read more
Remote Connect MySQL Server from Client Machine Setup

Remote Connect MySQL Server from Client Machine Setup

You can only connect to MySQL Server from localhost after MySQL installation by default, but in production, all MySQL clients remotely connect to server, for simulating real production environment in your home network, some configurations need to be made to be able to let you connect MySQL from client machine other than localhost.

Revise or create MySQL configuration file (RHEL or Centos 7)

Modify or create /etc/my.cnf file

1
vim /etc/my.cnf

add a configuration item bind-address and let it value to be your MySQL server host ip address (eg. 192.168.1.114)

1
bind-address=192.168.1.114
Read more
Linux network auto boot and restart

Linux network auto boot and restart

Centos 7 network is disabled by default after installation and initialization, which causes network connection can not be made until you manually turn on it

centos7_network_icon.png

That is such annoying when you usually use SSH tool remote connect to centos workstation or server, so it’s quite necessary to turn on the network automatically every time reboot machine or VM. To accomplish that, we need to modify the network configuration file.

Read more