Hadoop Data Side Load from SQL Server

Hadoop Data Side Load from SQL Server

Agile development and DevOps bring flexibilities and quick solutions to support business intelligent in timely manners. A technical platform and its associated applications and tools are able to turnaround very quick so that business analysts and data scientists would be able to leverage them to do the data modeling or machine learning, but in the other side, unlike functions buildup, data sync across different platforms is not that easy and quick especially for large organizations.

Background and the gap of data modeling for Hadoop early adopter

These years, big data and Hadoop are kind of trend for next generation data technic. Many companies adopt that as major data platform, but the most of data is still allocated in RDBMS data warehouse, business intention is to leverage high quality data in SQL database to build their analytical work in Hadoop, the data consistency is the first consideration from data perspective, but it is not a easy task because data is going to migrate to different platform with different operating system (from Windows to Linux). Technically, the best solution for the project is the build the direct connection from SQL Server and SSIS to Hive by using Apache Sqoop or utilize the JVM to build JDBC connection by JAVA, but for large organization, applying a new tool on production needs a quite lot approve work; developing JDBC connection facility also needs multiple level testing, those are taking a long time.

Therefore the solution is back to the foundation of the Hadoop - file system. Because SSIS cannot write to Hive directly using ODBC (before 2015 version). The alternative is to create a file with the appropriate file format and copy it directly to the Hadoop file system then use Hive command to write metadata to Hive metastore, the data will show up in the Hive table and also available in Cloudera Impala.

Read more
Oracle client side configuration for 12C

Oracle client side configuration for 12C

The client side need to do some configurations after Oracle 11g upgrade to 12C on Server in order to make database server is connectable. Before starting to configurate your clients, you have to get the below new server information from DBA

  1. Host name
  2. Port number
  3. Service name
  4. Your user name(usually it won’t be changed and replicated from old version)
  5. Password(initial password for test connection then you need to update it)

and then you need to make a new connection strings to add it into ORA file(*.ora)

oracle12cinfo.png

Read more
Python Environment Setup for Implementation

Python Environment Setup for Implementation

Python is a good scripting language to boost your productivity on data analysis and BI reporting. As open source language, you can easily get the binary installation file from python official site for windows and source code on vary versions for Linux, in production, it’s better choose installation approach by source code.

We also need to setup python environment after installation so that we can not only use python interpreter to develop but also make it executable by CLI and even some ETL tool such as Microsoft SSIS.

Python environment variable configuration and local folder set up for your file, package and library

If python is installed in system-wise, then you need to create some new folders to store you python file, package and library, e.g. python install path is “D:\Python36", then you need to add python executable interpreter to be a part of the PATH variable. Next create python environment variable PYTHONPATH with the following paths and create empty file __init__.py file in each of these folders:

  • create a new folder under D drive “D:\pyLib” and set that directory as value of PYTHONPATH and create __init__.py file in “D:\pyLib”
  • you can also create subfolder to assign different permissions for different user group
    • create a subfolder “D:\pyLib\AD-group1” and create the __init__.py file in it.
    • create a subfolder “D:\pyLib\AD-group2” and create the __init__.py file in it.

For Linux, if you install python3 by source code and directory is /usr/local/python3, then edit ~/.bash_profile file, append the python directory into PATH

1
2
# Python3
export PATH=/usr/local/python3/bin/:$PATH

then run source ~/.bash_profile let setting take effect

if your system pre-installed python2 then it’s necessary to make a soft link

1
2
ln -s /user/local/python3/bin/python3 /user/bin/python3
ln -s /user/local/python3/bin/pip3 /user/bim/pip3
Read more