Python Development Bootstrap on Managing Environment and Packages

Python Development Bootstrap on Managing Environment and Packages

If you have basic python experience and knowledge and willing to develop a real python application or product, I think that post is right for you. Let’s recap how do we start to use Python, we either go to official site to download python, install it on system wise then jump start to write “Hello, world!” or install Anaconda to get a long list of pre-installed libraries and then do the same thing. I would say that is totally ok by using python globally or system wise at the beginning stage, but as we accumulate enough python scripts and do the real python project, we may found something really painful during our development

  • Version confusion: different python versions reside together as well as pip versions
  • Package redundancy: too many libraries in one place, some of them you use only for some practice and never use them again
  • Package version conflict: that is the most severe one and headache, imagine
    • your early development based on python2, you move to python3 afterwards, but you found your early applications break after you upgrade package version on which your python2 applications dependent, because those libraries are not backwards compatible
    • even all your development based on python3, you still stuck on the situation that one library of your new development needs high version sub-dependency package but your early development rely on the same sub-dependency package but with lower version
    • when you work with team, you need to pass your work to other teammate to test, your application breaks due to there no unique environment and packages between your machines

Therefore, for real python production development, the first step should set up a proper virtual environment to be able to compatibility and collaboration before jump to coding. Thanks to community contributors, we have multiple choices to meet our different requirements and needs, there are 3 ways to do that, they are

  1. Python early official solution: venv (virtualenv) and pip
  2. Python latest official solution: pipenv
  3. Conda environment and package management tool

now let’s see how do we do the configuration.

Read more
Data Manipulation and ETL with Pandas

Data Manipulation and ETL with Pandas

Pandas is the most used library in Python to manipulate data and deal with data transformation, by leveraging Pandas in memory data frame and abundant build-in functions in terms of data frame, it almost can handle all kinds of ETL task. We are going to talk about a data process to read input data from Excel Spreadsheet, make some data transformations by business requirement then load reporting data into SQL Server database.

Read more
js">