Posted Updated Data / BI11 minutes read (About 1687 words)
Automation Process for Email Attachment Excel in Python
Working with business data, Excel spreadsheet is the most common file type you might deal with in daily basis, because Excel is a dominated application in the business world. What is the most used way to transfer those Excel files for business operation team, obviously, it’s Outlook, because email attachment is the easiest way for business team to share data and reports. Following this business common logic and convention, you may get quite a lot Excel files from email attachment when you involved into a business initiative or project to design a data solution for BI reporting and business analysis.
The pain points is too much manual work dragging down efficiency of data availability and also increasing the possibility of human error. Imagine, every day get data from email attachment, you need to check your inbox every once for a while, then download those files from attachment, open Excel to edit data or rename file to meet data process requirement such as remove the protected password, after those preparation works all done, push data to NAS drive, finally, launch the job to proceed the data. It’s not surprise that how easily you might make mistake because any single step contains error would cause the whole process failed.
It’s very necessary to automate the whole data process if business project turns to BAU (Business As Usual) program and you have to proceed data in regular ongoing basis. Python and Windows Task Scheduler provides a simple and fast way to solve this problem, now let’s take a look.
Overall speaking, this task can be broken down by a couple of steps:
Access Outlook to read email and download the attachment
Remove Excel file protected password (it’s common in business file to protect data privacy)
Manipulate and edit Excel file
Copy file to NAS drive
Setup the Windows Task Scheduler to run python script automatically.
Interact Outlook and Excel with Python
In this case, we are focusing on local operation because server setting is vary for different production environment. Python standard library can’t meet our need on this request, we have to leverage its third-party libraries to access outlook and interact with Excel so we are going to import win32com, openpyxl and shutil. We firstly work out workflow to guide us to compose the python scripts
From the workflow chart, we can tell there are four functions in the streamline. The first one is read email and download attachment readmail(), this step is critical and the most import business logic build-in it.
we are going to apply two business logics, one is the latest email, the other one is if the email is latest (current date) then check if the current date -1 equals to data date in the subject line. we will process file when all the requirements are satisfied.
For Excel manipulation and spreadsheet decryption are pretty straight forward, no additional business logic so just apply the single function. We create 3 python module files then encapsulate them into the main readmail() function script.
fileTransfer_module.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import win32com.client as win32 import os import os.path import openpyxl as xl from datetime import datetime, timedelta import sys import shutil
defcpfile(src_file, tgt_file): if os.path.exists(tgt_file): os.remove(tag_file) shutil.copy(src_file, tgt_file) else: shutil.copy(src_file, tgt_file) if __name__ == "__main__": cpfile(src_file, tgt_file)
import win32com.client as win32 import os import os.path from datetime import datetime import sys
defxlpwd(): ''' function xlpwd() is aiming to peel off attachment excel file password to be able to be ready by program arg: opt1_opt2 value: "opt1" to deal with one excel file; "opt2" to deal with another excel file ''' opt1_opt2 = sys.argv[1] excel = win32.Dispatch('Excel.Application') mon = '0' + str(datetime.today().month) ## suppose password is letters and 2 digits month combination if opt1_opt2 = "opt1": pwd = "randomletters" + mon fn = "yourExcelFile_1.xlsx" sn = 2## sheet number st = "tab name 1" elif opt1_opt2 = "opt2": pwd = "otherrandomletters" + mon fn = "yourExcelFile_2.xlsx" sn = 1 st = "other tab name" prd_dir = r"\\your NAS UNC address" file = prd_dir + os.sep + fn wb_tgt = excel.Workbooks.open(file,0,False,5,pwd) ## open encrypted Excel file wb_tgt.Password = ""## remove password wb_tgt.Worksheets(sn).Name = st ## define the tagart worksheet order and name wb_tgt.Save() wb_tgt.Close() excel.Quit() if __name__ == "__main__": xlpwd()
import os, os.path import sys from datetime import datetime, timedelta import win32com.client as win32 from excelOp_mudule import xltemp
defreadMail(): ''' function readMail is aiming to read outlook email attachment and download them arg: lob, app, c_type value: lob == 'you line of business' and app == 'application name generate the source file' and c_type == 'business category' ''' lob = sys.argv[1] app = sys.argv[2] c_type = sys.argv[3] outlook = win32.Dispatch("Outlook.Application").GetNameSpace("MAPI") type1 = outlook.Folders["your email address"].Folders["your inbox subfolder1"] type2 = outlook.Folders["your email address"].Folders["your inbox subfolder2"] type3 = outlook.Folders["your email address"].Folders["your inbox subfolder3"] ## if your inbox subfolder name has convention you can use while loop inbox = outlook.GetDefaultFolder(6) ## Microsoft Outlook API number for inbox is 6 type1_msgs = type1.Items type2_msgs = type2.Items type3_msgs = type3.Items inbox_msgs = inbox.Items file_ind = 0 folder = "" if lob == "finance"and app == "app1"and c_type == "consumer": mail_items, file_ind, folder = type1_msgs, 0, "finance_files" elif lob == "marketing"and app == "app2"and c_type == "small business": mail_items, file_ind, folder = type2_msgs, 1, "small business files" elif lob == "inventory"and app == "app3"and c_type == "cooperate": mail_items, file_ind, folder = type3_msgs, 2, "cooperate files" else: mail_items = None path = r"\\your file staing folder directory" + os.sep + folder num_mails = len(mail_items) lst_mails = list(reversed(range(num_mails))) id_mail = lst_mail[0] email = mail_items[id_mail] subject = email.Subject ## Outlook API Subject line object if file_ind in (0, 2): num = -13## depends on your own situation date_mail_str = subject[num:] if date_mail_str[0] != ' ': date_mail_dt = datetime.strptime(date_mail_str, "%b. %d, %Y") else: date_mail_dt = datetime.strptime(date_mail_str, " %b. %d, %Y") elif file_ind == 1and datetime.today().strftime("%a") != "Mon"and datetime.today().day <= 10: date_mail_str = subject[-8:-1] + '2020' date_mail_dt = datetime.strptime(date_mail_str, " %B %d %Y") elif file_ind == 1and datetime.today().strftime("%a") != "Mon"and datetime.today().day <= 10: date_mail_str = subject[-8:-1] + '2020' date_mail_dt = datetime.strptime(date_mail_str, "%B %d %Y") received_time = email.ReceivedTime ## Outlook API receive time object today = datetime.today() ## check availiability of the latest file if today.year == received_time.year and today.month == received_time.month and today.day == received_time.day: avail_ind = 1 else: raise AttributeError("the latest file is not available! check with business team") ## check if the file is right copy if avail_ind == 1and (file_ind == 0or file_ind == 1): val_dt = received_time - timedelta(days = 1) if date_mail_dt.day == val_dt.day and date_mail_dt.month == val_dt.month and date_mail_dt.year == val_dt.year: valid_copy_ind = 1## usually business file for current date is yesterday's data if data is daily basis else: raise AttributeError("file copy is not right! check with business team") elif avail_ind == 1and file_ind == 2: val_dt = received_time if date_mail_dt.day == val_dt.day and date_mail_dt.month == val_dt.month and date_mail_dt.year == val_dt.year: valid_copy_ind = 1## sometime current date file is current date data depends on business process else: raise AttributeError("file copy is not right! check with business team") if valid_copy_ind == 1: attachment = email.Attachment.item(1) report_name = attachment.FileName os.chdir(path) input_file = os.getcwd() + os.sep + date_mail_str + report_name ifnot os.path.exists(input_file): attachment.SaveAsFile(input_file) xltemp(input_file, file_ind)
if __name__ == "__main__": readMail()
Schedule jobs to auto check your inbox and execute python scripts
If you use Linux as your local machine then there are so many scheduling tools such as crontab, for Windows users, you can use GUI tool Task Scheduler to to the automation scheduling task. In this case, we use Task Scheduler.
simply follow the wizard to create local jobs to implement above readMail.py and pwdDery_module.py scripts. After decrypted Excel files push to your server then trigger server jobs so that make the whole data process automated, no more manual work.