Automation at Work: Windows File Sorting

Automationwork.jpg

Learning Python has been an extremely rewarding experience and not only because it helps me achieve my goal of a future career in tech. In fact it is already helping me in my current career! Since learning how to code I have been able to find ways to make my non-tech supply chain work more productive by implementing Python programming solutions.

I am going to discuss my favourite of these solutions but first I’ll have to highlight the problem.

Let me take you in to the chaotic nonsense world of small business supply chain and logistics. You see far away from your logistics giants like Coca Cola, Unilever, Amazon etc. there is a great ocean of small businesses shouting “we want growth at all costs!” followed by a limping whisper of “as long as those costs directly contribute to revenue.”

In a small business this leaves non-revenue generating operations and finance a bit out in the cold. However it’s not hard to see why this view is taken. You can have the best functioning mega-efficient rockstar operations team in the world and it won’t matter a lick if you don’t generate revenue.

All this is to say that in a small business with a physical supply chain, operations is typically only funded to the point of functionality and no further. This results in small teams that spend so much time on firefighting and urgent day to day that there often isn’t time for finding higher level efficiencies. As such, to finally get to the point, there is a lot of room for task automation as it is a kind of efficiency won’t have been implemented yet.

As the title gives away I wrote a Python script to do the glamorous job of file sorting. Our warehouse sends 20+ reports and documents a day, all via email. These are stock reports, sales notifications, outbound reports, inbound reports, movement reports - basically if it can be reported on you better believe it’s coming to my inbox. Previously no one ever had time to manually download these and sort them into folders. If you needed something in a report you had to use outlooks less than stellar search function. Of course in doing this my team were also having to sift through the thousands of standard emails in the inbox around these reports. Having our vital day to day reporting hidden amongst a mountain of clutter wasn’t very efficient or in any way good practice.

Okay so we’ve identified the problem, that’s the first step. Great job!

Discover & share this Fish GIF with everyone you know. GIPHY is how you search, share, discover, and create GIFs.

Well nemo and friends, now we write Python in our spare time to fix this issue. The script had two problems to solve:

  1. Get the report file attachments off of outlook and onto our file system.

  2. Sort the downloaded report files into folders for each report type.

This ended up being two separate Python scripts, one for the first task and another for the second. This seemed the better solution as one required internet and Outlook access while the other works entirely on the local machine.

Automating Outlook proved tricky and the solution that I decided on was using Outlooks internal rules system to pull as much of the weight as possible. Setting up detailed rules to forward a copy of any email containing a report attachment to a subfolder called ‘reports’ in my email. I backdated this to search Outlook for any pre-existing emails matching these rules resulting in the immediate forwarding of 4,000 emails to the reports inbox.

After this I could be pretty sure that any email in the ‘reports’ subfolder contained an attached warehouse report. Using the Pywin32 module for python I wrote a simple script to access my outlook and download all the attachments from any emails in the ‘reports’ inbox to a folder called ‘To Be Sorted’. After which it would then delete the emails to prevent reports from being downloaded multiple times.

Great! Now I had the reports off of Outlook and the only problem left to solve was the monstrous ‘To Be Sorted’ folder which had 4,500+ files in it with more coming in every day. This is where the second Python script came in. A script you can view on my GitHub: Windows File Sorter.

The first task was to establish how many different types of reports were in the ‘To Be Sorted’ folder. This would be the number of separate folders / directories I needed to create and sort these reports into. 18 was the magic number in the end, 18 different reports requiring 18 different sets of rules in the Python script for sorting. This was of course done through the overzealous use of ‘if statements’:

The examples if statements in my GitHub repo. The version I use has 17 elif statements.

The examples if statements in my GitHub repo. The version I use has 17 elif statements.

The script sorts files based on their filenames (and extensions), the if statements can get as complex as necessary to single out a single type of file:

elif-statement-textprocessing

This is an example from the script I use to sort work reports. It essentially checks, Is it an excel file? If it is an excel file then when you remove “-” and “_” from the file name are you only left with numbers? If yes then this must be an Ecommerce stock report, assign it to the “Ecom Stock\\” destination.

The more rules you have the more creative you have to get. This is because you have to make sure that each rule only applies to that one specific file type.

Any one can use this sorting script as I have generalised it for my GitHub, you only need to tell it where to find the files and add your own if statement rules on where certain files should be sent on your system. The actual moving of the files is done by a more complex ‘move’ function that will also delete duplicate files based on rules around filename, file size and file creation date. If you want to see how that works then check out the script on my aforementioned GitHub.

This isn’t a one off problem, we still receive new warehouse reports every day. As it is an ongoing issue I wanted to have these new reports sorted pretty much as soon as they came in. This way we would always have access to the most up to date information. I knew I therefore had to also automate running the scripts themselves as I couldn’t just sit at work all day firing off Python.

On Linux this is a pretty simple problem to have, you can just create a cron job for the script and it will run according to the timing rules you set. On Windows the best solution I could find was to use the built in ‘Windows Task Scheduler’ a joyless piece of software that breaks if you so much as whisper too loudly at it. However I did eventually get it working and running my scripts every 10 minutes, downloading anything in the ‘reports’ inbox and sorting it into it’s correct folder.

So there it is! A bit of automation at work, this has already saved me an untold amount of time. Searching for the right report in emails was a massive slog. Now it’s easy, I just open the designated folder for the report type I need and it’s right there. This has also benefitted the rest of my team as well because the script runs in a shared folder so everyone can have access to the sorted reports!

Previous
Previous

How to Learn Git From Scratch

Next
Next

Is It Worth Learning C?