When Excel Doesn’t Cut It: Using R and Python for Advanced Data Tasks

When Excel Doesn’t Cut It: Using R and Python for Advanced Data Tasks

If you know anyone passionate about Microsoft Excel, then you know conversations about its uses can get emotional very quickly. People who’ve mastered every plugin and pivot table sometimes have a hard time believing there’s anything Excel can’t do. This podcast, titled “A Beehive of Excel Devotees” by Roger Peng and Hilary Parker, is a perfect illustration of that mentality.

It underscores a point that is missing in most discussions: you need to pick the right tools for the job at hand. Sometimes, Excel is the right tool. Sometimes, it isn’t. If you’re looking to build a career as a business analyst or data scientist, having a plethora of tools in your back pocket as well as the skill set needed to use them, is a must.

And it’s okay to use Excel. It is okay. Excel is a little tool that is suited to fulfill many needs. But professionals working with data must also understand that it’s not the best solution for every task. Let’s dive deeper and look at three scenarios in which Excel may be the best option.

1. You Want a Quick Look at Some Data

Excel is ubiquitous in the business world. If someone in another department sends you data to analyze, it’s probably an Excel spreadsheet. If so, well, why would you spend time converting that data to another format when it’s in Excel already? Chances are they sent it to you in Excel because that's all they know how to use. Excel is their hammer, and every data set looks like a nail.

Even if they send you a text file (most likely tab-delimited or comma-delimited), it is probably more efficient to open it up in Excel to take a look.

2. You Need to Share Your Results With Other People

Even if you don’t use Excel to perform your data exploration and analysis, once again, the chances are high that you will need to share those results with people who only know Excel. At my last job, I was the only person out of 300 or so employees who knew and used R. Everyone else used Excel (or maybe Google Sheets or Open Office). So whatever data I analyzed, I almost always had to “send it back” in an Excel format (size constraints notwithstanding).

It often was not sufficient to send a report. The stakeholders wanted to “touch” the data and formulas. So it was more efficient to do everything that I could in Excel as per the stakeholders’ request.

3. You Need to Perform an Ad Hoc Analysis

A few years ago, I was contracted to write a data science business plan for a Software as a Service (SaaS) organization. What tool do you think I used for the financial analysis and projections? Excel, of course.

Excel was my first and only choice for the financials. And it lends itself well to not only performing the number crunching but also formatting the results for easy inclusion in the final report (MS Word document).

Considering that I would only need to do this analysis once and that Excel was perfectly fitted to the financial job at hand, this was a no-brainer.

No matter what level of proficiency you’re at with Excel, if you find it to be a useful tool, it’s worthwhile to know it’s capabilities fully. Foundational, Intermediate, and Advanced Microsoft Excel classes cover every tool and strategy. 

When Should You Use a Tool Other Than Excel?

That depends on the situation. Recently, I was asked to do an analysis of IT trouble tickets. I was sent a spreadsheet of data and specifically asked to find out why the mean-time-to-resolution (MTTR) was so high. It was no small task. (For those of you with a statistics background, the mean-time is probably not appropriate. The median-time-to-resolution is a better gauge of efficiency, in this case).

I did not use the Excel spreadsheet given to me. I instead used a Business Intelligence tool, Qlik Sense.

Here’s why I went with Qlik Sense over Excel:

  1. This would be repeated every month—maybe even more frequently—and I didn’t want to have to redo the analysis over and over each time.
  2. I could query the data automatically from the various sources and perform the Extract-Transform-Load (ETL) using scripts.
  3. I could create a visualization dashboard that was easily shared and would eliminate the need for me to “babysit” the delivery of the report.

Excel was not the best tool for this project.

Some other considerations would be dataset size and types of analysis. If the data set is too large for Excel, then some other tool is mandated. In this case, you should consider online tools or R or Python. You might also want to look at PowerPivot, a very powerful add-in to Excel that can mitigate dataset size constraints.

If the project will be repeatable, then a scripting language like R or Python would be more appropriate. And some analyses cannot even be done in Excel. Try and do natural language processing with a spreadsheet, and you’ll see what I mean. 

If the project needs to be peer-reviewed, then you must carefully document your code and thinking process to conform to reproducibility standards. Trying to add documentation that outlines a linear methodology in Excel is almost impossible for other people to consume and review. Of course, you can use cell comments, but what if that means adding a sheet of documentation for every sheet in the workbook, or adding a cell of documentation for every cell formula? Your documentation can quickly turn into spaghetti. The only thing worse would be no documentation. 

And if you are trying to learn a new language or tool, stop falling back to Excel every time you have a new task or project. Make sure you’re learning how to use new tools. It’s so easy to double-click that file and open it in Excel, but you will never fully understand why other tools like R or Python are necessary unless you use them.

Comments

Post a Comment

Popular posts from this blog

How to Prepare for Machine Learning Security Risks