How do you deal with large data sets in Excel? by Rob Collie
Answer by Rob Collie:
You see that? That's an Excel workbook with 300M rows of data in it. Yes, 300 million rows of data – and it's running right now on my laptop. Furthermore, the data "came" to me as multiple tables – 160 M in one table, 140M in another, and then two "reference" tables that are much smaller. Normally this would require tons of VLOOKUP, SUMIF, and/or array formulas in order to produce a cohesive analysis – and those things tend to bring "traditional" Excel to its knees even with tens of thousands of rows, much less hundreds of millions. But this workbook is FAST.
So what's the secret? I'm using Excel PLUS another piece of technology that:
- Is also built by Microsoft.
- Was designed from the ground up to be an extension to Excel, and learnable by Excel users.
- Is the most amazing thing I ever saw in my 13+ years working at Microsoft.
- Is available for free for most users, and for a small price (around $100) for everyone else.
- Has been on the market for 6 yrs!
- But Microsoft has done a terrible job marketing it. TERRIBLE.
It's called Power Pivot. And you absolutely should be using it. Every dollar I have made as a professional since leaving Microsoft in 2010 has come from Power Pivot – and today I make a much better living USING it, and helping others use it, than I ever did BUILDING software for MS.
(It's not just useful for large data sets BTW – I use it for ALL of my analysis work, even 50 rows of survey data.)
Couple extra tidbits about that 300M row workbook:
- The original data set occupies approx 40 GB in the database
- The workbook is 800 MB on disk – so Power Pivot yielded about 50x compression – and remains compressed even in memory!
- The workbook takes a moment to load from disk, but once it's loaded, it is lightning fast – recalcs in about 1.5 seconds.
For more information, here are two articles I've written: