SUMIF vs. GETPIVOTDATA in terms of performance in #Excel

Did you experience a slow workbook? How slow? Let’s say it takes more than 2 minutes to just open the file.

This is actually what a colleague asked me for help. When I opened the workbook, I saw LOTS of SUMIFS across tens of worksheets. And all SUMIFS refer to a common dataset on the workbook.

Then I asked: “Would you consider using Pivot Table instead of so many SUMIFS?”

The answer was “NO” as she needed to use formula to return results that fit into the “pre-set” table layouts, which I totally understood… such a common workplace’s scenario!

Then I proposed GETPIVOTDATA instead of SUMIF(S).

Her responses told me that she didn’t know about GETPIVOTDATA. Maybe I should write a blogpost about GETPIVOTDATA. Before that, let’s test if GETPIVOTDATA is more efficient than SUMIF.

To do the test, I have created two simple columns, with 100,000 rows (large enough to see the time difference) as shown below:

Column A contains 1,000 unique items spread over the range A2:A100001; while Column B are random numbers.

Then on Column E, I generate a list of the first 200 items, i.e. A0000001 to A000200.

On column F, I calculate the total for the (200) items using SUMIF; while on column G, I calculate the same using GETPIVOTDATA.

Let’s see in more details.

In F2:F201, I input the following formula:

=SUMIF($A$2:$A$100001,E2,$B$2:$B$100001)

This is very straight-forward.

In order to use GETPIVOTDATA to obtain the same results as if using SUMIF(s), I have to create a pivot table first.

I inserted the pivot table in J1 on the same worksheet, as shown below:

This pivot table simply summarizes the values of the 1,000 items in the range of A1:B100001

Now, with this pivot table inserted in J1, I would get the following auto-generated GETPIVOTDATA function when I input a = sign in G2 followed by clicking the cell K2. See below:

This formula will look into the “Value” from the pivot table located in $J$1, returning the corresponding item of “A0000001” that is “hard-coded” in the formula. As I need the formula to be copied all the way down to G201, I have to change hard-coded argument into a variable, like this:

In short, I input the following formula into G2:G201:

=GETPIVOTDATA("Value",$J$1,"Item",E2)

Let’s recap what we have prepared so far

  1. A range of 200 cells using SUMIF
  2. A range of 200 cells using GETPIVOTDATA via a Pivot Table

Both ranges of data return exactly the same results, i.e. total of the 200 items listed in E2:201 based on the dataset in A2:B100001

To test which function is more efficient, we simply calculate the time required to calculate the range, with the help of some VBA codes. If you are interested in how, take a look at this article. Indeed, I strongly recommend you read the entire article.

Let’s see the result

Five tests were run separately on the range F2:F201 and G2:201, where SUMIF and GETPIVOTDAT is used respectively.

Here’s the result for SUMIF: On average it takes 0.76 second.

Here’s the result for GETPIVOTDATA: On average it takes 0.017 second.

0.76s vs. 0.017s…

K.O.

GETPIVOTDATA runs 40+ times faster than SUMIF in this scenario.

Why is that?

I am not an expert in performance optimization nor do I have the technical knowledge to answer this question. My wild guess is the huge number of cells involved in the two functions. Think about the case for SUMIF, each formula looks into a range of 100,000 cells; while for GETPIVOTDATA, each formula looks into only 1,000 cells. Is that a big difference already?

Having said that, there is one major drawback using GETPIVOTDATA. The result is not instantly recalculated with updates. See below:

When the data source is updated, SUMIF recalculates instantly. However, the result returned by GETPIVOTDATA remains unchanged until the related pivot table has got refreshed, manually.

At this point, you may be thinking… that’s not ideal. No, it is not. But think twice. For a slow workbook because of lots of inefficient formulas, you would probably turn the calculation option to manual; and every time you need updated results, you have to press F9 to recalculate. IF this is what you are doing, I’d say GETPIVOTDATA wins. 😉

Conclusion:

GETPIVOTDATA can be much more efficient than SUMIF(s). Although it’s not perfect, it would be a good alternative to tones of expensive formulas like SUMIF(S).

In case you would like to do the test by yourself, you may download the sample file.

Advertisement

About MF

An Excel nerd who just transition into a role related to data analytics at current company......😊 Recently in love with Power Query and Power BI.😍 Keep learning new Excel and Power BI stuffs and be amazed by all the new discoveries.
This entry was posted in Excel Tips and tagged . Bookmark the permalink.

3 Responses to SUMIF vs. GETPIVOTDATA in terms of performance in #Excel

  1. Deniz says:

    why not doing it with VBA? it will be both automatically and not leading excel to increase its size.

    Like

  2. Fiaafly says:

    Did your test include the time the pivot needs to refresh? 🙂 Really interesting though! Thanks

    Like

    • MF says:

      No it didn’t. But in my experience, pivot table refresh is much quicker than the time needed for calculating thousands of sumif. 😄

      Like

Comments, suggestions, corrections are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.