Filter: It’s easy and not easy

Applying filter in Excel is a very common task. We used to filter a dataset and focus on a subset of data that we are interested in. It is super easy to filter. The problem is, are you sure you have the right dataset being filtered?

Situation

I have a list of files stored in a folder. All files are generated automatically with the same filename convention: “FiscalYearWeekyyyyww.csv”, where yyyy = year; ww = week. The file started from week 01 of fiscal year 2018, all the way to the latest week. The latest file, say, is “FiscalYearWeek202105.csv”.

Consistent filename convention. Looks good! Right?

The task is to combine all files but those for FiscalYear 2018, using Power Query of course. (note this post is not about Power Query, but you may refer to other related post here).

Easy! We can exclude files of fiscal year 2018 by filter rows where ‘Name’ does not contain 2018, as shown in the following screenshot.

If this is what you do, please pause for 10 seconds and think carefully…

Do you really want to do this?

1…

2…

3…

10

Time’s up. I hope you have spotted the potential danger and revised your filtering rule.

What’s wrong with filtering where ‘Name’ does not contain 2018?

At the first glance, it makes perfect sense to filter out “2018” because we do not want data file of fiscal year 2018. Nevertheless, since we have already gone through 2020, the filename convention becomes a trap awaiting you to fall…

Let’s take a closer look at the files we have:

There’s the trap! 😨

We have the file “FiscalYearWeek202018” for the week 18 of fiscal year 2020. When we simply filter ‘Name’ where does not contain 2018, this file will be excluded too.

So how to deal with it given the situation we have? A simple twist to the filtering rule would do. We can revise the filtering rule to “does not contain k2018“.

As simple as this. 😉

If the filename convention is set in a more robust (or user-friendly) way, e.g. “FYyyyy_ww“, user can do the filtering task in a more intuitive way (i.e. does not contain FY2018) without worrying about the trap. This is something got ignored very often but could be very dangerous. Be cautious!

While we can rely on Excel to perform the tasks in an efficient way, it's our responsibility to construct an effective instruction.

What do you think? Did you set a trap for yourself unintentionally? Please share with us by leaving comments.

About MF

An Excel nerd who just transition into a role related to data analytics at current company......😊 Recently in love with Power Query and Power BI.😍 Keep learning new Excel and Power BI stuffs and be amazed by all the new discoveries.
This entry was posted in Excel Tips and tagged , . Bookmark the permalink.

Comments, suggestions, corrections are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.