Append Queries + Group By = Another alternative for a better Full Join
In the previous post, we explored an alternative way to perform FULL OUTER JOIN with Left Outer Join + Right Anti Join, followed by Append Queries in order to get a result with better layout. Well, as it’s Excel, that means there are different ways to achieve the same. In this post, let’s see how we can do the same using Append Queries + Group By.
Again a picture tells thousand words. The following diagram illustrates the concept:
In the end of this post, I will post a video (work in progress 😅) riding on the example in previous post to demonstrate it. From which you will see the two different approaches yield the exactly the same result.
For illustration purpose, two simple tables are used for this post. Let’s see how to do it step by step in Power Query!
You may download a Sample File to follow along.
A. Load the two tables into Power Query
Select a table, then
- Go to Data tab
- From Table/Range
Close and Load to Connection only. Then repeat the above steps for the other table.
Now we should have the two tables in Power Query Editor.
B. Append Queries as New
In the Power Query Editor
- Go to Home tab
- Click the drop-down menu of Append Queries
- Append Queries as New
- Select the primary table
- Select the table to append
- OK
C. Group By
- Select the columns (Attribute 1 and 2) we want to group
- Go to Transform tab
- Click Group By
- There is where we need to input the new columns and the operations we need, one by one:
- New Column Name – manual input. In our case, we want to keep the original headers. That’s why we input Value 1, 2, 3 respectively.
- Operation – Select from drop-down menu. There are common operations like SUM, AVERAGE, MIN, MAX, etc there. In our case, we select SUM.
- Column – Select from drop-down menu. There are all the columns in the current table. Select the column where you want to apply the Operation to. In our case, we select Value 1, 2, 3 respectively.
- When we are done with one column (Value1), click the Add aggregation and repeat the above for Value2 and Value3.
- OK
In summary, we ask Power Query to create a new column called “Value1” by summing the values in the original column “Value1“; group them by Attribute1 and Attribute2.
In plain language: Give me a summary table by grouping Attribute1 and Attribute2; sum the values as summary. Think about what we do with Pivot Table. 😉
Here we go!
Let’s Close and Load the result to worksheet and see!
The two tables are JOINED, FULLY! 🙌
Isn’t it cool?
Some considerations:
- If there are lots of values in the two tables, it’s a bit tedious to do the Add aggregation in the Group By step.
- I guess there may be performance issue on BIG table with lots of values. I haven’t tested it nor compared it with the approach using two merges + append. If you know the answer, please leave your comments below.
- It works only if you want to pull the values (numbers that we can apply SUM operation to) of the two tables together. It doesn’t work if you want to put attributes (text cannot be summed).
- Any other thoughts? Please share with us by leaving comments below. 🙂
Video is now ready coming soon. Please enjoy stay tuned. 😉
You may down a Sample File with the previous example here.
Another way: Merge Sales and Traffic with Full Outer join, add two conditional columns, remove most of the other columns:
let
Source = Sales,
MergedQueries = Table.NestedJoin(Source, {“Store”, “Date”}, Traffic, {“Store”, “Date”}, “Traffic”, JoinKind.FullOuter),
ExpandedTraffic = Table.ExpandTableColumn(MergedQueries, “Traffic”, {“Date”, “Store”, “Traffic In”}, {“Date.1”, “Store.1”, “Traffic In”}),
AddedConditionalColumn = Table.AddColumn(ExpandedTraffic, “Custom”, each if [Date] = null then [Date.1] else if [Date.1] = null then [Date] else [Date]),
AddedConditionalColumn1 = Table.AddColumn(AddedConditionalColumn, “Store1”, each if [Store] = null then [Store.1] else if [Store.1] = null then [Store] else [Store]),
RemovedOtherColumns = Table.SelectColumns(AddedConditionalColumn1,{“Custom”, “Store1”, “Trans Count”, “Unit Sold”, “Net Sales”, “Traffic In”}),
RenamedColumns = Table.RenameColumns(RemovedOtherColumns,{{“Custom”, “Date”}, {“Store1”, “Store”}}),
SortedRows = Table.Sort(RenamedColumns,{{“Store”, Order.Ascending}, {“Date”, Order.Ascending}}),
ReplacedValue = Table.ReplaceValue(SortedRows,null,0,Replacer.ReplaceValue,{“Trans Count”, “Unit Sold”, “Net Sales”})
in
ReplacedValue
LikeLike