Data Blending. Joins in details

Data Blending. Joins in details

I am back with a second chapter of my Data Blending trilogy. 

I have recorded a tutorial, so if you prefer video format, here you are!

 

You can find the links to the previous tutorials here: first part, second part.

In the previous part we’ve already talked about what is a blending data source, how to create a blended data source in Google Data Studio. Today I would like to draw your attention to different types of joins. So, as we remember, we have 5 types of joins.

gds_joins_gaille

We have Left Outer, Right Outer, Inner, Full Outer and Cross joins. Let’s see the examples of how they work.

google_facebook_ads_blending_gaiile

Here we have two tables with data from Google Ads (left table) and Facebook Ads (right table) that help us visually notice and see the difference between different types of joins. 

 

Left Outer join

left_join_gaille

The main principle of Left Outer join is that we take all the data from the left table and try to find matches in the right table. I’ve decided to sort it by date.

left_join_google_facebook_gaille

Both of the Google and Facebook tables have four rows. Google Ads table has data from 5, 6, 9 and 4 of August, and as we’ve chosen the Left Join, Facebook Ads table has 5, 6, 9 and 4 of August as well.. We have a cost parameter here. You see, in the Google Ads table we have the cost for all the dates, and here we have the cost for all the dates too. In the Facebook column in the Left Join table for 9th of August we have null, because in Facebook ads table we don’t have 9th of August – we have only 7, 6, 5 and 4. So, Google didn’t find any row for 9th of August in the Facebook table and added null. Null means “don’t have information”.

Right Outer Join

The logic of creating a blended data using Right Outer Join is pretty much the same. The only difference is that we take all the rows from the right table, not from the left. It’s sorted by date.

Right_join_gaille

 

 

Our Facebook is a right table, so we have 4 rows here again, because the Facebook table has 4 rows, and it is 7, 6, 5 and 4 of August, like the Facebook has 7, 6, 5 and 4 of August. In the Right Join table we have a cost for all Facebook for all the days. In Facebook Ads table we have cost for 7th of August and here we have this data whereas Google doesn’t have information for 7th of August, so we have null here. 

 

Inner join

Inner Join matches only the rows we can find both in the left and in the right table. 

 

So, we can see that in the Google table we have 9 (that Facebook doesn’t have) and 6, 5, 4 of August. And the Facebook table has 7 and 6, 5, 4 of August. That means that 6, 5 and 4 of August are common for both of the tables, whereas 9th and 7th are not. So, in the Inner Join table we will have only 3 rows that we can find in both tables. 

The result is here! We have only 3 rows – 6, 5, 4 of August. We exclude data from 7th and the 9th. 

 

Full Outer join

Full Outer Join works in an opposite way. Here we have all the data from the left table and all the data from the right table. If we cannot find some data to match, we put null

full_outer_join_gaille

And here we will have 5 rows. Here for Google Date we have 4 rows and for Facebook Date 4 rows. But for 9th of August we have null for Facebook Date and null for Facebook Cost. And for 7th of August we have null for Google Date and null for Google Cost. But in total, we have all the rows.  I have sorted it by Google Date.

 

Cross join

It is the most rare to use a join type. I almost haven’t used it in practice.Still, it’s better to know how it works.

cross_join_gaille

In a Cross join we take the first row from the Google table and multiply it for the number of the rows in a Facebook table. So, we will have the 9th of August from Google and the 7th of August from the Facebook table. Then again the 9th of August from Google with 6th of August from the Facebook table. Again 9th of August from Google and then 5th of August from Facebook. All rows from one table will be multiplied with all the rows from other tables. And we have 4 and 4, in total we will have 16 rows. 4 rows for 9th of August, 4 rows for 6th of August etc. 

In our example we can see that we have 9th — 7th of August, 9th — 6th, 9th — 4th. And again for the next day from Google we have 4 rows from Facebook and all the same. Because of that the numbers are a bit weird but sometimes this Cross Join works.

 

Charts

After creating a join, we can use it not only for the tables, we can create nice charts like this.

chart_blending_one_gaiile chart_blending_two_gaille

For example, it lets us see the dates, both Google and Facebook data in one chart, or even we can create a filter with this data, so basically we can use it as a normal data source but with the information from different ads platforms.

 

***

Hope you liked the review! You can find the previous part by the link. The last chapter is here

Leave the comments below. Which of the joins do you use the most?

Anna Prymak editor

Оставить ответ