Data Extensions. System Data Views. With the JOIN statement, use one query to rule them all.
There will be some cases where all you need is to pull (and maybe transform) data
FROM a single Data Extension. But the real magic of SQL is visible when you need to combine information from multiple data points. Here comes the
There are three things to unpack.
In the example above, you can see
JOIN word. There are many
JOIN statement types available in SQL. The appropriate joining will allow you to segment your records precisely the way you want and without the need for extensive
WHERE clauses. Understanding differences between various types and using the correct one for the task at hand will make your code shorter, easier to read and optimized. Let's check what is in the quiver:
The most straightforward join is
INNER JOIN. It takes records from the first table (
Welcome-Campaign-Segment Data Extension in the above example) and second table (
_Open System Data View in this case) to output records available in both of them (based on selected relation).
In our query, it will be contacts that are in the Data Extension and in the same time have some email opens in
_Open System Data View. It will ignore all contacts that are not visible in the
_Open System Data View, and all the email opens that are not related to contacts in the
Welcome-Campaign-Segment Data Extension.
Of course, this query in its current state is not perfect and could output wrong results (like tracked opens for the contacts from our Data Extension, but coming from the entirely different campaign) as it doesn't limit the scope to just one campaign or Job.
INNER JOIN, you can write just the word
JOIN, as in the first example, to have the same result. However, it is best practice to state this type explicitly by writing the full name. It makes it easier to differentiate from the following types of
Left and Right Joins
Next types of joins are
LEFT JOIN (also known as
LEFT OUTER JOIN) and
RIGHT JOIN (also known as
RIGHT OUTER JOIN). Instead of limiting the output to only those records that are available in both tables, they are taking full data from one table and append the additional information from the second one (if it is available).
LEFT JOIN will take complete data of the first table (in our example, Data Extension selected by
FROM) and extend it with information from the second one (
_Open System Data View joined with
You can go the other way around with
RIGHT JOIN (in our example it would output everything in
_Open System Data View and append to it the data from the Data Extension - not that useful).
As you can see, the only thing we changed in the above query is the type of
JOIN. This simple change will alter our outcomes to show a full list of all Subscriber Keys and Email Addresses from
Welcome-Campaign-Segment Data Extension with the date of email open for those contacts who engaged with the communication. Unengaged ones would have
null values in the
OpenDate column. Again, it is not a proper way of building such a report. There will be the ready-to-use snippet in the bottom.
Due to the similarity of
LEFT JOIN and
RIGHT JOIN, I highly recommend to choose one type and use it consistently to lower the risk of an error.
You Should Know
If you would want to see all contacts from our Data Extension that did not open the email, you could use
LEFT JOIN with exclusion:
Find more about it in the WHERE statement guide.
The last classic SQL
JOIN is the
FULL JOIN (also known as
FULL OUTER JOIN) that allows you to get data from both sources.
In our example, it would mean all contacts from
Welcome-Campaign-Segment Data Extension and all records from
_Open System Data View, with
null values in the columns without a match. Contacts that did not open and tracked opens nor related to contacts available in the Data Extension we selected.
FULL JOIN is an awesome tool when you, for example, want to create a master segment from multiple smaller ones. Another use case - checking multiple tracking data points with a clear view which are missing (for example, which users have a tracked email click without open and vice versa).
Then there is Self-join. It is not a separate type of
JOIN statement, but rather a specific use case for the above types. It joins the table to itself. Useful when you have columns in your data extension that point to other columns in the same extension.
A use case I can think of in the world of Salesforce Marketing Cloud could be, for example, a Data Extension that stores product categories and subcategories.
Input Data Extension:
In such a scenario, you can use Self-join to build a more readable table that might be useful for personalization:
Output Data Extension:
Another use case? You might have a product table for your e-commerce wherein one column store recommended products that are often purchased together. To personalize an email with such offers, self-join on that column would be perfect.
You won't be using Self-join a lot, but it might be a colossal timesaver when needed.
I was writing a lot about the various
JOIN statements, but there is also the
ON element. We use it to define the relationship between the joined data sources. It tells the query which value should it should consider to decide whether the record is available in both columns.
In our example, we used
ON de.SubscriberKey = o.SubscriberKey that takes the SubscriberKey from the Data Extension and the System Data View and matches the rows based on it. You can use any other but always think about edge cases. Is the email address always unique? It is a very similar experience to what you do in the Salesforce Marketing Cloud Data Designer.
This column does not have to be the Primary Key of the Data Extension or System Data View. You can select any as long as its values have a match in the second source.
You can also add a bit of logic to it, by leveraging
OR statements. It is especially useful when either:
- You are not sure in which column you will have a match. For example, you have separate columns for 15 characters Salesforce ID and 18 characters one and want to match on either one.
- You want to match on multiple criteria. For example, not only the
EmailAddressfield but also
LastNameto cover shared email address.
- You want to join more than two data sources together using convoluted logic. You will find an example soon at the bottom of the Multiple Various Joins section.
Table Name Prefix
When you are joining multiple data sources that have columns of the same name, you will have to tell the SQL from which source you select each column. You can do it by using the table name prefix before the column name:
However, you probably saw in the previous examples that there is another way. You don't have to write the full name of the data source every time. You can leverage labels just as with columns to make it shorter:
All you need to do is to write your label after a data source name. Now you can leverage the shortened name to prefix the selected columns. It's especially useful when pulling data from Data Extensions.
Using table name prefixes is not always required (for example when the two sources you
JOIN don't have any columns of the same name). You might see some examples without it. However, it is best practice to leverage this feature for consistency, readability and error-deferring purposes.
Joining two tables is just the beginning. Working with multiple columns can be much more mind-boggling, and in some cases taking a paper and pencil to draw some Venn Diagrams can make a huge difference. Let's check some of the most popular use cases.
Multiple Inner Joins
In many cases, you will want to get a segment of people that fulfil multiple requirements simultaneously. For example, contacts that were part of three different ToFu campaigns related to a particular product. Such data might make for a great audience for your next campaign further down in the marketing funnel. For such a scenario, you will use multiple
INNER JOIN statements chained in one query:
This SQL will give you only those contacts that were in all three Data Extensions.
Multiple Left Joins
Another huge group of cases you might encounter are queries with multiple
LEFT JOIN statements. They are great when you want to extend the data for your main data source with multiple other data points without losing records that do not have a match. Awesome for building a segment for a campaign from Data Extensions with extensive personalization:
Notice that in this case, we joined additional Data Extensions on different columns. What is more, we don't need to
SELECT those columns to be able to filter on them. Thanks to using
LEFT JOIN instead of
INNER JOIN this query will return to us also contacts that do not have any custom offer applied (we can hide appropriate content block with personalization for them).
Multiple Various Joins
You are not limited to just one type of join when creating your query. You can mix and match different types of joins to extract precisely the data you need. However, the more complexity you add (either by the number of sources or by the number of various joins), the more recommended drawing the Venn Diagrams will be.
In the example above, we took the same query as for the Multiple Left Joins, but we changed first
JOIN type to
INNER. What is the impact? This time, we will get only those contacts from
Up-Sell-Campaign-Segment that have a matching product in
Product-Details-List and only then we would extend them with optional information about the custom offer.
This approach might be better for our needs, as we would be sure that all contacts have available data for Product Name and Product Price. A great way to make our content better and more comfortable to create.
You Should Know
How complex a
JOIN can get? Very. You will encounter such monstrosities especially in the world of reporting queries that requires information from multiple System Data Views:
Here you can see ready-to-use reporting query for all email sends from the last month from the current Business Unit.
We are matching the System Data Views by multiple criteria to be sure that the data is for each particular send (that's the issue we were highlighting in query examples in the top of this page). We also limit the
JOIN ON selection to only the Unique behavioural events - so the dates are showing the first engagement with our content, even if there were multiple ones.
One thing to remember - the more complex query you create, the more error-prone it is and the closer you are to the 30 minutes AutoKill limit for the query execution. Always minimize your query to only the data you truly need.