SFMC SQL Join

Data Extensions. System Data Views. With the JOIN statement, use one query to rule them all.

There will be some cases where all you need is to pull (and maybe transform) data FROM a single Data Extension. But the real magic of SQL is visible when you need to combine information from multiple data points. Here comes the JOIN statement.

Check who opened the email and when by querying the _Open System Data View
SELECT
de.SubscriberKey,
de.EmailAddress,
o.EventDate as OpenDate
FROM [Welcome-Campaign-Segment] de
JOIN _Open o ON de.SubscriberKey = o.SubscriberKey

There are three things to unpack.

JOIN Types

In the example above, you can see JOIN word. There are many JOIN statement types available in SQL. The appropriate joining will allow you to segment your records precisely the way you want and without the need for extensive WHERE clauses. Understanding differences between various types and using the correct one for the task at hand will make your code shorter, easier to read and optimized. Let's check what is in the quiver:

Inner Join

The most straightforward join is INNER JOIN. It takes records from the first table (Welcome-Campaign-Segment Data Extension in the above example) and second table (_Open System Data View in this case) to output records available in both of them (based on selected relation).

In our query, it will be contacts that are in the Data Extension and in the same time have some email opens in _Open System Data View. It will ignore all contacts that are not visible in the _Open System Data View, and all the email opens that are not related to contacts in the Welcome-Campaign-Segment Data Extension.

Of course, this query in its current state is not perfect and could output wrong results (like tracked opens for the contacts from our Data Extension, but coming from the entirely different campaign) as it doesn't limit the scope to just one campaign or Job.

SELECT
de.SubscriberKey,
de.EmailAddress,
o.EventDate as OpenDate
FROM [Welcome-Campaign-Segment] de
INNER JOIN _Open o ON de.SubscriberKey = o.SubscriberKey

When using INNER JOIN, you can write just the word JOIN, as in the first example, to have the same result. However, it is best practice to state this type explicitly by writing the full name. It makes it easier to differentiate from the following types of JOIN.

Left and Right Joins

Next types of joins are LEFT JOIN (also known as LEFT OUTER JOIN) and RIGHT JOIN (also known as RIGHT OUTER JOIN). Instead of limiting the output to only those records that are available in both tables, they are taking full data from one table and append the additional information from the second one (if it is available).

LEFT JOIN will take complete data of the first table (in our example, Data Extension selected by FROM) and extend it with information from the second one (_Open System Data View joined with LEFT JOIN).

You can go the other way around with RIGHT JOIN (in our example it would output everything in _Open System Data View and append to it the data from the Data Extension - not that useful).

The same query, but this time joined with LEFT JOIN
SELECT
de.SubscriberKey,
de.EmailAddress,
o.EventDate as OpenDate
FROM [Welcome-Campaign-Segment] de
LEFT JOIN _Open o ON de.SubscriberKey = o.SubscriberKey

As you can see, the only thing we changed in the above query is the type of JOIN. This simple change will alter our outcomes to show a full list of all Subscriber Keys and Email Addresses from Welcome-Campaign-Segment Data Extension with the date of email open for those contacts who engaged with the communication. Unengaged ones would have null values in the OpenDate column. Again, it is not a proper way of building such a report. There will be the ready-to-use snippet in the bottom.

Due to the similarity of LEFT JOIN and RIGHT JOIN, I highly recommend to choose one type and use it consistently to lower the risk of an error.

You Should Know

If you would want to see all contacts from our Data Extension that did not open the email, you could use LEFT JOIN with exclusion:

Example of exclusion applied to LEFT JOIN
SELECT
de.SubscriberKey,
de.EmailAddress,
o.EventDate as OpenDate
FROM [Welcome-Campaign-Segment] de
LEFT JOIN _Open o ON de.SubscriberKey = o.SubscriberKey
WHERE o.SubscriberKey IS NULL

Find more about it in the WHERE statement guide.

Full Join

The last classic SQL JOIN is the FULL JOIN (also known as FULL OUTER JOIN) that allows you to get data from both sources.

In our example, it would mean all contacts from Welcome-Campaign-Segment Data Extension and all records from _Open System Data View, with null values in the columns without a match. Contacts that did not open and tracked opens nor related to contacts available in the Data Extension we selected.

Not a best use case for the FULL JOIN
SELECT
de.SubscriberKey,
de.EmailAddress,
o.EventDate as OpenDate
FROM [Welcome-Campaign-Segment] de
FULL JOIN _Open o ON de.SubscriberKey = o.SubscriberKey

FULL JOIN is an awesome tool when you, for example, want to create a master segment from multiple smaller ones. Another use case - checking multiple tracking data points with a clear view which are missing (for example, which users have a tracked email click without open and vice versa).

Self-join

Then there is Self-join. It is not a separate type of JOIN statement, but rather a specific use case for the above types. It joins the table to itself. Useful when you have columns in your data extension that point to other columns in the same extension.

A use case I can think of in the world of Salesforce Marketing Cloud could be, for example, a Data Extension that stores product categories and subcategories.

Input Data Extension:

CategoryNameCategoryIDParentCategoryID
Wristwatches1
Diver Watches21
Pilot Watches31

In such a scenario, you can use Self-join to build a more readable table that might be useful for personalization:

Self-join using INNER JOIN will output only categories that have a parent category
SELECT
c.CategoryName AS Category,
p.CategoryName AS ParentCategory
FROM [Wristwatches-DE] c
INNER JOIN [Wristwatches-DE] p ON p.ParentCategoryID = c.CategoryID

Output Data Extension:

CategoryParentCategory
Diver WatchesWristwatches
Pilot WatchesWristwatches

Another use case? You might have a product table for your e-commerce wherein one column store recommended products that are often purchased together. To personalize an email with such offers, self-join on that column would be perfect.

You won't be using Self-join a lot, but it might be a colossal timesaver when needed.

Joining ON

I was writing a lot about the various JOIN statements, but there is also the ON element. We use it to define the relationship between the joined data sources. It tells the query which value should it should consider to decide whether the record is available in both columns.

In our example, we used ON de.SubscriberKey = o.SubscriberKey that takes the SubscriberKey from the Data Extension and the System Data View and matches the rows based on it. You can use any other but always think about edge cases. Is the email address always unique? It is a very similar experience to what you do in the Salesforce Marketing Cloud Data Designer.

This column does not have to be the Primary Key of the Data Extension or System Data View. You can select any as long as its values have a match in the second source.

You can also add a bit of logic to it, by leveraging AND/OR statements. It is especially useful when either:

  1. You are not sure in which column you will have a match. For example, you have separate columns for 15 characters Salesforce ID and 18 characters one and want to match on either one.
  2. You want to match on multiple criteria. For example, not only the EmailAddress field but also LastName to cover shared email address.
  3. You want to join more than two data sources together using convoluted logic. You will find an example soon at the bottom of the Multiple Various Joins section.

Table Name Prefix

When you are joining multiple data sources that have columns of the same name, you will have to tell the SQL from which source you select each column. You can do it by using the table name prefix before the column name:

Example of SQL with full table name prefixes
SELECT
_Sent.SubscriberKey,
_Job.EmailName
FROM _Sent
LEFT JOIN _Job ON _Sent.JobID = _Job.JobID

However, you probably saw in the previous examples that there is another way. You don't have to write the full name of the data source every time. You can leverage labels just as with columns to make it shorter:

Example of SQL with labelled table name prefixes
SELECT
de.SubscriberKey,
de.EmailAddress,
o.EventDate as OpenDate
FROM [Welcome-Campaign-Segment] de
INNER JOIN _Open o ON de.SubscriberKey = o.SubscriberKey

All you need to do is to write your label after a data source name. Now you can leverage the shortened name to prefix the selected columns. It's especially useful when pulling data from Data Extensions.

Using table name prefixes is not always required (for example when the two sources you JOIN don't have any columns of the same name). You might see some examples without it. However, it is best practice to leverage this feature for consistency, readability and error-deferring purposes.

Multiple Joins

Joining two tables is just the beginning. Working with multiple columns can be much more mind-boggling, and in some cases taking a paper and pencil to draw some Venn Diagrams can make a huge difference. Let's check some of the most popular use cases.

Multiple Inner Joins

In many cases, you will want to get a segment of people that fulfil multiple requirements simultaneously. For example, contacts that were part of three different ToFu campaigns related to a particular product. Such data might make for a great audience for your next campaign further down in the marketing funnel. For such a scenario, you will use multiple INNER JOIN statements chained in one query:

Contacts used for three different ToFu campaigns that are ready for a MoFu campaign
SELECT
de1.SubscriberKey,
de1.EmailAddress,
de2.PhoneNumber,
de3.EbookName
FROM [Welcome-Campaign-Segment] de1
INNER JOIN [Newsletter-Promo-Campaign-Segment] de2 ON de1.SubscriberKey = de2.SubscriberKey
INNER JOIN [Ebook-Download-Campaign-Segment] de3 ON de1.SubscriberKey = de3.SubscriberKey

This SQL will give you only those contacts that were in all three Data Extensions.

Multiple Left Joins

Another huge group of cases you might encounter are queries with multiple LEFT JOIN statements. They are great when you want to extend the data for your main data source with multiple other data points without losing records that do not have a match. Awesome for building a segment for a campaign from Data Extensions with extensive personalization:

To the list of contacts add the product details we want to use in the email content along with special offer description
SELECT
de.SubscriberKey,
de.EmailAddress,
prod.ProductName
prod.ProductPrice
promo.Offer
FROM [Up-Sell-Campaign-Segment] de
LEFT JOIN [Product-Details-List] prod ON de.OfferedProduct = prod.ProductId
LEFT JOIN [Custom-Offer-List] promo ON de.OfferedPromotion = promo.PromotionId

Notice that in this case, we joined additional Data Extensions on different columns. What is more, we don't need to SELECT those columns to be able to filter on them. Thanks to using LEFT JOIN instead of INNER JOIN this query will return to us also contacts that do not have any custom offer applied (we can hide appropriate content block with personalization for them).

Multiple Various Joins

You are not limited to just one type of join when creating your query. You can mix and match different types of joins to extract precisely the data you need. However, the more complexity you add (either by the number of sources or by the number of various joins), the more recommended drawing the Venn Diagrams will be.

Same query as above, but this time with a mix of INNER JOIN and LEFT JOIN
SELECT
de.SubscriberKey,
de.EmailAddress,
prod.ProductName
prod.ProductPrice
promo.Offer
FROM [Up-Sell-Campaign-Segment] de
INNER JOIN [Product-Details-List] prod ON de.OfferedProduct = prod.ProductId
LEFT JOIN [Custom-Offer-List] promo ON de.OfferedPromotion = promo.PromotionId

In the example above, we took the same query as for the Multiple Left Joins, but we changed first JOIN type to INNER. What is the impact? This time, we will get only those contacts from Up-Sell-Campaign-Segment that have a matching product in Product-Details-List and only then we would extend them with optional information about the custom offer.

This approach might be better for our needs, as we would be sure that all contacts have available data for Product Name and Product Price. A great way to make our content better and more comfortable to create.

You Should Know

How complex a JOIN can get? Very. You will encounter such monstrosities especially in the world of reporting queries that requires information from multiple System Data Views:

Reporting query for each sent email within the last month
SELECT
s.SubscriberKey,
j.EmailName,
j.DeliveredTime AS DeliveryDate,
s.EventDate AS SentDate,
o.EventDate AS OpenDate,
c.EventDate AS ClickDate,
b.EventDate AS BounceDate,
b.BounceCategory,
u.EventDate AS UnsubscribeDate
FROM _Sent s
LEFT JOIN _Job j ON s.JobID = j.JobID
LEFT JOIN _Open o ON s.JobID = o.JobID
AND s.ListID = o.ListID
AND s.BatchID = o.BatchID
AND s.SubscriberID = o.SubscriberID
AND o.IsUnique = 1
LEFT JOIN _Click c ON s.JobID = c.JobID
AND s.ListID = c.ListID
AND s.BatchID = c.BatchID
AND s.SubscriberID = c.SubscriberID
AND c.IsUnique = 1
LEFT JOIN _Bounce b ON s.JobID = b.JobID
AND s.ListID = b.ListID
AND s.BatchID = b.BatchID
AND s.SubscriberID = b.SubscriberID
AND b.IsUnique = 1
LEFT JOIN _Unsubscribe u ON s.JobID = u.JobID
AND s.ListID = u.ListID
AND s.BatchID = u.BatchID
AND s.SubscriberID = u.SubscriberID
AND u.IsUnique = 1
WHERE CONVERT(DATE, s.EventDate) >= DATEADD(MONTH, -1, CONVERT(DATE, GETUTCDATE()))

Here you can see ready-to-use reporting query for all email sends from the last month from the current Business Unit.

We are matching the System Data Views by multiple criteria to be sure that the data is for each particular send (that's the issue we were highlighting in query examples in the top of this page). We also limit the JOIN ON selection to only the Unique behavioural events - so the dates are showing the first engagement with our content, even if there were multiple ones.

One thing to remember - the more complex query you create, the more error-prone it is and the closer you are to the 30 minutes AutoKill limit for the query execution. Always minimize your query to only the data you truly need.

Last updated on by Mateusz Dąbrowski