SFMC SQL Join
Data Extensions. System Data Views. With the JOIN statement, use one query to rule them all.
There are cases where all you need is to pull (and maybe transform) data FROM
a single Data Extension. But the real magic of SQL is visible when you need to combine information from multiple data points. Here comes the JOIN
statement.
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
Let's dive in the details of what you see above.
JOIN Types
In the example above, you can see the JOIN
word. There are many JOIN
statement types available in SQL. The appropriate joining will allow you to segment your records precisely the way you want and without the need for extensive WHERE
clauses. Understanding differences between various types and using the correct one for the task at hand will make your code shorter, easier to read and optimised. Let's check what is in the quiver:
Inner Join
The most straightforward join is INNER JOIN
. It takes records from the first table (Welcome-Campaign-Segment
Data Extension in the above example) and the second table (_Open
System Data View in this case) to output records available in both of them (based on selected relation).
In our query, it will be contacts that are in the Data Extension and at the same time have some email opens in _Open
System Data View. It will ignore all contacts that are not visible in the _Open
System Data View, and all the email opens that are not related to contacts in the Welcome-Campaign-Segment
Data Extension.
Of course, this query in its current state is not perfect and could output wrong results (like tracked opens for the contacts from our Data Extension, but coming from the entirely different campaign) as it doesn't limit the scope to just one campaign or Job.
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
INNER JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
When using INNER JOIN
, you can write just the word JOIN
, as in the first example, to have the same result. However, it is best practice to state this type explicitly by writing the full name. It makes it easier to differentiate from the following types of JOIN
.
Left and Right Joins
The following joins are LEFT JOIN
(also known as LEFT OUTER JOIN) and RIGHT JOIN
(also known as RIGHT OUTER JOIN). Instead of limiting the output to records available in both tables, they take complete data from one table and append the additional information from the second one (if it is available).
LEFT JOIN
will take complete data of the first table (in our example, Data Extension selected by FROM
) and extend it with information from the second one (_Open
System Data View joined with LEFT JOIN
).
You can go the other way around with RIGHT JOIN
(in our example, it would output everything in _Open
System Data View and append to it the data from the Data Extension - not that useful).
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
LEFT JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
As you can see, the only thing we changed in the above query is the type of JOIN
. This simple change will alter our outcomes to show a complete list of all Subscriber Keys and Email Addresses from Welcome-Campaign-Segment
Data Extension with the email open date for those contacts who engaged with the communication. Unengaged ones would have null
values in the OpenDate
column. Again, it is not a proper way of building such a report. There will be the ready-to-use snippet at the bottom.
Due to the similarity of LEFT JOIN
and RIGHT JOIN
, I highly recommend choosing one type and use it consistently to lower the risk of an error.
If you would want to see all contacts from our Data Extension that did not open the email, you could use LEFT JOIN
with exclusion (called LEFT EXCLUDING JOIN):
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
LEFT JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
WHERE o.SubscriberKey IS NULL
Find more about it in the WHERE statement guide.
Full Join
The last classic SQL JOIN
is the FULL JOIN
(also known as FULL OUTER JOIN
) that allows you to get data from both sources.
In our example, it would mean all contacts from Welcome-Campaign-Segment
Data Extension and all records from _Open
System Data View, with null
values in the columns without a match. Contacts that did not open and tracked opens nor related to contacts available in the Data Extension we selected.
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
FULL JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
FULL JOIN
is an excellent tool when you, for example, want to create a master segment from multiple smaller ones. Another use case - checking multiple tracking data points with a clear view which are missing (for example, which users have a tracked email click without open and vice versa).
Self-join
Then there is Self-join. It is not a separate type of JOIN
statement but rather a specific use case for the above types. It joins the table to itself. Useful when you have columns in your data extension that point to other columns in the same extension.
A use case I can think of in the world of Salesforce Marketing Cloud could be, for example, a Data Extension that stores product categories and subcategories.
Input Data Extension:
CategoryName | CategoryID | ParentCategoryID |
---|---|---|
Wristwatches | 1 | |
Diver Watches | 2 | 1 |
Pilot Watches | 3 | 1 |
In such a scenario, you can use Self-join to build a more readable table that might be useful for personalisation:
SELECT
c.CategoryName AS Category
, pc.CategoryName AS ParentCategory
FROM WristwatchesDE AS c
INNER JOIN WristwatchesDE AS pc
ON pc.ParentCategoryID = c.CategoryID
Output Data Extension:
Category | ParentCategory |
---|---|
Diver Watches | Wristwatches |
Pilot Watches | Wristwatches |
Another use case? You might have a product table for your e-commerce wherein one column store recommended products that are often purchased together. To personalise an email with such offers, self-join on that column would be perfect.
You won't be using Self-join a lot, but it might be a colossal timesaver when needed.
Self-join might be the perfect tool for working with data coming from Salesforce through Marketing Cloud Connect to Synchronized Data Extensions.
Very often, you might encounter Accounts that are related to another Account. This hierarchical structure uses the ParentId field. You can flatten this relationship for use in personalisation and journeys with self-join.
SELECT
a1.Id AS Level1AccountId
, a1.Name AS Level1AccountName
, a2.Id AS Level2AccountId
, a2.Name AS Level2AccountName
, a3.Id AS Level3AccountId
, a3.Name AS Level3AccountName
FROM Account_Salesforce AS a1
LEFT JOIN Account_Salesforce AS a2
ON a2.Id = a1.ParentId
LEFT JOIN Account_Salesforce AS a3
ON a3.Id = a2.ParentId
Thinking about how much easier it would be with a neat visual cheat sheet? Perfect - I prepared one for you at the bottom :)
Joining ON
I was writing a lot about the various JOIN
statements, but there is also the ON
element. We use it to define the relationship between the joined data sources. It tells the query which value it should consider to decide whether the record is available in both columns.
In our example, we used ON de.SubscriberKey = o.SubscriberKey
that takes the SubscriberKey from the Data Extension and the System Data View and matches the rows based on it. You can use any other but always think about edge cases. Is the email address always unique? It is a very similar experience to what you do in the Salesforce Marketing Cloud Data Designer.
This column does not have to be the Primary Key of the Data Extension or System Data View. You can select any as long as its values have a match in the second source.
You can also add a bit of logic to it by leveraging AND
/OR
statements. It is handy when either:
- You are not sure which column will have a match. For example, you have separate columns for 15 characters Salesforce ID and 18 characters one and want to match on either one.
- You want to match multiple criteria. For example, not only the
EmailAddress
field but alsoLastName
to cover shared email address. - You want to join more than two data sources together using convoluted logic. You will find an example soon in the Multiple Various Joins section.
You are not limited to the simple ON ColumnA = ColumnB
. If needed, you can go for more dynamic joins that leverage SQL functions:
SELECT
wel.SubscriberKey
, sms.OptInStatusID
, sms.OptOutStatusID
FROM WelcomeCampaignSegment AS wel
LEFT JOIN _SMSSubscriptionLog AS sms
ON sms.MobileNumber LIKE CONCAT('%', wel.MobilePhone)
Remember that such an approach will have a significant impact on performance.
Table Name Prefix
When you are joining multiple data sources with columns of the same name, you will have to tell the SQL from which source you select each column. You can do it by using the table name prefix before the column name:
SELECT
_Sent.SubscriberKey
, _Job.EmailName
FROM _Sent
LEFT JOIN _Job
ON _Job.JobID = _Sent.JobID
However, you probably saw in the previous examples that there is another way. You don't have to write the full name of the data source every time. You can leverage aliases just as with columns to make it shorter:
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
INNER JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
All you need to do is write your alias after a data source name. For readability purposes, I recommend using the AS
keyword between. Now you can leverage the shortened name to prefix the selected columns. It's instrumental when pulling data from multiple Data Extensions.
Using table name prefixes is not always required (for example, when the two sources you JOIN
don't have any columns of the same name). You might see some examples without it. However, it is best practice to leverage this feature for consistency, readability and error-deferring purposes.
Multiple Joins
Joining two tables is just the beginning. Working with multiple columns can be much more mind-boggling, and in some cases taking paper and pencil to draw some Venn Diagrams can make a huge difference. Let's check some of the most popular use cases.
Order of joining the tables is always important, but it is crucial with multi-table JOINs as it can have enormous impact on the final outcome:
As you can see, changing the order of second and third table creates significantly different outcome. And there is more - based on what you use in the ON
, the order might also impact to which records enriching data is applied.
For example, if you enrich data with the first joined table and then add some new records with the another - only the records from the initial table will be enriched. The ones added in last step will not have this added context. Changing order of JOIN
s can solve it.
Multiple Inner Joins
In many cases, you will want to get a segment of people who simultaneously fulfil multiple requirements. For example, contacts that were part of three different ToFu campaigns related to a particular product. Such data might make an ideal audience for your next campaign further down in the marketing funnel. For such a scenario, you will use multiple INNER JOIN
statements chained in one query:
SELECT
wel.SubscriberKey
, wel.EmailAddress
, nsl.PhoneNumber
, ebo.EbookName
FROM WelcomeCampaignSegment AS wel
INNER JOIN NewsletterPromoCampaignSegment AS nsl
ON nsl.SubscriberKey = wel.SubscriberKey
INNER JOIN EbookDownloadCampaignSegment AS ebo
ON ebo.SubscriberKey = wel.SubscriberKey
This SQL will only give you those contacts that were in all three Data Extensions.
Multiple Left Joins
Another huge group of cases you might encounter are queries with multiple LEFT JOIN
statements. They are great when you want to extend the data for your main data source with multiple other data points without losing records that do not have a match. Awesome for building a segment for a campaign from Data Extensions with extensive personalisation:
SELECT
up.SubscriberKey
, up.EmailAddress
, prod.ProductName
, prod.ProductPrice
, promo.Offer
FROM UpSellCampaignSegment AS up
LEFT JOIN ProductDetailsList AS prod
ON prod.ProductId = up.OfferedProduct
LEFT JOIN CustomOfferList AS promo
ON promo.PromotionId = up.OfferedPromotion
Notice that we joined additional Data Extensions on different columns in this case. What is more, we don't need to SELECT
those columns to filter on them. Thanks to using LEFT JOIN
instead of INNER JOIN
, this query will return to us also contacts that do not have any custom offer applied (we can hide appropriate content block with personalisation for them).
Multiple Various Joins
You are not limited to just one type of join when creating your query. You can mix and match different types of joins to extract precisely the data you need. However, the more complexity you add (either by the number of sources or by the number of various joins), the more recommended drawing the Venn Diagrams will be.
SELECT
up.SubscriberKey
, up.EmailAddress
, prod.ProductName
, prod.ProductPrice
, promo.Offer
FROM UpSellCampaignSegment AS up
INNER JOIN ProductDetailsList AS prod
ON prod.ProductId = up.OfferedProduct
LEFT JOIN CustomOfferList AS promo
ON promo.PromotionId = up.OfferedPromotion
In the example above, we took the same query as for the Multiple Left Joins, but we changed the first JOIN
type to INNER
. What is the impact? This time, we will get only those contacts from Up-Sell-Campaign-Segment
that have a matching product in Product-Details-List
and only then we would extend them with optional information about the custom offer.
This approach might be better for our needs, as we would be sure that all contacts have available data for Product Name and Product Price. A great way to make our content better and more comfortable to create.
How complex can a JOIN
can get? Very. You will encounter such monstrosities, especially in the world of reporting queries that require information from multiple System Data Views - for example, when you want to debug your email sends.
One thing to remember - the more complex query you create, the more error-prone it is and the closer you are to the 30 minutes AutoKill limit for the query execution. Always minimise your query to only the data you truly need.
UNION
Apart from the JOIN
, a UNION
operator also serves a similar purpose. It concatenates the outcomes of multiple SELECT
statements.
SELECT
jan.SubscriberKey
, jan.EmailAddress
FROM JanuaryEventParticipants AS jan
UNION
SELECT
feb.SubscriberKey
, feb.EmailAddress
FROM FebruaryEventParticipantsSegment AS feb
As you can see in the above example, the UNION
doesn't require any relationship between the two queries. There is no ON
element. On the other hand, the UNION
cannot add any new columns.
UNION
works only if:
- Both queries have the same number of columns.
- Both queries have the same order of columns.
- Columns in both queries have the same data types.
The names do not have to be the same. You don't even have to add aliases for static values. However, it is best practice to match the aliases across all queries for readability purposes.
To simplify the difference between JOIN
and UNION
:
- If you want to add columns - use
JOIN
. - If you want to add rows - use
UNION
. - If you want to add both - use both (or
FULL JOIN
)
By default, the UNION
will ignore duplicates. If you want to keep them, use UNION ALL
.
SELECT
jan.SubscriberKey
, jan.EmailAddress
FROM JanuaryEventParticipantsSegment AS jan
UNION ALL
SELECT
feb.SubscriberKey
, feb.EmailAddress
FROM FebruaryEventParticipantsSegment AS feb
The obvious use case for UNION
is gathering subscribers from multiple Data Extensions to build a master segment.
There is, however, another use case that I find perfect for UNION
- attaching Seedlists.
If you are using a deliverability monitoring suite, you probably are using seedlist. If not - a seedlist is a list of bot email addresses used to estimate your communication inbox placement.
In most cases, you will store your seedlist in a separate Data Extension and will have only Email Address and fake Subscriber Key columns filled in. How to push them quickly into your segment? With UNION
, of course:
SELECT
wel.SubscriberKey
, wel.EmailAddress
, wel.FirstName
, wel.Interest
FROM WelcomeCampaignSegment AS wel
UNION
SELECT
sl.SubscriberKey
, sl.EmailAddress
, 'Seed' AS FirstName
, 'Estimating' AS Interest
FROM Seedlist AS sl
In the example above, we add the Seedlist to the master segment and provide static value placeholders for columns used in personalisation.
You can mix and match JOIN
and UNION
to create your perfect segment. And even add arbitrary data without a FROM
.
SELECT
wel.SubscriberKey
, wel.EmailAddress
, wel.FirstName
, i.Interest
FROM WelcomeCampaignSegment AS wel
LEFT JOIN InterestsDataPoint AS i
ON i.SubscriberKey = wel.SubscriberKey
UNION
SELECT
sl.SubscriberKey
, sl.EmailAddress
, 'Seed' AS FirstName
, 'Estimating' AS Interest
FROM Seedlist AS sl
UNION ALL
SELECT
'123456789987654321' AS SubscriberKey
, '[email protected]' AS EmailAddress
, 'SalesDepartment' AS FirstName
, 'Leads' AS Interest
INTERSECT
The INTERSECT
operator is very similar in usage to UNION
, but instead of concatenating outcomes, it outputs only those that exist in both queries.
SELECT bf.SubscriberKey
FROM BlackFridayPromoSegment AS bf
INTERSECT
SELECT p.SubscriberKey
FROM PurchasesDataPoint AS p
As with UNION
, INTERSECT
works only if:
- Both queries have the same number of columns.
- Both queries have the same order of columns.
- Columns in both queries have the same data types.
The check for rows existing in both queries is done considering data in all provided columns.
You can have the same outcome with more control over comparison logic with a mix of JOIN
and WHERE
.
EXCEPT
The EXCEPT
operator is very similar in usage to UNION
, but instead of concatenating outcomes, it outputs only those that exist in the first query.
SELECT bf.SubscriberKey
FROM BlackFridayPromoSegment AS bf
EXCEPT
SELECT p.SubscriberKey
FROM PurchasesDataPoint AS p
As with UNION
, EXCEPT
works only if:
- Both queries have the same number of columns.
- Both queries have the same order of columns.
- Columns in both queries have the same data types.
The check for rows existing in both queries is done considering data in all provided columns.
You can have the same outcome with more control over comparison logic with a mix of JOIN
and WHERE
.
SFMC SQL JOIN Cheat Sheet
All standard SQL JOINs in a single image with name, example SFMC-focused use case, key part of the SQL query and diagrams visualising the output. Ready to bookmark or download. Full size version linked below the preview.
View full size