Skip to main content

SFMC SQL Join

Data Extensions. System Data Views. With the JOIN statement, use one query to rule them all.

There are cases where all you need is to pull (and maybe transform) data FROM a single Data Extension. But the real magic of SQL is visible when you need to combine information from multiple data points. Here comes the JOIN statement.

Check who opened the email and when by querying the _Open System Data View
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey

Let's dive in the details of what you see above.

JOIN Types

In the example above, you can see the JOIN word. There are many JOIN statement types available in SQL. The appropriate joining will allow you to segment your records precisely the way you want and without the need for extensive WHERE clauses. Understanding differences between various types and using the correct one for the task at hand will make your code shorter, easier to read and optimised. Let's check what is in the quiver:

Inner Join

The most straightforward join is INNER JOIN. It takes records from the first table (Welcome-Campaign-Segment Data Extension in the above example) and the second table (_Open System Data View in this case) to output records available in both of them (based on selected relation).

In our query, it will be contacts that are in the Data Extension and at the same time have some email opens in _Open System Data View. It will ignore all contacts that are not visible in the _Open System Data View, and all the email opens that are not related to contacts in the Welcome-Campaign-Segment Data Extension.

Of course, this query in its current state is not perfect and could output wrong results (like tracked opens for the contacts from our Data Extension, but coming from the entirely different campaign) as it doesn't limit the scope to just one campaign or Job.

SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
INNER JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey

When using INNER JOIN, you can write just the word JOIN, as in the first example, to have the same result. However, it is best practice to state this type explicitly by writing the full name. It makes it easier to differentiate from the following types of JOIN.

Left and Right Joins

The following joins are LEFT JOIN (also known as LEFT OUTER JOIN) and RIGHT JOIN (also known as RIGHT OUTER JOIN). Instead of limiting the output to records available in both tables, they take complete data from one table and append the additional information from the second one (if it is available).

LEFT JOIN will take complete data of the first table (in our example, Data Extension selected by FROM) and extend it with information from the second one (_Open System Data View joined with LEFT JOIN).

You can go the other way around with RIGHT JOIN (in our example, it would output everything in _Open System Data View and append to it the data from the Data Extension - not that useful).

The same query, but this time joined with LEFT JOIN
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
LEFT JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey

As you can see, the only thing we changed in the above query is the type of JOIN. This simple change will alter our outcomes to show a complete list of all Subscriber Keys and Email Addresses from Welcome-Campaign-Segment Data Extension with the email open date for those contacts who engaged with the communication. Unengaged ones would have null values in the OpenDate column. Again, it is not a proper way of building such a report. There will be the ready-to-use snippet at the bottom.

Due to the similarity of LEFT JOIN and RIGHT JOIN, I highly recommend choosing one type and use it consistently to lower the risk of an error.

You Should Know

If you would want to see all contacts from our Data Extension that did not open the email, you could use LEFT JOIN with exclusion (called LEFT EXCLUDING JOIN):

Example of exclusion applied to LEFT JOIN
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
LEFT JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey
WHERE o.SubscriberKey IS NULL

Find more about it in the WHERE statement guide.

Full Join

The last classic SQL JOIN is the FULL JOIN (also known as FULL OUTER JOIN) that allows you to get data from both sources.

In our example, it would mean all contacts from Welcome-Campaign-Segment Data Extension and all records from _Open System Data View, with null values in the columns without a match. Contacts that did not open and tracked opens nor related to contacts available in the Data Extension we selected.

Not a best use case for the FULL JOIN
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
FULL JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey

FULL JOIN is an excellent tool when you, for example, want to create a master segment from multiple smaller ones. Another use case - checking multiple tracking data points with a clear view which are missing (for example, which users have a tracked email click without open and vice versa).

Self-join

Then there is Self-join. It is not a separate type of JOIN statement but rather a specific use case for the above types. It joins the table to itself. Useful when you have columns in your data extension that point to other columns in the same extension.

A use case I can think of in the world of Salesforce Marketing Cloud could be, for example, a Data Extension that stores product categories and subcategories.

Input Data Extension:

CategoryNameCategoryIDParentCategoryID
Wristwatches1
Diver Watches21
Pilot Watches31

In such a scenario, you can use Self-join to build a more readable table that might be useful for personalisation:

Self-join using INNER JOIN will output only categories that have a parent category
SELECT
c.CategoryName AS Category
, pc.CategoryName AS ParentCategory
FROM WristwatchesDE AS c
INNER JOIN WristwatchesDE AS pc
ON pc.ParentCategoryID = c.CategoryID

Output Data Extension:

CategoryParentCategory
Diver WatchesWristwatches
Pilot WatchesWristwatches

Another use case? You might have a product table for your e-commerce wherein one column store recommended products that are often purchased together. To personalise an email with such offers, self-join on that column would be perfect.

You won't be using Self-join a lot, but it might be a colossal timesaver when needed.

You Should Know

Self-join might be the perfect tool for working with data coming from Salesforce through Marketing Cloud Connect to Synchronized Data Extensions.

Very often, you might encounter Accounts that are related to another Account. This hierarchical structure uses the ParentId field. You can flatten this relationship for use in personalisation and journeys with self-join.

Flattening three levels of Salesforce Account hierarchy
SELECT
a1.Id AS Level1AccountId
, a1.Name AS Level1AccountName
, a2.Id AS Level2AccountId
, a2.Name AS Level2AccountName
, a3.Id AS Level3AccountId
, a3.Name AS Level3AccountName
FROM Account_Salesforce AS a1
LEFT JOIN Account_Salesforce AS a2
ON a2.Id = a1.ParentId
LEFT JOIN Account_Salesforce AS a3
ON a3.Id = a2.ParentId

Thinking about how much easier it would be with a neat visual cheat sheet? Perfect - I prepared one for you at the bottom :)

Joining ON

I was writing a lot about the various JOIN statements, but there is also the ON element. We use it to define the relationship between the joined data sources. It tells the query which value it should consider to decide whether the record is available in both columns.

In our example, we used ON de.SubscriberKey = o.SubscriberKey that takes the SubscriberKey from the Data Extension and the System Data View and matches the rows based on it. You can use any other but always think about edge cases. Is the email address always unique? It is a very similar experience to what you do in the Salesforce Marketing Cloud Data Designer.

This column does not have to be the Primary Key of the Data Extension or System Data View. You can select any as long as its values have a match in the second source.

You can also add a bit of logic to it by leveraging AND/OR statements. It is handy when either:

  1. You are not sure which column will have a match. For example, you have separate columns for 15 characters Salesforce ID and 18 characters one and want to match on either one.
  2. You want to match multiple criteria. For example, not only the EmailAddress field but also LastName to cover shared email address.
  3. You want to join more than two data sources together using convoluted logic. You will find an example soon in the Multiple Various Joins section.
You Should Know

You are not limited to the simple ON ColumnA = ColumnB. If needed, you can go for more dynamic joins that leverage SQL functions:

Joining on a normalised phone number from Data View and potentially prefixed phone of Data Extension
SELECT
wel.SubscriberKey
, sms.OptInStatusID
, sms.OptOutStatusID
FROM WelcomeCampaignSegment AS wel
LEFT JOIN _SMSSubscriptionLog AS sms
ON sms.MobileNumber LIKE CONCAT('%', wel.MobilePhone)

Remember that such an approach will have a significant impact on performance.

Table Name Prefix

When you are joining multiple data sources with columns of the same name, you will have to tell the SQL from which source you select each column. You can do it by using the table name prefix before the column name:

Example of SQL with full table name prefixes
SELECT
_Sent.SubscriberKey
, _Job.EmailName
FROM _Sent
LEFT JOIN _Job
ON _Job.JobID = _Sent.JobID

However, you probably saw in the previous examples that there is another way. You don't have to write the full name of the data source every time. You can leverage aliases just as with columns to make it shorter:

Example of SQL with aliased table name prefixes
SELECT
wel.SubscriberKey
, wel.EmailAddress
, o.EventDate AS OpenDate
FROM WelcomeCampaignSegment AS wel
INNER JOIN _Open AS o
ON o.SubscriberKey = wel.SubscriberKey

All you need to do is write your alias after a data source name. For readability purposes, I recommend using the AS keyword between. Now you can leverage the shortened name to prefix the selected columns. It's instrumental when pulling data from multiple Data Extensions.

Using table name prefixes is not always required (for example, when the two sources you JOIN don't have any columns of the same name). You might see some examples without it. However, it is best practice to leverage this feature for consistency, readability and error-deferring purposes.

Multiple Joins

Joining two tables is just the beginning. Working with multiple columns can be much more mind-boggling, and in some cases taking paper and pencil to draw some Venn Diagrams can make a huge difference. Let's check some of the most popular use cases.

You Should Know

Order of joining the tables is always important, but it is crucial with multi-table JOINs as it can have enormous impact on the final outcome:

As you can see, changing the order of second and third table creates significantly different outcome. And there is more - based on what you use in the ON, the order might also impact to which records enriching data is applied.

For example, if you enrich data with the first joined table and then add some new records with the another - only the records from the initial table will be enriched. The ones added in last step will not have this added context. Changing order of JOINs can solve it.

Multiple Inner Joins

In many cases, you will want to get a segment of people who simultaneously fulfil multiple requirements. For example, contacts that were part of three different ToFu campaigns related to a particular product. Such data might make an ideal audience for your next campaign further down in the marketing funnel. For such a scenario, you will use multiple INNER JOIN statements chained in one query:

Contacts used for three different ToFu campaigns that are ready for a MoFu campaign
SELECT
wel.SubscriberKey
, wel.EmailAddress
, nsl.PhoneNumber
, ebo.EbookName
FROM WelcomeCampaignSegment AS wel
INNER JOIN NewsletterPromoCampaignSegment AS nsl
ON nsl.SubscriberKey = wel.SubscriberKey
INNER JOIN EbookDownloadCampaignSegment AS ebo
ON ebo.SubscriberKey = wel.SubscriberKey

This SQL will only give you those contacts that were in all three Data Extensions.

Multiple Left Joins

Another huge group of cases you might encounter are queries with multiple LEFT JOIN statements. They are great when you want to extend the data for your main data source with multiple other data points without losing records that do not have a match. Awesome for building a segment for a campaign from Data Extensions with extensive personalisation:

To the list of contacts, add the product details we want to use in the email content along with a special offer description
SELECT
up.SubscriberKey
, up.EmailAddress
, prod.ProductName
, prod.ProductPrice
, promo.Offer
FROM UpSellCampaignSegment AS up
LEFT JOIN ProductDetailsList AS prod
ON prod.ProductId = up.OfferedProduct
LEFT JOIN CustomOfferList AS promo
ON promo.PromotionId = up.OfferedPromotion

Notice that we joined additional Data Extensions on different columns in this case. What is more, we don't need to SELECT those columns to filter on them. Thanks to using LEFT JOIN instead of INNER JOIN, this query will return to us also contacts that do not have any custom offer applied (we can hide appropriate content block with personalisation for them).

Multiple Various Joins

You are not limited to just one type of join when creating your query. You can mix and match different types of joins to extract precisely the data you need. However, the more complexity you add (either by the number of sources or by the number of various joins), the more recommended drawing the Venn Diagrams will be.

Same query as above, but this time with a mix of INNER JOIN and LEFT JOIN
SELECT
up.SubscriberKey
, up.EmailAddress
, prod.ProductName
, prod.ProductPrice
, promo.Offer
FROM UpSellCampaignSegment AS up
INNER JOIN ProductDetailsList AS prod
ON prod.ProductId = up.OfferedProduct
LEFT JOIN CustomOfferList AS promo
ON promo.PromotionId = up.OfferedPromotion

In the example above, we took the same query as for the Multiple Left Joins, but we changed the first JOIN type to INNER. What is the impact? This time, we will get only those contacts from Up-Sell-Campaign-Segment that have a matching product in Product-Details-List and only then we would extend them with optional information about the custom offer.

This approach might be better for our needs, as we would be sure that all contacts have available data for Product Name and Product Price. A great way to make our content better and more comfortable to create.

How complex can a JOIN can get? Very. You will encounter such monstrosities, especially in the world of reporting queries that require information from multiple System Data Views - for example, when you want to debug your email sends.

One thing to remember - the more complex query you create, the more error-prone it is and the closer you are to the 30 minutes AutoKill limit for the query execution. Always minimise your query to only the data you truly need.

UNION

Apart from the JOIN, a UNION operator also serves a similar purpose. It concatenates the outcomes of multiple SELECT statements.

Group subscribers from two Data Extensions
SELECT
jan.SubscriberKey
, jan.EmailAddress
FROM JanuaryEventParticipants AS jan

UNION

SELECT
feb.SubscriberKey
, feb.EmailAddress
FROM FebruaryEventParticipantsSegment AS feb

As you can see in the above example, the UNION doesn't require any relationship between the two queries. There is no ON element. On the other hand, the UNION cannot add any new columns.

UNION works only if:

  1. Both queries have the same number of columns.
  2. Both queries have the same order of columns.
  3. Columns in both queries have the same data types.

The names do not have to be the same. You don't even have to add aliases for static values. However, it is best practice to match the aliases across all queries for readability purposes.

To simplify the difference between JOIN and UNION:

  • If you want to add columns - use JOIN.
  • If you want to add rows - use UNION.
  • If you want to add both - use both (or FULL JOIN)
You Should Know

By default, the UNION will ignore duplicates. If you want to keep them, use UNION ALL.

Group subscribers from two Data Extensions
SELECT
jan.SubscriberKey
, jan.EmailAddress
FROM JanuaryEventParticipantsSegment AS jan

UNION ALL

SELECT
feb.SubscriberKey
, feb.EmailAddress
FROM FebruaryEventParticipantsSegment AS feb

The obvious use case for UNION is gathering subscribers from multiple Data Extensions to build a master segment.

There is, however, another use case that I find perfect for UNION - attaching Seedlists.

If you are using a deliverability monitoring suite, you probably are using seedlist. If not - a seedlist is a list of bot email addresses used to estimate your communication inbox placement.

In most cases, you will store your seedlist in a separate Data Extension and will have only Email Address and fake Subscriber Key columns filled in. How to push them quickly into your segment? With UNION, of course:

Append Seedlist to your Segment
SELECT
wel.SubscriberKey
, wel.EmailAddress
, wel.FirstName
, wel.Interest
FROM WelcomeCampaignSegment AS wel

UNION

SELECT
sl.SubscriberKey
, sl.EmailAddress
, 'Seed' AS FirstName
, 'Estimating' AS Interest
FROM Seedlist AS sl

In the example above, we add the Seedlist to the master segment and provide static value placeholders for columns used in personalisation.

You Should Know

You can mix and match JOIN and UNION to create your perfect segment. And even add arbitrary data without a FROM.

AutoKill is the limit of your imagination
SELECT
wel.SubscriberKey
, wel.EmailAddress
, wel.FirstName
, i.Interest
FROM WelcomeCampaignSegment AS wel
LEFT JOIN InterestsDataPoint AS i
ON i.SubscriberKey = wel.SubscriberKey

UNION

SELECT
sl.SubscriberKey
, sl.EmailAddress
, 'Seed' AS FirstName
, 'Estimating' AS Interest
FROM Seedlist AS sl

UNION ALL

SELECT
'123456789987654321' AS SubscriberKey
, '[email protected]' AS EmailAddress
, 'SalesDepartment' AS FirstName
, 'Leads' AS Interest

INTERSECT

The INTERSECT operator is very similar in usage to UNION, but instead of concatenating outcomes, it outputs only those that exist in both queries.

Let's check who converted from our Black Friday promotion
SELECT bf.SubscriberKey
FROM BlackFridayPromoSegment AS bf

INTERSECT

SELECT p.SubscriberKey
FROM PurchasesDataPoint AS p

As with UNION, INTERSECT works only if:

  1. Both queries have the same number of columns.
  2. Both queries have the same order of columns.
  3. Columns in both queries have the same data types.

The check for rows existing in both queries is done considering data in all provided columns.

You can have the same outcome with more control over comparison logic with a mix of JOIN and WHERE.

EXCEPT

The EXCEPT operator is very similar in usage to UNION, but instead of concatenating outcomes, it outputs only those that exist in the first query.

Let's check who did not convert from our Black Friday promotion
SELECT bf.SubscriberKey
FROM BlackFridayPromoSegment AS bf

EXCEPT

SELECT p.SubscriberKey
FROM PurchasesDataPoint AS p

As with UNION, EXCEPT works only if:

  1. Both queries have the same number of columns.
  2. Both queries have the same order of columns.
  3. Columns in both queries have the same data types.

The check for rows existing in both queries is done considering data in all provided columns.

You can have the same outcome with more control over comparison logic with a mix of JOIN and WHERE.

SFMC SQL JOIN Cheat Sheet

All standard SQL JOINs in a single image with name, example SFMC-focused use case, key part of the SQL query and diagrams visualising the output. Ready to bookmark or download. Full size version linked below the preview.



View full size