Selected information on the SELECT statement in SFMC SQL
SELECT statement opens each query. It picks the columns from the source data extensions and system data views and allows you to create entirely new data points.
The simplest version of this statement is
SELECT *, which targets all available columns from the source. I do not recommend this approach, as it is terrible for your query's readability and performance. As Salesforce Marketing Cloud caches the source Data Extension column names when you save SQL Query Activity, you won't have added value of flexible selection. You will still have to edit and save the activity. Also, Query Studio does not support this shortcut. Don't use
The prefered method is targeting columns explicitly by their names:
SELECT ContactKey , EmailAddres
If a column name contains a space, hyphen or is a SQL's reserved keyword, you need to capture it with squared brackets:
SELECT ContactKey , EmailAddres , [External Contact Id]
There are also some exceptional cases related to selected System Data Views in Salesforce Marketing Cloud, where you need to use an underscore prefix before a name (for example, when taking Subscriber ID from Enterprise Attributes):
You can also utilise
SELECT as a tool to rename the source columns to match your needs in an outcome data extension. It is especially useful when pulling the data from Synchronized Data Extension with names coming from Salesforce custom fields.
SELECT ContactKey , EmailAddress , [External Contact Id] AS ExternalContactId , Gender__c AS Gender
Use aliases also for passing new values directly from your query. Need to add a custom string to outcome data extension? Or maybe synchronisation timestamp for debugging purposes? You can leverage SQL for it.
SELECT ContactKey , EmailAddress , [External Contact Id] AS ExternalContactId , Gender__c AS Gender , 'Lead' AS RecordType , GETUTCDATE() AS ImportDate
You Should Know
You can even go further by creating new values by leveraging existing data! For example, if you have a first name and last name, you can use SQL to create a FullName column from scratch:
SELECT SubscriberKey , EmailAddress , FirstName + ' ' + LastName AS FullName
It becomes mighty with the SQL String functions.
CASE statement evaluates a list of conditions and returns one of the multiple possible result expressions. For Marketing Cloud purposes, it is especially useful when you want to select a column and modify the values within it. Perfect for translating numeric codes to descriptive ones for personalisation purposes. Excellent for standardising data (change those
on values of opt-in to a single format).
SELECT ContactKey , EmailAddress , [External Contact Id] AS ExternalContactId , Gender__c AS Gender , 'Lead' AS RecordType , GETUTCDATE() AS ImportDate , CASE MemberType__c WHEN 1 THEN 'Bronze' WHEN 2 THEN 'Silver' WHEN 3 THEN 'Gold' WHEN 4 THEN 'Platinum' ELSE 'Unregistered' END
You can have as many
THEN conditions as you want. Also, the
ELSE statement is optional but a good idea to catch unexpected values.
Feel free also to leverage aliases to manipulate the column name at the same time:
SELECT ContactKey , EmailAddress , [External Contact Id] AS ExternalContactId , Gender__c AS Gender , 'Lead' AS RecordType , GETUTCDATE() AS ImportDate , CASE MemberType__c WHEN 1 THEN 'Bronze' WHEN 2 THEN 'Silver' WHEN 3 THEN 'Gold' WHEN 4 THEN 'Platinum' ELSE 'Unregistered' END AS MemberType
The above examples are already helpful in segmentation, but there is also one more way to write and use a
CASE statement that gives even more power. The search
The difference is that you are not passing the column name on which you will be building conditions. Instead, you create a boolean expression from scratch for each scenario.
SELECT ContactKey , EmailAddress , [External Contact Id] AS ExternalContactId , Gender__c AS Gender , 'Lead' AS RecordType , GETUTCDATE() AS ImportDate , CASE WHEN MemberType__c = 1 THEN 'Bronze' WHEN MemberType__c = 2 THEN 'Silver' WHEN MemberType__c = 3 THEN 'Gold' WHEN MemberType__c = 4 THEN 'Platinum' ELSE 'Unregistered' END AS MemberType
The example above doesn't show why it is worth using the search
CASE. It is a bit longer than the standard approach! But there is a reason to use this form of
CASE statement that is visible once you want to create more elaborate conditions that are possible only with this version:
SELECT ContactKey , CASE WHEN IsAppInstalled = 1 THEN 'Push' WHEN EmailOptIn = 1 AND EmailAddress IS NOT NULL THEN 'Email' WHEN PhoneOptIn = 1 AND Mobile IS NOT NULL THEN 'Phone' END AS PreferredChannel
You Should Know
CASE statement returns
THEN value of the first condition that evaluated to
TRUE. In the above example, if the
IsAppInstalled = 1 is true, other conditions are not considered. Even if for selected contact all of them will be true, it will still have the value
Push in PreferredChannel. It is perfect, as it allows us to prioritise cheaper channels.
Always consider the order of the conditions in your
CASE statement. It both might lead to errors if you don't expect the behaviour mentioned above and make your life much easier if you know it works that way.
As you can see, thanks to search
CASE statements, we can check conditions on multiple different columns and connect them with
OR statements for better control. You can use this version of the
CASE statement for all your needs (even those covered by the basic one) to not worry about two ways of formatting.
You Should Know
You can use
CASE statements inside a
CASE statement to have an even more dynamic query.
SELECT ContactKey , CASE WHEN MemberType = 'Platinum' THEN 'Key Account Manager' WHEN MemberType = 'Gold' OR MemberType = 'Silver' THEN CASE WHEN IsAppInstalled = 1 THEN 'Push' WHEN EmailOptIn = 1 AND EmailAddress IS NOT NULL THEN 'Email' WHEN PhoneOptIn = 1 AND Mobile IS NOT NULL THEN 'Phone' END WHEN MemberType = 'Bronze' THEN CASE WHEN IsAppInstalled = 1 THEN 'Push' WHEN EmailOptIn = 1 AND EmailAddress IS NOT NULL THEN 'Email' END END AS PreferredChannel
LIKE is not only valuable for the
WHERE Statement. It is also an incredible tool empowering the
CASE statement. It allows you to build flexible conditions on free text values:
SELECT ContactKey , CASE WHEN EmailAddress LIKE '%@mateuszdabrowski.%' THEN 'Internal' ELSE 'External' END AS ContactType
SELECT statement, the
LIKE operator is handy for fixing data coming from the free text fields. To learn more about the possibilities, check out the
CASE statement is a fantastic tool but can be overkill for simple conditions. The
IIF comes to the rescue.
It takes three arguments:
- The value returned if the condition evaluates to true
- The value returned if the condition evaluates to false
With it, instead of multiline
SELECT _Subscriber.SubscriberKey , CASE WHEN _Subscriber.Status = 'active' THEN 'TRUE' ELSE 'FALSE' END AS Sendable
You can have the same outcome in a single, more readable
SELECT _Subscriber,SubscriberKey, , IIF(_Subscriber.Status = 'active', 'TRUE', 'FALSE') AS Sendable
Whenever you have a single condition - go with
You Should Know
You can take both
IIF to the next level with dynamic values. Don't limit yourself to a string, integer or boolean - pass a column name.
Use case? Think of a Master Data Extension that stores a massive amount of contact information, including preferred channel (Email or Mobile). With dynamic values and
IIF in just a few lines, we can create an optimised Data Extension with crucial information:
SELECT SubscriberKey , PreferredChannel , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact
Now, even with just three columns in your Entry Point, you can create a Journey that adapts the channel to customer preferences.
You can add a
TOP clause to your
SELECT statement to limit the number of rows returned by your query.
SELECT TOP 10 SubscriberKey , PreferredChannel , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact
TOP clause is not limited to an integer; you can also ask for a specific per cent of rows with
SELECT TOP 10 PERCENT SubscriberKey , PreferredChannel , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact
The number of rows will be rounded up to the nearest integer (so
TOP 50 PERCENT from Data Extension with 101 rows will return 51 rows).
You Should Know
PERCENT modifier doesn't work in Query Studio and will produce an error there. However, it works perfectly in Automation Studio Query Activity.
TOP clause best practice is to always use it with
ORDER BY. Only with sorted data, you will understand rows selection logic.
To sort outcomes of your query, add
ORDER BY at the end - along with column(s) that used for sorting.
SELECT TOP 10 PERCENT SubscriberKey , QuizScoreORDER BY QuizScore
You can control the sort order by applying
ASC (default) and
DESC after the column.
SELECT TOP 10 PERCENT SubscriberKey , QuizScoreORDER BY QuizScore DESC
You Should Know
ORDER BY sorts Text fields alphabetically. It might surprise you if you store numbers in such field type, as it will order
The best solution is to use the appropriate field type for numbers. However, if for whatever reason you cannot do it, leverage
SELECT TOP 10 PERCENT SubscriberKey , QuizScoreORDER BY CONVERT(INT, QuizScore) DESC
ORDER BY can operate not only on the selected column. You already saw above that you can use expression (even
CASE!) to calculate the sorting value, but it's much more flexible.
You can sort using a column not picked in
SELECT TOP 10 PERCENT SubscriberKeyORDER BY QuizScore DESC
You can sort using an alias defined in the
SELECT TOP 10 PERCENT SubscriberKey , QuizScore AS PointsORDER BY Points DESC
And you can even sort using multiple values - each can have different sorting order.
SQL will sort using the first provided column and then, while maintaining the order of the previous column, will sort again using the next value. Let's jump to an example:
SELECT TOP 10 PERCENT SubscriberKey , QuizScoreORDER BY QuizScore DESC, QuizFilledDate ASC
The query will first sort using QuizScore to find people with the most points. Next, it will apply the second sort on the date of quiz completion. As an outcome, we will have the highest scorers sorted from first to last within each high score.
Sometimes even such complex sorting might not be enough.
For example, let's capture three top scorers of our quiz:
SELECT TOP 3 SubscriberKey , QuizScoreORDER BY QuizScore DESC
As an outcome, we got:
However, if we check our Data Extension, we could see more people with the same score:
What can we do to capture everyone with a score good enough for the Top 3 status?
WITH TIES will help:
SELECT TOP 3 WITH TIES SubscriberKey , QuizScoreORDER BY QuizScore DESC
With those two words added after the
TOP clause, our query will return all rows that match the sorted value in the last row. It, however, means that the number of outcomes is not set in stone, so be sure to prepare your next steps for that uncertainty.
You Should Know
WITH TIES modifier doesn't work in Query Studio and will produce an error there. However, it works perfectly in Automation Studio Query Activity.
DISTINCT clause is the most straightforward deduplication tool.
When added to the
SELECT statement, it will check all available values and leave unique ones.
SELECT DISTINCT DomainORDER BY Domain
It's perfect for single-column deduplication. However, when you apply it to a select with multiple columns, it will leverage all of them for deduplication.
SELECT DISTINCT SubscriberKey , DomainORDER BY Domain
In the above case, you will get a separate row for each Subscriber Key - Domain pair. It means that you might receive multiple rows with the same Subscriber Key (but each with a different Domain). And you will undoubtedly see numerous rows with the same domain (but each with a different Subscriber key).
You can use both
TOP in the same query.
SELECT DISTINCT TOP 10 PERCENT DomainORDER BY Domain
SQL will first deduplicate the results using
DISTINCT and then limit output with
You Should Know
There is an important exception from stadard rules if you are using
ORDER BY in the same query. You have to add all columns used for sorting to the
SELECT, which might have painful impact on your deduplication rules.
Additional elements that you will be using in many of your queries are table prefixes for selecting columns from multiple joined Data Extensions.
SELECT _Subscriber.SubscriberKey , _EnterpriseAttribute.LastName
I'm covering this topic in-depth in
JOIN statement documentation.
Despite official SFMC SQL documentation stating that it supports the
IF constructs, I could not make it work.
Unfortunately, SFMC SQL does not support variables.