Skip to main content

SFMC SQL Select

Selected information on the SELECT statement in SFMC SQL

The SELECT statement opens each query. It picks the columns from the source data extensions and system data views and allows you to create entirely new data points.

The simplest version of this statement is SELECT *, which targets all available columns from the source. I do not recommend this approach, as it is terrible for your query's readability and performance. As Salesforce Marketing Cloud caches the source Data Extension column names when you save SQL Query Activity, you won't have added value of flexible selection. You will still have to edit and save the activity. Also, Query Studio does not support this shortcut. Don't use *.

The prefered method is targeting columns explicitly by their names:

Select ContactKey and EmailAddress columns
SELECT      ContactKey    , EmailAddres

If a column name contains a space, hyphen or is a SQL's reserved keyword, you need to capture it with squared brackets:

Let's add another column with whitespace in its name
SELECT      ContactKey    , EmailAddres    , [External Contact Id]

There are also some exceptional cases related to selected System Data Views in Salesforce Marketing Cloud, where you need to use an underscore prefix before a name (for example, when taking Subscriber ID from Enterprise Attributes):

There are some special column names with an underscore prefix
SELECT _SubscriberId

Aliasing with AS#

You can also utilise SELECT as a tool to rename the source columns to match your needs in an outcome data extension. It is especially useful when pulling the data from Synchronized Data Extension with names coming from Salesforce custom fields.

Improve readability and consistency with aliasing
SELECT      ContactKey    , EmailAddress    , [External Contact Id] AS ExternalContactId    , Gender__c             AS Gender

Custom values#

Use aliases also for passing new values directly from your query. Need to add a custom string to outcome data extension? Or maybe synchronisation timestamp for debugging purposes? You can leverage SQL for it.

Add custom column aliased 'ImportDate' with a value equal to the UTC of the script execution for each record
SELECT      ContactKey    , EmailAddress    , [External Contact Id] AS ExternalContactId    , Gender__c             AS Gender    , 'Lead'                AS RecordType    , GETUTCDATE()          AS ImportDate
You Should Know

You can even go further by creating new values by leveraging existing data! For example, if you have a first name and last name, you can use SQL to create a FullName column from scratch:

Let's create a concatenated column for easier personalisation
SELECT      SubscriberKey    , EmailAddress    , FirstName + ' ' + LastName AS FullName

It becomes mighty with the SQL String functions.

Conditional values with CASE#

CASE statement evaluates a list of conditions and returns one of the multiple possible result expressions. For Marketing Cloud purposes, it is especially useful when you want to select a column and modify the values within it. Perfect for translating numeric codes to descriptive ones for personalisation purposes. Excellent for standardising data (change those 1, true and on values of opt-in to a single format).

SELECT      ContactKey    , EmailAddress    , [External Contact Id] AS ExternalContactId    , Gender__c             AS Gender    , 'Lead'                AS RecordType    , GETUTCDATE()          AS ImportDate    , CASE MemberType__c        WHEN 1 THEN 'Bronze'        WHEN 2 THEN 'Silver'        WHEN 3 THEN 'Gold'        WHEN 4 THEN 'Platinum'        ELSE 'Unregistered'      END

You can have as many WHEN/THEN conditions as you want. Also, the ELSE statement is optional but a good idea to catch unexpected values.

Feel free also to leverage aliases to manipulate the column name at the same time:

SELECT      ContactKey    , EmailAddress    , [External Contact Id] AS ExternalContactId    , Gender__c             AS Gender    , 'Lead'                AS RecordType    , GETUTCDATE()          AS ImportDate    , CASE MemberType__c        WHEN 1 THEN 'Bronze'        WHEN 2 THEN 'Silver'        WHEN 3 THEN 'Gold'        WHEN 4 THEN 'Platinum'        ELSE 'Unregistered'      END                   AS MemberType

Search version of CASE#

The above examples are already helpful in segmentation, but there is also one more way to write and use a CASE statement that gives even more power. The search CASE.

The difference is that you are not passing the column name on which you will be building conditions. Instead, you create a boolean expression from scratch for each scenario.

The same query as the previous one, but written with Search Case
SELECT      ContactKey    , EmailAddress    , [External Contact Id] AS ExternalContactId    , Gender__c             AS Gender    , 'Lead'                AS RecordType    , GETUTCDATE()          AS ImportDate    , CASE        WHEN MemberType__c = 1 THEN 'Bronze'        WHEN MemberType__c = 2 THEN 'Silver'        WHEN MemberType__c = 3 THEN 'Gold'        WHEN MemberType__c = 4 THEN 'Platinum'        ELSE 'Unregistered'      END                   AS MemberType

The example above doesn't show why it is worth using the search CASE. It is a bit longer than the standard approach! But there is a reason to use this form of CASE statement that is visible once you want to create more elaborate conditions that are possible only with this version:

Let's calculate what channel we should use in our campaign for a given user
SELECT      ContactKey    , CASE        WHEN IsAppInstalled = 1 THEN 'Push'        WHEN EmailOptIn = 1 AND EmailAddress IS NOT NULL THEN 'Email'        WHEN PhoneOptIn = 1 AND Mobile IS NOT NULL THEN 'Phone'      END AS PreferredChannel
You Should Know

CASE statement returns THEN value of the first condition that evaluated to TRUE. In the above example, if the IsAppInstalled = 1 is true, other conditions are not considered. Even if for selected contact all of them will be true, it will still have the value Push in PreferredChannel. It is perfect, as it allows us to prioritise cheaper channels.

Always consider the order of the conditions in your CASE statement. It both might lead to errors if you don't expect the behaviour mentioned above and make your life much easier if you know it works that way.

As you can see, thanks to search CASE statements, we can check conditions on multiple different columns and connect them with AND/OR statements for better control. You can use this version of the CASE statement for all your needs (even those covered by the basic one) to not worry about two ways of formatting.

You Should Know

You can use CASE statements inside a CASE statement to have an even more dynamic query.

Let's add another dimension of membership level to differentiate available channels
SELECT      ContactKey    , CASE        WHEN MemberType = 'Platinum' THEN 'Key Account Manager'        WHEN MemberType = 'Gold' OR MemberType = 'Silver' THEN            CASE                WHEN IsAppInstalled = 1 THEN 'Push'                WHEN EmailOptIn = 1 AND EmailAddress IS NOT NULL THEN 'Email'                WHEN PhoneOptIn = 1 AND Mobile IS NOT NULL THEN 'Phone'            END        WHEN MemberType = 'Bronze' THEN            CASE                WHEN IsAppInstalled = 1 THEN 'Push'                WHEN EmailOptIn = 1 AND EmailAddress IS NOT NULL THEN 'Email'            END      END AS PreferredChannel

LIKE in CASE#

LIKE is not only valuable for the WHERE Statement. It is also an incredible tool empowering the CASE statement. It allows you to build flexible conditions on free text values:

Great way to split your database between Internal and External users for different communication
SELECT      ContactKey    , CASE        WHEN EmailAddress LIKE '%@mateuszdabrowski.%' THEN 'Internal'        ELSE 'External'      END AS ContactType

In the SELECT statement, the LIKE operator is handy for fixing data coming from the free text fields. To learn more about the possibilities, check out the LIKE guide.

IIF Shorthand#

The CASE statement is a fantastic tool but can be overkill for simple conditions. The IIF comes to the rescue.

It takes three arguments:

  1. Condition
  2. The value returned if the condition evaluates to true
  3. The value returned if the condition evaluates to false

With it, instead of multiline CASE:

SELECT      _Subscriber.SubscriberKey    , CASE        WHEN _Subscriber.Status = 'active' THEN 'TRUE'        ELSE 'FALSE'      END AS Sendable

You can have the same outcome in a single, more readable IIF line:

SELECT      _Subscriber,SubscriberKey,    , IIF(_Subscriber.Status = 'active', 'TRUE', 'FALSE') AS Sendable

Whenever you have a single condition - go with IIF.

You Should Know

You can take both CASE and IIF to the next level with dynamic values. Don't limit yourself to a string, integer or boolean - pass a column name.

Use case? Think of a Master Data Extension that stores a massive amount of contact information, including preferred channel (Email or Mobile). With dynamic values and IIF in just a few lines, we can create an optimised Data Extension with crucial information:

SELECT      SubscriberKey    , PreferredChannel    , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact

Now, even with just three columns in your Entry Point, you can create a Journey that adapts the channel to customer preferences.

TOP#

You can add a TOP clause to your SELECT statement to limit the number of rows returned by your query.

Returns 10 rows
SELECT TOP 10      SubscriberKey    , PreferredChannel    , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact

TOP PERCENT#

The TOP clause is not limited to an integer; you can also ask for a specific per cent of rows with PERCENT modifier:

SELECT TOP 10 PERCENT      SubscriberKey    , PreferredChannel    , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact

The number of rows will be rounded up to the nearest integer (so TOP 50 PERCENT from Data Extension with 101 rows will return 51 rows).

You Should Know

PERCENT modifier doesn't work in Query Studio and will produce an error there. However, it works perfectly in Automation Studio Query Activity.

TOP with ORDER BY#

The TOP clause best practice is to always use it with ORDER BY. Only with sorted data, you will understand rows selection logic.

To sort outcomes of your query, add ORDER BY at the end - along with column(s) that used for sorting.

Top 10% with the lowest Quiz Score
SELECT TOP 10 PERCENT      SubscriberKey    , QuizScoreORDER BY QuizScore

You can control the sort order by applying ASC (default) and DESC after the column.

Top 10% with highest Quiz Score
SELECT TOP 10 PERCENT      SubscriberKey    , QuizScoreORDER BY QuizScore DESC
You Should Know

ORDER BY sorts Text fields alphabetically. It might surprise you if you store numbers in such field type, as it will order 10 before 2.

The best solution is to use the appropriate field type for numbers. However, if for whatever reason you cannot do it, leverage CONVERT:

Top 10% with highest Quiz Score if QuizScore is a Text field
SELECT TOP 10 PERCENT      SubscriberKey    , QuizScoreORDER BY CONVERT(INT, QuizScore) DESC

ORDER BY can operate not only on the selected column. You already saw above that you can use expression (even CASE!) to calculate the sorting value, but it's much more flexible.

You can sort using a column not picked in SELECT:

Top 10% with highest Quiz Score, without returning the score
SELECT TOP 10 PERCENT      SubscriberKeyORDER BY QuizScore DESC

You can sort using an alias defined in the SELECT statement:

Top 10% with highest Quiz Score, using an alias
SELECT TOP 10 PERCENT      SubscriberKey    , QuizScore AS PointsORDER BY Points DESC

And you can even sort using multiple values - each can have different sorting order.

SQL will sort using the first provided column and then, while maintaining the order of the previous column, will sort again using the next value. Let's jump to an example:

Top 10% with highest Quiz Score, first to solve
SELECT TOP 10 PERCENT      SubscriberKey    , QuizScoreORDER BY QuizScore DESC, QuizFilledDate ASC

The query will first sort using QuizScore to find people with the most points. Next, it will apply the second sort on the date of quiz completion. As an outcome, we will have the highest scorers sorted from first to last within each high score.

TOP WITH TIES#

Sometimes even such complex sorting might not be enough.

For example, let's capture three top scorers of our quiz:

Three top scorers
SELECT TOP 3    SubscriberKey    , QuizScoreORDER BY QuizScore DESC

As an outcome, we got:

SubscriberKeyQuizScore
XYZ97
ACE97
PIL96

However, if we check our Data Extension, we could see more people with the same score:

SubscriberKeyQuizScore
XYZ97
ACE97
PIL96
UDO96
SDJ96

What can we do to capture everyone with a score good enough for the Top 3 status? WITH TIES will help:

Despite asking for TOP 3, it will return 5 rows due to ties
SELECT TOP 3 WITH TIES    SubscriberKey    , QuizScoreORDER BY QuizScore DESC

With those two words added after the TOP clause, our query will return all rows that match the sorted value in the last row. It, however, means that the number of outcomes is not set in stone, so be sure to prepare your next steps for that uncertainty.

You Should Know

WITH TIES modifier doesn't work in Query Studio and will produce an error there. However, it works perfectly in Automation Studio Query Activity.

DISTINCT#

The DISTINCT clause is the most straightforward deduplication tool.

When added to the SELECT statement, it will check all available values and leave unique ones.

Unique domains in your database sorted alphabetically
SELECT DISTINCT      DomainORDER BY Domain

It's perfect for single-column deduplication. However, when you apply it to a select with multiple columns, it will leverage all of them for deduplication.

One row for each SubscriberKey - Domain pair, sorted alphabetically by domain
SELECT DISTINCT      SubscriberKey    , DomainORDER BY Domain

In the above case, you will get a separate row for each Subscriber Key - Domain pair. It means that you might receive multiple rows with the same Subscriber Key (but each with a different Domain). And you will undoubtedly see numerous rows with the same domain (but each with a different Subscriber key).

DISTINCT with TOP#

You can use both DISTINCT and TOP in the same query.

SELECT DISTINCT TOP 10 PERCENT      DomainORDER BY Domain

SQL will first deduplicate the results using DISTINCT and then limit output with TOP.

You Should Know

There is an important exception from stadard rules if you are using DISTINCT and ORDER BY in the same query. You have to add all columns used for sorting to the SELECT, which might have painful impact on your deduplication rules.

Table Prefixes#

Additional elements that you will be using in many of your queries are table prefixes for selecting columns from multiple joined Data Extensions.

SELECT      _Subscriber.SubscriberKey    , _EnterpriseAttribute.LastName

I'm covering this topic in-depth in JOIN statement documentation.

Flow control#

Despite official SFMC SQL documentation stating that it supports the IF constructs, I could not make it work.

Variables & Functions#

Unfortunately, SFMC SQL does not support variables.

Fortunately, it does support (selected) functions. Learn more about them in SFMC SQL Functions documentation pages related to Strings, Dates, Numbers, Conversion, Aggregation.