SFMC SQL Select

Selected information on the SELECT statement in SFMC SQL

The SELECT statement opens each query. It picks the columns from the source data extensions and system data views and allows you to create entirely new data points.

The simplest version of this statement is SELECT *, which targets all available columns from the source. I do not recommend this approach, as it is terrible for your query's readability and performance. As Salesforce Marketing Cloud caches the source Data Extension column names when you save SQL Query Activity, you won't have added value of flexible selection. You will still have to edit and save the activity. Also, Query Studio does not support this shortcut. Don't use *.

The prefered method is targeting columns explicitly by their names:

Select ContactKey and EmailAddress columns
SELECT
      ContactKey
    , EmailAddres

If a column name contains a space, hyphen or is a SQL's reserved keyword, you need to capture it with squared brackets:

Let's add another column with whitespace in its name
SELECT
      ContactKey
    , EmailAddres
    , [External Contact Id]

There are also some exceptional cases related to selected System Data Views in Salesforce Marketing Cloud, where you need to use an underscore prefix before a name (for example, when taking Subscriber ID from Enterprise Attributes):

There are some special column names with an underscore prefix

SELECT _SubscriberId

Aliasing with AS

You can also utilise SELECT as a tool to rename the source columns to match your needs in an outcome data extension. It is especially useful when pulling the data from Synchronized Data Extension with names coming from Salesforce custom fields.

Improve readability and consistency with aliasing
SELECT
      ContactKey
    , EmailAddress
    , [External Contact Id] AS ExternalContactId
    , Gender__c             AS Gender

Custom values

Use aliases also for passing new values directly from your query. Need to add a custom string to outcome data extension? Or maybe synchronisation timestamp for debugging purposes? You can leverage SQL for it.

Add custom column aliased 'ImportDate' with a value equal to the UTC of the script execution for each record
SELECT
      ContactKey
    , EmailAddress
    , [External Contact Id] AS ExternalContactId
    , Gender__c             AS Gender
    , 'Lead'                AS RecordType
    , GETUTCDATE()          AS ImportDate

You Should Know

You can even go further by creating new values by leveraging existing data! For example, if you have a first name and last name, you can use SQL to create a FullName column from scratch:

Let's create a concatenated column for easier personalisation
SELECT
      SubscriberKey
    , EmailAddress
    , FirstName + ' ' + LastName AS FullName

It becomes mighty with the SQL String functions.

TOP

You can add a TOP clause to your SELECT statement to limit the number of rows returned by your query.

Returns 10 rows
SELECT TOP 10
      SubscriberKey
    , PreferredChannel
    , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact

TOP PERCENT

The TOP clause is not limited to an integer; you can also ask for a specific per cent of rows with PERCENT modifier:

SELECT TOP 10 PERCENT
      SubscriberKey
    , PreferredChannel
    , IIF(PreferredChannel = 'Email', EmailAddress, MobileNumber) AS PointOfContact

The number of rows will be rounded up to the nearest integer (so TOP 50 PERCENT from Data Extension with 101 rows will return 51 rows).

You Should Know

PERCENT modifier doesn't work in Query Studio and will produce an error there. However, it works perfectly in Automation Studio Query Activity.

TOP with ORDER BY

The TOP clause best practice is to always use it with ORDER BY. Only with sorted data, you will understand rows selection logic.

To sort outcomes of your query, add ORDER BY at the end - along with column(s) that used for sorting.

Top 10% with the lowest Quiz Score
SELECT TOP 10 PERCENT
      SubscriberKey
    , QuizScore
ORDER BY QuizScore

You can control the sort order by applying ASC (default) and DESC after the column.

Top 10% with highest Quiz Score
SELECT TOP 10 PERCENT
      SubscriberKey
    , QuizScore
ORDER BY QuizScore DESC

You Should Know

ORDER BY sorts Text fields alphabetically. It might surprise you if you store numbers in such field type, as it will order 10 before 2.

The best solution is to use the appropriate field type for numbers. However, if for whatever reason you cannot do it, leverage CONVERT:

Top 10% with highest Quiz Score if QuizScore is a Text field
SELECT TOP 10 PERCENT
      SubscriberKey
    , QuizScore
ORDER BY CONVERT(INT, QuizScore) DESC

For example, it will be needed when you ORDER BY COUNT within GROUP BY.

ORDER BY will also sort dates in Marketing Cloud alphabetically using month(start with Apr, follow up with Aug and so on). You can fix it by using CONVERT to DATE (if you to order only by date) or DATETIME2 (if time is important for ordering as well), just as in previous example.

ORDER BY can operate not only on the selected column. You already saw above that you can use expression (even CASE!) to calculate the sorting value, but it's much more flexible.

You can sort using a column not picked in SELECT:

Top 10% with highest Quiz Score, without returning the score
SELECT TOP 10 PERCENT
      SubscriberKey
ORDER BY QuizScore DESC

You can sort using an alias defined in the SELECT statement:

Top 10% with highest Quiz Score, using an alias
SELECT TOP 10 PERCENT
      SubscriberKey
    , QuizScore AS Points
ORDER BY Points DESC

And you can even sort using multiple values - each can have different sorting order.

SQL will sort using the first provided column and then, while maintaining the order of the previous column, will sort again using the next value. Let's jump to an example:

Top 10% with highest Quiz Score, first to solve
SELECT TOP 10 PERCENT
      SubscriberKey
    , QuizScore
ORDER BY QuizScore DESC, QuizFilledDate ASC

The query will first sort using QuizScore to find people with the most points. Next, it will apply the second sort on the date of quiz completion. As an outcome, we will have the highest scorers sorted from first to last within each high score.

TOP WITH TIES

Sometimes even such complex sorting might not be enough.

For example, let's capture three top scorers of our quiz:

Three top scorers
SELECT TOP 3
    SubscriberKey
    , QuizScore
ORDER BY QuizScore DESC

As an outcome, we got:

SubscriberKey	QuizScore
XYZ	97
ACE	97
PIL	96

However, if we check our Data Extension, we could see more people with the same score:

SubscriberKey	QuizScore
XYZ	97
ACE	97
PIL	96
UDO	96
SDJ	96

What can we do to capture everyone with a score good enough for the Top 3 status? WITH TIES will help:

Despite asking for TOP 3, it will return 5 rows due to ties
SELECT TOP 3 WITH TIES
    SubscriberKey
    , QuizScore
ORDER BY QuizScore DESC

With those two words added after the TOP clause, our query will return all rows that match the sorted value in the last row. It, however, means that the number of outcomes is not set in stone, so be sure to prepare your next steps for that uncertainty.

You Should Know

WITH TIES modifier doesn't work in Query Studio and will produce an error there. However, it works perfectly in Automation Studio Query Activity.

DISTINCT

The DISTINCT clause is the most straightforward deduplication tool.

When added to the SELECT statement, it will check all available values and leave unique ones.

Unique domains in your database sorted alphabetically
SELECT DISTINCT
      Domain
ORDER BY Domain

It's perfect for single-column deduplication. However, when you apply it to a select with multiple columns, it will leverage all of them for deduplication.

One row for each SubscriberKey - Domain pair, sorted alphabetically by domain
SELECT DISTINCT
      SubscriberKey
    , Domain
ORDER BY Domain

In the above case, you will get a separate row for each Subscriber Key - Domain pair. It means that you might receive multiple rows with the same Subscriber Key (but each with a different Domain). And you will undoubtedly see numerous rows with the same domain (but each with a different Subscriber key).

DISTINCT with TOP

You can use both DISTINCT and TOP in the same query.

SELECT DISTINCT TOP 10 PERCENT
      Domain
ORDER BY Domain

SQL will first deduplicate the results using DISTINCT and then limit output with TOP.

You Should Know

There is an important exception from stadard rules if you are using DISTINCT and ORDER BY in the same query. You have to add all columns used for sorting to the SELECT, which might have painful impact on your deduplication rules.

Table Prefixes

Additional elements that you will be using in many of your queries are table prefixes for selecting columns from multiple joined Data Extensions.

SELECT
      _Subscriber.SubscriberKey
    , _EnterpriseAttribute.LastName

I'm covering this topic in-depth in JOIN statement documentation.

Flow control

Despite official SFMC SQL documentation stating that it supports the IF constructs, I could not make it work.

Variables & Functions

Unfortunately, SFMC SQL does not support variables.

Fortunately, it does support (selected) functions. Learn more about them in SFMC SQL Functions documentation pages related to Strings, Dates, Numbers, Conversion, Aggregation, CASE and LIKE.

Aliasing with AS​

Custom values​

TOP​

TOP PERCENT​

TOP with ORDER BY​

TOP WITH TIES​

DISTINCT​

DISTINCT with TOP​

Table Prefixes​

Flow control​

Variables & Functions​

Aliasing with AS

Custom values

TOP

TOP PERCENT

TOP with ORDER BY

TOP WITH TIES

DISTINCT

DISTINCT with TOP

Table Prefixes

Flow control

Variables & Functions