This is a very stupid problem. I am not writing this post because it’s a fascinating topic—rather, I’m writing it in the hopes that you avoid the headaches that I’ve gone through scouring the internet for the best answer to this question:

Let’s say I have a Redshift table users. This table gets loaded via some process I don’t control. It contains a field, amount, that gets loaded as a varchar when it should really be an int. There are a very small number of records (< .01%) that are not valid integers, and so simple ::int fails. How do I convert it?

This situation happens constantly for analysts. We don’t control our source data: data is being loaded into Redshift by products like Stitch and Fivetran and by the data engineers we work with. We have to deal with the data that comes to us, not the data we wish we had.

I’ve spent several hours searching for an elegant solution. This seems like a problem that people would run into all the time, right? There must be an elegant way.

As it turns out, BigQuery and Snowflake both have a function called try_cast. This function does exactly what you would want: if it’s able to convert from one format to another, it does so. If not, it returns a null. Sweet. Unfortunately, Redshift doesn’t have a try_cast function. I’m going to have to hack one together myself.

This is very solvable with Redshift UDFs, but I want to impose the constraint that the solution not involve a UDF. UDFs are annoying to install and manage; I’d prefer a solution that was pure SQL.

Here’s my answer, in several steps:

1. Trim out spaces: trim(amount)

2. Convert non-integer fields to nulls via regex:

when amount ~ ‘^[0–9]+$’ then amount
else null

3. Cast the resulting value: amount::int

Here’s the whole thing put together:

    when trim(amount) ~ '^[0-9]+$' then trim(amount)
    else null 
end::int as amount

Wrapping it up in a macro

Did you really want to think through how to solve that problem? Are you a regex expert? For me, the answers are no, and no.

That’s why programming has functions: one programmer can figure out how to do something and wrap that logic up in a function so that other programmers can easily use the functionality without understanding the implementation. And if you’re a dbt user, you can use and build your own SQL functions too.

I was annoyed at the amount of time I invested in this problem, so I wrote a try_cast macro for the dbt Redshift package (see it here). Now you can just write {{try_cast('amount', 'int')}} and you’ll get back a safely-converted integer.

What annoying Redshift hacks do you use that should really be turned into macros? We’d love contributions :)

⚡️Ready to improve your analytics engineering workflow?  Get started with dbt today. ⚡️