Automating my job by using GPT-3 to generate database-ready SQL to answer business questions

As an analyst, I spend a lot of my time writing SQL (or other code) to answer questions about our business. These questions can range from simple customer support queries ("Does user X have the right plan?") to evaluating KPIs and growth metrics ("How many users signed up in the last month and what percent of those converted to paid?") to the more open-ended ("How much revenue will we have in 12 months?").

To make my job easier, I try to automate as many of these questions as I can. My company, SeekWell, builds awesome tools to help with this, like a unified team SQL repository and seamless scheduling of queries, reports, and alerts.

Many things, though, like actually writing SQL code, are difficult to automate—or at least have been.

Enter GPT-3

Openai's GPT-3 is starting to break the conventional wisdom of which tasks can and can't be automated. At the most basic level, GPT-3 is a text-completion engine, trained on huge swaths of the internet. It takes inputted text and returns the text that it thinks would appear next.

Many have already used it to generate HTML and CSS code from specific design instructions. Others have made #1 trending blog posts generated mostly by GPT-3 with some creative prompts.

In my case, since so much of my job is writing SQL, I want to be able to describe a question in plain English and have GPT-3 convert it into the SQL code that, if executed on my Postgres database, would answer the question.

To accomplish this, I found more success using GPT-3 Instruct (still in beta) than the more generalized GPT-3 engines. GPT-3 Instruct (in particular, davinci-instruct) lets you give specific instructions, like "Only respond in correct SQL syntax", that guides GPT-3's responses.

If you're interested in trying it out, you can sign up for the waitlist for the GPT-3 API here.

Below, I'll detail my experience with the API. In each example, I'll include a gif of asking GPT-3 a question. Below the gif will be the input question (generated by me, in green) and GPT-3's response translating it into SQL (generated by GPT-3, in blue). I'll also include my instructions to GPT-3 (in yellow) and examples I fed GPT-3 (in orange).

Here's the first instruction I gave it, a question, and the response from GPT-3:

Instruction: Given an input question, respond with syntactically correct PostgreSQL. Be creative but the SQL must be correct.

Input: how many users signed up in the past month?

GPT-3 Response: SELECT COUNT(*) FROM users WHERE signup_time > now() - interval '1 month'

Woah. I never gave it my database schema but it assumes I have a table called "users" (which is accurate) and that there's a timestamp field called "signup_time" for when a user signed up. Pretty logical.

It also knew the correct way to subtract a month from today's date in PostgreSQL (now() - interval '1 month') without seeing an example. Pretty cool!

To give GPT-3 a little more context, I'll add information about my database schema to the instruction.

My new Instruction:

Instruction: Given an input question, respond with syntactically correct PostgreSQL. Be creative but the SQL must be correct. Only use tables called "users" and "charges". The "users" table has columns: id (integer), signup_dt (timestamp), email (character varying), and plan_type (character varying). The "charges" table has columns: amount (bigint), user_id (integer), and charge_dt (timestamp).

Now I'll try asking another question, and see if GPT-3 can respond based on my actual database:

Input: how much revenue did we have in the past 7 days?

GPT-3 Response: SELECT SUM(amount) FROM charges WHERE charge_dt > now() - interval '7 days'

Nice. I gave it a little more information on the tables and columns in my database, and it correctly inferred that "amount" = "revenue" and knew to SUM() that amount for the past 7 days of charges to answer the question. Clearly, it already knows some basic SQL syntax, and can make inferences about what I mean.

Priming GPT-3 with examples

I'll try another question but first, it might make sense to explicitly tell GPT-3 of the columns in my tables via a simple example. Examples are a great way to give GPT-3 context and flesh out subtleties that might not be obvious in just an instruction. I found this helpful for really hammering into it which columns and tables are eligible to use (GPT-3 sometimes likes to "invent" columns to make answering the question easier).

Here are the examples I added to the prompt:

Input: select all available columns from users table. Output: SELECT id, signup_dt, email, plan_type FROM users

Input: select all available columns from charges table. Output: SELECT amount, user_id, and charge_dt FROM charges

Now with those examples fed to GPT-3, I'll ask another question:

Input: what was the largest payment amount made in the past 30 days?

GPT-3 Response: SELECT MAX(amount) FROM charges WHERE charge_dt > NOW() - INTERVAL '30 days'

It knew to use MAX(amount) to get the largest payment and set the date condition correctly without seeing an example on how to subtract a number of days from the current date.

It's doing pretty well, but let's try something a little trickier now, using the same instruction and examples:

Input: when did user with email [email protected] make his last payment?

GPT-3 Response: SELECT MAX(charge_dt) FROM charges WHERE user_id = 1 AND email = '[email protected]'

Okay, not quite right.

The "charges" table doesn't have a column for email, so this query would fail. I also never said the user with email = "[email protected]" has user_id = 1—don't know where it got that idea from. To answer the question correctly, you would have to join the "charges" table with the "users" table since only "users" has an email column.

Since it's a big leap to know to join those tables on users.id = charges.user_id (even if that seems perfectly logical to a human analyst), I'll give GPT-3 an example of a "join" so it has something to go off of:

Input: when did jane@seekwell.io make her first payment? Output: SELECT MIN(charge_dt) as last_payment_dt from charges LEFT JOIN users ON users.id = charges.user_id WHERE users.email = 'jane@seekwell.io'

Now I'll try again:

Input: when did user with email [email protected] make his last payment?

GPT-3 Response: SELECT MAX(charge_dt) as last_payment_dt from charges LEFT JOIN users ON users.id = charges.user_id WHERE users.email = '[email protected]'

So maybe this made it a little too easy based on the example, but it correctly changed MIN() → MAX() and replaced the email with the one I asked for—so it wasn't a total gimme.

Let's try another level of difficulty, but first, I'll add in a few more examples to provide more context and improve the responses. New examples:

Input: how many users signed up in the past 30 days? Output: SELECT COUNT(*) FROM users WHERE signup_dt >= now() - interval '30 days'

Input: when did user with email [email protected] sign up? Output: SELECT signup_dt FROM users WHERE email = '[email protected]'

Input: how much revenue did we have in the past 7 days? Output: SELECT SUM(amount) from charges WHERE charge_dt >= now() - interval '7 days'

Input: how much revenue did we have from 10-01-20 through 11-15-20? Output: SELECT SUM(case when charge_dt>= '10-01-20'::date and charge_dt < '11-15-20'::date then amount else 0 end) as revenue FROM charges

Input: how much revenue have we had from users that signed up in the last 6 months? Output: SELECT SUM(charges.amount) FROM users INNER JOIN charges ON users.id = charges.user_id WHERE users.signup_dt>= now() - interval '6 months'

Input: when did user with email [email protected] make his first payment? Output: SELECT MIN(charge_dt) as last_payment_dt from users INNER JOIN charges ON users.id = charges.user_id WHERE users.email = '[email protected]'

Input: how many new users signed up in each of the last 2 months? Output: SELECT sum(case when signup_dt>= now() - interval '1 month' then 1 else 0 end) as signups_this_month, sum(case when signup_dt>= now() - interval '2 months' and signup_dt < now() - interval '1 month' then 1 else 0 end) as signups_last_month FROM users

These examples give it some basic queries so it's more familiar with my database and the specific terms and concepts of my use case (like that a "paid user" means their plan_type="pro"). I also gave it a taste of some more advanced concepts like aggregating a metric (revenue) between 2 dates.

Now I'll try something harder:

Input: how many payments has user with email [email protected] made?

GPT-3 Response: SELECT COUNT(*) FROM users INNER JOIN charges ON users.id = charges.user_id WHERE users.email = '[email protected]'

Cool! GPT-3 intuited that COUNT() was the appropriate expression here, knew to join the "users" with the "charges" table, and correctly set the email to the one that I asked about.

Another question for GPT-3:

Input: how much revenue have we received from users with an email ending in 'seekwell.io' in the last 3 months?

GPT-3 Response: SELECT SUM(charges.amount) FROM users INNER JOIN charges ON users.id = charges.user_id WHERE users.email LIKE '%seekwell.io%' AND users.signup_dt>= now() - interval '3 months'

Perfect! It knew that "email ending in seekwell.io" translated in SQL to "users.email LIKE '%seekwell.io%' " (even though it hadn't seen that before) and took the sum of all charges for users matching that condition. And another:

Input: how much revenue did we have between June 1st and August 1st 2020?

GPT-3 Response: SELECT SUM(case when charge_dt >= '06-01-2020'::date and charge_dt < '08-01-2020'::date then amount else 0 end) as revenue FROM charges

This one was a little easier since I already taught it how to get revenue from 10-01-20 through 11-15-20, but it did know to convert June 1st and August 1st to their appropriate date formats in SQL ('06-01-2020' and '08-01-2020' respectively). If it had just replaced the dates from my example with "June 1st" and "August 1st", then I'd know it wasn't getting the concept of dates but just pattern matching to the closest example.

With this result, I can tell GPT-3 is at least somewhat getting the concept of dates.

Taking GPT-3 to the next level

Thus far, I've been asking pretty basic questions about users, revenue, and growth metrics, which GPT-3 has been handling well. So let's make it a little more challenging.

I want to know what percent of users converted to paid. GPT-3, your go:

Input: what percent of users who signed up converted to paid?

GPT-3 Response: SELECT COUNT() as total_users, COUNT() as total_paid_users, 100.0*(COUNT() as total_paid_users) / COUNT() as total_users AS percent_converted FROM users

Okay, not quite what I was going for, and definitely not correct SQL. I don't think it really understands the concept of "percent"—just that it involves COUNT() and multiplying by 100.

I'll add an example that implements that concept and see if GPT-3 can catch on. My new example using "percent":

Input: what percent of users have an email ending in 'gmail.com'? Output: SELECT SUM(CASE WHEN email like '%gmail.com' then 1 else 0 end)/COUNT(*) as percent_gmail FROM users

Now when I try the same question I get:

Input: what percent of users who signed up converted to paid?

GPT-3 Response: SELECT SUM(CASE WHEN plan_type = 'pro' THEN 1 else 0 end)/COUNT(*) as percent_paid FROM users

Much cleaner. It knew to divide the number of users with plan_type = 'pro' by the total number of users to get the percent I asked for. I can also ask the same thing for a subset of users (e.g. users that signed up in the last 6 months) and get an accurate response:

Input: what percent of users that signed up in the past 6 months converted to paid?

GPT-3 Response: SELECT SUM(CASE WHEN plan_type='pro' THEN 1 ELSE 0 END)/COUNT(*) AS percent_paid FROM users WHERE users.signup_dt>= now() - interval '6 months'

Tuning GPT-3's parameters

Besides giving GPT-3 an instruction and some examples, there are a few parameters that can be tuned to increase GPT-3's effectiveness.

An important one is "temperature" which translates essentially to how much randomness GPT-3 incorporates into its response. If you just want the highest probability response (i.e. no randomness) then you can set the temperature = 0.

Take, for example, the (somewhat tricky) prompt, "how much revenue did we have in each of the last 2 months?". Setting the temperature to .8 gives you something utterly nonsensical:

Input: how much revenue did we have in each of the last 2 months?

.8 Temperature response GPT-3 Response: SELECT SUM(case when charge_dt>= now() - interval '1 month' then charges.amount else 0 end) as revenue_this_month, CASE WHEN signup_dt>= now() - interval '2 months' AND signup_dt < now() - interval '1 month' THEN SUM(charges.amount) else 0 end as revenue_last_month FROM charges LEFT OUTER JOIN users ON users.id = charges.user_id \ WHERE users.signup_dt >= now() - interval '1 month'

Meanwhile, setting the temperature to a cooler .3 gives you a working query that answers the question:

Input: how much revenue did we have in each of the last 2 months?

.3 Temperature GPT-3 Response: SELECT SUM(case when charge_dt>= now() - interval '1 month' then amount else 0 end) as revenue_this_month, SUM(case when charge_dt>= now() - interval '2 months' and charge_dt < now() - interval '1 month' then amount else 0 end) as revenue_last_month FROM charges

The higher temperature GPT-3 is being more "creative", using expressions and techniques it hasn't seen before, but also overcomplicates it and gives a mostly nonsensical response. The .3 temperature response, on the other hand, correctly converted the question to SQL.

In my experience, if you're asking a question that follows a pattern already established by an example, cooler temperatures tend to perform better. But if you're asking something that demands more creativity, higher temperatures are optimal since GPT-3 is more likely to try something it hasn't been explicitly taught.

In the case below I ask, "how much revenue have we had from users that signed up in the last 6 months?" (after removing it from my examples). With a lower temperature, GPT-3 tried to invent a "signup_dt" column in the "charges" table so it didn't have to join the "users" and "charges" tables together. With a higher temperature, it did join them, which was necessary to answer the question correctly.

Here is the .8 temperature response:

Input: how much revenue have we had from users that signed up in the last 6 months?

.8 Temperature GPT-3 Response: SELECT SUM(charges.amount) FROM users INNER JOIN charges ON users.id = charges.user_id WHERE signup_dt >= DATE_SUB(now(), INTERVAL '6 months')

To be sure, "DATE_SUB(now(), INTERVAL '6 months')" is not a valid Postgres expression (one of the drawbacks of higher temperatures is it tries things that might not work), so this query would technically fail. But structurally, it's on the right path by joining "charges" and "users" so it can condition on "signup_dt".

Meanwhile, the .2 temperature response was totally inaccurate in using a "signup_dt" column that doesn't actually exist in the "charges" table:

Input: how much revenue have we had from users that signed up in the last 6 months?

.2 Temperature GPT-3 Response: SELECT SUM(CASE WHEN signup_dt >= now() - interval '6 months' THEN amount ELSE 0 END) AS revenue FROM charges

Conclusion

Now, I've got a GPT-3 instance that takes a plain English question and translates it to SQL that really works on my database. While not always perfect, and still needs some handholding for more complex concepts like "growth rate" or "percent", it's definitely useful. I get to save a little time when I have a simple question that needs to be asked about my database, and don't feel like writing the SQL myself.

Even just the fact that GPT-3 knew SQL concepts like adding or subtracting time intervals from dates—without having seen an example of it first—means that it can be useful for beginners unfamiliar with SQL syntax. Simply asking, "GPT-3, how do you subtract 30 days from today's date in SQL?" seems easier than Googling or reading documentation.

As for actually acting on the answers you get from GPT-3—that's still a human job, for now.

Quick plug: We're currently testing a feature for SeekWell customers that uses GPT-3 to automatically generate SQL from plain English, customized to your database schema. If you're interested in trying this out, let us know at [email protected].

Code used for this project can be found on my github.

By Brian Kane @SeekWell.