I’d like to ask for clarification of language you used in todays update. You said Meta’s AITemplate is "much more performant than Nvidia’s own implementation." My understanding of the press release is that the 12x improvement is of AITemplate compared against the current version of PyTorch. Nvidias own library is much more performant than PyTorch, and Meta’s press release describes the performance of AITemplate as "close to hardware-native."
My understanding of your language was an implication that Meta’s library outperforms Nvidia’s own inference library, but my understanding of Metas press release is that they outperform PyTorch – the previous best inference library that was compatible with both Nvidia and AMD hardware.
This is exactly right; it was just a dumb mistake on my part. The broader analysis does still hold, but it’s more about AITemplate being almost as fast as native CUDA code, not being faster. My apologies for the mistake, and thanks to the readers who reached out to correct me.
I don’t generally do interviews with investors, but today is an exception. Daniel Gross founded Cue, a search engine that was bought by Apple and incorporated into iOS, and led machine learning efforts at Apple from 2013-2017, before becoming a partner at YCombinator and then transitioning into angel investing. Nat Friedman co-founded Xamarin, an open-source cross-platform SDK which was bought by Microsoft in 2016; Friedman led Microsoft’s acquisition of GitHub of 2018, and was CEO of the developer-focused company until last year; he too is now focused on angel investing.
Gross and Friedman are friends of mine, and in that context asked me to review the website for a grant program they were developing for new companies focused on AI; that led to a number of conversations about the democratization of AI that I have been focused on on Stratechery over the last couple of months, and I thought it would be interesting to capture some of that conversation for you.
To listen to this interview as a podcast, click the link at the top of this email to add Stratechery to your podcast player.
On to the interview:
An Interview With Daniel Gross and Nat Friedman About the Democratization of AI
Nat Friedman and Daniel Gross, it’s exciting to talk to you for a Stratechery Interview. Before we get into the topic at hand, which is AI, and I will get into the genesis of this conversation in a moment, but one thing I do like to do with these interviews is get more into the background of folks. It’s a little tricky because I guess this is the first disclosure, I am friends with you in real life, so I know your background, but for people who don’t know who you are, give me a little bit about where you came from. Daniel, why don’t you go first?
Daniel Gross: Sounds great, thanks for having us here. I am originally from Jerusalem, Israel. I grew up in a world and life where the physical world was very different from the digital one I experienced. I grew up to an orthodox family and it was not very much a technology-first world, but I really found myself on the Internet like I think many of us did, and I really ended up, I feel like, growing up online more than I did in Jerusalem, which although, admittedly, Israel is a very big tech hub — Jerusalem’s quite different from Tel Aviv, it’s a bit like comparing Kyoto to Tokyo. I came out to Silicon Valley when I was pretty young and ended up starting a search engine called Cue, which raised some money from some VCs in Silicon Valley and ended up getting acquired by Apple in 2013. I ran search and machine learning at Apple for about three-and-a-half years.
Basically, as I’m sure many people know, Apple has a very unique way of building an org where there’s the org structure and then there’s this thing called the DRI model, which is a virtual org built on top of it with no guaranteed annuity. So every year, there are different DRIs for different big things at Apple,
DRI being Directly Responsible Individual.
DG: That’s right. So you get your name put up on every slide, and I think it was Steve [Jobs’] systematic way of ensuring both accountability and the ability to highlight and have people be responsible for projects in the organization regardless of where they were. In many ways, I was the recipient of that, although by the time we got acquired, Steve wasn’t there anymore. I think I was 23 years old and I was running machine learning for Apple because of the DRI model, and plausibly also because of the acquisition, I got thrown up into a fairly high part of the organization. I did that for a while and it was a different era of machine learning, which we’ll talk about, this architecture wasn’t quite invented then. I remember us very much desperately trying to make models, do intelligent things with the keyboard, and still work in progress.
(laughing)Particularly the keyboard, right?
DG: I don’t want to know.
Way to go straight to a sore spot. There’s two things about that that I really like. Number one, this bit about growing up in different worlds or the virtual and the physical just because, I just relate to that so much. Growing up in a small town in Wisconsin and no one around me even knew really what the Internet was. I put on my Twitter bio very, very early, "Home on the Internet," and that’s become a very tangible thing living abroad, it’s been a defining characteristic and definitely people who grew up in a similar way, you definitely feel that it is a shared culture, just this culture was not in a physical world, it was online.
The other bit is the Apple bit. I owe Apple a lot because when I was in business school, I had been an English teacher in Taiwan, I wanted to work in tech, no one would even interview me or give me a job. There was an Apple hiring manager who talked to me and said "That’s the weirdest path I’ve seen in a long time. You’re hired." I actually barely went through any interviews but she was like, "We find it really important at Apple to harvest — we’re betting on the weirdness of your background. You might be good, you might be bad, it’s just an internship." Going forward, I ended up not working at Apple, but having Apple on my resume opened up a ton of doors. I can see a connection there to this DRI approach where a 23-year-old Daniel Gross is in charge of machine learning and why not?
DG: I think the benefit of it from my boss’ perspective was that a DRI basically means you’re responsible for doing these things, usually called a tent-pole, big thing for us this year, and you’re not guaranteed it next year. So the decision later on when it became time for me to figure out who would be DRIs for various things, it’s a much easier decision because you’re not stuck in this tenured problem that you traditionally have with org charts where you promote someone and you basically can’t really demote people. It’s an amazing company, I think it’s highly unusual, I think rumors of its demise used to be greatly overstated — now they’re maybe correctly stated because everyone thinks they’re on top of the world.
I remember when we got acquired, I think Apple stock was worth $400 billion or so and we had another acquisition offer from a startup at the time that was valued at single digit billions. I remember thinking that company might double or triple but there’s no way Apple — I mean that’s just a large number, and Google Finance doesn’t have a place for trillions — so that could never work, you just go up to 999 and break, but yeah, in hindsight, it’s really had an incredible run.
Then I left Apple. I worked at Y Combinator as a partner for a while, they had originally funded me when I was a nobody as a 18-year-old, so I wanted to pay that forward to other people. I launched an AI program at Y Combinator to basically help fund AI companies. This is, again, pre large language model revolution, which we may cover today, and then I left, and I’ve been working ad hoc with Nat over the past couple of years, really. We’ve known each other for a long time investing in startups. After I got acquired by Apple, I took the cash I got and very stupidly invested in these early stage businesses that were at the time very small. I’d been given all sorts of instructions about money allocation and maybe you want to invest 10 or 20 percent of your net worth in startups and I didn’t really do the math properly, and suddenly found 99.9% of my net worth was invested in startups, but the companies ended up getting pretty big like SpaceX and Coinbase and Instacart and Uber and whatnot, so that worked out and I’ve been in the process of just doing that over and over and over again, hopefully with greater degrees of success, but past performance definitely doesn’t predict the future. I met Nat along the way, Nat actually met me earlier than I even remember meeting him — I guess you originally remember meeting me when I was just a YC founder in 2010, is that right?
Nat Friedman: Yeah, that’s right. I met you actually at the YC Demo Day.
I think he’s saying that you’re old, Nat, but continue.
NF: Well, guilty! Yeah, it’s true, I did actually meet Daniel at YC at the Demo Day right after he’d pitched a company that was then called Greplin that became Cue that Apple subsequently bought. Before that batch of demos, Paul Graham got up and said there’s one company here that actually changed their product three days ago, and you may not even be able to tell which one it is, and I thought Daniel had this amazing pitch and I thought it was really interesting. Went up to Paul Graham afterwards and he said, "Oh, that’s the one that changed three days ago, this idea."
Nat, tell us your background. How did you come to be at Y Combinator at this Demo Day?
NF: Yeah, sure, happy to do so. I mean, it’s funny, guys, we’re like three peas in a pod here. Similar situation — I didn’t really know a lot of people like me growing up. I was into systems and math and technology and I was taking apart every toy I was given as a kid, fairly introverted, didn’t have a lot of friends, wasn’t good at sports or anything, but really fell in love with the computer. I was super lucky that my parents got me a computer when I was little. Then when I was a teenager, I had a modem, and this dates me also, but we went on a family vacation for a week. We left the house and I left my modem dialing, just literally dialing every phone number in my hometown to try to see what other modems were out there that we could connect to. When I got home, it had found this bank of modems that turned out to be the local university’s dial-up access, and they turned out not to have reset the default password on their little Cisco routers. So I got Internet access in the early ’90s at home, and that really changed my world. Like you, Ben, my webpage says that my real hometown is the Internet and that’s really how I feel.
Looking back on that, it’s amazing how that effort looks, it’s like I was setting up a search for extraterrestrial intelligence or something. I was literally dialing every phone number in my hometown to find other people who are similar to me that I would get along with. I found them on the Internet in what would become later known as the open source community and movement — the term didn’t exist at the time — but there were already hundreds and maybe thousands of people who were writing code, and thinking out loud, writing code in public, sharing it, helping each other out. I discovered online the Linux project and all the source code for all the tools I could ever imagine, and that I could talk to and learn from some of the best programmers in the world. It was amazing for me. First time I found a community that I felt I belonged to and, really, it’s the story of the rest of my life. Almost every part of my life since then has had some connection to this idea of open source, building on these public commons of knowledge, online communities, that sort of thing.
So from there I went to school and I ended up starting two different startups, and one of them was a mobile platform company called Xamarin that Microsoft acquired. We sold it to Microsoft in 2016, just about a year after Satya [Nadella] had taken over Microsoft. I’d always thought of myself as a startup guy, and so I thought I would put in my requisite year or two, but I found when I got there that Satya was an amazing leader. You take a company of a 100, 200,000 people and you change one guy and you get this totally different behavior from the whole company. So I felt like I was learning something about leadership from him and he had an incredible team of people there.
Then about a year in, I sent Satya this email saying, "Hey, I really think it would make sense for Microsoft to buy GitHub." Much to my shock, a week later I was in a meeting with him and a few other senior executives at Microsoft and he basically said, "Let’s do it. Let’s go do it," and empowered me, even though I’d only been there for a short period of time relative to all the other folks at Microsoft, to go lead this acquisition.
Just to jump in, it’s interesting to see that commonality in the bias towards action and not favoring seniority in your story and in Daniel’s story.
NF: That’s true.
Obviously, Microsoft has had its ups and downs and it’s clearly been on an up since Satya came in and there’s probably some connection there of how you actually stay viable and vigorous as a big company.
NF: Yeah, I think Satya is expansive and he’s always reaching higher and I think that’s part of it. He sees a big idea and he wants to go after it. I couldn’t explain it, but I felt excited that he trusted me to go do this thing. At the time, it was one of the largest acquisitions Microsoft had done, certainly the largest developer-related acquisition in history, and then I subsequently was installed as CEO and ran GitHub for about three years and did that until the end of last year.
Transformers and Large Language Models
We want to get into some of the details of that in a moment, but just the other piece of disclosure, Daniel, I forgot to mention, you are a published author now. You wrote a book, Talent, with Tyler Cowen which is excellent but maybe we’ll get to it indirectly in this conversation, but what happened was, Nat, you reached out, you’re running this AI grant program, aigrant.org, and I think you asked me to just proofread the page and see if there’s anything to add or not, and it inspired this long back and forth.
To that end, there’s this broader context that I’ve been writing about, which has been you go back even just six or nine months and OpenAI, we’re seeing these incredible things, and I think DALL-E 2 was a real watershed. The visual component of AI just arrests people so much more, even though we already had GPT-3 and all those sorts of things. A picture is worth a thousand words, it’s definitely the case as far as AI stuff goes, but they had all these controls, all these limitations, it was invite-only. It fed into the assumption, the conventional wisdom — which I held — that AI is going to be this centralized thing, you need all this access to data.
Then this summer happens and, first, Midjourney comes along, and it’s just from a user perspective a Discord server, and you can generate these amazing images for free — we can get into the cost issues in a little bit. Then the bomb drops, which is Stable Diffusion, where now to go back to your open source thing, it’s this availability and the iteration that we’ve seen even the last little bit has been crazy and it’s been a real — there’s an emoji with the mind exploding (), that’s been me for the last two months. Every single assumption I had about this space has feels like it’s been upset.
The reason I want to talk to you guys is I’ve talked to you two probably more than anyone else about this space, and this grant proposal, which is a reason for you to talk about it, but also it spurred that much more thinking about it. The apple cart was upended, all these apples are now over the road, and it’s really interesting to think about what’s the new cart going to look like, how are these apples going to end up stacked up? I guess the question is, Daniel, you referred to this a couple times in how machine learning worked in the before-world and the after-world, and that line was transformers and these large language models. I’m going to put you on the spot. Could you describe what a transformer is, what a large language model is, why it’s important and why it’s a big before-and-after moment? Explain it to me like I’m five, ELI5 is your challenge.
DG: Certainly, yeah, that is the hardest challenge, isn’t it? So we talked about two different things. One is the explosion, I think, in just images and the image generation. You mentioned a bunch of different products, Stable Diffusion, which is a model that’s being productized by different companies, Midjourney, which is a proper company. These images use a particular technique that, actually, maybe Nat explained because he actually has a particularly good explanation for it and has a particularly pernicious paper of exactly how the image stuff works.
I have described it on other podcasts. I like the image one, because I think it’s one that it’s just tangible for people to get, this idea that you train this model to pull out an image, you already know what it is from random noise, and then those capabilities let you pull anything from random noise. How did machine learning used to work when you were running it at Apple versus how it works today and why it’s such a big difference?
DG: I think everyone has heard of these neural networks, these forward-feed neural networks that you feed information through them, you get different neurons activated, and over time, you have a thing that can usually do a decent job of giving a bunch of tokens to predict the next token. It does well in different domains, depending on the task. Now one issue we always had with neural networks is they had very little contextual memory. Even at Apple we were training these things called LSTMs, long short-term memory cells. These were our best way at giving neural networks — you can think about it the same way you think about human attention of just being able to keep an entire sentence, maybe two sentences in your head in order to understand or complete the third. This turns out to be a very important element of comprehension, so to speak.
Now, the issue LSTMs had in all prior architectures mostly until the transformer is that training time exploded with the size of context that you would add to the window, and no one could really find a way to parallelize this training process. LSTMs, which at least when I worked at Apple we were using for your keyboard, after spending a lot of money we could get them to predict a sentence but not much beyond that. It turns out the productive power of whatever model you have is somewhat a byproduct of just how much context it can remember and how much it can read and write. By the way, this is somewhat true for humans too. A smart programmer can keep two or three pages of ideas in their mind as they write down the next page.
Now in 2017, a bunch of people, each of which now has their own company, the new PayPal Mafia is the Transformer Mafia, wrote this paper called Attention is All You Need, which at the time was mostly ignored by the rest of the world, and they came up with a way to effectively parallelize this training, and enable us to create models that are much larger, and as a byproduct are able to store more context tokens over and over, but effectively more words and effectively be able to predict more words to you.
The paper was mostly ignored when it came out — I thought it was neat, I don’t know that I made much of it. Google at the time had developed this pretty large model based on the paper that it didn’t release for various reasons we can touch on. Then OpenAI really productized that paper with GPT-2 and 3, general purpose transformer, that transformer is from that paper from Attention is All You Need. They were able to build these successively larger and larger models because they were able to parallelize training. These models now, GPT-3, is considered state-of-the-art, although I think our grandchildren will look at that the same way as we look at tube television.
That might be generous.
DG: Yeah, same way that maybe they look at the abacus. It can write two or three paragraphs, but you could see it also starts going off the rails as things get longer and the output gets longer. That’s a byproduct of all of the current flaws of the system, which is, again, they can’t keep the train of thought going for long enough. The current path the entire machine learning world is going down is an attempt to build these larger and larger and larger models with the idea that the smarter the model is, the more productive tasks it can do. Just making it pretty concrete, GPT-4 or an equivalent model, you could imagine summarizing legal documents for you, which is a lot of productive labor. You can imagine something reading tax forms, that type of thing.
So how much of this parallelizing was a hardware component in addition to a software idea? I’ve been writing a lot about Nvidia, recently and this bit about GPUs being highly parallelizable and they built this whole software ecosystem with CUDA on top of that where it was much more accessible to general researchers in products. Was that an important factor in this or did they both happen at the same time by chance?
DG: One of the views, famously, in the stories of progress, is how many people view the Manhattan Project as this massive moment of scientific discovery, and we did a lot of things at once, and we managed to make the nuclear bomb. But there’s another view of the Manhattan Project, which is that we assembled a lot of things that were on the shelf and just about ready to go — I think the similar thing happened with these large language models. The capabilities from a GPU perspective were there.
Now, it is true that the V100, which is the GPU that GPT-3 was originally made on, is a little bit slower than the A100, which is the current state-of-the-art, but these are incremental, the capabilities were sort of there, it was mostly a software innovation. I think one of the most open questions now that we should be pretty humble about in the world of machine learning is everyone’s reading these articles and using GPT-3 and everyone’s really obsessed with transformers being the final and end-all architecture. It is true that there’s little tips and tricks people are adding along the way, but broadly, the architectures people are using are this transformer. When you speak to the transformer zealots, they’re telling you it’s the transformer forever, meaning, with the transformer I can make something big enough one day that it is actually an artificial generalized intelligence — it could do all tasks, humans couldn’t do it even better. But I don’t know because people thought they had the answers before that too.
Right, the transformer came out of left field. You talk about how models get larger and larger and larger in every version of GPT, the number of parameters are ever greater, and you have these massive fleets of Nvidia GPUs or you have Google building their own chips or whatever might be going through this massive processing, but that goes back into the fact that it was the conventional wisdom that actually generating these models is going to be such a massive problem. What flipped to make this actually more broadly accessible? The big shift also seems to be you can generate remarkably similar results with much smaller amounts of inputs or much dirtier input just harvesting stuff across the Internet, instead of putting in super highly-structured data. Was that part of the transformer revolution or was that another one of those pieces that came along at the same time and made this all work out?
DG: Prior to the transformer getting big, there were certainly a small scene of what people would call unsupervised learning, meaning learning that didn’t require massive amounts of labeled datasets. No one had quite figured out, and there was this idea that if you just threw enough information at a model the same way reality throws a lot of information at us and we learned in an unsupervised model, the model will figure it out. The issue people had not a shortage of data, I mean, we have the Internet. By the way, it turns out to be a huge thing. I think, actually, Nat pointed this out, I think the real discovery is the fact that we have the Internet and that it might go down in history as the only way we could’ve made AI is we digitized the world.
This is a big thing. Everyone thinks about games being the future in terms of VR for example, and I’m like, "Well, you have to generate all this content so it needs to be AI-generated, and you need inputs." It’s weird because games are almost like handcrafted HTML pages, and there’s not nearly enough of them. Whereas when it comes to the Internet, because we have things like forums where anyone could publish, there’s so much text out there, there’s so many images on the Internet — just the sheer amount of stuff, even though it’s not nearly as well-structured as your handcrafted HTML page or your 3D gaming world, it actually is a reason why anything textual-related in these large language models are actually way, way, way further ahead because the input is not quality, it’s quantity, and the Internet has just unleashed this massive amount of quantity.
DG: Yup and I think one of the more interesting things about Stable Diffusion, this thing that we’re now seeing, where computers can generate art given a piece of text, is I don’t think it would be possible had in 1992 or 1993, Tim Berners-Lee not put the alt tag under image HTML.
So every image has text associated with it.
DG: That’s right. So this happenstance, I think, Nat, you told me that that was an accident or an afterthought.
NF: I wasn’t there when it happened, but yeah, I mean, I don’t know exactly what you think I was —
(laughing) Way to shoot down the old jokes!
NF: — I was doing back then, but I do agree this idea that the Internet was this digitization engine that’s the boot loader for AI, seeing as the two inputs that it turns out you needed were data and hardware, and we got the data from the Internet and we got the hardware from gaming, and the transformer really was just a computational optimization for how to put those things together. It seems very likely with prior architectures you could have also built these large models, but they were either two or three orders of magnitude less computationally efficient. Just being able to use, as you said Ben, that parallel functionality on a Nvidia chip meant that you could actually implement this attention mechanism and have transformers work.
You’re one of the few people in the world to actually ship a widely used product that is built on this, and I’m referring to GitHub Copilot. You talked about a programmer holding three pages in their head, and there is some function of intelligence, which is it’s a page function like how many pages can you actually hold and get down on the page, but that actual act of transferring the whole logic in your head and putting it on to text, it’s just busy work in some respects. You have to call the right APIs, you have to fill in all this boilerplate code, and that’s a lot of what GitHub Copilot will do for you, it will just abstract away that busywork. So instead of remembering all the syntax and all these specific things, you just fill in this whole section, which is a known thing. Tell me about that process, did you see this as like, "Oh, this is an obvious application," or was it a meandering path? How did a Copilot come to be?
NF: I think it’s really interesting because it’s one of those products that looks incredibly obvious in retrospect, but on foresight, it was really foggy and it wasn’t totally clear what was there. So the story was June 11th of 2020, GPT-3 came out, and I saw it and I thought, "My God! This is incredible." I got access to some of the demos and the playground pretty early and it blew my mind. So I said, "We should do something with this, I don’t know what." Satya, with great wisdom, had already set up a partnership with OpenAI, so we had a relationship with them already where cooperation was possible.
I grabbed a couple of really bright developers at GitHub, and the challenge we had was that we were building around uncertainty. We knew the models were good, but we didn’t know exactly what they were going to be useful for. OpenAI was feeding us improved models really regularly so they were improving at some rate and we didn’t know when that would stop. We had a couple of ideas for different areas where you could take one of these models that predicts text and figure out some use for it. So we took a two-pronged approach to investigating those. The first idea that we had was actually not a code-writing, code-synthesis autocomplete bot. It was a question-and-answer bot. It was actually a chatbot that we thought would help answer questions.
Stack Overflow in your IDE.
NF: Yeah, exactly. It was like Stack Overflow bot, that was one area we started investigating. The other area was this code synthesis, but we didn’t know what the UI for that would be or exactly how it would work. So to investigate these two areas, we actually set up these tests. On the chatbot side, we had a group of engineers at GitHub write literally hundreds of questions about programming in Python. Then we plugged all these questions into this bot, and we took the answers and then we rated them to see if they were any good, and that allowed us to track not just the quality of the Q&A, but how it improved over time as OpenAI dropped better models on us. We saw that happen week by week. The other thing we did was we wanted to know, "Okay. How good is it actually at writing code?" So we trawled across GitHub and we found all this code that had unit tests. That was the search function we ran, so code that has unit tests where the unit tests currently work and pass. Then we blanked that. We set up a test harness where we blanked out the function bodies and we asked GPT-3 or we asked this GPT-3 derivative to fill them in, and then we reran the unit tests to see if they would pass again. So that was for the other test harness.
You can see the analogy to the image idea, where you know what you’re going for and you pull it out of randomness. In this case, you’re pulling code out of randomness and you know how it actually compute in the end.
NF: Right. Now, the interesting thing was in most cases it was wrong, either at questions or at code. Actually, I think on the code synthesis, I don’t remember the exact numbers, but I think on the code synthesis, maybe 20 percent of the tests would pass at first, and then over time we got up to 30, 35 percent, something like that.
DG: Was it wrong in a sensical way? Was it close but wrong?
NF: Not always. The thing I would always say with those models is that they alternate between spooky and kooky. So half the time or some fraction of the time, they’re so good, it’s spooky like, "How did it figure that out? It’s incredible. It’s reading my mind," or "It knows this code better than I do." Then sometimes it’s kooky, it’s just so wrong, it’s nonsense, it’s ridiculous. So when it was wrong, it was really wrong. It turned out from testing it in the Q&A scenario that when you actually asked the thing a question and it gave you more often than not a wrong answer, you got very irritated by it — this was an extremely bad interaction. So we knew that it couldn’t be some explicit Q&A interaction. It couldn’t be something where you ask a question and then 70 percent of the time you get a useless answer. It had to be some product where it was serving you suggestions when it has high confidence, but it wasn’t something you were asking for and then getting disappointed by.
So basically, Microsoft had it right with Clippy is what you’re saying?
NF: Well, partly, yeah. I think, yeah, Clippy was not a direct inspiration, but the other thing it had to be though that Clippy wasn’t is it had to be unobtrusive. It had to because it turns out in retrospect, we know this now and we didn’t know it at the time, the question that we were trying to answer was, "How do you take a model which is actually pretty frequently wrong and still make that useful"? So you need to develop a UI which allows the user to get a sense and intuition themselves for when to pay attention to the suggestions and when not to, and to be able to automatically notice, "Oh, this is probably good. I’m writing boilerplate code," or "I don’t know this API very well. It probably knows it better than I do," and to just ignore it the rest of the time.
So it’s funny because a lot of the ideas we had about AI previously were this idea of dialogue. The AI is this agent on the other side of the table, you’re thinking about the task you want to do, you’re formulating it into a question, you’re asking, and you’re getting a response, you’re in dialogue with it. The Copilot idea is the opposite. There’s a little robot sitting on your shoulder, you’re on the same side of the table, you’re looking at the same thing, and when it can it’s trying to help out automatically. That turned out to be the right user interface. Now, even after we figured that out, there was actually a multi-month journey to find how to make that useful for people. The UI now, again, it looks very obvious, there’s this gray text that appears, sometimes it’s a line, sometimes it’s a block, but it took us months of tweaking and tinkering to get there.
So from the June realization that we should do something, I think it was end of summer, maybe early-September by the time we concluded chatbots weren’t it. Then it really wasn’t until February of the next year that we had the head exploding moment when we realized this is a product, this is exactly how it should work. We realized, for example, also, that latency was really important. The model couldn’t get too big because if the model was too big, even though it was more accurate, it was too slow, so you would get irritated waiting for it, it would write a smaller and smaller fraction of your code. So now, it’s very obvious. It seems like the most obvious product and a way to build, but at the time, lots of smart people were wandering in the dark looking for the answer.
An observation a question, and I’m doing this on my own podcast so I can do that, we’re not at a conference here — the observation is what’s striking, you go back to the spooky versus kooky. The reason is that people want to anthropomorphize everything and they want to put everything in human terms. The whole point of a computer is it just operates utterly and completely different than humans do. At the end of the day, it’s still calculating ones and zeros. So everything has to be distilled to that and it just does it at tremendously fast speed, unimaginable speed, but that is so completely different than the way that a human mind works that that’s how whatever was kooky or spooky I’m sure was completely and utterly logical to the computer. It strikes me that this is why the chat interface was wrong, because what it was doing was it was taking this intelligence, and it was actually accentuating the extent to which it was different than humans by trying to put it as a human, as if you’re talking to someone, and it was actually essential to come up with a completely different interface that acknowledged and celebrated the fact that this intelligence actually functions completely and utterly differently than humans do.
NF: I think that’s super right, but I think one of the key questions you have to ask when you’re thinking about building a AI product is people can come up with great ideas that will work once the model is much smarter than it is today, but you have to come up with a product that people actually enjoy using, even given the model’s current intelligence. The thing that Copilot gave us that we, again, only realized in retrospect was this randomized psychological reward. It’s like a slot machine where the ongoing cost of using it at any given moment is not very high, but then periodically you hit this jackpot where it generates a whole function for you and you’re utterly delighted. You can’t believe it, it just saved you 25 minutes of Googling and Stack Overflow and testing. That happens at random intervals, so you’re ready for the next randomized reward, it has this addictive quality as a result. Whereas people frequently have ideas that are like, "Oh, the agent is going to write a huge pull request for you and it’s going to write a huge set of changes across your code, you’re going to review that."
DG: That’s a horseless carriage in your view.
NF: Well, my view is that’ll be great once the IQ of the model is as good as one of your best programmers, but when it’s not, then the user experience you’re offering to deliver to individuals who are your customers is review code written by a maybe slightly mentally defective junior programmer, which is the least favorite thing of any programmer to spend their time doing.
DG: A schizophrenic software engineer that’s sometimes brilliant and sometimes actually writes Japanese instead of Python.
NF: There’s just this art in saying, "How are you going to handle hallucination? How are you going to handle errors? How does that make sense in your product? How fast does it need to be?" Which is one of the reasons I think images are doing great, by the way, is that images are fiction. Code is nonfiction, it’s testable, the tests have to pass, it can’t have a syntax error. Images, if there’s an extra stray pixel somewhere, it’s part of the art, there’s no error in a way.
That’s one of the interesting trade-offs here, though, right? Because the more structured something is like code or like law — law is another go-to example of potential AI applications — it’s in some respects obvious it seems how computers can do this because programming in general is a creative aspect. Remembering all the syntax and how an API works is not creative at all, it’s just busy work. So it’s interesting where there’s this tension between the more creative something is, the more allowance there is for error, which is good for AI. On the other hand, where AI is arguably the most useful and impactful is places where it’s just regurgitating stuff, but then the accuracy is a question. There’s a bit of a tension there.
NF: Well, I think it’s interesting. If you didn’t have Copilot, the story you could tell right now is that these things are good at creativity and imagination. You have copywriting, rewriting things, you have images, you have AI, games, all these types of ideas, but they’re not really good at precision, writing syntactically correct code, but because we have Copilot, I think the answer has to be, "Well, they are, but in those cases, you really have to find the shape of the product that’s going to make that work," and it’s going to involve some level of human supervision and a learning to go to do the thing that you want it to do.
Is there a direct line then? I mean, one of the things about this grant project that was intriguing is your subhead is "Products, not papers. Tinkering, not training. Apps, not"…I actually never know how to say this.
NF: Arxiv, I just say Arxiv.
Arxiv, thank you, A-R-X-I-V, which is a great tagline, and this idea that, "Oh, wait. There’s been this explosion and talking about machine learning and all this stuff for ages," but how do you actually make it useful for people? A lot of what you’re talking about with Copilot is that OpenAI did all the actual AI work — that’s not really an AI product from a GitHub perspective, it’s a product product from a GitHub perspective. Your sense is that the opportunity here. Why is that the opportunity?
NF: It’s funny, I had this experience of, with a great team, taking what essentially from OpenAI was a research artifact and figuring out how to turn it into a product and it was so successful. People love using Copilot. Today, I think one of the stats from a recent study that was really interesting was Copilot for programmers, given a certain test to write a web server from scratch, that kind of thing, will complete the task more than 50% faster if they use Copilot than if they don’t. Then we know from the telemetry that Copilot for some languages is writing up to 40% of the code that people have of the new code that they’re writing when they have it enabled. So it’s incredibly successful, millions of people have used it and love it, it’s a big deal.
So I left GitHub thinking, "Well, the AI revolution’s here and there’s now going to be an immediate wave of other people tinkering with these models and developing products", and then there kind of wasn’t and I thought that was really surprising. So the situation that we’re in now is the researchers have just raced ahead and they’ve delivered this bounty of new capabilities to the world in an accelerating way, they’re doing it every day. So we now have this capability overhang that’s just hanging out over the world and, bizarrely, entrepreneurs and product people have only just begun to digest these new capabilities and to ask the question, "What’s the product you can now build that you couldn’t build before that people really want to use?" I think we actually have a shortage.
Interestingly, I think one of the reasons for this is because people are mimicking OpenAI, which is somewhere between the startup and a research lab. So there’s been a generation of these AI startups that style themselves like research labs where the currency of status and prestige is publishing and citations, not customers and products. We’re just trying to, I think, tell the story and encourage other people who are interested in doing this to build these AI products, because we think it’ll actually feed back to the research world in a useful way. We keep hearing these narratives about how AI is going to solve reasoning over time. I think a very good test for whether it’s actually doing that is something like Copilot, where if it is doing that, it’s going to start writing close to 100% of your code in time.
One of the advantages of Copilot is you have a compiler, right? You have a already-used-to-looking-for-defined-errors checking system, and it either runs or it doesn’t. Is there going to be an entire market for compilers for everything? You’re going to need a legal compiler to make sure all the logic makes sense or whatever stuff is produced by these things?
NF: The really interesting thing about Copilot, is it does sometimes make mistakes, but actually, the product and the model have no explicit syntax checking function in them. The model produces some options and then they get inserted into your code. If there’s a syntax error in them, it’s not going to get caught. So I think it actually does show how powerful these large language models are that they can write syntactically correct code often enough to be used unfiltered in a product like this.
DG: Was Copilot fine-tuned at all on unit tests or whatever?
NF: Yeah, it was fine-tuned on that unit test harness that I mentioned, and then it’s also fine-tuned on all the feedback of the people who use it. So as people use Copilot, it suggests some code, they accept the code, but then they edit it a little bit. Those types of edits get fed back into making the model smarter.
There’s an aspect where you want people to be in a good mood using it because then they’ll give you feedback.
NF: It does sometimes put people in a bad mood. Someone was telling me, a friend of mine was telling me the other day, "Copilot, I love it, but it always is insulting me." I said, "What do you mean?" He said, "Well, I’ll be typing a comment and I’ll type ‘this is…’ and it’ll suggest "a hack" and I’ll look at my code and I’ll be like, ‘Gosh, you’re right. It is the hack.'"
DG: Yeah, it’s funny. I remember when we made search predictions in the iPhone when you pulled down to search and it predicts what you wanted to tap on. Greg Christie, very famous iPhone designer, was in the meeting. It was one of his last few days at Apple, so he was feeling very rambunctious and free after being there for 20 years and making the iPhone and whatnot, and he said, "Doesn’t matter what you predict there. Just anything you put there, people are going to tap on. Doesn’t matter. Don’t overthink it." We’re like, "What do you mean?" He said, "Just watch. Literally, put random apps there." It’s sort of true. People, when you do make those predictions, they take what you have. I guess, Nat, the thing I was going to ask you was, how did you think about with Copilot about network effects? Does getting feedback from users matter or in this world does that maybe not matter because models get smarter faster than you collect data from your users?
Centralization and Decentralization
NF: Well, we’re getting to Ben’s theme of centralization versus decentralization here a little bit. I mean, one of the theories behind centralization was that the companies who have all the distribution will get all the data, the telemetry, and feedback from the usage, and so their models will enter this virtuous cycle of improvement that no one else can access. It is true that the feedback, at least in the Copilot case, has improved it, but it’s on the order of 8%, 10%, it’s not 50% better as a result of that.
The centralization/decentralization thing is fascinating because I also bought the narrative that AI was going to be this rare case where this technology breakthrough was not going to diffuse through the industry and would be locked up within a few organizations. There were a few reasons why we thought this. One was this idea that maybe the know-how would be very rare, there’d be some technical secrets that wouldn’t escape. What we found instead is that every major breakthrough is incredibly simple and it’s like you could summarize on the back of an index card or ten lines of code, something like that. The ML community is really interconnected, so the secrets don’t seem to stay secret for very long, so that one’s out, at least for most organizations.
And the other big question was the data question, right?
NF: Yeah, then data was the next one.
Yeah, I mean, because you go back to Daniel’s time, it’s like, "Well, Apple’s going to always be screwed because they don’t collect enough data and privacy works great from a marketing perspective now, but it’s going to hinder them in the long run". I think this bit about 1) the data actually doesn’t have to be super highly structured, and then 2) if that’s the case, well, the Internet’s out there and everyone has access to the Internet. That seems to have been really the critical tipping point.
NF: Yeah, I think that’s right. I think the Internet and the ability for pretty much anyone to just go, at relatively low cost, scrape the data they need off the Internet and train on it is a big democratizing force. Now, that said, there is a norm in the community that if you have a algorithmic breakthrough, you publish your research, but if you do a ton of work to make a dataset, you don’t have to publish that. So I do believe all the great labs are actually pouring huge amounts of energy into cleaning their data.
Right. Clean data is still better than dirty data.
NF: Yeah. You download a hundred million hours of YouTube and you clean it and you cluster it with tools and you discard bad clusters that don’t work and that sort of thing. I think probably one way to look for arb in any industry is to ask what’s low status.
Arbitration, for those listening.
NF: Yeah, and data is low status. Scraping data and cleaning data is not a high status activity, doesn’t get you citations. I’m pretty sure there’s an edge, an arb that you can find around data and that is being found. Then the other one is hardware. So you ask this question, "Gosh, can you just afford more hardware than anyone else?" That one I think, at least for the moment, is not a barrier, but when GPT-4 sees the light of day, when we see massive models, maybe there’ll be some escape velocity that a few organizations will hit that no one will be able to keep up with. That’s possible, but at least it hasn’t happened yet.
No, it’s gone in the opposite direction. There’s also this trade-off between training and inference, right? So just broad strokes, training is the actual creation of the model, inference is the application of the model to produce results. In the case of the image example, training is actually the process of pulling out a known image from random noise, developing the heuristics, and then you actually apply it to a new image, that’s inference.
Do you think there’s a trade-off here between the more training that you do, the more optimization you get on the backend? Google’s been using machine learning in search, for example, but when you type in the search result, it’s not running on an Nvidia GPU. There’s no way they could actually afford that, but it’s so optimized that it’s effectively deterministic in usage so they can scale it up. Whereas Midjourney, they have almost certainly invested much less in training, but when you actually do develop an image, you’re running on an Nvidia GPU in the cloud and that’s costing some amount of money. Am I right to think there is a give-and-take there between training inference and scalability and how much it can reach out?
NF: I think right now that’s true, and if you think about scaling laws, which basically say to make better models, you should make them bigger, the bigger the models that you’re making, the more the inference cost and the fewer GPUs that could even fit them, for example. So that’s true in an unoptimized case, or if your model is very general purpose, but if you’re able to narrowly define the set of scenarios in which your model’s going to operate, people are finding that you can really optimize them. You can distill them, you can quantize them, you can make them smaller and smaller. I think I heard recently, which shocked me, that the language models that Google trains for search, they actually now spend more on training than they do on inference. Now, to your point, in Midjourney, for example, it’s the opposite. There are not enough GPUs in the cloud like the whole cloud, Amazon and Google and et cetera, for 10 million people to use Midjourney at the same time, it’s just far too demanding from an inference point of view. I think to everyone’s shock, those image models, those diffusion models have turned out to be quite cheap to train and quite small.
DG: Just a lot of them at the bottom.
NF: Yeah, and since Stable Diffusion came out, one of the amazing things to watch has been how the open source community has swarmed on it and optimized the heck out of it.
It’s been optimized like crazy.
Well, what’s the story of Stable Diffusion? Where did it come from?
NF: So there was an online open source community called the EleutherAI. They caught onto this idea of diffusion models that were published by this guy Joshua, who’s from Google, I think, and started playing around with some tools. There was one called Disco Diffusion and Latent Diffusion a year and a half ago. So if you happen to be on this open source, EleutherAI Discord server, a year and a half ago you saw the future. You saw some of this stuff cropping up and in use. There were a few people who were really into it making art with AI. No one knew what a big deal it would be — maybe they did, but it certainly wasn’t obvious to me.
Then what happened was there was a university lab in Munich called the CompVis Lab that had previously trained one open source diffusion model, the latent diffusion model, and decided to train another one. There’s a guy named Emad Mostaque, amazing person in London, was a really unique individual who had run a successful hedge fund in London, and had decided to turn his attention and his energy and his wealth towards accelerating and democratizing AI. So he went into the Eleuther community and found the folks from the CompVis lab in Munich and these few open source pioneers of diffusion models and AI art and said, "I’ve bought an enormous cluster. I’ve bought 4,000 A100s in AWS with my own money."
They were being used for crypto, right? No knows where they came from, he just somehow acquired a massive server of Nvidia GPUs.
NF: Yeah, I don’t know. I know he personally guaranteed the bills to Amazon and, "I’d like to offer this to you to train the greatest ever image model." Of course, they wisely said, "Thank you. We’ll do that." I guess about a year later, six months to a year later, they successfully trained this Stable Diffusion model. I think, actually, the total cost, Emad has talked about this publicly of training that model, was in the low millions of dollars, and that would include, when you train these models, you make mistakes, you have bad training runs, things won’t converge, you have to start over, it includes all the errors. So it was actually relatively low cost, and I think showed, because he has such a big cluster, it showed that we’ve really only scratched the surface of what you can build with these models.
I guess that’s the question going forward. The energy around Stable Diffusion is insane — it’s attracted the attention of all the young Nat Friedmans with their computer, ostracized by all their real world friends but super obsessed with this AI model, and so they’re contributing to the project. There are optimizations coming out every week adapting it to different GPU or whatever it might be, but you still need the model, someone has to actually make the model. Is the expectation that now there’s an expectation out there that someone is going to build that model and someone will just keep building new ones? Because there is a monetary component here, it’s not just about effort if someone has to actually play the role of Emad.
NF: Yeah, the question is whether the future of democratized AI or open source AI rests on the shoulders of a few madmen like Emad who have a vision and are willing to spend money to do it, or whether there’s some durable and scalable mechanism for this broadly, it’s possible that we don’t need that many open source foundation models. By having a great one for images and the great one for audio and probably at some point a video on, certainly a language one, which we still don’t have a really truly state-of-the-art language model, you could potentially have enough to really catalyze an ecosystem. Then you can imagine things evolving the way they did with Linux.
Linux seems the obvious analogy for Stable Diffusion.
NF: We have, in the case of Linux, we have the Linux Foundation, and suddenly all these companies whose commercial products and their own efforts are absolutely dependent on the success of Linux going forward, and it remaining state-of-the-art, and they were willing to contribute money and full-time engineers to this project. So I think there’s a window right now before the large labs release their most expensive models that maybe you can’t replicate in open source yet, where such a critical mass could be formed, and where an industry federation or a club or something like that could gain enough momentum to actually be sustainable long-term. I think we’re likely to see it. I know of at least three different efforts right now to train an open source language model that’s Chinchilla scale, state-of-the-art. Now, we’ve got multiple companies talking publicly about it.
DG: When you say Chinchilla, what is that? I actually think that’s an important observation. What is Chinchilla?
NF: In this context, I meant just a state-of-the-art language model, but it is a good question. So GPT-3 was a very large model, 176 billion parameter model that OpenAI trained on a lot of data over a long period of time, and it absolutely blew everyone’s mind. It was the four-minute mile. Subsequently, the other labs decided to try to really replicate that but also vigorously and rigorously analyze why it worked and what were the limits of what’s possible in these sizes. DeepMind, in particular, recently published this paper about a model they built called Chinchilla. What they were asking was, given a certain amount of data and a certain amount of compute, what’s the right size for a model? What they discovered after having built a series of models at different sizes with different amounts of data in compute was that GPT-3 was actually too big given the amount of compute and data that went into it. Daniel, you’ve used the analogy of an oversized suitcase, it’s half-empty.
DG: By the way, that’s the story of the American West, massive, bountiful land with too few people to use it.
NF: I sometimes think of them like sponges and these bigger models can hold water.
DG: Yeah, they can soak it up.
NF: It wasn’t fully soaked up, but they were able to train a GPT-3 class model that was as smart as GPT-3 that’s about a third the size, half-to-a-third the size. I think it shows how early we are in this process that we’re finding 2x and 3x optimizations still, and I think we’ll keep finding this.
DG: That’s important because I think if there’s more room at the bottom to quote Fineman, then it’s less and less centralizing, and it seems like there might be a lot.
NF: I think that’s right. The other thing is that we see just the energy of a fully-functional open source community swarming around this new capability, and it’s exciting. We haven’t seen that with text and the language models yet. I think one of the reasons is, obviously, that images are much more interesting and exciting to look at and show off, but I think the other reason is we don’t have a state-of-the-art open source language model yet, and I think that will come in the next six months.
Is AI Sustaining or Disruptive?
I think there’s also, to your point, in what you’re trying to find people to build is who’s actually nailed the, other than GitHub Copilot — really, what’s the product case for or what’s the UX for text? Images are obvious, there’s an entire stock image art industries. The crazy thing is I used Midjourney in an article a couple weeks ago, I generated three images from that, I paid the $600 commercial license fee. I pay $500 per Getty image — we buy a Getty image for the covering of Dithering every month, and I’m spending way more on that, and we’re purposely making it look crappy. So there’s all these obvious drop-in applications for images.
More broadly there’s this business model question. There’s this idea of sustaining innovation versus disruptive innovation, where sustaining is existing companies can adopt it and it makes their existing business models better and makes them more effective. Disruptive means it’s just completely different, it’s a new paradigm, existing companies can’t react to it, and they’re just screwed. I think a stock image company, they’re being disrupted. Their whole thing is to have real images made by real photographers or real illustrators and they can’t offer up a picture of a paper boy with his arm cut off, which is what I posted in my article. So they’re stuck.
Where do you see this balance playing out over time? GitHub, arguably, Microsoft has always been basically the biggest development company in the world. GitHub having Copilot in it, Microsoft having that product, that feels like sustaining innovation. It’s making Microsoft’s development products that much more compelling. Is the answer just going to be it depends?
NF: I think it does depend. I think you’re going to see both, basically. I mean, you have a couple things going on here. First, there will be amazing bolt-on uses of AI for existing products that really make those products much better and can just be added as features. They don’t change the workflow that much, but they already add a lot of value. I think Copilot is one of those, but there will also be, I think, new things that don’t fit neatly into an existing product category that maybe involve a totally new UI or workflow that maybe sit at the intersection of a couple of different vice presidents in a major company, so it’s just not clear who’s responsible for building this, where there’s a lot of room for startups. The answer’s probably a lot to do with either interface revolutions, where the interface is language-based and the existing interface is just completely irrelevant or needs to be reinvented.
@tszzl wrote that article about Text is the Universal Interface. It’s almost like a reset to the command line era now, where there’s a real skill now in typing great prompts that gets of the image that you want.
DG: Yeah, but I mean, that’s just because we’re in early innings here, I do think I would not over-learn from Copilot. I think Nat’s too humble to make this point, but it is generally the case that most large companies, certainly most large enterprise companies, don’t innovate on UX and UI. Why they fail to do so I find is a fascinating question, but why is Figma possible? Why is Stripe possible? That’s because large companies, for whatever reason, don’t build great interfaces.
We’re in this new era where new user interfaces are possible and it’s somewhere in between the spectrum of a GUI and a voice or text user interface. I don’t think it’ll be text just because in the domain of images, sure, all mistakes are actually features, great, but the issue that you have is in real domains, like you mentioned legal, tax, where productive work is made, mistakes are bad. The issue with text is of one observation we always had from Apple is unlike a GUI, the customer does not understand the boundaries of the system. So unless, to Nat’s point, if you have AGI and it’s smarter than a human, great. Up until that point, you need something that has this feature that the GUI has, which is amazing. The GUI only shows you buttons you can press on, it doesn’t have buttons that don’t work, usually.
You haven’t used the new settings app on MacOS…
DG: (laughing)Yeah, I mean, a good GUI should not have an Apple Arcade ad in settings, but that’s a bit of a different problem and story. I think no one’s, well, large companies certainly won’t, and startups are only now beginning to think about what does that actually mean, and maybe interfaces should look a little bit more like trees. I think if [Douglas] Engelbart or Alan Kay were around now, I’m sure they would have a lot of interesting ideas. When you catch Alan Kay on the street or whatever in San Francisco as is common, he’s shouting pretty much at anyone who will listen that graphically user interfaces have not changed in 70 years, and they just left the pen, not at the global maxima, they just got out of the office one day and retired. So there’s a lot of low-hanging fruit there that I’m sure we’re going to see startups start to experiment with now.
This is very interesting because I think this is where the GitHub Copilot example makes sense. Maybe there really is just a UI functionality to this sustaining versus disruptive, where when it does make sense to drop into an existing surface, then companies are going to be well-suited to that. If you need to create a new surface completely, who owns that? There’s no DRI for that new-to-the-world surface.
What is striking and, I think exciting, particularly if you’re a designer, it’s easy to look at this world, particularly of the image world, and feel trepidation. We’re all tech optimists, obviously. You hear our backgrounds, you can see where it comes from. It’s like, "Oh, but there will be new opportunities. There will be new jobs," but what you’re really articulating is actually the biggest opportunity right now is in design. It’s in sheer creation, creating entirely new ways of interacting with computers, and that being the key to unlocking this overhang of capability that’s out there and no one knows how to actually put in front of people.
DG: That’s a great point. It’s really a great point.
NF: This is one of the greatest moments for entrepreneurs that I’ve seen in my life because this frontier has opened up of new apps that you can build these AI-native apps that, look, there will be AI bolt-on apps too, and those will be great and people should build those as well, but these AI-native apps, where I do think taking these models and these capabilities and converting them into something that people really love using that’s novel requires enormous creativity and hard work and a lot of iteration. We did a baby version of that with Copilot, but that’s going to happen across the board. I think it’s a whole new world to discover. We don’t have the map, we don’t know where all the treasure is buried, but there’s definitely treasure out there.
For example, in the world of images, this idea of text-to-image, which is the basic concept you have with these diffusion models, well, if you ever watch a really creative person sit over their shoulder and watch them use Midjourney for an hour, you find that what they’re doing is not one text-to-image. They’re writing a prompt, they’re generating a bunch of images, they’re generating variance of those images, they’re remixing ideas, they might be riffing off someone else in the Discord channel, they’re exploring a space, you’re exploring latent space in a way, and then pinning the elements of it that you like, and you’re using that for creativity and ideas, but also to zero in on an artifact or an output that you’re trying to produce, and those are different modes. So already just thinking about this, you can start to imagine what a native user interface for it might look like. It doesn’t look like a box that you type text and an image appears, it looks something much, much more fluid than that.
The one thing that really just resonates is this feeling of excitement, right? I mean, tech was feeling locked in, I wrote The End of the Beginning, we have the mobile operating systems, we have the cloud things, and it’s like, "Well, is this it? What’s going to be next?" That’s why the summer has been invigorating, it’s been so exciting. It’s why I wanted to talk to you guys and there’s skepticism, AI’s been around the corner for so long.
DG: Very true, yeah.
What’s been so exciting is that, no, it’s actually starting to become really tangible and the implications for that — it’s no longer just a research project. To your point, one of the reasons I think why there hasn’t been products is people feel intimidated. It’s like, "I’m not smart enough to understand these neural networks, I’m not math-oriented, I’m just artistic or I’m just a product person," or whatever these things might be. The reality is, no, actually those skills are more valuable than ever. I should pitch again, aigrant.org, just because it really was an inspiration for this, "No, there’s real opportunities here." That’s what you’re looking to invest in it, it’s why I wanted to share your vision and excitement with the Stratechery audience. This isn’t just hand waving stuff, there’s real stuff out there and it’s pretty cool and has real business implications.
DG: Yeah, I agree, especially, I think I’m speaking for both of us there’s a sense that, in general, the world’s in a very precarious place now. Darkness seems to be coming from a lot of different corners. The previous wave the technology people were talking about in the last two or three years, aesthetically, at least to me, did not of feel like it was solving very acute problems in our day-to-day life, and a lot of the metaverse stuff conversely, while interesting, also feels extremely far away.
A bit dystopian. Well, just to jump to it, you’re obviously referring to crypto. What’s interesting is I think Peter Thiel had the phrase about or AI being centralized and crypto being decentralized. I wrote a piece about OpenSea this year saying, "No, actually, crypto’s going to be super centralized," because if you have this low friction environment, value’s going to accrue to discovery, which is the whole Aggregator thing, this is just that entire thing on steroids. That was one shoe, the other shoe is, no, it turns out AI, because it’s built on the Internet, it’s inherently democratized.
People talk about the internet, "Oh, it’s all so centralized," which I mean, I contribute to that because I write about that, but it’s centralized from a value capture perspective because you capture with an ad model by being ahead of discovery in the middle, but Google and Facebook aren’t monopolies. It infuriates people because competition really is a click away, that’s actually true! That aspect and reality of the Internet means that to the extent that if AI is built on the Internet, the Internet is the boot loader for AI, that openness is a quality that it ought to inherit, and it looks like that’s what’s happening and that’s super exciting.
DG: Yeah, I think it will. There’s just way too much excitement from people around the world. The secret’s out and people have seen the light.
That was the big change. People had to wake up to the fact that it could be democratized. That’s why Stable Diffusion might end up amounting to nothing, but it will be one of the greatest products ever just because it will have changed so many people’s minds about what was possible.
DG: Stable Diffusion is that woman in the 1984 ad who just smashed the screen and everyone realized, "Wait a minute. This is for the taking for everyone." So yeah, for me, it’s a daily source of inspiration just because everything out there is actually pretty bleak, but I think this stands to be in, I think everyone’s afraid of being Ray Kurzweil in the ’80s so we’re always a bit cautious here, but I think it stands to be one of the largest sources of products.
DG: I think it’s interesting, and I’m not the first to make this observation that software, in general, the promise that was made by people taking their software companies public was that it had deflationary effects and that we could, in fact, continue to lower inflation rates over time because of the deflationary effect of software.
It was always a little bit vague to me and many others what that actually meant, because at the end of the day, if you look at a day in a life of a lawyer or a tax accountant or a taxi driver, it’s actually not that materially different. When you think of what a deflationary effect should be, free and abundant energy is a deflationary effect. Enough energy so that I can levitate to my house and toast bread in one second, that’s a deflationary effect because before I’m paying money for that and now I don’t. Why WhatsApp would be deflationary over calling people is less clear, so I actually think software hasn’t really been that deflationary until now.
I think in hindsight, if we were to be able to zoom out the picture and look back, the idea of digitizing the entire world using the keyboard instead of the pen, I think will have been useful maybe just as a bootloader for AI, because once software is intelligent, then it very clearly becomes deflationary because people have more "manpower" at their disposal. A lawyer now has the force of a hundred or thousand or million paralegals.
I sometimes think the only way to reason about AI in the optimistic case is imagine you discover a new country that is full of people that can work for free that are really smart. That’s what AI-native software I think stands to be. That’s obviously a massive deflationary effect, to effectively have more people at your disposal for zero marginal cost. So yeah, as we think about the world now, I think everyone’s stuck on inflation/deflation, those terms are top of mind, and you think of what the deflationary effects in the future could be. I think software that is automated, truly automated, not yesterday’s automated between five options, we’ve sorted them for you or between these ten Netflix TV shows we’ve picked top two for you. That’s not real automation.
Right. Instead of them deciding what you want, they’re more at your beck and call doing what you need.
DG: Yeah. There’s a difference between recommendation and automation. Automation is you did the work for me, I didn’t do the work, and so I have more time, and so my money goes further. I think large language models, in particular, much more so than image models, will really change the world in this way because today, they can do tasks of maybe someone who’s a very undeveloped child, but that’s going to change over time. I think GPT-4, whenever it’s released, will start a bit of an arms race towards this. I think a lot of people, blue collar jobs, I think, for now would be pretty similar, the same because we still haven’t figured out the robotics aspects of this and dexterity it turns out to be a really complicated problem, but a lot of white collar jobs are going to be dramatically amplified by the fact that we’re just going to have free intellectual labor. There’ll be a thing you could add to any WhatsApp conversation you’re in or any Slack you’re in, it’ll just do work, scrape stuff from the web, summarize things. That’s real work.
We have gone well over our limit here. This is a long one. We have to continue this because I know there’s some number of people saying, "Yeah, good for you, Mr. Investor. You’re being optimistic about this, but what about the downstream societal effects?" I think I’m just stating that to acknowledge it. It’s real.
I also think there’s some aspect we don’t know what’s going to happen, and that’s also an issue where Stable Diffusion, I think, was this bomb going off because it’s out of the bag. It is going to happen now. It’s just like the Internet. If the powers that be knew what the Internet would unleash, it would’ve never been allowed to come to being, but it’s too late now. I think there’s probably this aspect around AI that is a similar thing and people’s minds are open now. They know what’s possible, they know what could be created.
I did write this piece a while ago about Tech’s Two Philosophies, where one is the bicycle of the mind philosophy, where computers are a tool for you, the other one is where tech takes care of you. Google and Facebook are more, to your point, just recommending stuff to you, "You’ll like this, take this." Apple and Microsoft were more of the, "We’ll give you the implements to do what you wanted to do much more efficiently and larger than possible".
NF: Yeah, I’ve called those the Creator Internet and the Consumer Internet before.
Right. That’s not a statement of values, whether it’s good or bad. I think it’s downstream of whether you’re an aggregator versus a platform, if you’re focused on discovery or if you’re focused on enablement, but I think just to tie this all together, it’s why I’m excited. It’s why I’m optimistic because it felt like AI was going to do nothing but be the Consumer Internet, it was just going to serve you stuff. This idea where actually normal people with normal jobs in their normal life can be more productive and can do more things and they’re not gated by "Will OpenAI give you permission to do it?". No. Literally, anyone can create anything is very exciting.
NF: There’s this question which we don’t know the answer to, which is: how intelligence-limited have we been in general as individuals in our lives or as a species in general? I think we’re excited about open source because it makes this innovation permissionless. If you’re a product person or an entrepreneur, you can just grab one of these models. Now, you don’t even have to know exactly how to train it and try to build a product, but it’s interesting to see how permissionless it makes creation, the products that make creation.
Midjourney, I’m an advisor to them, and it’s been incredible to see their meteoric rise. I would say the amazing thing about that story is it embodies both of those trends. First, it’s actually a bootstrapped company where David Holz, who created Leap Motion before then, which was also a revolutionary product, just started initially with an open source model, the actual predecessor to Stable Diffusion and then spent months tweaking and tuning it, and he understood, I think, that the model had to have its own flavor and style and had to produce output that was pleasing by default. It’s now one of the largest AI products in the world. He’s got in his Discord two and a half million users. So you use it through the Discord, but his Discord, I noticed, is larger than the Fortnite, Roblox, and Minecraft Discords put together. So I think it may be one of the biggest in the world, if not the biggest, and he’s got millions of people using Midjourney to make images.
You might ask, "Who are these people?" Some of the stories are interesting. There was one David was telling me about recently who’s a trucker, who when he stops at truck stops, he canceled his Netflix, and now what he does is he just makes images for a couple hours before bed, and he’s utterly transfixed by this. To me, that seems like it’s just objectively better than watching Netflix and binging a show; it’s exploring the space of your own ideas and creativity and seeing them fed back to you. So it turns out there’s a lot of people who have this creative impulse and just didn’t have the tools, the manual skills to express it and to create art, and something like Midjourney or something like Stable Diffusion gives them that, and that’s incredibly exciting.
Well, the cat is out of the bag. It’s going to be interesting to see, because technology can be used for good stuff or bad stuff, and there’s going to be a lot of battles and fights over this, which we should talk about. However, we are now 20 minutes over, and I appreciate you guys taking the time. Oh, tell me real quick about AI Grant, and what it is and the deadline that is fast approaching.
NF: So actually, it’s originally something Daniel and I set up more than five years ago to give grants to people doing AI research, and we gave out forty or so grants to people who ended up doing some really fundamental work, and then we decided to reanimate it this year, but in 2022 versus 2017, the need is not new research, the need is products. So what we’ve done is we’ve pivoted it towards this focus on products and there’s a great team led by Evan Conrad, who’s leading it, and Daniel and I have invested together $10 million dollars into AI Grant. AI Grant is now going to give out I think a no-brainer deal to the companies and the individuals who apply, which is a quarter million dollars in cash on an uncapped SAFE, just incredibly friendly investment terms and a quarter million dollars in cloud compute that’s been provided by Microsoft through Azure, and then a lot of other things besides. There’s an amazing network of advisors including Noam Shazeer, who’s one of the co-inventors of the transformer architecture, and David Holz of Midjourney, and lots of other really impressive people. Emad Mostaque is one of the advisors. Then lots of credits to other services, whether it’s OpenAI, API services or other things like that. So the application process is open now.
When does it close? That’s the big question.
NF: Well, actually, I think it closes two days before this airs, but I’m happy to share here we will accept some late applications.
Use discount code: Stratechery!
NF: (laughing) That’s right. But yeah, if something appears in the next day or two, we will definitely give it attention.
Well, Nat and Daniel, it’s great to have you. As this develops, I’d love to do this again because it’s insane the speed with which things are changing and we haven’t seen this since early smartphone era. It’s really exciting.
NF: Thanks, Ben.
DG: Thanks, Ben.
This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery.
The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly.
Thanks for being a subscriber, and have a great day!