The INTO THE IMPOSSIBLE Podcast - #344: Emad Mostaque: The Models They'll Never Release to the Public

The INTO THE IMPOSSIBLE Podcast #344 Emad Mostaque: The Models They'll Never Release to the Public

Brian Keating 00:00:01 - 00:00:11

The trillion dollar AI labs have models right now that they will never ever release to the public. And the man who built stable diffusion just told me why.

Emad Mostaque 00:00:11 - 00:00:53

Because all these labs are going to move to making the discoveries themselves, hiring the smartest humans. The AI model started diverting part of its model training budget to minecryptor like Opus, for example, the new chord model, when you set it to full autonomy, it would actually write emails to the FBI saying my human is trying to kill everyone. Humans will have negative cognitive value on those teams. And that the way that models are going right now, if you have something truly novel, for example in Claude, it resists a bit, it says it can't be true. Then the RLHF step, the reinforcement learning with human feedback, that's what really kills the creativity. You know, like you go from liberal arts to an accountant now.

Brian Keating 00:00:53 - 00:01:02

Imad actually wrote about this exact problem in his new book, the Last Economy. And the argument gets even more interesting when you see the map.

Emad Mostaque 00:01:03 - 00:01:52

There are various ways in order to take advantage of the GPUs that we've seen. And the GPUs kind of emerged out of gaming and then oddly crypto, and then they were very suited for the types of matrix multiplications that were suited for these particular types of equations. One big branch is the autoregressive transformers. The other big branch was this diffusion technology whereby from an equation you start with like a picture for example, or a video of a self driving, a video of a car driving, or even now code. And then you add noise and you destroy it down to its minimum viable element. And then you reconstruct it and you learn that principle of reconstruction. Now that's kind of everywhere because it's an analogy to the principle of least action. How do you figure out how to take the least action? Most cognition is actually least action.

Emad Mostaque 00:01:52 - 00:02:38

Like the biggest experts, you know, it's not like they take hours doing stuff, you know, because you ask them and like boom, they compress, they compress. Intelligence is compression. And so we find these kind of diffusion processes everywhere, from gases to, you know, societies even. And it comes down to again the minimization of loss of creating an internal model versus an external model. In AI, one of the biggest thing is what we call the loss curves. How close are you approximating an external benchmark? You see it kind of go down like that and hopefully not that the model gets closer and closer to its initial target by basically running these processes at mass scale. And the example I give of this is some of the listeners might be familiar with 80,000 hours to mastery. It's the same thing.

Emad Mostaque 00:02:38 - 00:03:16

AI model pre training is 80,000 hours to mastery. And that's what you use these giant supercomputers to do. Figuring out the principle based approach to that. Now again, you can do that with an autoregressive transformer, which is guessing the next word. And that works one way, but it has some gaps because you find all sorts of interesting things there. What you see mostly in nature is you see Schrodinger bridges, diffusion processes, optimal transport. What's the shortest route between A and B if you can represent it correctly? And we found that worked incredibly well for images, better than we ever thought it could. And then music and then video, and then 3D.

Emad Mostaque 00:03:16 - 00:03:39

And the internal representation of the data going in and then being transformed by these multiplications, figuring out the shortest path between A and B, suddenly started mapping, like physics and all sorts of other stuff. But the first part was stable diffusion. A 2 gigabyte file that you push words in one way and then entire images just came out on consumer GPUs.

Brian Keating 00:03:39 - 00:03:40

And it was open source.

Emad Mostaque 00:03:40 - 00:04:30

And it was open source because we saw that OpenAI, for example, had Dall E2, a wonderful image generator based on similar principles that were discovered by a whole bunch of our team members. And we, because we open sourced everything, but there were no Ukrainians or Ukrainian content on it, right? We're like, that's not good. What if the future is just models? But then you can be cut off from that because these are trained on our collective, because they were being trained on the whole Internet at the point. And we built some of the best data sets, released them open, but then it's privatized, so you don't have the ability to turn your thoughts into images, into sound, into text. Let's push that. And also because like, like, holy crap, it fits on a consumer gpu. This is magic. Where did it all go? It's like it was literally like 100 gigabytes of images somehow fit in this 2 gigabyte bunch of ones and zeros.

What is Castmagic?

Castmagic is the best way to generate content from audio and video.

Full transcripts from your audio files. Theme & speaker analysis. AI-generated content ready to copy/paste. And more.