All The Fun You Can Have with Foundations
The Arrival of ChatGPT (plus Stable Diffusion 2 and Galactica)
We were at AWS re:Invent this week (with 50K+ attendees it’s basically like Woodstock for IT professionals), and fully expected to be writing about all the new offerings from AWS in the Data, AI and ML space (Amazon Aurora zero-ETL integration with Amazon Redshift, Amazon Athena for Apache Spark, AWS Glue Data Quality, and many other product announcements).
That all changed with the arrival of ChatGPT on Wednesday late morning, which has been dominating AI Twitter for the past 48 hours:
ChatGPT is billed as a “conversational AI system that listens, learns, and challenges”, but that hardly does it justice. The experience of asking virtually any question in natural language and receiving an articulate, nuanced response (with practically zero latency!) is downright magical. In fact, ChatGPT could probably write a better definition for this post itself than we could…
ChatGPT is only the latest in a bevy of interesting application releases spawned from foundation models recently. In the past three weeks alone we’ve seen the launch of Stable Diffusion 2, Galactica (a 120B parameter model from Meta AI and Papers with Code), and now ChatGPT. Each of these releases are very different from each other and vary in levels of controversy (which we’ll address more below), but offer a good opportunity to talk about what foundation models are and why they’re important for intelligent applications.
So what even are Foundation Models?
Our colleague Jon Turow co-wrote an excellent (no, we’re not biased) deep-dive on foundation models recently which we won’t regurgitate here, but to quickly summarize, foundation models can be thought of as models trained on broad data sets that can be adapted to a wide range of general purpose tasks (e.g. text, video, code, etc.). Some of the most well-known FMs today include GPT-3 and DALL-E2 from Open AI, Stable Diffusion from Stability AI, BERT, and Megatron-Turing.
While versions of self-supervised models have been around for some time, FMs have grown rapidly in scale and sophistication over just the past few years (and at an exponential clip). For example, the model size (measured in number of parameters) of Megatron-Turing NLG released by Microsoft in October was 3x the size of GPT-3 which came out in 2020, and 1,000x the size of BERT released just a couple years earlier. This is one of those stats that even the 🤯 emoji doesn’t fully capture…
There are a number of innovations behind this mind-boggling growth (better hardware through GPUs, cutting-edge distributed learning software, availability of large sets of training data, etc.) which we won’t go into here, but suffice to say that FMs have reached a size and scale that has broken through into the public consciousness and we are likely to continue to see amazing breakthroughs both in the near and long term.
If you want to read more on the topic, there are plenty of great publications on FMs out there we like: notably Stanford’s “On the Opportunities and Risks of Foundation Models” and “Reflections on Foundation Models”
What’s happened recently? Stable Diffusion 2, Galactica, and ChatGPT
In the past three weeks alone, we’ve seen the release of Stable Diffusion 2, Galactica, and Chat GPT.
Stable Diffusion
Stable Diffusion is a text-to-image model that was built on prior work of the lab with Latent Diffusion Models, with breakthroughs in speed and quality allowing it to run on consumer GPUs. The first version of Stable Diffusion was released in August and quickly racked up 33K stars on Github within three months. However, it was not without controversy, particularly around copyrights infringement. Stability AI released Stable Diffusion 2 last week, and key updates included:
Models trained on a new text encoder called OpenCLIP improving the pixel quality of images
Upscaler Diffusion enhancing image resolution by 4x
New text-guided inpainting models to quickly switch out parts of an image
New filters to restrict adult content images
While Stable Diffusion 2 has given users the ability to generate even more lifelike and creative images, some have complained that the new version has made it more difficult to generate images in the style and tone of specific artists. It will be interesting to see how Stability AI, and other text-to-image models, walk a delicate line between creative expression and copyright infringement.
Galactica
Meta AI announced Galactica, a new 120B parameter large language model trained on scientific literature. Papers with Code open sourced the models and highlighted the demo here. They describe the capabilities of the model as one that “can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” Impressive!
While Meta released a demo of Galactica to the public on November 15th, the public release only lasted for a whopping 3 days before being taken down. Galactica highlights a fundamental problem with foundation models which is the inability to distinguish the truth from falsehood.
To be fair, Galactica highlights on their website the “limitations” of the model, and despite the Twitter and public outrage, the model is actually quite good and outperforms peers like BLOOM and Chinchilla on scientific benchmarks (Mathematical reasoning on MMLU). Nonetheless, the rather unsuccessful launch of Galactica reminds us how important it is to understand the risks of these models and the various ways these models can be extremely dangerous for creating and spreading false information.
ChatGPT
As mentioned above, ChatGPT was announced by OpenAI on Wednesday and has quickly taken AI Twitter by storm. It allows users to input any question in natural language, and the “chatbot” interface spits out an answer that makes you think you are talking to a (very erudite) human. ChatGPT looks like an application layer sitting on top of GPT-3 which allows it to take advantage of the enormous memory and computing power of OpenAI’s signature foundation model. Unlike general purpose GPT-3, ChatGPT is far more user-friendly and has filters in place to avoid some of the thorny issues that have befuddled other foundation models.
Ben Tossell assembled a list of amazing examples of what you can do with ChatGPT:
To capture the breadth and depth of ChatGPT’s capabilities, Riley Goodside asked ChatGPT to explain a complex computer science algorithm, with accompanying examples in code, but all in the specific style of a 1940s gangster. The results are both stunning and hilarious:
Highly encourage you to play around with it here - it’s fun!
Funding News
Below we highlight select private funding announcements across the Intelligent Applications sector. These deals include private Intelligent Application companies who have raised in the last two weeks, are HQ’d in the U.S. or Canada, and have raised a Seed - Series E round.
Despite the Thanksgiving break, there was no shortage of activity with 14 new deal announcements that totaled $310M of total capital raised. A special shoutout (and shameless plug) to our newest Madrona portfolio company, Deepgram, which announced their $47M funding round earlier this week. We share some more of our insights on why we believe Deepgram will unlock conversational intelligence here.
We hope you enjoyed this edition of Aspiring for Intelligence, and we will see you again in two weeks! This is a quickly evolving category, and we welcome any and all feedback around the viewpoints and theses expressed in this newsletter (as well as what you would like us to cover in future writeups). And it goes without saying but if you are building the next great intelligent application and want to chat, drop us a line!