Dawn of the Agents

AutoGPT, BabyAGI, and the arrival of autonomous AI

and

Apr 27, 2023

Please share the love with your friends and colleagues who want to stay up to date on the latest in artificial intelligence and generative apps!🙏🤖

Share Aspiring for Intelligence

What Caught Our Attention In AI?

🧠 Deepmind and Google Brain Join Forces

🤖 Microsoft Readies AI Chip as Machine Learning Costs Surge

⚕️Generative AI for Healthcare Takes Center Stage at HIMSS

💡 Databricks Dolly 2.0 Announces One of the First Open Instruction Tuned-LLMs

👶 BabyAGI and AutoGPT take the world by a storm

One of the biggest highlights in the past two weeks was how AutoGPT and BabyAGI have taken the world by storm. Imagine being able to ask your computer to do something like “order me a pizza” or “book me the cheapest flights to Maui” and then having the program accomplish the task, no additional steps required.

Well, that’s the promise of autonomous AI Agents. Autonomous agents can be programmed to iteratively start, run, and complete different tasks, employing “human-like” notions such as prioritizing steps and determining how best to accomplish a task. While this concept has been around for several years going back to early “AI assistants” (remember the original x.ai?), they are now seeing a resurgence thanks to the recent explosion of large language models and the technical capabilities of GPT-4.

While we’re still in the early innings of seeing autonomous agents being deployed at scale or in enterprise production, we’re excited by the promise of the technology. One only has to look at the massive growth in star count, essentially GitHub’s version of a “like” button, to get a sense of the popularity of these projects.

AutoGPT achieved 100K stars on GitHub in just a few weeks…

How do agents differentiate from “classic” AI?

In the context of AI and foundation models, agents refer to computer programs or systems designed to perform specific tasks without human intervention. Today, the way that most of us interact with AI is through prompting ChatGPT. Let’s go back to the example above of booking a flight from New York City to Maui. Today, a user cannot actually complete the action of booking a flight through ChatGPT. Instead, they would likely query several prompts into ChatGPT, such as:

What are the best times of the year to go to Maui?
What airlines have direct flights from New York City to Maui?
Where can I book the cheapest direct flight?

After querying ChatGPT for relevant information, the user would then have to go to an external travel site to book the flight themselves. If they have access to the Expedia ChatGPT plugin, they will be presented with the relevant link within their conversation window to book a flight on Expedia.

Agents take this experience one step further by automating the entire workflow.

The promise of autonomous agents is that they only require an initial prompt (i.e. the objective) from the user: “book me the cheapest flight from New York City to Maui”. From there, the agent will build a series of tasks that it can self-execute to accomplish the objective. For example, armed with the initial prompt above and enough information, an agent may create a task list like this:

Google the best months of the year to visit Maui
Survey the five airlines that have direct flights from NYC to Maui during those months
Select the lowest-cost direct flight across those airlines for a week-long stay
Input traveler’s information and credit card details into the airline website to book a flight
Send flight confirmation details to the user’s email address

Of course, this example assumes the agent has access to the user’s payment details and other relevant information (which we believe will increasingly happen), but the key is that the agent was able to define a list of tasks and priorities on its own, and fulfilled the original objective without needing any further prompting from the user.

How do agents actually work?

I. Introduction of Agents:

The concept of agents has been around for over a decade, typically in the software context. Agents can be classified into different types based on their characteristics, such as whether they are reactive or proactive, whether they have a fixed or dynamic environment, and whether they are single or multi-agent systems (working together to achieve tasks, or operating as a single agent). Importantly, not all agents are autonomous agents, so let’s quickly level set on what an agent is and how different types of software agents work.

Agent Characteristics:

Reactive agents: Respond to their environment and take actions based on stimuli.
Proactive agents: Take initiative and perform an action based on prediction.

Environments in which Agents Operate:

Fixed environment: Static set of rules that do not change.
Dynamic environment: Rules are changing and the agent needs to adapt to new situations.

Some Examples of Types of Agents:

Conversational Agents: Simulate human conversation and can be used to answer questions, provide information, schedule appointments, set reminders, etc.
Recommendation Agents: Designed to provide personalized recommendations based on user data and behavior.
Task-oriented Agents: Perform specific tasks like booking a reservation. Knowledge-based agents are also task-oriented and designed to provide answers to questions based on a database of knowledge.
Autonomous Agents or “Zero-Shot” Agents: AI systems that can act independently of human control, making decisions and taking actions based on its own internal state and the external environment.

II. Putting Agents to Work:

Let’s take BabyAGI as an example. BabyAGI is the brainchild (brainbaby?) of Yohei Nakajima, a venture capitalist and AI builder based in the Seattle area. Originally spawned out of a desire to create an “AI founder” and with inspiration from the #HustleGPT movement, Yohei built BabyAGI is a simplified AI Agent in Python script that leverages OpenAI’s GPT-4 language model, an open-source embedding database called Chroma (previously leveraging Pinecone vector search), and the LangChain framework to perform a wide range of tasks across different domains.

The BabyAGI script works by running an infinite loop that follows the below steps:

Pulls the first task from the task list.
Sends the task to the execution agent, which uses OpenAI's API to complete the task based on the context.
Enriches the result and stores it in Chroma.
Creates new tasks and reprioritizes the task list based on the objective and the result of the previous task.

BabyAGI’s system works based on three agents that are working together:

Execution Agent - core system that utilizes OpenAI’s APIs to process the tasks. Two key parameters: the objective and the task.
Task Creation Agent - uses OpenAI’s API to create new tasks based on current objects and previous tasks. Four key parameters: the objective, the result of the previous task, the task description, and the current task list.
Prioritization Agent - uses OpenAI’s API to prioritize the task list. One key parameter: the ID of the current task.

III. The Stack:

In the context of BabyAGI, the stack contains a few key components that allow the agents to work. We expect these systems will be relevant across any use of agents.

The Model - An LLM sits at the core of the stack, responsible for completing tasks and generating new tasks based on the completed results. BabyAGI is calling OpenAI’s GPT-4 API.
The Vector Database - The vector search platform provides search and storage capabilities for retrieving task-related data and results. Storing in the database also allows agents to reference for context in future tasks. Vector DBs include Chroma, Pinecone, Weaviate, and others.
The Tooling Framework -Langchain is the framework that is used to enhance system capabilities around task completion, agent-based decision-making, and data contextual awareness.

Other Agents in the Wild

Today there are a handful of projects that have catalyzed resurgent interest in AI agents, most notably AutoGPT and BabyAGI (referenced above), but also more “niche” projects like Westworld and Camel.

AutoGPT

The first autonomous agent to burst onto the scene was AutoGPT, which was released as an open-source project on March 30th by game developer Toran Bruce Richards. Billed as an “experimental open-source attempt to make GPT-4 fully autonomous”, AutoGPT builds on ChatGPT’s framework, but essentially pairs GPT with a companion robot that instructs GPT on what actions to take. The companion robot receives instructions from the user and uses GPT and several APIs to carry out the necessary steps to achieve the desired goal. In this way, AutoGPT relies on self-prompting to chain together “multiple LLM thoughts” to achieve a desired goal.

AutoGPT is not yet an off-the-shelf application that any person can just start using. It requires some technical know-how, and users need to be able to connect with OpenAI’s API and create a token-based payment arrangement. That hasn’t slowed down its popularity among developers however; AutoGPT surpassed 100K stars on GitHub in just a few weeks, and amassed a large and passionate community across Discord and Twitter.

Westworld Simulation

Researchers from Stanford and Google created an interactive sandbox environment with 25 generative AI agents that can simulate human behavior. This is all possible via the agent architecture that extends an LLM with three important elements:

Memory and Retrieval: Memory stream which contains a list of observations for each agent alongside a timestamp of that memory. The important parts of the memory are based off recency, importance, and relevance.
Reflection: High-level abstract thoughts to help agents make inferences. Synthesizes memories into higher-level inferences over time, enabling the agent to draw conclusions about itself and others to better guide its behavior.
Planning: Translates conclusions to the current environment and creates action plans. Agents can create actions based on the plan and can react and update the plan according to the other observations in the memory stream.

CAMEL (Communicative Agents for “Mind” Exploration of Large Scale Language Model)

CAMEL proposes a role-playing agent framework where two agents can communicate with one another. Typically solving tasks in the real world requires multiple steps, so this framework involves a few key components:

AI user agent: Gives instructions to the AI assistant agent.
AI assistant agent: Follow AI user’s instruction and responds with solutions to the task.

In this set up there is also a task-specific agent that is brainstorming tasks for the AI user and the AI assistant. This task-specific agent also helps write task prompts without the user having to define them.

What are the limitations today?

While the promise of autonomous agents is immense, we have yet to see any fully autonomous agents being deployed in major enterprise use cases. There are a few reasons behind this:

Specification of the objective and reliably translating the prompt to action: Generating natural language into a code interpretation still presents many variables. Additionally, there are more challenges as the agents have to interpret and order multiple complex tasks.
Security and authorization: In order for agents to be ready for production use cases, there will need to be strong security and authorization layers. What should the agent have access to, and how do you integrate across different layers?
Hallucinations: Models still have hallucinations problems, which means the information retrieval and interpretation could remain inaccurate.
Cost: Assuming agents are running prompts at $0.05 per 1K token and each context has ~10K prompts, that would cost ~$0.50 per prompt. This can meaningfully add up if an agent is running 8 hours per day and 365 days per year (think a customer service agent). At 2,920 average work hours and 1 prompt per second, this would be ~$5.3M/year to run - much more expensive than a customer success agent!
Building out ready-to-use agents / APIs: In order for these autonomous agents to interact, they will need to have more APIs to interact with. Pre-built agents/APIs will help speed up this process.

What innovation can we expect to see in the years ahead?

The Copilot becomes the Pilot → We’ve talked about AI pair programming at length, such as how GitHub Copilot is already being used to write billions of lines of code. With AI Agents, the “copilot” may become the “pilot” itself, with the agent not only generating code but modifying it, recursively debugging, and moving code into production. Here’s an early example.
“Prompt Engineering” gets replaced → There has been much discussion of the role that “prompt engineers” will play in an AI-first world, and the six-figure salaries they can command. However, if agents are able to coax the best possible results out of the model without human intervention, the need for this type of role would lessen (or be altered to a “guider” vs. a “prompter”).
Integrations with Plugins → OpenAI’s Plugin ecosystem allows third-party apps like Expedia and Instacart to unlock the last-mile “action” in a conversation. These actions will become more important as autonomous agents take shape, as they cannot actually complete most tasks without access to user data, payment information, etc., making integrations between agents and third-party tools more important.

In the AI world, we may remember April 2023 as the month that autonomous agents began to take its first steps. While there are still plenty of limitations and obstacles to keep in mind, we are excited for the role agents will play in the AI ecosystem. Check out other great pieces on this topic including from Krishna and Sophia Yang.

Funding News

Below we highlight select private funding announcements across the Intelligent Applications sector. These deals include private Intelligent Application companies who have raised in the last two weeks, are HQ’d in the U.S. or Canada, and have raised a Seed - Series E round.

New Deal Announcements - 04/14/2023 - 04/17/2023:

Special shoutout to Madrona portfolio companies, Bobsled, Groundlight, and Lexion on their recent financings! Bobsled is revolutionizing data sharing across platforms and is a critical part of the modern data stack. Groundlight is building a world-class AI-computer vision system platform, making high-quality CV as simple as integrating an API service like Twilio. Lexion is a contract management software that is leveraging AI to help operations teams get deals done faster.

We hope you enjoyed this edition of Aspiring for Intelligence, and we will see you again in two weeks! This is a quickly evolving category, and we welcome any and all feedback around the viewpoints and theses expressed in this newsletter (as well as what you would like us to cover in future writeups). And it goes without saying but if you are building the next great intelligent application and want to chat, drop us a line!

Aspiring for Intelligence