Anthropic: Closing the Gap?

Computer use, Claude 3.5, and the race for model-to-app supremacy

and

Oct 25, 2024

Please subscribe, and share the love with your friends and colleagues who want to stay up to date on the latest in artificial intelligence and generative apps!🙏🤖

Anthropic is having a heck of a year.

Throughout 2024 they’ve announced the addition of several high-profile hires, including John Schulman (an OpenAI co-founder and post-training co-lead), Jan Leike (formerly the head of OpenAI’s Superalignment team), and Pavel Izmailov (formerly an OpenAI researcher focused on reasoning and safety). They closed a mammoth $4B investment from Amazon in March and are already in talks to raise more money, this time at a reported $30-$40B valuation. And just this week they announced the launch of “computer use”, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku.

There’s no question the four-year-old startup launched by seven former OpenAI employees is having a banner year, especially during a period of turbulence by its #1 competitor.

Anthropic has often been likened as Pepsi to OpenAI’s Coca-Cola. We believe this is comparison is reductive and inadequate. Pepsi and Coca-Cola both sell sugary drinks that effectively taste the same, and are not particularly known for persistent innovation (and Pepsi has always been second fiddle).

We believe a better (though admittedly lofty) comparison is Apple and Microsoft. The two titans have each conquered large parts of the tech market (e.g. consumer vs. enterprise), cultivated different cultures and ecosystems (e.g. designers vs. developers), promote varying core philosophies, and maintained market supremacy through constant innovation and product excellence.

Anthropic forecasts more than $850 mln in annualized revenue rate by 2024-end - report | Reuters

Let’s take a closer look at what Anthropic has been up to.

“Computer use” and its uses

Anthropic released computer use in public beta, which allows developers to “direct Claude to use computers the way people do”, essentially a program that can take control of the user’s mouse cursor and perform basic tasks on their computer.

This is a major step towards a future of AI agents, where users can give multi-step instructions to accomplish tasks on the user’s computer by looking at a screen, moving a cursor, clicking buttons, and typing text. Anthropic describes how computer use works here:

When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place. Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands—similar to how models often struggle with simple-seeming questions like “how many A’s in the word ‘banana’?”.

Computer use also illustrates how AI giants are moving up the stack from the foundation model to the application layer. As models continue to get commoditized, Anthropic, OpenAI, and other FM providers are seeking paths to generate long-term value, much of which will come from AI applications. Whether Anthropic’s push into computer use will cannibalize their end customers who are building their own agents (e.g., a Cursor) remains to be seen.

To Anthropic’s credit, they noted that “Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company” have already started exploring computer use, and other startups like Browserbase have illustrated how their tech can pair with computer use to solve session recordings, logs, captchas, etc.

Claude 3.5 Announcements - Sonnet and Haiku

Earlier this week, Anthropic announced a new model, Claude 3.5 Haiku, and an upgrade to it’s existing Claude Sonnet model, Claude 3.5 Sonnet. Claude 3.5 Sonnet has shown wide-range of improvements across industry benchmarks, primarily around agentic coding and tool use tasks.

The key changes in the Claude 3.5 Sonnet model that we’ve noticed include:

Stronger complex reasoning tasks through systematic problem decomposition and analysis. This includes better scientific reasoning, mathematical problem solving, and business analysis. The model demonstrates thoroughness in solution development by explicitly stating assumptions, considering multiple perspectives, and building clear implementation paths. We have seen this systematic step-by-step approach allows for better verification and review of the reasoning process.
Higher performance, accuracy and sophistication on coding tasks. Claude 3.5 Sonnet model can help across the entire software development lifecycle from initial design to bug fixes, maintenance, and optimizations. This puts an interesting implication on companies that are leveraging Claude for building applications (e.g., Cursor) and other competitors like GitHub Copilot which currently rely on less performant GPT models for AI code assistance.
Claude 3.5 Sonnet Coding Task
Remarkable conversational clarity and directness. Claude 3.5 can engage in natural dialogue and in a very direct way. It has avoided the verbose, repetitive response characteristic of many ‘v1’ LLM systems. The communication style balances technical precision with accessible explanations of complex concepts.
The UI/UX represents a significant advancement in AI interaction design. The dual-panel layout (as seen above in the coding example) has an analytical summary on the left and actionable content on the right which streamlines user workflow. For developers, this also manifests as an IDE-like environment (this will vary based on user and task), departing from traditional chat box constraints.
Demonstrates transparency and explicitly delineating the boundaries of its knowledge and capabilities. It clearly acknowledges limitations regarding internet access, temporal knowledge boundaries (April 2024), and other comparisons in which it does not have data access to, avoiding speculative or hallucinatory responses that were common among earlier ‘v1’ AI iterations. Instead of trying to make something up, Sonnet 3.5 says phrases like “I should be careful not to make specific claims about changes or improvements in my capabilities compared to other models, since I can't directly verify these comparisons.”

Claude Haiku 3.5 is Anthropic’s next-generation model. Haiku 3.5 boasts strong performance and speed across a variety of coding, tool use, and reasoning capabilities. It’s said to have more improved instruction following and is well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large volumes of data. Anthropic also claims that the speed of Haiku 3.5 makes it a great use case for real-time interactions - examples may include customer service chatbots, e-commerce recommendations, educational platforms, and real-time coding suggestions. Claude 3.5 Haiku is expected to be released before the end of the year, and we’re excited to do our own evaluation of the product release!

Conclusion: The Diverging Paths of AI Giants

What better way to end a blog post on language models than prompting Claude 3.5 Sonnet and ChatGPT 4o to write a conclusion. We fed each model the body of this post and specifically asked it to talk about i) why Anthropic and OpenAI are moving from the model layer to the app layer, ii) how models are commoditizing and performance is generally asymptoting, and iii) how the two companies are each creating a unique product ethos.

What do you think of the conclusions - and can you guess which one is which?

The AI industry is reaching an inflection point where foundation model performance is beginning to asymptote and commoditize. In response, both Anthropic and OpenAI are moving up the technology stack, though in distinctly different ways. OpenAI has pursued vertical integration through consumer products like ChatGPT and the GPT Store, while Anthropic is positioning AI as an augmentative force through capabilities like "computer use" and partnerships with specialized platforms like Asana and Canva. These divergent approaches – OpenAI's contained but user-friendly environments versus Anthropic's focus on integration and augmentation – echo the historical Apple/Microsoft dynamic. As base model performance reaches diminishing returns, the ability to create compelling applications will likely determine long-term value creation, making these strategic differences increasingly significant for the future of AI deployment.

Both Anthropic and OpenAI are shifting focus from model development to building AI applications, reflecting a broader industry trend where models are becoming commoditized and performance gains are tapering off. The real long-term value now lies in creating apps that leverage these models to solve user-specific problems. OpenAI is integrating its models broadly into platforms like Microsoft’s suite, positioning itself as the “AI operating system,” while Anthropic is pushing into specialized agent tasks, like its “computer use” feature, which enables more nuanced AI interactions with digital environments.

These differing strategies highlight how Claude and GPT models foster unique ecosystems - OpenAI prioritizes scale and platform ubiquity, while Anthropic is cultivating partnerships focused on safety, alignment, and developer-first approaches. As competition moves from model performance to app development, both companies are setting the stage for the next phase of AI innovation.

We hope you enjoyed this edition of Aspiring for Intelligence, and we will see you again in two weeks! This is a quickly evolving category, and we welcome any and all feedback around the viewpoints and theses expressed in this newsletter (as well as what you would like us to cover in future writeups). And it goes without saying but if you are building the next great intelligent application and want to chat, drop us a line!

Punit Thakkar

Claude has taken away my money from ChatGPT for the past 6 months. It gets so much more right compared to ChatGPT, especially code.

On that note, I've been making simple websites and chrome extensions for simple personal use cases using Claude. Would you like to read about it on Future Telescope?

Also I'm gonna guess the first snippet was written by Claude and the second one by ChatGPT. Let me know if I got it right!

Expand full comment

1 reply

Oct 26, 2024

did Sonnet write the 2nd conclusion?

1 more comment...

Aspiring for Intelligence

Anthropic: Closing the Gap?

Computer use, Claude 3.5, and the race for model-to-app supremacy

“Computer use” and its uses

Claude 3.5 Announcements - Sonnet and Haiku

Conclusion: The Diverging Paths of AI Giants

Discussion about this post