Hope everybody enjoyed the Fourth of July holiday! Thank you to our now 1K+ subscribers, we appreciate all the continued support! Please continue to share with colleagues and friends :)
Last week was a big one in the data and AI world, with two of the most significant players in the space - Databricks and Snowflake - hosting their annual conferences (Databricks’ Data and AI Summit in San Francisco and Snowflake’s Summit in Las Vegas). It’s likely not coincidental that the two giants decided to hold their marquee events in the same week. Snowflake and Databricks have over the past decade been both friend and foe, but this week made it glaringly obvious that they are now arch-competitors, with the new battleground being AI.
It should be no surprise that the majority of the discussions and announcements at both conferences were surrounded by Generative AI. The major theme being relayed was that in order to have a Generative AI strategy, every company has to start with a data strategy. Unsurprisingly, Databricks and Snowflake are each making the case that they are best positioned to assist customers in that journey.
How did two companies that began life at different parts of the value chain, and at one time even enjoyed a strategic partnership, evolve into such fierce competitors in this new age of AI?
Let’s dig in.
[Quick disclaimer: Madrona invested in Snowflake’s Series C and still holds some shares in the company.]
Snowflake: From Data Warehouse to Data Cloud
Snowflake was founded in 2012 by Benoît Dageville and Thierry Cruanes, two database experts who had previously spent many years at Oracle where they made the astute observation that most data warehouses were “rigid, expensive and difficult to use.” Dageville and Cruanes teamed up with Marcin Zukowski, former CEO of Vectorwise (now Actian Vector), to build the data warehouse of the future, based on three key premises: 1) a fully cloud-based architecture; 2) the separation of compute from storage to allow near limitless scaling; and 3) elasticity in how computing resources are used, resulting in unprecedented speeds in query processing and flexibility.
Today, Snowflake has evolved from “simply” a cloud data warehouse into a “Data Cloud”, a single platform for customers to access, build, collaborate, and monetize their data. In just over a decade they have grown into a $55B market cap public company servicing 6,000+ customers and much of the Fortune 500. Having muscled their way alongside the major hyperscalers (Azure, AWS, and GCP), Snowflake has now clearly set its vision on gaining more mindshare in artificial intelligence.
To do so, they have made a number of acquisitions and product launches in AI and ML, including:
Snowpark, which allows data scientists to work with their preferred programming languages to enable end-to-end ML workload development, deployment, and orchestration. Through Snowpark, customers can ingest, analyze, and transform their data to train ML models and run more predictive analytics.
Streamlit, a data-based app builder that Snowflake acquired for $800M in March 2022, allows customers to develop data-intensive apps with only a few lines of code. Streamlit simplifies the process of contextualizing data analytics tasks and ML model outputs through front-end web applications.
Neeva, which was acquired by Snowflake earlier this year in a push to accelerate how businesses interact and search with their data, particularly in a more conversational way.
Databricks: Building The Lakehouse
Databricks was founded in 2013, just one year after Snowflake. Unlike Benoit and Theirry who were industry practitioners, Databricks was founded by a group with deep roots in academia and the open-source community. Its seven original cofounders, including current CEO Ali Ghodsi, were researchers at UC Berkeley’s AMPLab where they conceived Apache Spark, an open-source unified analytics engine for large-scale data processing. Spark has grown into one of the largest and most used data processing frameworks, executing data engineering, data science, and machine learning at scale.
Databricks was initially formed to commercialize Spark, introducing an enterprise-grade version of Spark with all the features (governance, support, hosting, etc.) that large organizations required. Databricks has since evolved into the novel “Lakehouse Platform” unifying data, analytics, and AI. The unified Lakehouse concept brings together “one platform for integration, storage, processing, governance, sharing, analytics, and AI.”
In the past ten years, Databricks has become one of the most highly valued private companies in the world, last valued at $38B in 2021 and recently crossing the $1B revenue milestone. They serve thousands of enterprise customers and open-source users, and are considered one of the most hotly anticipated IPOs. Throughout all of this growth, they are increasingly positioning themselves as a leader in AI, and recently made key acquisitions and announcements, including acquiring MosaicML for $1.3B (covered more below), and open-sourcing Dolly, an instruction-tuned LLM trained for less than $30.
Colliding in AI
Snowflake and Databricks are both well-positioned to continue capitalizing on long-term secular trends as Enterprises position for the Generative AI paradigm shift. With the proliferation of Generative AI applications, both companies are trying to position themselves as strategic multiproduct data platforms. Below, we highlight a few of the major announcements from the respective conferences and our thoughts on each company’s overall AI strategy.
Snowflake Major Announcements:
Developer Announcements:
Snowflake’s Native App Framework: This is a new way of putting data to work by allowing developers to create, distribute, and monetize applications that can all scale with Snowflake’s Data Cloud.
Snowpark Container Services: Extended data programmability and compute infrastructure to support programming languages, access third-party software, and enhanced security and governance for hosting full-stack apps and LLMs. Provides further flexibility by generalizing Snowflake’s compute platform such that customers can run a full stack end-to-end application from the bottom of the stack (data layer) all the way up to the UI layer.
Other Notable Announcements: Snowpipe streaming capabilities; Dynamic Tables (also known as Materialized Tables); Document AI (a new service to extract unstructured data within documents); and Iceberg Tables.
Partnership Announcements: Snowflake announced several notable partnerships with NVIDIA, Microsoft, and Weights & Biases.
With Nvidia, Snowflake is planning to embed the company’s NeMo enterprise developer framework into its Data Cloud, which will allow Snowflake customers to build and deploy LLMs and AI-driven applications leveraging proprietary data that resides in Snowflake.
With Microsoft, Snowflake extending the partnership with Azure to focus on new product integrations around Microsoft Azure’s OpenAI and Azure AI/ML services. The partnership has the potential to increase workloads and customers into the Data Cloud.
With Weights & Biases, a leading MLOps platform, Snowflake’s Container Services enables Weights & Biases to accelerate the iterative development of ML models, LLMs, and LLM-powered applications in the Snowflake Data Cloud. Ultimately this partnership will help enterprises and users more easily build and leverage generative AI.
Beyond these two, Snowflake announced a number of other partnerships with companies like Alteryx, Hex, Dataiku, RelationalAI, Pinecone, and more.
Our Take
Until very recently, Snowflake did not reveal any plans for adding generative AI to its existing capabilities, and many investors have expressed concern that Snowflake is being out-competed in this space (particularly against Databricks). However, at the 2023 Summit, Snowflake landed a strong story around their vision to be a platform for Generative AI positioning themselves as the trusted data cloud provider.
Snowflake’s partnership with Nvidia, along with the announcement of Snowpark Container services, helps give them a foothold as a more viable player in the AI data stack. Their driving message is they can enable customers to securely access, develop and deploy LLMs and AI-driven applications within the Snowflake Data Cloud while providing access to accelerated computing with Nvidia GPUs and AI software.
While their story and messaging is impressive, we believe they are still behind the eight ball relative to Databricks in AI…
Databricks Major Announcements:
Developer Announcements:
LakehouseIQ: LLM-powered natural language interface for searching and querying data and powerful understanding of customer’s data, internal jargon, and usage patterns to understand customer’s schemas, documents, queries, lineage, and more.
LakehouseAI: Databricks announced a number of new capabilities around Databricks ML including a number of LLMOps capabilities like the bringing together of data, the preparation of datasets for ML, fine-tuning and curation of ML models, and the deployment of the models themselves. Databricks also announced a number of features around vector search, feature service, and MLFlow Gateway.
MosaicML: Just before the Summit kickoff, Databricks announced a $1.3B acquisition of MosaicML which during the Summit was positioned as the "machine to build your GenAI models”.
Other Notable Announcements: Delta Lake 3.0, MLFlow 2.5 to support across different backend LLMs, Lakehouse Apps, and Intelligent Monitoring with Databricks Lakehouse Monitoring.
Our take
Databricks has taken a unification approach to AI by bringing together data, AI models, and monitoring and governance capabilities into the Lakehouse platform. As a result, Databricks has enabled customers to develop their GenAI solutions more efficiently and customers view them as a trusted partner that is on average faster, cheaper, and easier to use for facilitation of ML development.
While already considered a key player in the AI stack, Databricks has emboldened their position as a leader in GenAI through investments in models like Dolly (an open-source instruction-following LLM) and their big-ticket acquisition of MosaicML. Databricks contines to echo the message that their Lakehouse is the best way for gen-native startups to train and deploy their own AI models, leveraging their own proprietary data in a cost-effective way without being tied to Big Tech.
What Can We Expect Going Forward?
While the generative AI craze has been continuing unabated for 8+ months, this past week clearly signaled that Snowflake and Databricks are taking the gloves off to compete for both mind and market share in this space.
So what can we expect from this heightened rivalry moving forward?
Acquisitions will continue → Snowflake and Databricks are both fairly well-positioned to continue acquiring smaller companies that complement their overall strategies. Snowflake has ~$4B of cash on their balance sheet, while Databricks maintains a rich valuation that doubles as usable currency. Meanwhile, there are hundreds of startups across AI and data tools yearning for an exit in a dry IPO market. We don’t expect Neeva and MosaicML to be the last acquisitions these giants will make, and there will be consolidation.
Customers will benefit → One of the clear winners in the emerging battle between Snowflake and Databricks should be their customers. Both of these giants are rapidly adding new and novel products and services to their platforms, building “one-stop shops” for their customers to build data applications and take advantage of LLMs. This platform augmentation will help democratize access to artificial intelligence and allow data scientists, data engineers, and AI practitioners to collaborate more meaningfully.
Azure and AWS will make even more $$$ → As Snowflake and Databricks continue to push further into owning more of the AI market, they will require massive compute capabilities, primarily served by Azure and AWS, a point data engineer Anant Packidurali astutely observed. Similar to how Nvidia is a secular beneficiary of who “wins” in AI, the hyperscalers that underpin the compute needs for Snowflake and Databricks stand to gain regardless of who emerges victorious in the AI battle.
As Enterprises become more heavily reliant on data to bolster their GenAI strategy, we believe both Snowflake and Databricks are well-positioned to take advantage of the generational shift. Although they came from different parts of the value chain and their relationship has evolved over the course of the past decade, they are now squarely locked in a race where the rewards are enormous.
We are closely watching how new gen-native start-ups are thinking about their data, AI, and ML strategy and carefully deciding which data partner they should work with.
Funding News:
Below we highlight select private funding announcements across the Intelligent Applications sector. These deals include private Intelligent Application companies who have raised in the last two weeks, are HQ’d in the U.S. or Canada, and have raised a Seed - Series E round.
In advance of the holiday week, there were a slew of new announcements in the Intelligent and Generative App space. Congratulations to Madrona’s newest portfolio Typeface (read more about our investment here) and to our existing portfolio company Runway on their $100M extension round!
New Deal Announcements - 06/23/2023 - 07/05/2023:
We hope you enjoyed this edition of Aspiring for Intelligence, and we will see you again in two weeks! This is a quickly evolving category, and we welcome any and all feedback around the viewpoints and theses expressed in this newsletter (as well as what you would like us to cover in future writeups). And it goes without saying but if you are building the next great intelligent application and want to chat, drop us a line!
It will be interesting if you do comparison with aws:redshift and its ecosystem, gcp:BQ and its ecosystem and azure: one lake and its ecosystem., Issue for these two companies is not against each other but the cloud vendors.
Great analysis. We've been all over these two unicorns since their founding. We've profiled both of them many times in our free SiliconANGLE theCUBE Breaking Analysis research.
https://siliconangle.com/tag/breaking-analysis/