The COGS in the Machine

Every app is rushing to become intelligent, but at what cost?

and

Mar 03, 2023

Thanks to our now 600 subscribers! Are your friends/family/colleagues interested in intelligent or generative apps? Please share!

For years, traditional software as a service (SaaS) has been heralded as a great business due to its high gross margin profile. Top SaaS companies at scale routinely achieve 80%+ gross margins, like Adobe (88%), Gitlab (88%), and Paycom (85%). In software, the saying goes, the marginal cost of making one additional unit of product is virtually zero.

But in this current AI boom, where virtually every traditional software company is rushing to incorporate AI or leverage large language models, how are their Cost of Goods Sold (COGS) impacted? If every SaaS app becomes “AI-enabled”, should we expect industry gross margins to stay the same?

Is Buck right in believing that AI makes for a worse business model than SaaS?

Buck @BucknSF

AI takes a good biz model w/SaaS and makes it worse, at least in the near-term. If AWS costs $1m/yr for a startup, AI is costing another $2-3M. And because so much money flooding into the space, customers going to be conditioned to get queries for free.

Or are we already seeing the cost of leveraging FMs decrease dramatically?

Greg Brockman @gdb

ChatGPT API now available, 10% the price of our flagship language model & matching/better at any pretty much any task (not just chat). Also released Whisper API & greatly improved our developer policies in response to feedback. We ❤️ developers:

openai.comIntroducing ChatGPT and Whisper APIsDevelopers can now integrate ChatGPT and Whisper models into their apps and products through our API.

Let’s dive in.

A Refresher on Software COGS

The median public SaaS company maintains a ~75% gross margin (per Clouded Judgment from Jamin Ball). Compare this with other industries such as consumer retail (~54% average GM), manufacturing (~30% average GM), or even online marketplaces (generally 50-60% GM), and you can see why software is such an attractive market.

While SaaS companies still have to pile money into Sales & Marketing and Research & Development to remain competitive, these costs are “below the line”, meaning they are drawn from the pool of dollars available after paying for the cost of making the “goods” (i.e., the software) itself.

So what are the main components of COGS? There are typically three:

Hosting → These are the cloud infrastructure costs that are paid to third-party providers (such as AWS or Azure) directly related to running and supporting the software. These costs include payments for storage, compute, provisioning, monitoring, and any other infrastructure-related support required to run the SaaS application. Hosting is typically the lion’s share of a SaaS company’s total COGS (often 60-70%).
Customer Support / Success → This includes the portion of the overall customer support and success costs (such as salaries and training) directly related to the creation and delivery of the product. It can often be a tricky endeavor to figure out exactly how much of CS should be included above the line (in COGS) vs. below the line (in Operating Expenses), but in general CS can account for 20%+ of total COGS.
Professional Services → If the production, implementation, or ongoing support of the software requires professional services, those direct costs should be included in COGS. PS is often the smallest portion of total COGS, especially for modern cloud-native companies that run entirely on subscription without the need for any services.

What Changes with LLMs?

With the advent of foundation models, how does all of this change? Three new costs are introduced into the equation:

The Model - Costs associated with leveraging FMs (e.g., OpenAI)
Training and Finetuning - Costs associated with training, RLHF, etc.
FM Ops - Costs associated with deploying, monitoring, operating, optimizing, and maintaining models

1. The Model(s)

At the heart of AI lies the foundation model. The foundation model is the basic engine that can transform a specific input into a completed output. Whether a company decides to build an AI product from scratch, or bake in AI capabilities (a difference we refer to as “generative native” vs. “generative enhanced”), they need to decide which model to use, and how many.

Companies can decide whether they want to leverage a proprietary closed-source model (like OpenAI, co:here, or Anthropic) or an open-sourced model (like Stable Diffusion or Bloom). There is a rich debate being waged on the merits of open-source vs. closed-source models which we won’t go into here, but at a minimum, startups should be aware of the costs involved (which can vary greatly by volume).

Let’s say we have a travel planning app looking to incorporate text generation capabilities on top of Davinci (the most powerful GPT-3 model). They would have to call OpenAI’s API, and pay based on how much text is being generated. Davinci currently costs $0.02 per 1,000 tokens, where each “token” is basically a piece of a word (1,000 tokens = ~750 words). The greater the words being generated, the larger the number of tokens required, and thus the higher the cost being paid to OpenAI. Of course, costs can vary based on modality (images cost more than text) and model type. As companies grow and generate more text/images, the cost of scaling the business also meaningfully increases.

At the same time, the large foundation models are becoming more competitive. Just two days ago, OpenAI announced API support for both ChatGPT and Whisper, at a fraction of previous costs. Thanks to “a series of system-wide optimizations”, the ChatGPT API costs 1/10th that of Davinci. On the whole, this is great for startups and businesses leveraging OpenAI models today, but as model parameters increase over time, there may be a parallel increase in token prices or new monetization strategies (such as ChatGPT Plus).

If you are using OpenAI (GPT-3 or DALL-E) to generate content and want to play around with a pricing calculator, check out a simple one by DeepakNess here!

DeepakNess @DeepakNesss

For people who are using OpenAI API to generate content:

2. Training and Finetuning

Off-the-shelf models such as Davinci are rarely enough for an AI-native or AI-enhanced product to deliver a great customer experience. Part of the “magic” of AI is personalization, or how the use case can be customized to a specific user. The model needs to be trained and finetuned on specific data sets in order to be truly valuable to the product.

Let’s think back to our AI-powered travel planning app which uses GPT-3 Davinci for text generation. While GPT-3 has been pre-trained on a large corpus of data using the open internet, fine-tuning can further improve the model and tailor the results more specifically to the use case in question (for example, trip suggestions based on the user’s exact location, budget, hobbies, etc.) Fine-tuned models reduce the need for example prompts, saving costs and enabling lower-latency requests. However, the process requires having valuable data and that data being prepped properly, classification, categorization, etc. Companies like Snorkel, Scale, V7, and DataLoop accelerate training data development by labeling data and then using the results to train models. Some of the fine-tuning may also happen through the model provider itself, or a third-party product like MosaicML, HumanLoop, PyTorch, and LightningAI.

Reinforcement learning human feedback (RLHF) is another type of instruction fine-tuning that is a crucial component in improving the accuracy of FM responses. Human feedback is required to teach the model to generate responses that align with user expectations. Currently, RLHF generates thousands of responses for a given prompt and then ranks the best answers. The more users interact with the model's outputs, the better they can identify different loss patterns. Companies such as Amazon Mechanical Turk or Scale's RLHF offerings commonly provide these traditional "services".

Training and finetuning costs can vary greatly depending on what kind of models are being used underneath. For companies using a single (or small number) of models shared across their entire customer base (like GitHub Copilot using Codex), a significant degree of training and finetuning is likely needed to get the models in the right spot. Given how “general purpose” most large language models are, there is usually quite a bit of experimentation required to create accurate results specific to the user.

3. FM Ops

The third major bucket of costs associated with AI is what we refer to as FM Ops, or the infrastructure required to deploy, monitor, operate, and maintain models in production. Selecting the model is only the first step; figuring out how to squeeze the most out of it over time is an ongoing process, and this is where FM Ops comes in.

The various components of ML Infrastructure per Run.ai

We believe companies that incorporate AI and foundation models fall into two buckets:

Companies that use a single model
Companies that leverage many models

FM Ops costs will likely be more significant for the second bucket of companies, where a high degree of infrastructure is required to handle and operate these models. While there are benefits to incorporating multiple models, the right tools are needed to avoid becoming the farmer who chases two rabbits and catches neither.

FM Ops is still an emerging category, so we expect to see many newer companies emerge that are more specifically focused on FM Ops vs ML Ops. A few of theses subsectors include:

Chaining: Fixie, Langchain, Dust, GPTIndex
Monitoring: WhyLabs, Arize, Fiddler (currently more MLOps focused)
Versioning: Neptune, Weights & Biases (currently more MLOps focused)
Deployment and Optimization: OctoML, Wallaroo
Classification: MindsDB

Some of these costs are directly tied to revenue and would be considered a COGS line item, while others will be considered more as R&D. Either way, they are new expenses that will be introduced into the system.

Additionally, there are privacy and security considerations to keep the data in each of the models separate. So while companies using a single (or few) models have higher training/finetuning costs, they can see some savings from requiring less ML infrastructure (and vice-versa).

Emad Elwany, co-founder and CTO at Lexion, sums it up nicely:

“ML Infra needs a ton of machinery (whether you build it in house or use MLOps products) to be able to train hundreds of models on a weekly basis, deploy them, monitor them, obtain and use customer feedback, roll them back if they cause issues in production etc. Like inference, the cost of training starts really making a dent in cost at very large scales (when you go into support too many users or too many models). You also need a lot of machinery to keep customer data separate (imagine if ChatGPT relays one customer's data to the other).” - Emad Elwany, Lexion

What can startups do to adjust to changing COGS?

As we’ve stated previously, we believe the future of software applications is they will be intelligent or cease to exist. In other words, we believe virtually ALL apps will have some intelligent or generative capabilities built in.

So what can companies do to reduce their AI COGS and continue to maintain strong software-like gross margins?

Use open-source models → Similar to open-source software like Linux, open-source models can be a useful tool for companies who want to constrain costs. Instead of paying for GPT-3, a startup could leverage GPT-J from Eleuther, a free alternative text generation model. Or for images, accessing Stable Diffusion or Blue Willow vs. DALL-E. While open-source models can help reduce AI COGS, drawbacks may include being less performant and requiring more technical expertise to implement.
Model optimization → Language models are constantly increasing in size due to the fact that model quality scales extremely well alongside model size. However, delivering these models to end users is becoming increasingly challenging, so startups may consider implementing model frameworks to make models faster and more cost-effective, and finding tools to reduce cost and deployment time by maximizing utilization across a broad set of hardware platforms.
Change from seat-based to usage-based pricing plans → Most SaaS tools today charge based on the number of seats sold. For example, implementing Salesforce can range from $25-$300 per user per month depending on tier. However as AI becomes more widely deployed across the enterprise, a better pricing plan may be oriented around USAGE vs. seats. With AI, fewer human workers are required to achieve the same output of tasks. Additionally, the value delivered to end customers more closely aligns with the costs associated to deliver that value (think back to Davinci charging more depending on how much text is being generated).
Raise prices → If all else fails, raise prices! If baking in AI capabilities truly leads to a better/faster/higher ROI customer experience, there may be justification to pass along some of the AI COGS to the customer. Obviously, this should be considered a last resort, as in a competitive marketplace, users can flock to the lower-priced product where available. But if the product becomes 10x better, the willingness to pay may increase also.

Conclusion

Overall, “baking in” AI or generative capabilities means that in addition to their standard COGS, software companies also have to account for the additional costs associated with foundation models, training and finetuning data, and related infrastructure costs. Without proper planning, this could lead to a significant reduction in gross margins (possibly a 20%+ drop!).

While using off-the-shelf FMs to add generative features to companies is generally less expensive than hiring AI/ML teams and training models in-house, it is critical for companies to understand the purpose of these capabilities. Before implementing generative capabilities, companies should carefully consider the associated costs, as there are many use cases where it may not make economic sense to use FMs.

As Milton Friedman would often say…

Funding News

Below we highlight select private funding announcements across the Intelligent Applications sector. These deals include private Intelligent Application companies who have raised in the last two weeks, are HQ’d in the U.S. or Canada, and have raised a Seed - Series E round.

New Deal Announcements - 02/17/2023 - 03/02/2023:

Special thanks to Emad Elwany for helping us with our thinking and framing some of the costs referenced above.

We hope you enjoyed this edition of Aspiring for Intelligence, and we will see you again in two weeks! This is a quickly evolving category, and we welcome any and all feedback around the viewpoints and theses expressed in this newsletter (as well as what you would like us to cover in future writeups). And it goes without saying but if you are building the next great intelligent application and want to chat, drop us a line!

Aspiring for Intelligence