Compute for Generative AI – a Quick Tour

The week after Thanksgiving brought some activity as investors looks to wrap up deals before the holiday season kicks into gear. Most notably, Together AI raised a significant Series A just over a year after being founded. Additional details below.

Together AI Raises Series A

Generative AI infrastructure provider Together AI announced a $102.5 million Series A funding round. Founded in 2022, Together AI specializes in open-source infrastructure for AI development, claiming to democratize AI by making it more accessible to developers. The round, led by Kleiner Perkins, attracted investment from a mix of VCs including Nvidia, Emergence Capital, Lux Capital and Definition.

Founded by Vipul Ved Prakash, Ce Zhang, Percy Liang, and Chris Re in 2022, Together AI is creating a comprehensive, cloud-based platform intended to enable developers to create and customize AI models freely. The company’s mission is to be a generator of new AI infrastructure, offering an alternative to the proprietary services of the major cloud vendors including AWS, Azure, and GCP. They are also more specifically positioned against the Model+Compute+Community providers like Hugging Face, Replicate and Kaggle. The company intends to use the funds to further expand its cloud platform and focus on developing the fastest cloud platform for generative AI applications.

The Series A financing, aimed at creating a fast cloud platform for generative AI applications, further fuels the growing interest in Generative AI infrastructure. Bucky Moore from Kleiner Perkins will join Together AI’s board as part of the deal, reinforcing their focus on open-source alternatives for AI development.

Together AI’s platform is designed to support the training, fine-tuning, and production running of AI models, promising scalability at competitive prices. Together AI’s strategic approach includes optimization across various tech stacks, which has already resulted in a reduction in costs for interactive inference workloads with large language models.

The startup’s infrastructure includes a network of data centers in the U.S and Europe, offering over 20 exaflops of computing power. Together partners with Crusoe and Vultr for its server infrastructure. The company’s client roster features other AI startups such as Nexusflow.ai and Voyage.ai, and has seen deployment of flagship projects such as Pika Labs Inc.’s text-to-video model (side note: definitely check out Pika Labs’ demo – it is super cool; also, Pika announced a $35mm raise this week, led by Lightspeed)

I have been trying to run LLMs on my 2017 iMac and relearning a few things along the way. As a non-gamer and hobbyist programmer, I have not cared all that much about compute power for at least a decade. My last two laptops have been Macbooks (circa 2010 and 2016), both of which are still functional, and the latter of which is in active use. Trying to run open sourced models like Meta‘s Llama 2 on the iMac (which has an AMD GPU) has only gotten me so far. The creaking performance drove me to first look at more fully powered, gaming-focused machines. The winner there seems to be the Nvidia RTX4090, a beast of a GPU that has so much processing power it has been banned from export to China. Listed at $1,599, the RTX4090 is notoriously hard to get hands on, and definitely not at that price. Given my hobbyist leanings, a better option seems to be just to rent compute and storage from the cloud – a lot more cheaper and flexible. Research around what the best options for generative AI compute are, brings us to the second part of our read today. Below is a quick summary of the key players and options, not just for hobbyists but scaling all the way up to large enterprises.

Key Players in Generative AI Infrastructure Cloud

Large Public Cloud (enterprise focus) – Public cloud players have jumped all-in with a range of home-grown platforms and solutions coupled with a confusing array of partnerships and solutions to build, test, scale and deploy AI enabled applications in the Enterprise. The TAM for Generative AI is large and expected to grow exponentially. According to IDC–

Enterprise spending on generative AI services, software and infrastructure will skyrocket over the next four years, jumping from $16 billion this year to $143 billion in 2027.

Spending on generative AI over the four year period to 2027 is expected to reach a compound annual growth rate (CAGR) of 73.3%, a figure which IDC says is more than twice the rate of growth in overall AI spending and almost 13 times greater than the CAGR for worldwide spending on IT over the same period.

IDC has forecast that by the end of 2027, spending on generative AI will account for 28.1% of overall AI spending, with generative AI infrastructure — which includes hardware, infrastructure as a service (IaaS), and system infrastructure software (SIS) — representing the largest area of investment during the so-called buildout phase of this period.

Below is a round-up of the key generative AI infrastructure offerings across the major public clouds –

Amazon Web Services – Bedrock, SageMaker Jumpstart, etc.
Azure AI – simpler branding, closely coupled with OpenAI
Google Cloud – Vertex Platform, Colab, Kaggle, etc.
also, IBM Watson and Oracle Cloud

Communities, Models, Compute (data scientists, researchers, hobbyists) – The more interesting and exciting work in generative AI is actually happening on platforms like Hugging Face(what a great name!) which provide access to a combination of models, datasets, libraries and frameworks to make generative AI highly accessible. I’d be remiss if I did not mention social platforms X, YouTube, Discord and Reddit where a lot of community building takes place. Additionally, GitHub provides hundreds of thousands of repositories for data scientists and tinkerers to build, test, debug and improve applications leveraging the latest open source generative AI software.

Below is an overview of the key players enabling the next generation of AI Applications through a communal, lets-learn-and-build-together approach.

Hugging Face– Hugging Face is a French-American company headquartered in New York City. It has actively contributed to the open-source AI community since its founding in 2016. Originating as a chatbot app, the company shifted its focus to developing a platform for machine learning after open-sourcing its algorithm.

Founders Clément Delangue, Julien Chaumond, and Thomas Wolf have guided Hugging Face to become a major player in the field, now boasting a community of over 50,000 organizations that use its hub.

Hugging Face has attracted substantial venture capital. Its Series C round in May 2022 raised $100 million, valuing the company at $2 billion, with participation from Coatue and Sequoia Capital. Following this, a Series D funding round in August 2023 added $235 million to their total, doubling the company’s valuation to $4.5 billion. Strategic investors include Salesforce, Nvidia, Google, and Amazon.

Hugging Face’s value proposition has attracted a wide range of customers—including Intel, Bloomberg, and Microsoft—with paid features such as secured corporate tools over its widely-utilized model hub. With this financial support, the company is expanding its team and services, maintaining a commitment to democratizing AI technology.

The platform’s greatest strength lies in its community, a machine learning / data science analogue to what GitHub has done for software development. It serves as a repository for more than a million AI models and datasets, fostering an environment where developers can jointly develop, test and refine AI applications. Hugging Face’s tools like AutoTrain and Inference API mitigate the complexity of training AI models and managing infrastructures for developers.

The company has delved into various AI applications, from language models that challenge OpenAI’s GPT-3 to code-generating tools like StarCoder. Their open-source contributions and strategic ties with cloud providers like Nvidia and AWS promise further advancements in AI accessibility and innovation.

Kaggle – While not obvious from a cursory review of the website, Kaggle is actually owned by Google, which acquired the company in March 2017. Kaggle, operates as a competitive forum and a collaborative community where individuals can publish and find datasets, build models in a web-based data science environment, and enter competitions to address various challenges in the data science sector. The platform’s origin coincides with the development of Apache TVM, an open-source machine learning compiler framework.

Kaggle’s stated mission is to fuel progress in data science by bringing together a community to tackle real-world problems. They have achieved this by offering data science competitions which see participants strive to produce the most effective models and solutions, using provided datasets from businesses and academic institutions.

Over the years, Kaggle has evolved beyond a competition host, enriching its ecosystem with public datasets that span various industries and offering tools for sharing code and insights. These shared resources have fostered a collaborative culture, leading to some notable advancements.

Educational content has gradually been integrated into the platform, with tutorials and courses available to both newcomers and experts. These methodologies provide continuous learning opportunities in data science.

Participation in Kaggle competitions is often the first practical exposure many individuals have to data science challenges. Established data scientists also frequent the platform, participating in discussions, sharing notebooks, and harnessing public datasets for research and experimentation. Competing on Kaggle helps builds practical skills while offering the chance to contribute to meaningful projects across various industries.

The platform’s model has proven to be effective, pushing the boundaries of data science and, in turn, providing solutions that have real-world impact. Kaggle’s free access has been a cornerstone of its wide-scale adoption, offering public datasets and the opportunity to engage with a worldwide community of data enthusiasts. While paid courses are available for faster skill acquisition, the core offerings of Kaggle remain open to all users.

In recent years, Kaggle has established a progression system to recognize the contributions and achievements of its users, ranging from Novices to Grandmasters. This system provides an additional incentive for users to improve their data science capabilities and engage more deeply with the Kaggle community.

Replicate – In February 2023, Replicate announced its emergence from stealth mode, having secured $17.8 million in funding to facilitate the development of artificial intelligence (AI) applications. The startup, co-founded by Ben Firshman and Andreas Jansson, offers a platform that simplifies the integration of AI models for software developers, aiming to minimize the need for specialized machine learning skills.

The company’s financial foundation was laid with a Series A round that brought in $12.5 million, led by Andreessen Horowitz, with contributions from investors including Y Combinator and Sequoia Capital. This follows a seed funding round of $5.3 million, supported by angel investors affiliated with organizations such as Figma and Vercel.

Replicate’s platform is tailored for software engineers who may not be versed in machine learning, enabling them to incorporate AI features into their applications with reduced coding requirements. CEO Firshman has highlighted the challenges developers often encounter with AI adoption and how Replicate’s platform targets these issues.

An essential component of Replicate’s platform is Cog, an open-source tool that standardizes the packaging of machine learning models for production deployment. Compatible with several operating systems such as macOS and Windows 11, Cog facilitates the production readiness of machine learning applications.

Among its offerings, Replicate boasts an AI model library that includes diffusion models similar to Stable Diffusion, alongside models catered to video creation, image upscaling, and image-text conversion—tools that expand the potential for developers to enhance their projects with AI.

Replicate seeks to distinguishes itself from the competition by prioritizing the developer experience and maintaining a comprehensive open-source AI library.

Growth indicators show a significant uptake in Replicate’s platform, with marked increases in active user engagement and the acquisition of high-profile enterprise customers. Replicate aligns itself with the industry’s movement towards broader automation applications.

Replicate aims to refine its platform to meet the demands of transforming business practices through AI automation. By integrating large AI models into software, Replicate foresees developers unlocking new potential and driving the evolution of the digital environment with innovative AI-powered applications.

OctoML – OctoML is a Seattle-based technology company specializing in AI and machine learning (ML) optimization. Founded in 2019 as a spin-out from the University of Washington, OctoML has established itself by providing solutions that target the efficiency and accessibility of AI deployment.

The company originated from the creators of Apache TVM, an open-source software used by companies including Amazon, Microsoft, and Facebook. OctoML’s launch was motivated by the opportunity to build upon the capabilities of TVM and other technologies to enhance the model execution lifecycle.

With a focus on AI sustainability, OctoML has developed a suite of products aimed at making AI more attainable for a wider audience. The OctoAI Platform is the company’s primary service, catering to developers with tools for running, fine-tuning, and scaling AI models such as Llama 2, WhisperX, and SDXL. This service is designed to streamline the deployment process, providing a reliable and fast solution for integrating AI into applications.

OctoML has recently expanded its focus by introducing the OctoAI platform, which transitions from merely optimizing models to also enabling businesses to utilize and customize existing open-source models. Notably, OctoML has improved the efficiency of models like Stable Diffusion, achieving faster run times and reducing operational costs compared to standard implementations.

The OctoAI Text Gen Solution, another product from OctoML, offers a similar functionality for language models, providing accelerated access to open-source Large Language Models (LLMs) and the option for users to fine-tune models with their own data. This provides a flexible alternative to the one-size-fits-all models that are commonly available.

Additionally, OctoML has introduced OctoAI Image Gen, which supports developers in customizing image generation applications. This solution allows for modifications through a single API, using assets such as LoRAs and Checkpoints to provide bespoke image generation that can scale accordingly without compromising efficiency.

OctoML has secured $132 million, including an $85 million Series C round led by Tiger Global Management. The company’s expanding user base includes notable clients such as Toyota, and partnerships have been established with hardware vendors including Qualcomm, AMD, and Arm.

Together AI(see above)

Bare Metal, GPUs (hobbyists, researchers, DIYers, cost-focused)

There is a vast network of cloud infrastructure providers that have jumped into the generative AI bandwagon offering varying levels of service with the key pitch being flexibility and affordability. These range from fairly established, global data center providers to much smaller players offering slices of compute and storage sourced from gamers selling idle GPU time on their high-end rigs. I won’t spend too much time on these, except to link to the ones I found to be relevant –

Vast.ai, CoreWeave, Paperspace (by DigitalOcean), Lambda Labs, RunPod, Jarvislabs.ai, Vultr, Crusoe Cloud, OVHcloud, LeaderGPU, Cudo Compute, Hostkey, GPU Mart, Immers Cloud, Fluidstack and TensorDock.

That’s a wrap for today! I will be posting a bit more on the higher layers of the Generative AI stack in subsequent editions, so make sure to like, subscribe and share with others who may be interested.