Hybrid AI

Ankesh Bharti,Fri Aug 09 2024•draft

(This essay is an early draft of a four part essay series that explains the motivation behind tiles.run (opens in a new tab). See part 1, "The New Intelligence" (opens in a new tab) and part 2, "Squishy Software" (opens in a new tab).)

Compute Illustration: Christopher Fleetwood

I.

The artificial intelligence landscape is undergoing a significant transformation, driven by rapid advancements in hardware, software, and the increasing demand for AI-powered applications. This evolution is characterized by a bifurcation between on-device narrow AI and cloud-scale general-purpose AI. However, a new paradigm is emerging that bridges these two worlds: hybrid AI architectures. On the edge side, the compute performance of on-device AI accelerators has grown exponentially. Apple's Neural Engine, for example, has increased its performance by 60x between 2017 and 2024^[1]. This growth, coupled with algorithmic progress, is enabling smaller AI models to achieve impressive capabilities on narrow tasks while running directly on edge devices like smartphones and laptops.

In AI model development, a concurrent trend is emerging whereby the largest models continue to grow in size, while the smallest commercially relevant models are becoming more compact over time. A striking example of this is Gemma 2 2B, which surpasses all GPT-3.5 models on the Chatbot Arena despite having only 2 billion parameters compared to GPT-3.5's 175 billion^[2]. This remarkable efficiency gain showcases the rapid pace of development in AI model architectures and training techniques. Such advancements are making it increasingly feasible to deploy powerful AI models directly on edge devices, opening up new possibilities for on-device AI applications that previously required cloud resources.

II.

Hybrid Illustration: Qualcomm

Meanwhile, on the cloud side, the performance of GPUs has been steadily doubling every 2.3 years^[3], with a further 10x boost^[4] from the adoption of new number formats like 16-bit floating point. Memory bandwidth and capacity are also increasing, enabling the training of massive AI models with hundreds of billions of parameters. These cloud-scale models are pushing the boundaries of artificial general intelligence (AGI). While these two tracks of AI development have been largely separate, the concept of hybrid AI is bringing them together. As outlined in Qualcomm's paper^[5], hybrid AI architectures leverage a combination of edge and cloud computing resources to deliver optimal performance, efficiency, and user experiences.

Apple's recently announced Apple Intelligence^[6] initiative is a prime example of a hybrid AI architecture in action. Instead of relying solely on cloud-based models, Apple is integrating its own foundation models directly into various features across its devices and services. This approach treats AI as a technology rather than a standalone product. By running smaller, specialized AI models on-device, Apple can deliver features like email prioritization, document summarization, and Siri enhancements with low latency and high privacy. At the same time, more complex tasks are seamlessly offloaded to large cloud models when necessary. This hybrid approach allows Apple to offer deeply integrated, personalized AI experiences while leveraging the collective processing power of millions of edge devices. Apple Intelligence showcases the benefits of hybrid AI architectures that Qualcomm and others envision. By intelligently distributing workloads between edge and cloud, hybrid AI reduces strain on cloud infrastructure, improves energy efficiency, and enables functionality even with limited connectivity. Sensitive user data can be processed locally, enhancing privacy. This intelligent workload distribution is reminiscent of the principles behind RouteLLM^[7], an open-source framework for cost-effective LLM routing, which aims to optimize the use of different AI models based on task requirements and resource availability.

Moreover, hybrid AI allows for continuous improvement of models based on real-world usage, benefiting from both local and global insights through lightweight training techniques like LoRA adapters.^[8] The interplay between edge and cloud creates opportunities for adaptive, personalized AI experiences across industries, from smartphones and IoT devices to vehicles. As more companies adopt hybrid AI strategies, following the path paved by Apple Intelligence, I expect a proliferation of powerful, efficient, and user-centric AI applications. However, this rise also presents challenges, including managing distributed AI systems' complexity, ensuring data privacy and security, and developing edge-cloud interoperability standards. As the technology matures, regulatory frameworks and industry best practices must evolve accordingly.

While hybrid edge-cloud architectures offer a promising path forward, the underlying AI models powering these systems must also evolve. The current generative AI paradigm, exemplified by large language models like ChatGPT, has captured the public imagination but faces inherent limitations in reliability and reasoning capabilities.

Read on to part 4, "Intent Router" (opens in a new tab).

References

[1] Apple introduces M4 chip. (2024). Apple. Link (opens in a new tab)

[2] Smaller, Safer, More Transparent: Advancing Responsible AI with Gemma. (2024). Google for Developers. Link (opens in a new tab)

[3] Predicting GPU Performance. (2022). EpochAI. Link (opens in a new tab)

[4] Trends in Machine Learning Hardware. (2023). EpochAI. Link (opens in a new tab)

[5] The future of AI is hybrid - Part I: Unlocking the generative AI future with on-device and hybrid AI. (2023). Qualcomm. Link (opens in a new tab)

[6] Apple Intelligence. (2024). Evans, B. Benedict Evans. Link (opens in a new tab)

[7] RouteLLM: Learning to Route LLMs with Preference Data. I. Ong, A. Almahairi, V. Wu, W. Chiang, T. Wu, J. E. Gonzalez, M. W. Kadous, I. Stoica. (2024). arXiv. Link (opens in a new tab)