Hybrid AI

Ankesh Bharti,draft

(This essay is an early draft of a four part essay series that explains the motivation behind tiles.run (opens in a new tab). See part 1, "The New Intelligence" (opens in a new tab) and part 2, "Squishy Software" (opens in a new tab).)

Compute Illustration: Christopher Fleetwood

I.

The artificial intelligence landscape is undergoing a significant transformation, driven by rapid advancements in hardware, software, and the increasing demand for AI-powered applications. This evolution is characterized by a bifurcation between on-device narrow AI and cloud-scale general-purpose AI. However, a new paradigm is emerging that bridges these two worlds: hybrid AI architectures. On the edge side, the compute performance of on-device AI accelerators has grown exponentially. Apple's Neural Engine, for example, has increased its performance by 60x between 2017 and 2024[1]. This growth, coupled with algorithmic progress, is enabling smaller AI models to achieve impressive capabilities on narrow tasks while running directly on edge devices like smartphones and laptops.

In AI model development, a concurrent trend is emerging whereby the largest models continue to grow in size, while the smallest commercially relevant models are becoming more compact over time. A striking example of this is Gemma 2 2B, which surpasses all GPT-3.5 models on the Chatbot Arena despite having only 2 billion parameters compared to GPT-3.5's 175 billion[2]. This remarkable efficiency gain showcases the rapid pace of development in AI model architectures and training techniques. Such advancements are making it increasingly feasible to deploy powerful AI models directly on edge devices, opening up new possibilities for on-device AI applications that previously required cloud resources.

II.

Hybrid Illustration: Qualcomm

Meanwhile, on the cloud side, the performance of GPUs has been steadily doubling every 2.3 years[3], with a further 10x boost[4] from the adoption of new number formats like 16-bit floating point. Memory bandwidth and capacity are also increasing, enabling the training of massive AI models with hundreds of billions of parameters. These cloud-scale models are pushing the boundaries of artificial general intelligence (AGI). While these two tracks of AI development have been largely separate, the concept of hybrid AI is bringing them together. As outlined in Qualcomm's paper[5], hybrid AI architectures leverage a combination of edge and cloud computing resources to deliver optimal performance, efficiency, and user experiences.

Apple's recently announced Apple Intelligence[6] initiative is a prime example of a hybrid AI architecture in action. Instead of relying solely on cloud-based models, Apple is integrating its own foundation models directly into various features across its devices and services. This approach treats AI as a technology rather than a standalone product. By running smaller, specialized AI models on-device, Apple can deliver features like email prioritization, document summarization, and Siri enhancements with low latency and high privacy. At the same time, more complex tasks are seamlessly offloaded to large cloud models when necessary. This hybrid approach allows Apple to offer deeply integrated, personalized AI experiences while leveraging the collective processing power of millions of edge devices. Apple Intelligence showcases the benefits of hybrid AI architectures that Qualcomm and others envision. By intelligently distributing workloads between edge and cloud, hybrid AI reduces strain on cloud infrastructure, improves energy efficiency, and enables functionality even with limited connectivity. Sensitive user data can be processed locally, enhancing privacy. This intelligent workload distribution is reminiscent of the principles behind RouteLLM[7], an open-source framework for cost-effective LLM routing, which aims to optimize the use of different AI models based on task requirements and resource availability.

Moreover, hybrid AI allows for continuous improvement of models based on real-world usage, benefiting from both local and global insights through lightweight training techniques like LoRA adapters.[8] The interplay between edge and cloud creates opportunities for adaptive, personalized AI experiences across industries, from smartphones and IoT devices to vehicles. As more companies adopt hybrid AI strategies, following the path paved by Apple Intelligence, I expect a proliferation of powerful, efficient, and user-centric AI applications. However, this rise also presents challenges, including managing distributed AI systems' complexity, ensuring data privacy and security, and developing edge-cloud interoperability standards. As the technology matures, regulatory frameworks and industry best practices must evolve accordingly.

While hybrid edge-cloud architectures offer a promising path forward, the underlying AI models powering these systems must also evolve. The current generative AI paradigm, exemplified by large language models like ChatGPT, has captured the public imagination but faces inherent limitations in reliability and reasoning capabilities.

III.

Neurosymbolic Illustration: Stephen Wolfram

The future of AI likely lies in neurosymbolic approaches that combine the pattern recognition strengths of neural networks with the logical rigor and interpretability of symbolic reasoning. This "best of both worlds" approach, akin to Daniel Kahneman's System 1 (fast, intuitive thinking) and System 2 (slow, deliberative reasoning), has the potential to overcome the limitations of pure neural network systems. The integration of Wolfram Alpha with ChatGPT[9] exemplifies this synergy: ChatGPT's natural language prowess combined with Wolfram Alpha's computational precision creates a system more powerful than either could achieve alone. This hybrid approach not only enhances AI capabilities but also paves the way for more robust, versatile, and trustworthy AI systems, marking a significant leap forward in artificial intelligence.

Recent research has shown that large language models themselves can serve as effective neurosymbolic reasoners. For instance, a study by Fang et al.[10] demonstrated that LLMs, when properly prompted and integrated with external symbolic modules, can successfully tackle complex symbolic reasoning tasks in text-based games. This approach achieved impressive results without the need for extensive training data or reinforcement learning, suggesting that LLMs may already possess latent symbolic reasoning capabilities that can be unlocked through clever system design.

Realizing this neurosymbolic future will require overcoming technical challenges in integrating discrete symbolic representations with continuous neural networks, as well as developing more efficient reasoning algorithms. But the potential benefits – AI systems that are not only powerful but also reliable, interpretable, and trustworthy – make this a crucial direction for the field.

As I look ahead to the next era of AI, it's clear that hybrid architectures, spanning edge and cloud, neural and symbolic, will be key to unlocking new possibilities. By combining the strengths of these approaches, we can build AI systems that augment and empower human intelligence in transformative ways. The path forward is not simply bigger language models, but thoughtful integration of diverse AI paradigms to create more robust, adaptive, and user-centric experiences.

Read on to part 4, "Intent Router" (opens in a new tab).

References

[1] Apple introduces M4 chip. (2024). Apple. Link (opens in a new tab)

[2] Smaller, Safer, More Transparent: Advancing Responsible AI with Gemma. (2024). Google for Developers. Link (opens in a new tab)

[3] Predicting GPU Performance. (2022). EpochAI. Link (opens in a new tab)

[4] Trends in Machine Learning Hardware. (2023). EpochAI. Link (opens in a new tab)

[5] The future of AI is hybrid - Part I: Unlocking the generative AI future with on-device and hybrid AI. (2023). Qualcomm. Link (opens in a new tab)

[6] Apple Intelligence. (2024). Evans, B. Benedict Evans. Link (opens in a new tab)

[7] RouteLLM: Learning to Route LLMs with Preference Data. I. Ong, A. Almahairi, V. Wu, W. Chiang, T. Wu, J. E. Gonzalez, M. W. Kadous, I. Stoica. (2024). arXiv. Link (opens in a new tab)

[8] LoRA: Low-Rank Adaptation of Large Language Models. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). arXiv. Link (opens in a new tab)

[9] Instant Plugins for ChatGPT: Introducing the Wolfram ChatGPT Plugin Kit. (2023). Stephen Wolfram. Link (opens in a new tab)

[10] Large Language Models Are Neurosymbolic Reasoners. Fang, M., Deng, S., Zhang, Y., Shi, Z., Chen, L., Pechenizkiy, M., & Wang, J. (2024). arXiv. Link (opens in a new tab)

Mail | Calendar | Github | 𝕏itter | Mastodon | Discord | Are.na | Instagram | Goodreads | Letterboxd | IMDb | Last.fm | PayPal | Resources | Subscribe
"Time slips away and leaves you with nothing, mister, but boring stories of glory days." — Bruce Springsteen

CC BY-NC 4.0 2024 © Ankesh Bharti