Squishy Software

Ankesh Bharti,Thu Aug 08 2024•draft

(This essay is an early draft of a four part essay series that explains the motivation behind tiles.run (opens in a new tab). See part 1, "The New Intelligence" (opens in a new tab).)

Squishy Illustration: Andrej Karpathy

I.

The history of artificial intelligence stretches back decades, with progress marked by distinct eras of computational power and algorithmic advancement. Based on an analysis of training compute trends, we can identify three major eras in the development of machine learning:

The Pre-Deep Learning Era (1950s to 2010): This period saw the birth of AI and its early development. Starting with Claude Shannon's Theseus in 1950, a remote-controlled mouse that could navigate a labyrinth, AI systems gradually improved in specialized tasks. During this era, training compute approximately followed Moore's Law, with a doubling time of about 20 months. Notable achievements included IBM's Deep Blue defeating world chess champion Garry Kasparov in 1997 and IBM Watson winning Jeopardy! in 2011.
The Deep Learning Era (2010 to 2016): Beginning around 2010-2012, this era marked a significant acceleration in AI development. The doubling time for training compute shortened dramatically to approximately 6 months. This period saw rapid advances in areas like image and speech recognition. By 2015, AI systems began outperforming humans on specific visual recognition tasks, and by 2017, they surpassed human-level performance in speech recognition. A major milestone was reached in 2016 when DeepMind's AlphaGo^[1] system defeated Lee Sedol, the world champion Go player.
The Large-Scale Era (2016 onwards): This era is characterized by the emergence of large-scale models developed by major corporations. These systems use training compute 2-3 orders of magnitude larger than those following the Deep Learning Era trend in the same year. The growth of compute in these Large-Scale models appears slower than in the Deep Learning Era, with a doubling time of about 10 months. This period has seen the development of increasingly powerful language models and generative AI systems.

The emergence of ChatGPT in late 2022^[2] marked a watershed moment in the field of artificial intelligence, dramatically contrasting with and building upon earlier AI developments. While pre-ChatGPT systems showed impressive but narrow capabilities, ChatGPT and subsequent Large Language Models (LLMs) demonstrated a level of general intelligence and versatility previously unseen. This breakthrough not only brought LLMs into the mainstream but also effectively democratized access to advanced AI capabilities. The impressive abilities of these models to engage in human-like conversations, generate creative content, and assist with complex tasks across various domains captivated the public imagination and sparked a surge of interest and investment in AI technologies.

It is important to acknowledge the landmark research that paved the way for these advancements in natural language processing (NLP). The seminal paper "Attention is All You Need"^[3] by Vaswani et al. (2017) introduced the Transformer architecture, which has become the foundation for most state-of-the-art language models. This groundbreaking work, along with other significant contributions in NLP research, laid the groundwork for the development of powerful language models like ChatGPT.

Tech giants and startups alike rushed to develop and deploy their own LLMs, while investors poured billions into AI ventures, recognizing the transformative potential of these technologies. This AI boom not only accelerated the pace of innovation but also intensified the need to better understand the inner workings of these powerful yet opaque systems. As LLMs became more integral to various aspects of society, from business operations to creative endeavors, the urgency to demystify their functioning and ensure their responsible development became increasingly apparent. This rapid progress has brought AI from the realm of specialized research into everyday life, fundamentally changing how we interact with technology and raising new questions about the future of human-AI interaction.

II.

Large Language Models (LLMs) are often compared to black boxes due to their opaque inner workings and decision-making processes, which pose challenges for researchers, developers, and users who wish to understand, control, and steer these systems towards desired goals. To facilitate human creativity and collaboration with LLMs, it is essential to develop intuitive "dials and knobs" that allow users to fine-tune and direct the behavior of these models, adapting them to specific tasks, domains, and preferences. These controls could include adjusting the level of creativity, specificity, or formality in generated text, as well as setting constraints on the output to align with user intentions, enhancing the usability and versatility of LLMs while fostering a more collaborative and interactive relationship between humans and AI systems.

Research efforts by both OpenAI and Anthropic aim to make LLMs less of a black box through their work on AI interpretability^[4] with techniques like sparse autoencoders and feature extraction. OpenAI's work on extracting concepts^[5] from GPT-4 and Anthropic's efforts to scale monosemanticity^[6] in Claude 3 Sonnet represent significant steps towards understanding the internal representations of these models. Both approaches aim to decompose the complex neural activity within language models into interpretable patterns or features, potentially unlocking insights into how these models process and generate language.

As we continue to explore and refine these systems, we may uncover new insights not only about artificial intelligence but also about the nature of information processing and representation in complex systems. This evolution in software development aligns closely with Andrej Karpathy's concept of Software 2.0^[7], where neural networks represent a fundamental shift in how we create and understand software, moving from explicitly programmed instructions to learned representations that can adapt and improve with more data and compute. By making LLMs more transparent and controllable, we can harness their power to augment human creativity and problem-solving abilities, ushering in a new era of collaborative intelligence.

Read on to part 3, "Hybrid AI" (opens in a new tab).

References

[1] AlphaGo - The Movie. (2020). Google DeepMind, YouTube. Link (opens in a new tab)

[2] Introducing ChatGPT. (2022). OpenAI. Link (opens in a new tab)

[3] Attention Is All You Need. (2017). Vaswani, A., et al. Advances in Neural Information Processing Systems, 30. Link (opens in a new tab)

[4] What is interpretability? (2024). Anthropic, YouTube. Link (opens in a new tab)

[5] Extracting Concepts from GPT-4. (2024). OpenAI. Link (opens in a new tab)

[6] Scaling Monosemanticity. (2024). Anthropic. Link (opens in a new tab)

[7] Software 2.0. (2017). Karpathy, A. Medium. Link (opens in a new tab)