Characterizing In-Context Learning: When Can Transformers Match Standard Learning Algorithms?

Hawarey, Mosab

doi:10.65737/AIRMCS2026277

AIR JOURNAL OF MATHEMATICS AND COMPUTATIONAL SCIENCES

CHARACTERIZING IN-CONTEXT LEARNING: WHEN CAN TRANSFORMERS MATCH STANDARD LEARNING ALGORITHMS?

Mosab Hawarey

Director, Geospatial Research

Published: February 19, 2026

DOI: 10.65737/AIRMCS2026277

License: CC BY 4.0

📄 Download Full Paper (PDF)

Abstract

Transformers exhibit remarkable in-context learning (ICL) capabilities—the ability to learn new tasks from examples provided in the context window without weight updates. Despite extensive empirical investigation, a fundamental theoretical question remains unanswered: which function classes can be learned in-context, and which cannot? This gap in our understanding limits principled system design and creates uncertainty about when ICL will succeed or fail. We address this gap by developing a theoretical framework based on Sufficient Statistic Complexity (SSC)—the minimal information that must be extracted from context examples to enable accurate prediction. We prove that function classes with attention-computable sufficient statistics (those expressible as sums over examples) are efficiently ICL-learnable, matching the sample complexity of standard learning algorithms (Theorem 1). Conversely, we prove that function classes requiring combinatorial sufficient statistics—such as sparse parity—cannot be efficiently ICL-learned by any polynomial-size transformer (Theorem 2), establishing a fundamental computational barrier via novel connections to circuit complexity. These results yield a near-complete characterization: we prove a Master Theorem (Theorem 3) establishing necessary and sufficient conditions for ICL-learnability, and a Dichotomy Theorem (Theorem 6) showing that natural function classes are either ICL-Easy (learnable with optimal sample complexity) or ICL-Hard (requiring exponentially more resources). The boundary corresponds to whether learning is parallelizable or inherently sequential. Our framework explains empirical ICL phenomena, provides architectural guidance, and opens new research directions connecting learning theory, circuit complexity, and meta-learning.

Keywords

in-context learning transformers sufficient statistics circuit complexity AC⁰ meta-learning computational learning theory attention mechanisms

How to Cite

APA:

Hawarey, M. (2026). Characterizing In-Context Learning: When Can Transformers Match Standard Learning Algorithms? AIR Journal of Mathematics and Computational Sciences, Vol. 2026, AIRMCS2026277.

https://doi.org/10.65737/AIRMCS2026277

Indexed & Discoverable In

🔗

Crossref

🧠

Semantic Scholar

📚

OpenAlex

🔍

Google Scholar

Plus automatic indexing in CORE, Scilit, and other DOI-triggered discovery services

Copyright & Open Access

© 2026 Mosab Hawarey. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. Authors retain full copyright to their work.

Publication Information

Journal: AIR Journal of Mathematics and Computational Sciences

Publisher: Artificial Intelligence Review AIR Publishing House LLC (AIR Journals)

Submitted: February 16, 2026

Approved: February 18, 2026 (based on this Evaluation Report; shared with author’s permission)

Published: February 19, 2026

DOI: 10.65737/AIRMCS2026277

Submission ID: AIR-2026-000277

Share This Research

📧 Email 🔗 Copy Link 🐦 X [Twitter]