What is General Artificial Intelligence? =======================================
Executive summary
Artificial General Intelligence (AGI)—also called general artificial intelligence or strong AI—refers to a machine or system that can understand, learn, and apply intelligence across a wide variety of tasks and domains at a level comparable to (or exceeding) that of a competent human. Unlike narrow (or weak) AI, which is built for specific tasks (e.g., image recognition, language translation, game playing), AGI would display the flexibility, transfer, abstraction, reasoning, and learning capabilities necessary to handle novel situations it was not explicitly trained on.
This article provides a detailed, multidisciplinary survey of AGI: conceptual definitions, historical context, theoretical foundations, candidate architectures and research directions, evaluation methods, current state of practice (as of 2024), practical implications, safety and governance challenges, and likely trajectories and uncertainties.
Definitions and core concepts
- Artificial General Intelligence (AGI): A system that can perform any intellectual task that a human being can, with comparable breadth, adaptability, and learning efficiency across domains.
- Narrow AI / Specialized AI: Systems engineered for a single or small set of tasks (e.g., speech recognition, chess playing). They may exceed human performance on their tasks but lack generality.
- Generalist AI / Broad AI: Often used to denote systems that handle many tasks (multimodal, multitask) but not necessarily all tasks at human-level generality.
- Strong AI vs Weak AI: Philosophical division—strong AI refers to genuine understanding/mind (equivalent to AGI), weak AI refers to tools and simulators of intelligence.
- Key capabilities associated with AGI: cross-domain transfer, abstraction, reasoning, planning, meta-learning, continual learning, common-sense understanding, flexible goal pursuit, robust interaction with the physical and social world, and efficient use of limited data.
Historical overview
- 1950s: Foundational ideas emerge. Turing proposes the "imitation game" (Turing Test) as a behavioral criterion for intelligence. Early symbolic AI (Newell & Simon) and the cybernetics community explore formal reasoning and problem solving.
- 1960s–1980s: Symbolic AI (knowledge representation, logic, planning) dominates. Work on cognitive architectures (e.g., SOAR, ACT-R) aims for broad cognitive coverage.
- 1980s–1990s: Emergence of connectionist approaches, backpropagation, and statistical learning. The AI winter(s) reflect unmet expectations around general-purpose intelligence.
- 2000s–2010s: Rise of machine learning, big data, and deep learning. Specialized systems achieve human-level or superhuman performance in narrow domains (vision, speech, games).
- 2018–2024: Foundation models (large pre-trained transformer-based models) show broad capabilities across many tasks via fine-tuning and prompting. Multimodal and multi-task agents (e.g., Gato, multimodal transformers) demonstrate an incremental move toward generality but still lack many core AGI capabilities.
- Present debates center on whether scaling current paradigms (bigger models, more compute/data) or combining approaches (symbolic + connectionist + embodied) is most likely to produce AGI.
Theoretical foundations
Formal characterizations
- Universal intelligence (Legg & Hutter, 2007): Intelligence is the expected performance of an agent across a wide range of environments, weighted by their simplicity (Occam's razor). This provides an abstract, formal notion but is not computable in practice.
- Solomonoff induction and AIXI: Theorize about optimal prediction and acting agents with access to all computable hypotheses; AIXI is a theoretically optimal but incomputable agent that integrates Solomonoff induction and sequential decision theory. These provide normative ideals but are not practical.
- Computational complexity: Intelligence is constrained by computational resources — time, memory, and data. Practical AGI must navigate trade-offs between generality and tractability.
Learning paradigms relevant to AGI
- Supervised learning: Learning from labeled examples—powerful but data-hungry and often narrow.
- Unsupervised / self-supervised learning: Learning internal representations from raw data signals; foundational to modern large models.
- Reinforcement learning (RL): Learning via trial-and-error with rewards; important for sequential decision-making and control.
- Meta-learning (learning-to-learn): Systems that adapt rapidly to new tasks from few examples—key for data-efficient generalization.
- Continual learning / lifelong learning: Learning across streams of tasks without catastrophic forgetting; necessary for accumulating knowledge over time.
- Causal learning and structured representation learning: Learning cause-effect relations and interpretable world models to support robust planning and counterfactual reasoning.
Philosophical and cognitive foundations
- Symbolic vs connectionist debate: Historically, symbolic AI emphasized explicit representations and logic-based reasoning; connectionist approaches (neural nets) emphasize distributed representations and learning from data. Hybrid/neurosymbolic approaches try to combine strengths of both.
- Embodied cognition: Argues intelligence arises from interaction with the environment; embodiment (sensors, actuators) may be critical for grounding concepts and achieving robust generality.
- Cognitive architectures: Models like SOAR and ACT-R aim to capture human cognitive processes; they provide frameworks for modular cognitive components (perception, memory, reasoning, decision-making).
Key capabilities required for AGI
A working list of abilities generally considered necessary (though definitions vary):
- Broad generalization and transfer:
- Apply knowledge from one domain to new, different domains with minimal additional training.
- Efficient learning and meta-learning:
- Rapidly learn new tasks from few examples or weak supervision.
- Abstraction and symbolic reasoning:
- Form and manipulate high-level abstractions, causal models, and logical structures.
- Robust planning and long-horizon reasoning:
- Plan over long temporal horizons under uncertainty and partial observability.
- Continual learning and memory:
- Accumulate and reuse knowledge without catastrophic forgetting; maintain episodic and semantic memory.
- Multi-modal perception and integration:
- Combine visual, auditory, textual, proprioceptive, and other signals coherently.
- Social, language, and commonsense understanding:
- Communicate naturally, understand social cues, and reason about human goals and norms.
- Creativity and problem invention:
- Generate novel solutions, hypotheses, and artifacts in new domains.
- Physical interaction and manipulation (for embodied AGI):
- Robust motor skills, perception-action loops, and physical problem-solving.
- Self-monitoring, reflection, and learning about objectives:
- Monitor internal states, reason about own performance, and adapt goals.
Architectural approaches and research paths
- Scaling up deep learning / foundation models
- Approach: Train ever-larger models on massive, diverse datasets using self-supervised objectives (e.g., masked language modeling).
- Rationale: Empirical success suggests scale yields emergent capabilities—broader knowledge, in-context learning, zero/few-shot performance.
- Examples: GPT series, PaLM, LLaMA, and multi-modal variants.
- Limitations: Data and compute costs, brittleness, lack of grounded causality, challenges with planning, reasoning, and sample efficiency.
- Neurosymbolic and hybrid systems
- Approach: Combine neural networks for perception and pattern recognition with symbolic systems for reasoning and manipulation of explicit structures.
- Rationale: Symbolic representations support systematicity, interpretability, and logical reasoning.
- Research: Integrating logic programming, knowledge graphs, program synthesis with neural modules.
- Modular and cognitive-architecture approaches
- Approach: Build architectures composed of interactive modules (perception, memory, planner, controller), possibly inspired by cognitive science.
- Examples: SOAR, ACT-R, NARS; more modern modular deep architectures (e.g., modular RL).
- Rationale: Modularity supports reusability, interpretability, and compositionality.
- Meta-learning and few-shot learning
- Approach: Train systems to rapidly adapt to new tasks by learning update rules, priors, or initialization states (e.g., MAML).
- Rationale: Enables efficient transfer and reduces data requirements for new tasks.
- Continual and lifelong learning
- Methods: Elastic Weight Consolidation, memory replay, progressive networks, dynamically expandable architectures.
- Goal: Maintain long-term competence across tasks and lifelong knowledge accumulation.
- Embodied and developmental approaches
- Approach: Use robots or simulated agents to learn through interaction, exploration, and developmental curricula.
- Rationale: Physical interaction grounds symbols, encourages causal model formation, and develops sensorimotor competence.
- Planning, search, and world models
- Model-based RL, planning with learned models, Monte Carlo tree search, and program induction aim to combine learning with explicit forward models for planning.
- Causal and structured representation learning
- Seek representations that reflect causal structure, enabling robust generalization under distribution shifts and interventions.
Benchmarks and evaluation
Evaluating AGI is challenging because AGI is not a single task; it must be assessed across breadth, adaptability, learning efficiency, and safety. Some evaluation approaches:
- Behavioral / task-based tests:
- Turing Test: Behavioral indistinguishability in conversation; criticized as narrow and anthropomorphic.
- Abstraction and Reasoning Corpus (ARC): Tests generalization and abstract reasoning from few examples.
- Animal-AI Testbed: Evaluates adaptive reasoning for embodied agents.
- Big-bench and BIG-bench Hard: Wide arrays of language tasks probing reasoning and problem-solving.
- GLUE / SuperGLUE: Benchmarks for natural language understanding (narrow relative to AGI).
- MT-Bench / Model-based evaluation suites: Human-preference and multi-turn evaluation.
- Formal / theoretical measures:
- Universal intelligence (Legg & Hutter) gives a principled but impractical metric.
- Performance-weighted evaluations across many environments with varying complexity and prior probabilities.
- Practical evaluation axes:
- Breadth: Number of distinct task families handled.
- Efficiency: Data and compute required for new tasks.
- Robustness: Performance under distribution shift, adversarial inputs, or degraded sensors.
- Transfer and adaptation: Speed and quality of transfer to new tasks.
- Safety and alignment: Propensity to follow instructions, avoid harmful behaviors, and respect constraints.
Current state of the art ...