A learning path ready to make your own.

What is artificial intelligence?

What is Artificial Intelligence? Artificial Intelligence (AI) is a multidisciplinary field that designs algorithms and systems to perform tasks requiring human-like intelligence—perception, reasoning, learning, planning, language understanding, and decision-making. It draws on computer science, mathematics, statistics, cognitive science, neuroscience and philosophy to build systems that sense, reason, learn and act in complex environments. Definitions and types Working definitions: practical (task-performing systems), functional (input→output mappings), normative (intelligent agents maximizing goals). By capability: Narrow (specialized, dominant today), General/AGI (human-level flexible cognition, hypothetical), Superintelligence (beyond human, theoretical). By approach: Symbolic (rule-based), Subsymbolic/Statistical (neural networks, learning from data), Hybrid (neuro-symbolic combinations). Historical highlights 1950: Turing Test; 1956: Dartmouth Workshop names the field. 1958: Perceptron; 1960s–70s: symbolic AI; 1980s: expert systems and revival of neural nets (backpropagation). 1997: Deep Blue beats Kasparov; 2006–2012: deep learning resurgence; 2016: AlphaGo; 2018–2023: Transformers and large language models; 2021–2024: foundation models and multimodal systems. Theoretical foundations Core math: linear algebra, probability & statistics, optimization, information theory, learning theory, computational complexity. Other foundations: control theory, logic/formal methods, neuroscience inspiration. Key concepts: parametrized models f(x;θ), loss minimization, generalization, Bayes’ theorem. Example update: θ ← θ − η ∇_θ L(θ). Key paradigms and algorithms Learning paradigms: supervised, unsupervised, self-supervised, semi-supervised, reinforcement learning, online learning, meta-learning. Symbolic methods: rule-based systems, knowledge representation, theorem proving. Probabilistic models: Bayesian networks, HMMs, CRFs. Search & evolution: genetic algorithms, evolutionary strategies, simulated annealing. Hybrid neuro-symbolic approaches combine learning with explicit reasoning. Architectures and modern models Feedforward (MLPs), CNNs (vision), RNNs/LSTM/GRU (sequences), Transformers (self-attention, foundation for LLMs), GNNs (graphs). Generative models: VAEs, GANs, normalizing flows, diffusion models (state-of-the-art for image generation). Multimodal models integrate text, image, audio; scaling, attention, pretraining and transfer are central innovations. Training objectives & techniques Common losses: MSE for regression, cross-entropy for classification; regularization (L2, dropout), early stopping. Bayesian methods (MCMC, variational inference), contrastive learning (SimCLR, CLIP), RL objectives (maximize expected return), meta- and curriculum learning. Parameter-efficient adaptation: adapters, LoRA, prompting for foundation models. Evaluation and metrics Classification: accuracy, precision, recall, F1, ROC-AUC. Regression: RMSE, MAE, R². Ranking: NDCG, MAP; generative: FID, Inception Score, human evals; language: BLEU/ROUGE often supplemented by human judgment or BERTScore. Robustness/safety: calibration, fairness metrics, adversarial robustness, OOD detection. Good experimental design and reproducibility are essential. Applications and case studies Computer vision (medical imaging, autonomous vehicles), NLP (translation, summarization, chatbots), robotics (navigation, manipulation). Healthcare (diagnostics, AlphaFold for protein structures), finance (fraud detection, risk models), recommender systems, creative tools (image/music generation), security and search. Each domain needs domain expertise for data quality, evaluation and safe deployment. Societal, ethical and governance considerations Risks: bias, privacy leaks, lack of transparency, adversarial failures, economic disruption, misinformation and dual-use misuse. Mitigations: auditing, model/dataset documentation, explainability techniques, differential privacy, federated learning, fairness-aware methods, regulation and standards. Current state & trends (mid-2024) Foundation models, multimodality, emergent abilities, and widespread industry deployment alongside growing safety and governance focus. Hardware & scaling innovations, democratization via open-source models, but concentration of compute and data remains. Persistent limitations: data hunger, brittle OOD generalization, limited reliable long-term reasoning, interpretability challenges. Open problems and research directions Alignment and provable safety, robustness under distribution shift, data-efficient and causal learning, interpretability and formal verification. Combining symbolic reasoning with learning, multi-agent coordination, energy-efficient AI, and socio-technical governance research. Future implications Likely continued automation, productivity gains and workforce shifts; widespread human–AI collaboration; uncertain AGI timelines and intensive governance debates. Need for policies balancing innovation, safety, equity and international coordination. How to get started Foundations: linear algebra, probability, calculus, optimization; programming in Python; ML frameworks (PyTorch, TensorFlow). Study path: basic ML → deep learning → advanced topics (RL, probabilistic models) → systems/MLOps → ethics and safety. Resources: online courses (e.g., Andrew Ng, CS229), books and seminal papers (Turing, backpropagation, “Deep Learning”, “Attention is All You Need”), Kaggle, Hugging Face. Appendix — key formulas & snippets Bayes’ theorem: p(θ|D) = p(D|θ) p(θ) / p(D). Softmax: σ(z)_i = exp(z_i) / Σ_j exp(z_j); cross-entropy: L = −Σ_i y_i log σ(z)_i. Gradient descent: θ ← θ − η ∇_θ L(θ). Q-learning (tabular): Q←Q+α[r+γ max Q' − Q]. Concluding remarks AI is a rapidly evolving combination of deep theory and broad practical impact. It excels at pattern recognition and is transforming many sectors, but faces major technical and societal challenges—robustness, interpretability, alignment and governance. Opportunities exist for research, application development and policy work, each carrying responsibilities to deploy AI safely and equitably. If you want, I can provide a tailored reading list, a cited milestone timeline, step-by-step tutorials (e.g., training a Transformer), or a deeper dive into any topic above.

Let the lesson walk with you.

Podcast

What is artificial intelligence? podcast

0:00-3:48

Follow the trail that experts already trust.

Resources

Turn quick sparks into lasting recall.

Flashcards

What is artificial intelligence? flashcards

16 cards

Question

Click to flip
Answer

Prove the idea before it slips away.

Quizzes

What is artificial intelligence? quiz

12 questions

Which of the following lists tasks typically associated with Artificial Intelligence as described in the content?

Read deeper, connect wider, own the subject.

Deep Article

What is Artificial Intelligence?

Artificial Intelligence (AI) is a broad, multidisciplinary field that seeks to create machines and systems capable of performing tasks that would normally require human intelligence. These tasks include perception, reasoning, learning, planning, language understanding, and decision-making. AI blends concepts from computer science, mathematics, cognitive science, neuroscience, statistics, and philosophy to design algorithms and systems that sense, reason, learn, and act in complex environments.

This article is a comprehensive, in-depth survey of AI: its history, core concepts and theories, principal approaches and algorithms, applications, current state-of-the-art, limitations and risks, and future directions. It aims to provide a solid conceptual and practical grounding for readers from diverse backgrounds.

Table of contents

  • Definitions and types of AI
  • Historical milestones
  • Theoretical foundations
  • Key paradigms and algorithms
  • Architectures and models (modern deep learning focus)
  • Learning paradigms
  • Evaluation and metrics
  • Practical applications and case studies
  • Societal, ethical, and governance considerations
  • Current state and trends (as of mid-2024)
  • Open problems and research directions
  • How to get started (learning path and resources)
  • Appendix: simple code examples and formulas

Definitions and types of AI

AI lacks a single universally accepted definition because it spans objectives (what systems do), methods (how they do it), and capabilities (how well they do it). Common working definitions include:

  • Practical: AI is the design of algorithms and systems that perform tasks that normally require human intelligence—e.g., perception, language, reasoning, planning.
  • Functional: AI systems map inputs (e.g., sensor data, text) to outputs (e.g., decisions, labels, actions) using learned rules or programmed logic.
  • Normative: AI is the study of intelligent agents—entities that perceive their environment and take actions to maximize their chances of achieving goals.

Types of AI by capability:

  • Narrow (Weak) AI: Systems specialized for specific tasks (e.g., image classification, machine translation, chess playing). This is the dominant form today.
  • General (Strong) AI / Artificial General Intelligence (AGI): Hypothetical systems with flexible, human-level cognitive capabilities across a wide range of tasks.
  • Superintelligence: Systems exceeding human capabilities across virtually all domains (theoretical).

Types by approach:

  • Symbolic (rule-based): Manipulate explicit symbols and rules (logic, knowledge bases).
  • Subsymbolic (statistical/connectionist): Use statistical learning and neural networks to learn representations from data.
  • Hybrid: Combine symbolic reasoning with statistical learning.

Historical milestones

Highlights of AI’s development (selected):

  • 1950 — Alan Turing’s “Computing Machinery and Intelligence” introduces the Turing Test as an operational approach to machine intelligence.
  • 1956 — Dartmouth Workshop (John McCarthy, Marvin Minsky, Claude Shannon, others) coins “artificial intelligence” and launches the field.
  • 1958 — Frank Rosenblatt’s perceptron introduces an early neural model capable of learning weights from data.
  • 1960s–1970s — Growth of symbolic AI: logic programming, early planning systems, expert systems.
  • 1969 — Minsky & Papert’s critique of perceptrons highlights limitations of single-layer networks and leads to symbolic approaches gaining preference.
  • 1980s — Expert systems boom; backpropagation (Rumelhart, Hinton, Williams, 1986) revives neural networks by enabling multi-layer training.
  • 1990s — Probabilistic graphical models (Bayesian networks, HMMs), kernel methods (SVMs), and robust statistical techniques become prominent.
  • 1997 — IBM Deep Blue defeats chess champion Garry Kasparov (landmark in applied search and evaluation).
  • 2006 onwards — “Deep learning” resurgence (geared by GPU compute, large datasets, algorithmic advances).
  • 2012 — AlexNet demonstrates dramatic improvement in ImageNet image classification using convolutional neural networks (CNNs), catalyzing deep learning adoption.
  • 2016 — AlphaGo defeats Go world champion Lee Sedol using reinforcement learning and tree search.
  • 2018–2023 — Transformer models (Vaswani et al., 2017) and large-scale pretrained language models (BERT, GPT series) produce breakthroughs in many language and multimodal tasks.
  • 2021–2024 — Foundation models, multimodal AI (text+image+audio), and increasing deployment in industry and society; growing focus on AI safety and regulation.

Theoretical foundations

AI rests on several mathematical and theoretical pillars:

  • Linear algebra: vectors, matrices, eigen-decomposition—core to representing data and transformations in machine learning.
  • Probability and statistics: modeling uncertainty, Bayesian inference, likelihood, estimation, hypothesis testing.
  • Optimization: gradient descent, convex/non-convex optimization, constrained optimization.
  • Information theory: entropy, mutual information—important for learning representations and regularization.
  • Learning theory: PAC learning, VC dimension, sample complexity—gives formal guarantees on a learner’s generalization ability.
  • Computational complexity: limits what can be computed efficiently and informs algorithm design.
  • Control theory and dynamical systems: important for robotics, feedback systems, and some reinforcement learning foundations.
  • Logic and formal methods: symbolic reasoning, theorem proving, and knowledge representation.
  • Neuroscience and cognitive science: inspiration for architectures (e.g., neural networks) and cognitive models.

Key mathematical objects and concepts:

  • Model: a parametrized function f(x; θ) mapping input x to output y or a distribution p(y|x; θ).
  • Loss function L(y, f(x; θ)): quantifies error; optimization minimizes expected loss plus regularization.
  • Generalization: performance gap between training and unseen data.
  • Bayes’ theorem: p(θ|D) ∝ p(D|θ)p(θ) — central to Bayesian learning.

Example: Gradient descent update theta <- theta − η ∇_θ L(θ) where η is the learning rate.


Key paradigms and algorithms

AI methods can be grouped by problem setup and algorithmic approach.

Learning paradigms

  • Supervised learning: learn mapping from inputs to labels (classification/regression). Algorithms: linear/logistic regression, SVM, decision trees, random forests, gradient-boosted trees, neural networks.
  • Unsupervised learning: discover structure without labels. Algorithms: k-means, Gaussian Mixture Models, PCA, autoencoders, generative adversarial networks (GANs), clustering.
  • Self-supervised learning: create proxy tasks from unlabeled data (e.g., masked language modeling) to learn representations.
  • Semi-supervised learning: combine small labeled datasets with large unlabeled data.
  • Reinforcement learning (RL): an agent interacts with an environment to maximize cumulative reward. Algorithms: Q-learning, SARSA, DQN, policy gradient, actor-critic, PPO, A3C.
  • Online learning: learning sequentially and adaptively from streaming data.

Symbolic and logic-based methods

  • Rule-based systems and knowledge representation (ontologies, semantic networks).
  • Automated theorem proving and formal verification.

Probabilistic and statistical models

  • Bayesian networks, Markov Random Fields, HMMs, conditional random fields (CRFs).

Evolutionary and search-based methods

  • Genetic algorithms, evolutionary strategies, simulated annealing—optimization via randomized search.

Hybrid approaches

  • Neuro-symbolic methods combine neural nets with symbolic reasoning to leverage both statistical learning and explicit logic.

Architectures and models (modern deep learning focus)

Deep learning architectures dominate many current practical successes.

  • Feedforward neural networks (MLPs): dense layers, used for tabular and basic representation learning.
  • Convolutional Neural Networks (CNNs): for spatially structured data (images, video); key layers: convolution, pooling, batch normalization.
  • Recurrent Neural Networks (RNNs), LSTM, GRU: sequence modeling (speech, time series). LSTMs addressed vanishing gradients.
  • Transformers: self-attention mechanism enables modeling long-range dependencies efficiently; foundational for modern language models and many multimodal systems.
  • Graph Neural Networks (GNNs): operate on graph-structured data (molecules, social networks).
  • Generative models: Variational Autoencoders (VAE), GANs, Normalizing Flows—model data distributions and generate samples.
  • Diffusion models: iterative denoising processes that have become state-of-the-art for generative image modeling (e.g., DALL·E 2, Stable Diffusion).
  • Multimodal models: integrate text, vision, audio; often based on Transformer backbones with modality-specific encoders.

Important innovations:

  • Attention and self-attention (scaling more effectively for sequence modeling).
  • Transfer learning and pretraining followed by fine-tuning.
  • Large-scale unsupervised/self-supervised pretraining producing foundation models.
  • Sparse and mixture-of-experts architectures to scale capacity while controlling compute.

Learning paradigms and objectives

Common objectives and methods used in training:

  • Supervised loss examples:
  • Regression: mean squared error (MSE) L = (1/n) Σ (yi − f(xi))^2
  • Classification: cross-entropy (softmax) L = −Σ y_i log p(y|x; θ)
  • Regularization: L2 (weight decay), dropout, early stopping.
  • Bayesian learning: treat parameters probabilistically; posterior inference via MCMC, variational inference.
  • Contrastive learning: maximize agreement between different views of same data (SimCLR, CLIP).
  • Reinforcement learning objective: maximize expected return E[Σ γ^t r_t]; solved via value-based, policy-based, or actor-critic methods.
  • Meta-learning: learn to learn across tasks (MAML, model-agnostic meta-learning).
  • Curriculum learning: schedule training tasks from easy to hard to improve convergence.

Evaluation and metrics

Evaluation depends on the task:

  • Classification: accuracy, precision, recall, F1, ROC-AUC, confusion matrix.
  • Regression: RMSE, MAE, R^2.
  • Ranking/recommendation: NDCG, MAP, precision@k.
  • Generative models: Inception Score, FID, likelihood estimates, human evaluation.
  • Language: BLEU, ROUGE, METEOR, but often replaced by human judgments or specialized metrics like BERTScore....

Ready to see the full tree?

Clone the preview to open the complete learning structure, practice tools, and generated study materials.