AI Fundamentals

About 1755 wordsAbout 6 min

2025-11-11

I have been missing a good resource on all the buzzwords in the AI space, so I decided to put one together myself. Sure, there is plenty of information out there on the topic, and you can ask pretty much any Large Language Model (LLM) enabled chat interface about it and get it explained, but I want something that highlights relationships within the different areas that I can maintain and update myself when new buzz enters the scene. For the record, I am not a fan of calling current state of artificial neural networks as advanced as what I consider Artificial Intelligence (AI) should be to be allowed to be called AI, but since the term AI is put on everything these days I will have to try to adapt to terminology commonly found in the wild.

Basics

AI - Artificial Intelligence :

Commonly referred to as machines capable of performing tasks that normally require human cognition. These tasks include recognizing speech and images, understanding natural language, making decisions, and solving complex problems.

ANI - Artificial Narrow Intelligence :

Narrow AI can be classified as being limited to a single, narrowly defined task. Most modern AI systems would be classified in this category.

Weak AI :

ANI x 2. Category needed for when multiple narrowly defined tasks can be accomplished.

Generative AI :

Uses generative models to produce text, images, videos, audio, software code or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts. The success of Generative AI in recent years is largely the cause for the hype that next-level AI, AGI, is within reach.

AGI - Artificial General Intelligence :

Commonly refers to AI systems with human-level cognitive abilities across a wide range of tasks, capable of autonomous learning, reasoning, and problem-solving beyond narrow, pre-defined domains. This, to me, is what defines AI and anything beneath it in capability should not be called AI. AGI has not been achieved yet.

ASI - Artificial Super Intelligence :

Hypothetical agent that possesses intelligence surpassing that of the brightest and most gifted human minds. Assumed to be created automatically once AGI is achieved.

Strong AI :

AGI and ASI. Beyond Weak AI basically. According to me, this is AI.

Creating AI Models

ML - Machine Learning :

Systems learn patterns from data without explicit programming.

Deep Learning :

A subset of ML using neural networks to analyze complex data like images and speech.

Types of Neural Networks :

Feedforward Neural Networks (FNNs) Use Case: Basic classification and regression tasks. How it works: Data flows in one direction, from input to output, through hidden layers. Example: Spam detection, simple pattern recognition.
Convolutional Neural Networks (CNNs) Use Case: Image and video processing. How it works: Uses convolutional layers to detect spatial hierarchies in images (edges, textures, objects). Example: Facial recognition, medical image analysis, self-driving cars.
Recurrent Neural Networks (RNNs) Use Case: Sequential data (time series, text). How it works: Maintains a "memory" of previous inputs using loops, making it useful for sequences. Example: Speech recognition, stock price prediction, early language models.
Long Short-Term Memory Networks (LSTMs) Use Case: Long-term dependencies in sequential data. How it works: A type of RNN with special units to remember or forget information over long sequences. Example: Machine translation, text generation, time series forecasting.
Transformer Networks Use Case: Natural language processing (NLP), language models. How it works: Uses self-attention mechanisms to weigh the importance of each part of the input data, enabling parallel processing. Example: Large Language Models, chatbots, translation.
Generative Adversarial Networks (GANs) Use Case: Generating realistic data (images, audio, text). How it works: Two networks (generator and discriminator) compete: one creates data, the other evaluates its authenticity. Example: Deepfake creation, art generation, data augmentation.
Autoencoders Use Case: Dimensionality reduction, anomaly detection, denoising. How it works: Encodes input data into a smaller representation, then decodes it back. Example: Image compression, fraud detection, feature extraction.
Graph Neural Networks (GNNs) Use Case: Data structured as graphs (social networks, molecules). How it works: Propagates information across nodes and edges in a graph. Example: Recommendation systems, drug discovery, network analysis.

Creating an LLM AI Model:

Data Collection
Source: Massive amounts of text from books, websites, articles, code repositories, and more.
Goal: Capture diverse language patterns, facts, and contexts.
Preprocessing
Tokenization: Text is split into tokens (words, subwords, or characters).
Cleaning: Remove duplicates, irrelevant, or low-quality data.
Model Architecture
Transformer: Most LLMs use a transformer architecture with self-attention mechanisms.
Layers: Stacked layers of attention and feed-forward networks.
Pretraining
Objective: Predict missing words or next words in sentences (self-supervised learning).
Process: The model learns patterns, grammar, facts, and some reasoning abilities.
Fine-Tuning
Supervised Learning: Adjust the model on specific tasks (e.g., translation, Q&A) using labeled data.
Reinforcement Learning: Use human feedback to improve response quality and safety.
Evaluation
Benchmarks: Test on standardized datasets for language understanding, reasoning, and safety.
Human Review: Assess outputs for accuracy, bias, and appropriateness.
Deployment
Inference: The trained model generates responses to user prompts.
Continuous Learning: Some models are updated with new data or user feedback.

PyTorch :

Open-source machine learning library developed by Meta AI. .pt/.pth is PyTorch's default serialization format, storing model state dictionaries that contain learned parameters (weights, biases), optimizer states, and training metadata.

TensorFlow :

Open-source machine learning (ML) and artificial intelligence (AI) framework developed by Google.

GGUF :

Popular file format for sharing models with open-source community.

Safetensors :

File type developed by Hugging Face, considered more secure and efficient compared to PyTorch default extensions.

ONNX :

Open Neural Network Exchange (ONNX) file format .onnx allows vendor neutral representation of models.

Parameters :

The internal variables of an AI model that define how it processes input data.

Weights :

A specific type of parameter. The numerical values that determine the strength or importance of connections between artificial neurons in the model

Biases :

Offsets that help the model fit the data better.

Parameter Count :

The total number of parameters (including weights and sometimes biases) in the model. Often used as a rough measure of a model’s size and capacity.

AI Evals :

AI Evaluations are the systematic evaluation of artificial intelligence systems, where models are tested against benchmarks, datasets, or real-world scenarios to assess their performance, safety, and alignment with intended goals. These evaluations help developers identify strengths, weaknesses, and potential risks before deploying AI in practical applications.

Using AI

Inference :

Model execution. Interpret input, do predictions, generate output.

Inference engine :

The software component that processes input data through a trained model to produce predictions or outputs. It’s the "brain" that executes the model’s logic.

Inference runtime :

The environment or system that executes the inference engine. The platform that runs the model, managing resources and interactions.

Hallucinations :

When model predictions derail and produce false outcome.

Grounding :

Enrich prompt input with additional data to increase chances of relevant output. Reduces risk of hallucination. Without grounding, a fact seeking prompt would produce output from only facts the model was trained on.

RAG - Retrieval Augmented Generation :

Common method for grounding. Common type of RAG is using Web Search API:s to enrich prompt input with relevant information, but also files, documents and vector databases.

CAG - Cached Augmented Generation :

Preloaded datasets for enriching prompt input/output result in faster response times. Suitable for querying FAQ:s e.t.c.

KAG - Knowledge Augmented Generation :

Enriches prompts with structured knowledge graphs (KGs) for better more precise reasoning. Anchors domain-specific facts in model reasoning.

MCP - Model Context Protocol :

Open standard introduced by Anthropic to enable seamless integration between large language models (LLMs) and external tools, systems, and data sources. It provides a universal interface for tasks like reading files, executing functions, and sharing contextual prompts, using JSON-RPC 2.0 for communication. By standardizing these connections, MCP reduces the need for custom connectors and allows AI applications to interoperate more easily across diverse platforms.

MoE - Mixture of Experts :

Arguably this should be categorized under Creating AI models.. But, it does also speed up inference time.. MoE enables the training of much larger models (with trillions of parameters) that would otherwise be infeasible, while keeping inference fast and resource-efficient. This balance is crucial for advancing the capabilities of LLMs without proportional increases in cost and energy need.

Agentic AI :

Artificial intelligence systems that act autonomously, pursuing specific goals with minimal human oversight. Unlike traditional generative AI, which mainly produces content, agentic AI integrates decision-making, tool usage, and adaptive problem-solving to complete tasks end-to-end. These systems demonstrate agency the capacity to operate independently, coordinate multiple subtasks, and interact dynamically with their environment.