AI Companion Glossary (2026): Every Term Explained Simply

Every AI companion term explained in plain language — tokens, context, personas, memory, and the jargon you'll hit, defined simply for 2026.

By Ash Kepler · May 1, 2026 · 11 min read

Short answer: this is a plain-English glossary of AI companion jargon, grouped into model and architecture terms, platform and feature terms, memory and conversation terms, and community and culture terms, so you can decode any thread or pricing page. The full breakdown is below.

Model & architecture terms	LLMs, context windows, fine-tuning, and the tech underneath.
Platform & feature terms	Tiers, tokens, personas, and product jargon.
Memory & conversation terms	How companions remember and hold context.
Community & culture terms	The slang and norms of the space.
Best for	Decoding threads, reviews, and pricing pages fast.

Every niche develops its own vocabulary, and the AI companion space has developed a particularly dense one. Partly because the technology is genuinely technical, partly because the community evolved its own shorthand, and partly because platforms invent their own terms for features that already have names. The result is that someone new to the space encounters an avalanche of jargon that nobody stops to define because everyone assumes everyone else already knows.

This post defines the terms you'll actually encounter, in language that assumes you're smart but not already immersed. Where a term has a deeper explanation elsewhere on the site, I've linked it. Where a term is simpler than it sounds, I've said so.

The model and architecture terms

LLM (Large Language Model): The AI engine underneath every companion app. A neural network trained on enormous amounts of text that predicts what comes next in a sequence. When your AI companion responds to you, an LLM is generating that response one word (technically one token) at a time. GPT-4, Claude, Llama, Mistral, and Gemini are all LLMs. Anthropic's research page and OpenAI's documentation are good starting points if you want to go deeper on how these work.

Context window: The working memory the model can hold during a single response. Everything the model can "see," your message, the conversation history, the character card, the system prompt, has to fit inside this window. When it fills up, things start getting dropped or compressed. The full explainer covers this in depth. Measured in tokens, not words.

Token: The unit the model actually processes. Roughly three quarters of a word in English. "Hello, how are you?" is about 6 tokens. Context windows are measured in tokens. API pricing is per token. Understanding tokens helps you understand why AI behaves the way it does around memory limits.

System prompt: Hidden instructions the platform gives the model before your conversation starts. You don't see it, but it's there, telling the model how to behave, what to refuse, what tone to use, what its name is. The system prompt is why the same underlying model feels completely different on different platforms.

Fine-tuning: Taking a pre-trained model and training it further on specific data to change its behavior. Many AI companion platforms fine-tune open-source models on conversation data, roleplay data, or NSFW data to produce models that perform better for their specific use case than the general-purpose originals.

Quantization: Compressing a model to use less memory at the cost of some quality. A 70B model that normally needs 140GB of memory can be quantized to fit in 40GB with minimal quality loss. The notation you'll see (Q4_K_M, Q5, Q8) refers to how many bits per parameter the quantized model uses. Lower numbers mean smaller and faster but slightly worse quality. Relevant when you're running local models.

Inference: The process of running input through a model and getting output. When you send a message and the AI responds, that response was generated through inference. Inference speed (measured in tokens per second) determines how fast your AI types back.

The platform and feature terms

Character card: A file that defines who an AI character is. Contains the character's name, description, personality, first message, and sometimes example dialogue and embedded images. The standard format is a PNG image with character data embedded in the metadata. Platforms like SillyTavern, Chub.ai, and character repositories all use variations of this format. The character card guide covers how to write good ones.

Lorebook (World Info): A dynamic dictionary attached to a character or scenario. Contains entries that activate based on keywords in the conversation. When someone mentions "the Iron Quarter," the lorebook entry for that location loads into context. Keeps world details available without permanently eating context space. The lorebook guide covers these in detail.

Author's note: A short instruction injected near the end of the context window, where the model pays most attention. Used to control tone, pacing, and style throughout a conversation. The most powerful single tool for maintaining consistency in long conversations. Full technique guide here.

Persona: Your character within a roleplay. On platforms that support personas, you define who "you" are in the conversation: name, appearance, personality, backstory. The model uses this to address you correctly and respond to you in character. Different from the user profile, which is account-level information.

Jailbreak: A prompting technique designed to bypass content filters the platform has put in place. The term comes from iOS jailbreaking and carries the same connotation: circumventing restrictions the platform intended. Covered in detail in the unfiltered vs jailbreak post.

SillyTavern: The most popular open-source frontend for AI character chat. Connects to various backends (Ollama, cloud APIs, KoboldAI) and provides extensive character management, lorebook support, group chat, and customization. The setup guide walks through installation.

GGUF: The standard file format for quantized local AI models. When you download a model to run in Ollama or LM Studio, it's usually a GGUF file. The format includes the model weights plus metadata about the model's architecture and quantization.

LoRA (Low-Rank Adaptation): A small add-on file that modifies a base model's behavior without replacing it. Think of it as a personality patch. You run a base model plus one or more LoRAs to get customized behavior. Common in local NSFW setups where LoRAs tune base models toward specific aesthetics or capabilities.

The memory and conversation terms

Sliding window: The simplest memory approach. The platform keeps the most recent N messages and drops older ones. When you're talking at message 200, messages 1-150 might already be gone. Cheap to run, brutal for long-term continuity. The sliding window vs vector retrieval post covers the trade-offs.

RAG (Retrieval-Augmented Generation): A more sophisticated memory approach where older conversation history gets stored in a database and relevant chunks get pulled back in when needed. The platform searches for context related to what you're currently talking about and injects it alongside your recent messages. Makes long-term memory possible at the cost of complexity and occasional retrieval mistakes.

Embedding: A mathematical representation of text that captures its meaning in a way computers can compare. When a RAG system stores your conversation, it converts chunks of text into embeddings and later searches for embeddings similar to your current input. The quality of the embedding model determines how well the system retrieves relevant memories.

KV-cache: The stored computation state that lets the model process context efficiently. Each time you add a message to a conversation, the model doesn't reprocess the entire history from scratch; it uses the cached state from previous processing. KV-cache is why continuing a conversation is faster than starting a new one, and why it uses progressively more GPU memory as conversations get longer.

Hallucination (confabulation): When the model generates information that's plausible-sounding but wrong. Your AI companion "remembering" that you mentioned having a sister when you never did. The model isn't lying; it's generating the most statistically-likely continuation, which sometimes means inventing details that fit the conversation but aren't grounded in actual history.

Temperature: A parameter that controls how random the model's outputs are. Low temperature (0.1-0.3) produces conservative, predictable responses. High temperature (0.8-1.2) produces more creative, varied, sometimes wild responses. Most companion platforms set temperature behind the scenes, but platforms like SillyTavern let you adjust it directly.

Top-p (nucleus sampling): Another parameter controlling output randomness, working differently than temperature. Instead of scaling all probabilities, top-p cuts off unlikely options entirely. A top-p of 0.9 means the model only considers the most likely options that together make up 90% of the probability mass. Used alongside temperature for fine control.

Repetition penalty: A parameter that makes the model less likely to repeat words and phrases it's recently used. Higher values reduce repetition but can also make the model avoid using the right word when the right word appeared recently. Finding the sweet spot matters for natural-sounding conversation.

The community and culture terms

OOC (Out of Character): Messages sent as yourself rather than as a character within a roleplay. Usually marked with double parentheses ((like this)) or brackets [like this]. Used to give the AI instructions, corrections, or meta-commentary without breaking the fiction.

IC (In Character): The opposite of OOC. Messages sent as your character within the roleplay. The default mode; you only mark OOC when you need to step outside the fiction.

Godmodding: When one participant in a roleplay controls another participant's character without permission. In AI contexts, this usually means the AI deciding what your character does, thinks, or says. Most well-written character cards include instructions to avoid godmodding.

Swiping: Regenerating the AI's response to get a different version. Most platforms let you "swipe" to see alternative responses to the same input. Useful when the first generation misses the mark. Heavy swipers sometimes produce worse conversations because the model's context accumulates confusing signals from seeing its own rejected outputs.

ERP (Erotic Roleplay): Roleplay that includes explicit sexual content. The term predates AI companions by decades, originating in text-based online roleplay communities. Used in AI contexts to distinguish conversations that include sexual content from those that don't.

SFW / NSFW: Safe For Work / Not Safe For Work. In AI companion contexts, indicates whether a platform, character, or conversation includes adult content. Some platforms are exclusively NSFW; some are exclusively SFW; most are somewhere in between with toggles or filter settings.

Canon: The established facts of a character or world. When someone says "that's not canon," they mean it contradicts what's been established. In AI companion use, maintaining canon means keeping the character and world consistent with what's been previously established, which the model doesn't always do without help.

Lore: The accumulated world-building information for a character or scenario. Backstory, history, geography, factions, rules, relationships. Lore lives in lorebooks, character descriptions, and the accumulated conversation history.

Frequently asked

Why do different platforms use different terms for the same thing?

Partly branding (every platform wants its own terminology) and partly because features developed independently across platforms. "World Info" and "Lorebook" mean the same thing. "Chat Memories" and "Pinned Memories" overlap heavily. The underlying concepts are consistent even when the names aren't.

Do I need to know all these terms to use AI companions?

No. Casual users can have great experiences without knowing any of this vocabulary. The terms become useful when you want more control over your experience, when you're troubleshooting problems, or when you're participating in community discussions about AI companion use.

What's the most important term to understand?

Context window. It explains more AI behavior than any other single concept. Once you understand that the model has limited working memory and everything competes for space in it, a lot of otherwise mysterious behavior becomes predictable.

Where did most of this terminology originate?

Mixed origins. Some terms come from machine learning research (tokens, inference, fine-tuning). Some come from the roleplay community that predates AI (OOC, IC, godmodding, ERP). Some come from specific platforms (character cards from TavernAI, lorebooks from AI Dungeon). Some come from the broader AI industry (hallucination, RAG, embeddings).

Is the terminology stable or still changing?

Still changing, though the core concepts are stabilizing. New features produce new terms regularly. Community slang evolves. Platform-specific terminology appears and sometimes spreads to the broader space. The underlying concepts change less than the words used for them.

Keep reading

GUIDE

Why Every AI Character Eventually Sounds the Same

7 min read

GUIDE

Why Your AI Roleplay Escalates Faster Than You Wanted

6 min read

GUIDE

Why Your AI Companion Writes Your Character's Dialogue For You

7 min read

GUIDE

Why Your Companion Suddenly Turned Formal

7 min read