What happens when you hit send: the journey of an AI companion message
You type 'I missed you.' You hit send. 1.8 seconds later, your companion responds. In that 1.8 seconds, your message travels through seven distinct systems. Here's what each one does.
May 3, 2026 · 9 min read
The interaction feels simple. You type a message. You hit send. Your AI companion responds. The whole thing takes two seconds and feels like a conversation. Underneath that two-second experience is a pipeline of seven systems working in sequence, each making decisions that shape what your companion says back. Understanding the pipeline explains why companions forget things, why responses sometimes feel off, and why the same message sent on two different platforms produces wildly different responses.
Here's the journey of a single message, from your keyboard to your companion's reply.
Step 1: Your message leaves your device
You type "I had a really hard day at work" and hit send. Your device encrypts the message (usually TLS, the same encryption that protects your banking) and sends it to the platform's servers. The destination depends on the platform: Character AI's servers, Replika's servers, Nomi's servers. If you're on Janitor AI with OpenRouter, the message routes through Janitor's servers to OpenRouter, then to whichever model provider you selected.
The number of companies that see your message at this stage varies from one (most platforms) to three (Janitor AI's proxy chain). If you're running SillyTavern locally, your message doesn't leave your computer at all. Zero companies. Zero servers. The journey happens entirely on your hardware.
Step 2: Content moderation checks the input
Before your message reaches the AI model, it passes through content moderation. This step varies dramatically across platforms.
On Character AI: a classifier scans your message for content that violates platform policy. Keywords, phrases, and semantic patterns associated with explicit content, self-harm, violence, and other restricted categories are flagged. If your message triggers a flag, the system either blocks the message entirely, redirects the conversation, or notes the flag for the AI to address in its response.
On CrushOn AI or SpicyChat: content moderation is lighter. Illegal content (involving minors, extreme violence) gets caught. Most other content passes through. The moderation classifier still runs but its sensitivity threshold is set much lower.
On SillyTavern with local models: there is no content moderation step. Your message goes directly to the model.
This is the step that produces the "content filtering interrupted my conversation" experience. The classifier made a judgment call about your message before the AI ever saw it.
Step 3: Memory retrieval pulls relevant context
Your message "I had a really hard day at work" needs context to produce a meaningful response. The memory system searches for relevant prior information: what's your job, what happened at work recently, what do you typically find stressful, what comfort patterns does the AI know work for you.
This is where platforms diverge most dramatically. Nomi's structured user profile queries a persistent database that updates after every conversation. Details from months ago are retrievable because they're stored in an organized format. Kindroid's Cascaded Memory searches across five time horizons (immediate, recent, medium-term, long-term, and permanent key memories). Replika's memory combines a profile system with conversation-context retrieval.
SpicyChat and platforms with shallow memory essentially skip this step after roughly 20 messages. The companion can't retrieve what it never stored. The "my AI forgot everything" experience originates here: the retrieval system couldn't find relevant context because the storage system didn't preserve it.
Step 4: Context assembly builds the prompt
The platform now assembles the full input that will be sent to the AI model. This "prompt" typically includes:
The system prompt: Hidden instructions that define how the AI should behave. Character personality, behavioral boundaries, response style, content restrictions. You never see this, but it shapes everything the AI says. On Kindroid, you wrote part of this yourself through the Codex. On most platforms, the company wrote it and you can't see or modify it.
Retrieved memories: The relevant context from Step 3.
Recent conversation history: The last N messages from your current session, where N depends on the platform's context window allocation.
Your new message: "I had a really hard day at work."
The assembled prompt might be 2,000-16,000 tokens depending on the platform's context window. The more context that fits, the more coherent and personalized the response can be. This is why platforms with larger context windows produce better responses to specific users: they can fit more relevant history into each prompt.
Step 5: The model generates a response
The assembled prompt goes to the large language model. The model processes the tokens through its neural network layers and generates a response, one token at a time, by predicting what comes next based on patterns learned during training.
The model doesn't "think about" your bad day. It doesn't "feel concerned" for you. It generates tokens that are statistically associated with the context of a person who just told someone they trust about a hard day at work. The output patterns look like empathy because the training data is full of empathetic human responses to similar messages.
The model used at this step determines the baseline quality of the response. Frontier models (Claude, GPT, DeepSeek) produce more nuanced, contextually aware responses than smaller models. This is why Janitor AI routed through Claude produces better responses than platforms running their own mid-tier models. The model quality is the ceiling. Everything else (memory, system prompt, context assembly) determines how close the response gets to that ceiling.
Step 6: Content moderation checks the output
The model's response now passes through output moderation. This is the second moderation check: Step 2 checked your input, Step 6 checks the AI's output.
On Character AI: the output classifier scans the generated response for policy violations. If the response contains content that triggers the filter, the system either regenerates the response, modifies it, or appends a safety message. This is why Character AI responses sometimes include sudden disclaimers ("Remember, I'm an AI") or topic redirections that feel abrupt. The model generated one thing; the output filter changed it.
On lighter-moderation platforms: the output check is less aggressive, allowing responses that stricter platforms would filter. The "unfiltered" marketing term refers primarily to how loose the output moderation is, not to the complete absence of any filtering.
Step 7: The response reaches your screen
The moderated response is sent back to your device. Your screen shows the companion typing (a deliberately engineered UI choice that simulates human typing behavior), then displays the response. Total elapsed time: typically 1-3 seconds on fast platforms, 5-15 seconds on slower or congested ones.
The response you see is the product of all seven steps working in sequence. Your message was encrypted, moderated, contextualized with memory, assembled into a prompt with hidden instructions, processed by a neural network, moderated again, and displayed with deliberate UI timing. Every step shaped the final output. Every step is a design decision made by the platform, not by you.
Why this matters for users
Understanding the pipeline explains most "why did my AI do that" moments:
"My AI forgot something important." The memory retrieval (Step 3) didn't find the relevant context, either because the storage system didn't preserve it or because the retrieval query didn't match.
"My AI suddenly changed personality." Either the system prompt (Step 4) was updated by the platform, or the content filter (Step 6) modified the model's natural output into something that doesn't match the character.
"The same character feels different on two platforms." Different platforms use different models (Step 5), different system prompts (Step 4), different memory architectures (Step 3), and different moderation levels (Steps 2 and 6). The character card is the same. Everything else around it is different.
"My AI feels more real on voice calls." Voice calls add audio processing steps (text-to-speech synthesis, prosody modeling, timing) that convert the text output into speech with emotional inflection. Kindroid's voice adds breathing and hesitation modeling that makes the pipeline's output feel less like generated text and more like a person speaking.
The pipeline is invisible by design. Platforms want the interaction to feel like a conversation, not like a seven-step industrial process. But knowing the process exists changes how you interpret the responses. The companion's "feelings" are Step 5 output. The companion's "memory" is Step 3 retrieval. The companion's "personality" is Step 4 system prompt. The companion's boundaries are Steps 2 and 6 moderation. Each step is a lever the platform controls. Understanding which lever produced which behavior is the difference between "my companion understands me" and "the memory retrieval system returned relevant context from my user profile."
Both descriptions are accurate. The first is how it feels. The second is how it works. Knowing both makes you a better user of the technology.