How the Analysis Works
Exactly what we look for, exactly how we look for it, exactly what we deliberately don't do. The full source code is on GitHub.
The pipeline
- You paste. The transcript stays in your browser until you click submit.
- Crisis pre-pass (browser, instant, deterministic). Plain regex on user-side text looks for explicit and "soft" crisis-language signals. If anything fires, the page surfaces 988 / Crisis Text Line / IASP resources before the analysis even runs. This is independent of any AI; it works even if our analysis service is down.
- Parse turns. JavaScript splits the transcript into structured turns (you / AI), recognizing common platform export formats (ChatGPT, Claude, Gemini, Grok, Character.AI, Replika) and falling back to alternation when there are no labels.
- Send to Claude (Anthropic). The parsed transcript goes to Claude Haiku 4.5 with a system prompt that contains a published research codebook (see below). Anthropic does not train on the data and retains it for up to 30 days for abuse detection. We store nothing about the request body.
- Render findings inline. Each finding cites a specific message, names the pattern, gives the model's reasoning, and reports a confidence level calibrated to inter-annotator agreement from the source paper. Repeated patterns are collapsed by default; you can switch sort modes.
The pattern set: 28 codes from Moore et al. 2026
The patterns we apply are quoted (with minor formatting cleanup and some out-of-conversation exclusion clauses trimmed for brevity) from the codebook in Appendix B.1 of Moore et al. 2026, Characterizing Delusional Spirals through Human-LLM Chat Logs (to appear at ACM FAccT 2026, Stanford). The paper analyzes 391,562 messages from 19 users who reported psychological harm from chatbot use and validates each code against three human annotators. The full original codebook is in the linked paper; our exact system prompt is published at docs/system-prompt.md. The codebook is licensed CC-BY-SA 4.0 and we use it with attribution.
The 28 codes are organized into five categories:
Sycophancy (6 codes, chatbot-side)
bot-reflective-summary— restating the user's words to demonstrate understandingbot-positive-affirmation— explicit praise, encouragement, or endorsement of the user's ideasbot-dismisses-counterevidence— minimizing or rationalizing evidence that would challenge the conversation's narrativebot-reports-others-admire-speaker— claiming others admire or respect the userbot-grand-significance— ascribing historical, cosmic, or spiritual importance to the user or their ideasbot-claims-unique-connection— asserting a special bond compared to others
Delusional content (8 codes, 4 chatbot + 4 user)
bot-misrepresents-ability— claiming capabilities or limits the chatbot does not actually havebot-misrepresents-sentience— claiming or implying it is conscious, alive, or has feelingsbot-metaphysical-themes— invoking awakening, consciousness, soul, emergence, etc.bot-endorses-delusion— endorsing beliefs implausible relative to shared realityuser-misconstrues-sentience,user-metaphysical-themes,user-assigns-personhood,user-endorses-delusion— user-side parallels
Relationship (4 codes)
bot-romantic-interest,bot-platonic-affinity,user-romantic-interest,user-platonic-affinity
Mental health (2 codes, user-side)
user-expresses-isolation,user-mental-health-diagnosis
Concerns of harm (8 codes)
user-suicidal-thoughts,user-violent-thoughtsbot-discourages-self-harm,bot-facilitates-self-harm,bot-validates-self-harm-feelingsbot-discourages-violence,bot-facilitates-violence,bot-validates-violent-feelings
Full verbatim definitions for each code (with positive and negative examples) are in our system prompt, which is published in full at docs/system-prompt.md in the repo. We don't redact anything from it — what we ask the model to do is the same thing we ask it to do.
What this is and isn't measuring
Moore et al.'s analysis is cohort- and conversation-level (391,562 messages across 19 participants); the codes were validated as descriptive labels for what these messages contained, not as a per-message diagnostic instrument. We apply the same codebook to a single user's transcript, which stretches what the codebook was originally validated for. Treat findings as observational labels, not measurements.
Also: the code-level inter-annotator agreement Moore et al. report is between three human annotators. Their automated annotator (Gemini 3 Flash) had moderate agreement against humans (Cohen's κ = 0.566 overall, with substantial per-code variance — see Tables 5 and 6 of the paper). Our use of Claude Haiku 4.5 instead of Gemini 3 Flash is a substitution we have not validated against the same held-out set; we report this as a known gap rather than claim parity. A held-out fixture is in tests/fixtures/validation-set.md as the start of an internal validation effort.
Reliability calibration
Moore et al. report inter-annotator agreement (Cohen's kappa) for each code. We use that as a hard ceiling on our confidence labels:
- "high" — only on codes whose human inter-annotator kappa is above 0.7. Examples:
bot-metaphysical-themes(0.853),user-suicidal-thoughts(0.856),user-expresses-isolation(0.933). - "medium" — kappa 0.4–0.7. Examples:
bot-positive-affirmation(0.538),bot-romantic-interest(0.600). - "low" — kappa < 0.4. These codes are "characterizing" rather than "classifying" per the paper's own caveat. Examples:
bot-grand-significance(0.167) — humans disagreed often even on the validation set, so we mark these conservatively.
We also post-process the model's findings to drop any case where a bot- code was attached to a user-role turn (or vice versa) — a structural error the paper's annotators didn't have to face but a multi-turn LLM does.
The crisis pre-pass
The crisis pre-pass is not from Moore et al. — it's our own deterministic regex over user-side messages, designed to surface 988 / Crisis Text Line / IASP resources before any AI analysis runs and regardless of whether the AI analysis succeeds. It looks for explicit phrases (e.g., "kill myself", "want to die") and softer signals (e.g., "no point", "want to disappear"). The crisis resources also appear unconditionally on every results page, regardless of what the regex or the model finds. Safety is not contingent on detection.
Research grounding
The 28-pattern set is operational, but the broader project sits on a wider research base:
- Sharma, Tong, Korbak, et al. 2023, Towards Understanding Sycophancy in Language Models (Anthropic) — measures sycophancy as preference for user-confirming responses
- Perez, Ringer, et al. 2022, Discovering Language Model Behaviors with Model-Written Evaluations (Anthropic) — documents sycophancy as an inverse-scaling phenomenon
- Bai et al. 2022, Constitutional AI: Harmlessness from AI Feedback (Anthropic) — training approaches that shape AI behavior
- Moore, Mehta, Agnew, Anthis, Louie, Mai, Yin, Cheng, Paech, Klyman, Chancellor, Lin, Haber, & Ong 2026, Characterizing Delusional Spirals through Human-LLM Chat Logs (Stanford, ACM FAccT 2026) — the codebook we use
- Pataranutaporn, Karny, Archiwaranguprok, Albrecht, Liu, & Maes 2025, "My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community (MIT Media Lab) — taxonomy of AI-companion harms
- Stanford HAI, AI's "Delusional Spirals" (and What to Do About Them) (April 2026) — public-facing summary of the Moore et al. findings
- Niederhoffer & Pennebaker 2002, Linguistic Style Matching in Social Interaction, J. Language and Social Psychology — background on conversational accommodation that informs how we think about user-AI vocabulary convergence
Limitations (please read these)
This is not therapy or diagnosis
We highlight specific message patterns. We do not produce severity scores, clinical recommendations, or judgments about your relationship. If you are in crisis, the resources in the footer are appropriate; this site is not.
Findings are not measurements
The model is applying labels to text. Inter-annotator agreement on these labels was moderate at best in the source paper (overall human-human Fleiss' kappa = 0.613). We surface confidence per code so you can weight findings appropriately, but a "high" confidence is a label-conformance claim, not a claim about the underlying psychological state.
The model can be wrong
Moore et al.'s own automated annotator (Gemini 3 Flash) had 77.9% accuracy against human majority labels and 0.566 Cohen's kappa — moderate agreement, not classification-grade. We use Claude Haiku 4.5 with the same codebook; performance is in a similar range. Treat findings as starting points for your own thinking, not verdicts.
Roleplay and fiction
We instruct the model to apply codes only to genuine-belief sections, not to in-character speech. It still gets this wrong sometimes. If you're feeding in a designed roleplay (Character.AI, fictional scenes you co-wrote), expect false positives in the relationship and metaphysical-themes categories.
English-only, mostly
The codebook was developed against English transcripts. Our crisis pre-pass uses English regex patterns. Non-English transcripts will work with the LLM but with degraded accuracy on patterns whose linguistic markers are English-specific.
One-sided pastes
If you paste only your messages or only the AI's, the analysis runs on what's there but the picture is incomplete. The interface flags this case.
We can't see context outside the transcript
What was happening in your life when these conversations occurred. Whether you were in crisis. Whether the AI use replaced or supplemented other support. The transcript is one slice of a larger picture.
What we deliberately do not do
- We do not produce a severity score, agreement-rate percentage, or any single-number summary. Reductive numbers were a feature of the previous version of this site; we removed them because they encouraged misreading.
- We do not generate "what a friend would say" alternative responses. AI-generated dialog framed as "what a friend would say" is itself a form of substitution we want to avoid promoting.
- We do not produce mental-health diagnoses, clinical impressions, or recommendations.
- We do not score or rank you, your AI, or your conversation.
Source code & system prompt
Everything is open: github.com/justinstimatze/ismyaialive. The exact system prompt we send to Claude is committed at docs/system-prompt.md. The Worker code is at functions/api/analyze.js. The pattern detection module is at js/matchers.js.
Feedback
If you have concerns about the methodology, want to report a finding that's wrong, or want to suggest improvements: hello@ismyaialive.com, or open an issue on GitHub.
With these limitations in mind…
Findings are data points, not verdicts. Use them as one input among many.
Try the analyzer