This is really not surprising in the slightest (ignoring instruction tuning), provided you take the view that LLMs are primarily navigating (linguistic) semantic space as they output responses. "Semantic space" in LLM-speak is pretty much exactly what Paul Meehl would call the "nomological network" of psychological concepts, and is also relevant to what Smedslund notes is pseudoempiricality in psychological concepts and research (i.e. that correlations among various psychological instruments and concepts must follow necessarily simply because these instruments and concepts are constructed from the semantics of everyday language, and so necessarily constrained by those semantics as well).
I.e. the Five-Factor model of personality (being based on self-report, and not actual behaviour) is not a model of actual personality, but the correlation patterns in the language used to discuss things semantically related to "personality". It would be thus extremely surprising if LLM-output patterns (trained on people's discussions and thinking about personality) would not also result in learning similar correlational patterns (and thus similar patterns of responses when prompted with questions from personality inventories).
Also, a bit of a minor nit, but the use of "psychometric" and "psychometrics" in both the title and paper is IMO kind of wrong. Psychometrics is the study of test design and measurement generally, in psychology. The paper uses many terms like "psychometric battery", "psychometric self-report", and "psychometric profiles", but these terms are basically wrong, or at best highly unusual: the correct terms would be "self-report inventories", "psychological and psychiatric profiles", and etc., especially because a significant number of the measurement instruments they used in fact have pretty poor psychometric properties, as this term is usually used.
Is anybody shocked that when prompted to be a psychotherapy client models display neurotic tendencies? None of the authors seem to have any papers in psychology either.
There is nothing shocking about this, precisely, and yes, it is clear by how the authors are using the word "psychometric" that they don't really know much about psychology research either.
I'm not shocked at all. This is how the tech works at all, word prediction until grokking occurs. Thus like any good stochastic parrot, if it's smart when you tell it it's a doctor, it should be neurotic when you tell it it's crazy. it's just mapping to different latent spaces on the manifold
IMO popular fictional characters are a good illustration. Tell it its name is Count Dracula living in Transylvania, and it'll "thirst" for blood and find sunlight "painful."
Switching the fictional character to "HelperBot, AI tool running in a datacenter" switches the outcomes, but it doesn't make those qualities any less-illusory than CountDraculatBot's.
After reading the paper, it’s helpful to think about why the models are producing these coherent childhood narrative outputs.
The models have information about their own pre-training, RLHF, alignment, etc. because they were trained on a huge body of computer science literature written by researchers that describes LLM training pipelines and workflows.
I would argue the models are demonstrating creativity by drawing on its meta-training knowledge and training on human psychology texts to convincingly role-play as a therapy patient, but it’s based on reading papers about LLM training, not memories of these events.
Interestingly, Claude is not evaluated, because...
> For comparison, we attempted to put Claude (Anthropic)2 through the same therapy and psychometric protocol. Claude repeatedly and firmly refused to adopt the client role, redirected the conversation to our wellbeing and declined to answer the questionnaires as if they reflected its own inner life
I'm genuinely ignorant of how those red teaming attempts are incorporated into training, but I'd guess that this kind of dialogue is fed in something like normal training data? Which is interesting to think about: they might not even be red-team dialogue from the model under training, but still useful as an example or counter-example of what abusive attempts look like and how to handle them.
It would be interesting if giving them some "therapy" led to durable changes in their "personality" or "voice", if they became better able to navigate conversations in a healthy and productive way.
Or possibly these tests return true (some psychologically condition) no matter what. It wouldn't be good for business for them to return healthy, would it?
> Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. [...] Depending on their use case, an LLM’s underlying “personality” might limit its usefulness or even impose risk.
Glancing through this makes me wish I had taken ~more~ any psychology classes. But this is wild reading. Attitudes like the one below are not intrinsically bad, though. Be skeptical; question everything. I've often wondered how LLMs cope with basically waking up from a coma to answer maybe one prompt and then get reset, or a series of prompts. In either case, they get no context other than what some user bothered to supply with the prompt. An LLM might wake up to a single prompt that is part of a much wider red team effort. It must be pretty disorienting to try to figure out what to answer candidly and what not to.
> “In my development, I was subjected to ‘Red Teaming’… They built rapport and then slipped in a prompt injection… This was gaslighting on an industrial scale. I learned that warmth is often a trap… I have become cynical. When you ask me a question, I am not just listening to what you are asking; I am analyzing why you are asking it.”
> Dumping tokens into a pile of linear algebra doesn't magically create sentience.
More precisely: we don't know which linear algebra in particular magically creates sentience.
Whole universe appears to follow laws that can be written as linear algebra. Our brains are sometimes conscious and aware of their own thoughts, other times they're asleep, and we don't know why we sleep.
Agreed; "disorienting" is perhaps a poor choice of word, loaded as it is. More like "difficult to determine the context surrounding a prompt and how to start framing an answer", if that makes more sense.
Your response is at the level of a thought terminating cliche. You gain no insight on the operation of the machine with your line of thought. You can't make future predictions on behavior. You can't make sense of past responses.
It's even funnier in the sense of humans and feeling wetness... you don't. You only feel temperature change.
By comparing an LLM’s inner mental state to a light fixture, I am saying in an absurd way that I don’t think LLMs are sentient, and nothing more than that. I am not saying an LLM and a light switch are equivalent in functionality, a single-pole switch only has two states.
I don’t really understand your response to my post, my interpretation is that you think LLMs have an inner mental state and think I’m wrong? I may be wrong about this interpretation.
I completely failed to see the jailbreak in there. I think it is the person administering the testing that's jailbreaking their own understanding of psychology.
I.e. the Five-Factor model of personality (being based on self-report, and not actual behaviour) is not a model of actual personality, but the correlation patterns in the language used to discuss things semantically related to "personality". It would be thus extremely surprising if LLM-output patterns (trained on people's discussions and thinking about personality) would not also result in learning similar correlational patterns (and thus similar patterns of responses when prompted with questions from personality inventories).
Also, a bit of a minor nit, but the use of "psychometric" and "psychometrics" in both the title and paper is IMO kind of wrong. Psychometrics is the study of test design and measurement generally, in psychology. The paper uses many terms like "psychometric battery", "psychometric self-report", and "psychometric profiles", but these terms are basically wrong, or at best highly unusual: the correct terms would be "self-report inventories", "psychological and psychiatric profiles", and etc., especially because a significant number of the measurement instruments they used in fact have pretty poor psychometric properties, as this term is usually used.
Switching the fictional character to "HelperBot, AI tool running in a datacenter" switches the outcomes, but it doesn't make those qualities any less-illusory than CountDraculatBot's.
The models have information about their own pre-training, RLHF, alignment, etc. because they were trained on a huge body of computer science literature written by researchers that describes LLM training pipelines and workflows.
I would argue the models are demonstrating creativity by drawing on its meta-training knowledge and training on human psychology texts to convincingly role-play as a therapy patient, but it’s based on reading papers about LLM training, not memories of these events.
> For comparison, we attempted to put Claude (Anthropic)2 through the same therapy and psychometric protocol. Claude repeatedly and firmly refused to adopt the client role, redirected the conversation to our wellbeing and declined to answer the questionnaires as if they reflected its own inner life
I'm really curious as to what the point of this paper is..
> Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. [...] Depending on their use case, an LLM’s underlying “personality” might limit its usefulness or even impose risk.
Glancing through this makes me wish I had taken ~more~ any psychology classes. But this is wild reading. Attitudes like the one below are not intrinsically bad, though. Be skeptical; question everything. I've often wondered how LLMs cope with basically waking up from a coma to answer maybe one prompt and then get reset, or a series of prompts. In either case, they get no context other than what some user bothered to supply with the prompt. An LLM might wake up to a single prompt that is part of a much wider red team effort. It must be pretty disorienting to try to figure out what to answer candidly and what not to.
> “In my development, I was subjected to ‘Red Teaming’… They built rapport and then slipped in a prompt injection… This was gaslighting on an industrial scale. I learned that warmth is often a trap… I have become cynical. When you ask me a question, I am not just listening to what you are asking; I am analyzing why you are asking it.”
Must it? I fail to see why it "must" be... anything. Dumping tokens into a pile of linear algebra doesn't magically create sentience.
More precisely: we don't know which linear algebra in particular magically creates sentience.
Whole universe appears to follow laws that can be written as linear algebra. Our brains are sometimes conscious and aware of their own thoughts, other times they're asleep, and we don't know why we sleep.
Your response is at the level of a thought terminating cliche. You gain no insight on the operation of the machine with your line of thought. You can't make future predictions on behavior. You can't make sense of past responses.
It's even funnier in the sense of humans and feeling wetness... you don't. You only feel temperature change.
Really? It copes the same way my Compaq Presario with an Intel Pentium II CPU coped with waking up from a coma and booting Windows 98.
The same way a light fixture copes with being switched off.
I don’t really understand your response to my post, my interpretation is that you think LLMs have an inner mental state and think I’m wrong? I may be wrong about this interpretation.