The Em Dash as an AI Tell: A Cybersecurity Perspective on Writing and Attribution

The debate surrounding ChatGPT’s prolific use of the em dash (—) has evolved from a linguistic curiosity into a significant cultural and technical flashpoint. This punctuation mark has been mistakenly branded as a definitive sign of machine-generated text, a phenomenon less about grammar and more about orthography, training data, and a broader anxiety over authenticity¹. For security professionals, this debate transcends style; it touches on core issues of trust, attribution, and the evolving challenges of detecting synthetic content in an era of sophisticated information operations. The conversation has spurred a form of “AI-shaming,” where humans actively alter their writing to avoid perceived AI tells, a practice that could inadvertently degrade the quality and expressiveness of human prose and complicate forensic analysis⁸.

A key technical insight is that the real “tell” is not the dash itself but its orthography. AI models like ChatGPT typically use the traditional print-style em dash (—) with no surrounding spaces. In contrast, the average computer user often employs a hyphen with spaces ( – ) or a double-hyphen (–), which some software may convert to an en dash (–)¹. This orthographic fingerprint provides a more reliable signal than the mere presence of the mark. The AI’s insistence on this formatting is described as a “deep bias” embedded from its training data, which includes a vast corpus of books and articles where sophisticated writers use em dashes liberally. Despite exhaustive efforts, including hard-coded instructions and labeling its use a “Critical Error,” moderators have found it nearly impossible to stop these models from defaulting to this punctuation².

**TL;DR: Key Points for Security Leadership**

* **The Signal:** The unspaced em dash (—) is a weak, transient indicator of AI-generated text, but its prevalence highlights the challenge of relying on stylistic “tells.”
* **The Real Problem:** A cultural shift of “AI-shaming” is causing humans to self-censor, potentially simplifying language and even intentionally introducing errors to appear more human.
* **Forensic Reality:** Reliable detection requires a multi-faceted approach analyzing structural complexity, semantic nuance, and imperfection, not just punctuation.
* **The Arms Race:** All current detection methods are transient as AI models rapidly evolve, making static defenses unreliable.
* **Operational Impact:** This phenomenon complicates threat intelligence reporting, phishing analysis, and source attribution, where discerning human from machine authorship is critical.

The Technical and Cultural Response to AI Writing

The security implications of this linguistic shift are profound. The fear of being perceived as AI is causing a measurable change in human writing behavior, a development documented across multiple industries. Professionals are now purposely leaving typos in their published work, avoiding sophisticated literary devices like metaphors, and cutting em dashes to “prove” their human authorship⁸. This is driven by a pervasive online culture where comment sections and professional communications are “littered with accusations” of AI use. For instance, an entrepreneur cited in a Slate article leaves typos in his posts as a “reassuring sign,” while a content strategist advises clients to leave small mistakes in public posts for authenticity⁸. This intentional introduction of noise and reduction of signal complexity poses a unique challenge for security analysts who rely on stylistic consistency and error analysis for attribution.

From a forensic standpoint, focusing solely on the em dash is a flawed strategy. A 2023 peer-reviewed scientific study on distinguishing academic science writing from humans or ChatGPT developed a model with over 99% accuracy using 20 features⁶. Their model relied on a holistic set of indicators, including paragraph complexity (humans write longer paragraphs with more sentences), sentence-level diversity (greater variation in sentence length), and specific word choice. Humans use more equivocal language (“however,” “but,” “although”), the word “because,” and references to numbers and proper nouns. In contrast, AI tends to use more vague terms like “others” and “researchers”⁶. This multi-variate approach is far more robust than any single punctuation tell.

Relevance to Security Operations and Threat Intelligence

For security teams, particularly those in Threat Intelligence, SOCs, and Digital Forensics, the em dash debate is a microcosm of a larger challenge: the weaponization of believable synthetic text. Phishing campaigns, influence operations, and fake news can all be generated at scale using these tools. Relying on a perceived “tell” like punctuation is a brittle defense that adversaries can easily adapt to overcome. The Duke University study finding that professionals fear being seen as “lazier and less competent” if they use AI highlights the social engineering angle; attackers can leverage this bias to make their synthetic personas seem more authentic by intentionally mimicking “human” errors⁸.

The evolution of this threat requires a shift in detection methodology. Instead of hunting for specific words or marks, analysts should profile content based on the broader features identified in research:
* **Structural Analysis:** Use scripts to measure average paragraph length, sentence length variation, and lexical density.
* **Semantic Analysis:** Look for a lack of specific, concrete examples, an over-reliance on vague jargon, and the absence of equivocal language that indicates nuanced thought.
* **Error Analysis:** While counter-intuitive, a complete lack of any typographical or grammatical errors can be a stronger indicator of AI origin than their presence, given the new trend of humans adding them intentionally.

Conclusion and Recommendations

The debate over the em dash is not really about punctuation. It is a proxy for a broader anxiety about authenticity, trust, and the value of human effort in the face of automated content generation. For the security community, it serves as a critical case study in the inadequacy of simple heuristics for detecting advanced threats. As noted by experts like Google Brain’s Daphne Ippolito, all current “tells” are only “transiently useful” as AI models rapidly evolve⁸. The arms race between generation and detection will continue, necessitating a focus on behavioral and contextual analysis rather than static signatures. Security teams must advocate for and develop tools that analyze writing on a deeper, multi-feature level to reliably attribute source and intent in an increasingly synthetic information landscape.