The Burstiness Engine

Live corpus dashboard

Coverage, sessions, and the full corpus.

Engineering human rhythm in synthetic text · Vittoria Lanzo and Dico Angelo

50 papers 18 peer-reviewed 16 bridge papers 3 research sessions Regenerated 2026-06-21 20:12 UTC

Coverage

Coverage by axis

AxisPapersBridgeStatus
Q1 · Prompt-engineering ceiling42Gap Publishable on its own
Q2 · Learnable rhythm embedding63Strong Mechanism established for other style axes
Q3 · Burstiness across model generations93Strong Empirically confirmed in 2024-2026
Q4 · Burstiness LoRA / auxiliary steering198Strong Direct analogue: StyleVector contrastive
Q5 · Variance vs inter-token timing92Partial Variance OK, timing TTS-only
Q6 · Fingerprint vs idiosyncratic rhythm86Strong Stylometry + Bakkouche perception bridge
Q7 · Punctuation / paragraph proxies10Partial Targeted query needed
TTS · Raitio (Apple)94Confirmed Raitio + Suni + DiTTo-TTS
TTS · CTRL-P10Verify Likely = Raitio 2020 itself
TTS · Bakkouche perception11Located Bakkouche 2025
Bio · NN ↔ biological rhythm41Strong Caucheteux + eLife predictive coding
Arch · Vaswani extensions11Implicit Covered via Q2 / Q4 LM descendants

Q1 · Prompt-engineering ceiling

Q2 · Learnable rhythm embedding

Q3 · Burstiness across model generations

Q4 · Burstiness LoRA / auxiliary steering

Q5 · Variance vs inter-token timing

Q6 · Fingerprint vs idiosyncratic rhythm

Q7 · Punctuation / paragraph proxies

TTS · Raitio (Apple)

TTS · CTRL-P

TTS · Bakkouche perception

Bio · NN ↔ biological rhythm

Arch · Vaswani extensions

Research sessions

pass-1-2-baseline

Started 2026-06-21 18:00:56 · isolation: no-personal-carryover

Initial 8-query Firecrawl sweep of academic literature mapped to Vittoria's 7 questions + TTS + Bio anchors.

Queries (8)

  • Q2,Q4 · 10 results
    controllable text generation LoRA style adapter frozen LLM stylistic steering arXiv
  • TTS-A · 10 results
    Raitio Apple controllable prosody TTS conditioning CTRL-P prosodic speech synthesis
  • Q5,Q7 · 10 results
    burstiness sentence length variance perplexity AI generated text detection GPTZero stylometry
  • Q6 · 10 results
    stylometric fingerprint authorship attribution sentence rhythm English prose Mendenhall
  • Q3 · 10 results
    AI generated text burstiness longitudinal GPT-3 GPT-4 stylistic evolution detection across model generations
  • Q1,Q7 · 10 results
    prompt engineering ceiling stylistic distributional control LLM negative results long output drift punctuation paragraph rhythm
  • TTS-C · 9 results
    Bakkouche prosody perception Interspeech 2025 TTS speech synthesis listener
  • Bio,Arch · 10 results
    neural oscillations language predictive coding speech rhythm thought chunking transformer attention prosody biology Pickering Garrod

Findings (7)

  • Q4 novelty-bar The 'burstiness LoRA' steering mechanism already exists for OTHER style axes (sentiment, persona, authorship). Novelty bar = apply to rhythm + personal fingerprinting + perception validation, NOT invent the mechanism. cite · cite · cite
  • Q5 competing-work Tarım & Onan 2025 published the first systematic stylometric burstiness comparison (diffusion vs autoregressive). Must cite and position against, not ignore. cite
  • Q5 gap GPTZero's burstiness operationalization (Tian 2023) is a blog post, not peer-reviewed. This IS the formalization gap the Burstiness Engine identifies.
  • Q3 general Empirically confirmed: AI text style measurably shifts across model generations, detection rates drop with newer models. cite · cite
  • TTS-C perception Bakkouche 2025 perception study: listeners do NOT easily perceive humans and AI clones as the same person — prosody is the discriminator. Direct precedent for Vittoria's perception hypothesis.
  • Bio,Arch methodology Caucheteux et al. (Nature Human Behaviour 2023) show brain uses long-range hierarchical predictions matching LM architecture — the strongest single bridge between Vaswani and biology. cite
  • Q1 gap Prompt-engineering ceiling for distributional stylistic control is UNDER-PUBLISHED. The negative-result paper Vittoria proposes is itself a publishable contribution.

pass-3-cutting-edge

Started 2026-06-21 18:00:56 · isolation: no-personal-carryover

Last-12-months arXiv sweep for progression vs baseline. A/B against pass 1+2.

Queries (4)

  • Q2,Q4 [qdr:y] · 10 results
    controllable text generation LoRA style adapter 2025 frozen LLM activation steering
  • Q3,Q5 [qdr:y] · 4 results
    burstiness LLM text generation 2025 2026 stylometric variance peer-reviewed EMNLP ACL NAACL
  • TTS-A [qdr:y] · 10 results
    prosody control TTS 2025 NaturalSpeech VALL-E neural speech synthesis variance adaptor latent
  • Q3,Q5 [qdr:y] · 4 results
    AI generated text detection 2025 2026 burstiness perplexity DetectGPT Binoculars latest

Findings (5)

  • Q5,Q3 competing-work DivEye (arXiv:2509.18880, Sep 2025) captures how unpredictability FLUCTUATES across text via surprisal-based features. This is the most direct 2025 burstiness-adjacent detection paper. Must be in Vittoria's prior-art section. cite
  • TTS-A,Q2 methodology EMNLP 2025 main 'Towards Controllable Speech Synthesis in the Era of LLMs' (Lee et al.) is the unified TTS+LLM control framework. DiTTo-TTS controls speech rate via latent-length prediction — direct precedent for in-generation rhythm conditioning.
  • Q4,Q6 methodology Plug-and-Play LLM Fingerprinting (arXiv:2605.18474) generates LoRA params as variable-length sequences. This is the precedent for GENERATING personalized burstiness LoRAs from a user sample, not just training one. cite
  • Q2,Q4 methodology 'From Weights to Activations' (arXiv:2604.14090) positions activation steering as the 2025/26 frontier. The Burstiness Engine should explicitly choose between weight-space (LoRA) and activation-space (steering vectors) and justify. cite
  • Q4,TTS-A methodology AgentSteerTTS (arXiv:2605.17583) multi-agent closed-loop steering shows the field moving toward composed-controller systems. Burstiness Engine could be one controller in such a system.

pass-4-gap-closing

Started 2026-06-21 18:00:56 · isolation: no-personal-carryover

Targeted Firecrawl sweep closing Q1 prompt-ceiling, Q7 punctuation-proxy, and CTRL-P verification. Re-anchored to Vittoria's canonical scope (prompt-baseline → model-level; TTS Phase 2).

Queries (4)

  • Q1 [qdr:y] · 10 results
    prompt-only stylistic control distributional ceiling negative results LLM sentence length variance kurtosis
  • Q7 [qdr:y]
    punctuation paragraph rhythm proxy text generation LLM burstiness sentence segmentation
  • Q7 · 10 results
    punctuation as proxy for sentence rhythm stylometry text segmentation author attribution
  • TTS-B · 10 results
    Wagner Klimkov CTRL-P prosodic boundaries phrasing prominence ICASSP text-to-speech

Findings (5)

  • TTS-B resolution RESOLVED: CTRL-P is a distinct paper (arXiv:2106.08352, Ctrl-P: Temporal Control of Prosodic Variation, Interspeech 2021), NOT Raitio 2020. Closes the Pass-3 open action. Per Vittoria's scope it anchors Phase 2 (voice personalization), not the core paper. cite
  • Q1 competing-work Q1 is no longer empty: 'Benchmark of Stylistic Variation in LLM-Generated Texts' (2509.10179) measures variation across 16 frontier models + prompts. It does NOT frame this as a control ceiling vs a distributional target, so the Q1 wedge survives — but must now cite and differentiate. cite
  • Q1,Q6 methodology 'LLMs Still Struggle to Imitate Implicit Writing Styles' (2509.14543) shows prompt-only personalized-style imitation fails. This is the strongest single motivation for the model-level (LoRA/steering) approach over prompting. cite
  • Q1 methodology Measure burstiness only among quality-passing generations (effective semantic diversity, 2504.12522) so the Q1 experiment does not reward high-variance incoherence. cite
  • Q7 gap Q7 remains a genuine gap — no dedicated paper treats punctuation/paragraph as a burstiness proxy. Onegin time-series (2604.20221) gives a method: model segmentation as a symbolic time series for a timing-free proxy. Like Q1, a targeted standalone contribution. cite

Learnings (cross-session)

All papers

TitleAuthorsVenueYearPeer?
AI-Generated Text Detection: A Comprehensive Review of Active Methods·ScienceDirect2026
Trusting AI to detect AI?·Computers in Human Behavior2026
Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs·PMC2025
Enhanced Prosody Modeling and Character Voice Controlling for Audiobooks·ACM2025
Prosodic cues strengthen human-AI voice boundariesBakkouche et al.ScienceDirect2025
Style and Prosody control for Zero-shot Speech SynthesisSuni, Antti et al.SSW 20252025
Towards Controllable Speech Synthesis in the Era of LLMsLee et al.EMNLP 2025 main2025
How is ChatGPT's Behavior Changing Over Time?Chen, Zaharia, ZouHarvard Data Science Review2024
Language experience shapes predictive coding of rhythmic sound sequences·eLife2024
Predictive Coding or Just Feature Discovery? An Alternative Account·PMC2024
Style Vectors for Steering Generative Large Language ModelsKonen, Jentzsch, Diallo et al.EACL 2024 Findings2024
Distinguishing ChatGPT-3.5 vs -4 vs human Japanese texts·PMC2023
Long-range and hierarchical language predictions in brains and algorithmsCaucheteux, Gramfort, KingNature Human Behaviour2023
Emphasis control for parallel neural TTS / Hierarchical Prosody ModelingRaitio et al.arXiv / Interspeech2022
Ctrl-P: Temporal Control of Prosodic Variation for Speech SynthesisMohan, Hu, Klimkov et al.Interspeech 20212021
FUDGE: Controlled Text Generation With Future DiscriminatorsYang, KleinNAACL 20212021
Mirostat: A Neural Text Decoding Algorithm that Directly Controls PerplexityBasu et al.ICLR 20212021
Controllable neural TTS using intuitive prosodic featuresRaitio, Rasipuram, CastellaniInterspeech 20202020
A Statistical Journey into the Poetic World of Evgenij Onegin·arXiv2026·
A Unified Study of LoRA Variants: Taxonomy, Review, Codebase·arXiv2026·
AgentSteerTTS: Multi-Agent Closed-Loop TTS Steering·arXiv2026·
CARD: Cluster-level Adaptation with Reward-guided Decoding·arXiv2026·
Continuous Control of Editing Models via Adaptive-Origin Guidance·arXiv2026·
From Weights to Activations: Is Steering the Next Frontier of LLMs?Turner et al.arXiv2026·
GLASS: GRPO-Trained LoRA for Acoustic Style Steering in Zero-Shot TTS·arXiv2026·
MAGIC-TTS: Fine-Grained Controllable Speech Synthesis·arXiv2026·
Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation·arXiv2026·
Self-Supervised Honesty Steering via Anti-Parallel Representations·arXiv2026·
Styles + Persona-plug = Customized LLMs·arXiv2026·
TADA! Tuning Audio Diffusion Models through Activation Steering·arXiv2026·
The Statistical Signature of LLMs·arXiv2026·
A Training-free Method for LLM Text Attribution·arXiv2025·
Benchmark of Stylistic Variation in LLM-Generated Texts·arXiv2025·
Beyond Checkmate: Creative Choke Points in AI Text·arXiv2025·
Can You Detect the Difference? Stylometric Comparison of Diffusion vs Autoregressive TextTarım & OnanarXiv2025·
Detecting LLM-Generated Short Answers·arXiv2025·
DivEye: Diversity Boosts AI-Generated Text Detection·arXiv2025·
Evaluating the Diversity and Quality of LLM Generated Content·arXiv2025·
LLMBraces: Straightening Out LLM Predictions·arXiv2025·
LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday People·arXiv2025·
Low-Rank Adaptation for Foundation Models — Survey·arXiv2025·
Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation·arXiv2025·
Personalized Text Generation with Contrastive Activation SteeringLiu et al.arXiv2025·
RedNote-Vibe: Temporal Dynamics of AI-Generated Text·arXiv2025·
Stylometry Recognizes Human and LLM-Generated Texts·arXiv2025·
Continuous Language Model Interpolation for Dynamic Control·arXiv2024·
Detecting AI-Generated Text: Factors Influencing Detectability·arXiv2024·
Large Language Models can be Guided to Evade AI-Generated Text Detection (SICO)Lu et al.TMLR 20242024·
StyleRemix: Authorship Obfuscation via DistillationFisher et al.arXiv2024·
Toward a realistic model of speech processing in the brain with SSL·arXiv2022·