The research process · Vittoria Lanzo and Dico Angelo

From a detector heuristic to a controllable target.

The whole project runs on one spine: prompt-level control is the baseline, so we show its ceiling, and model-level steering is the answer.

Everything below, the four research passes, the seven questions, and the corpus, exists to ground that line in the literature before a single experimental claim is made. This is the honest, source-backed account of the work, not a results brochure. The framing, the formal definition, the metric ablation, and the prompt-control negative result are settled. A working controller with quantitative wins is the open frontier, with GRPO runs in progress.

The thesis and the gap

Two camps, and an empty bridge between them.

Human writing has rhythm: sentence length and complexity vary in bursts. Language-model text is rhythmically flat, an average prosody for text, and that flatness is the single most reliable tell that text was machine-generated. The thesis is that burstiness is a controllable generation property, not just a measurable one. It can be specified as a distributional target and steered during generation, then validated by showing readers perceive the change.

The literature splits cleanly into two camps, and nobody has crossed the gap between them.

Measurement and detection

A strong, recent line of work (Tarim and Onan 2025, DivEye, stylometric detectors, GPTZero) quantifies burstiness after the fact to catch machine text. Several are strong and current. None of them control the property.

Steering machinery

Activation and LoRA methods (Style Vectors 2024, StyleVector 2025, Turner 2026, generated-LoRA fingerprinting) steer frozen models toward style targets: sentiment, persona, authorship. None of them target rhythm.

→

The bridge

A rhythm-specific distributional target, steered in-generation, validated by human perception. And the canonical operational definition of burstiness in detection practice is still a blog post, so the field runs on an un-formalized metric we replace.

The wedge in one line

They detect; we steer. Everything in the measurement camp acts post-hoc on finished text. The Burstiness Engine controls the property during generation, against an explicit distributional target.

The novelty wedge: a measurement camp and a steering camp with an empty bridge between them, which this work fills — The composite novelty. A rhythm-specific distributional target, steered in-generation, validated by human perception. No single mechanism is new; the **combination** is the contribution.

Four research passes

The corpus was built in deliberate, A/B-tested sweeps.

Each pass used real Firecrawl academic searches against arXiv, ScienceDirect, PMC, ACL, and Interspeech indexes, mapped to the seven research questions. Later passes were time-filtered to the last twelve months to surface cutting-edge competition, then A/B-tested against the earlier baseline.

Pass 1 and 2: the baseline literature map

Eight academic searches mapped the seven questions plus the TTS and biology anchors. The steering machinery (Style Vectors, StyleVector) turned out to already exist for other style axes, which set the novelty bar at "apply it to a rhythm target," not "invent the mechanism." It also confirmed the GPTZero burstiness definition is a non-peer-reviewed note, and that Q1, the prompt-control ceiling, had no dedicated paper.
Pass 3: the cutting-edge progression

Time-filtered to the last year, this pass built an A/B timeline from Vaswani 2017 through 2026 activation steering and generated LoRAs. It confirmed the novelty is the composite, not any ingredient, since each mechanism has a 2024 to 2026 precedent. It also pinned the two live competing papers, Tarim and Onan 2025 and DivEye, both of which measure burstiness rather than control it.
Pass 4: closing Q1, Q7, and CTRL-P

Three targeted searches resolved the open actions. CTRL-P was confirmed as a separate Interspeech 2021 paper, a Phase-2 anchor. Q1 stayed open but became contested by a stylistic-variation benchmark, so it sharpened to "control against a distributional target over output length." Q7, punctuation and segmentation as a timing-free proxy, remains a genuine gap with a symbolic-time-series method available.

A fifth pass is a design pass rather than a literature pass: prototype an activation steering vector on one open model and measure whether it moves the metric monotonically before any training investment. That work lives in the experiment harness, not the corpus.

The seven research questions

Vittoria's seven questions, mapped to the literature.

#	Question	Status
Q1	The prompt-engineering ceiling for rhythmic control	Gap, standalone-publishable
Q2	Burstiness as a learnable rhythm embedding, no base retrain	Mechanism exists
Q3	Is AI text getting more bursty across model generations?	Empirically yes
Q4	Small auxiliary rhythm model, a burstiness LoRA	Controller in progress
Q5	Sentence variance vs inter-token timing	Variance formalized
Q6	Burstiness fingerprint vs idiosyncratic style	Stylometry strong
Q7	Punctuation and paragraph structure as proxies	Genuine gap

Two of these resolved into clean standalone sub-contributions after Pass 4: Q1 (prompt-only control ceiling against a distributional rhythm target over length, quality-gated) and Q7 (punctuation and segmentation as a timing-free proxy via a symbolic-time-series method). Q5 timing and the TTS prosody lineage are deliberately Phase 2, not part of the core paper.

How the corpus is built

One source of truth, regenerated, human-curated.

The corpus is not a scrape. It is a hand-curated literature index built around a single rule: the seed file is the source of truth, the database is generated from it, and the public dashboard is regenerated from the database.

The corpus pipeline seed.py → burstiness.db (SQLite) → regen-html → the dashboard

The database is generated and never hand-edited. To add a paper, query, finding, or learning, you edit the seed file with a verified entry and rebuild. Curation is human-in-the-loop on purpose: arXiv IDs are verified before ingest, because a typed-not-searched ID once pulled a jet-physics paper in place of the intended neuroscience reference. The current corpus is 43 papers across the four research passes.

See it live

The same source of truth drives the live dashboard, where every paper, query, and per-pass finding is browsable. The research loop is human-in-the-loop precisely at curation, so what you see there is verified, not auto-ingested.

Compounding learnings

What each session taught the next.

These are the cross-session insights that shaped the method. They are mirrored as rows in the database and carried forward into every new pass.

The novelty is the composite

Every individual mechanism, steering, LoRA, contrastive activation, perception study, has a 2024 to 2026 precedent. The contribution is composing established machinery onto a new target with perception validation, not inventing the steering.

Two competitors are live

Tarim and Onan 2025 and DivEye both ship into the same room. We differentiate on the in-generation control angle, not the measurement angle. They detect; we steer.

Q1 is genuinely under-published

No peer-reviewed work measures the prompt-only stylistic ceiling against distributional targets. Q1 is either a standalone negative-result paper or the motivation section of the main one.

Run real searches; the delegation stub lies

A delegated research chain once reported "completed 100 percent" in one second while retrieving zero papers. Firecrawl academic search is the actual fetcher; the index layer comes after the corpus exists.

TTS perception methodology transfers to text

Listeners discriminate human from AI voice clones by prosodic variance, and reduced variation lowers naturalness. That perception design adapts directly to high-variance versus flat text, human versus steered.

Verify before you ingest

Search-then-add, never type-then-add. Silent ingestion of a wrong arXiv ID poisons the corpus invisibly, so every ID is checked against its redirect title before it enters the seed.

Current status

The framing, definition, ablation, and negative result are done. A working controller with quantitative wins is the open frontier, with GRPO runs in progress. Read the paper, explore the live corpus dashboard, or see the results and figures.