The Burstiness Engine

In-generation control · EMNLP / ACL / arXiv

Engineering human rhythm in synthetic text.

Burstiness is the variation in sentence length and complexity that makes writing feel human. We make it a reproducible target and steer it during generation.

Vittoria Lanzo and Dico Angelo

METAVENTIONS AI · Research preprint in progress

43 papers 4 research passes 75 tests passing Spearman ρ = 1.0 sentence-length dial Cohen's d = 0.93
Human rhythmhigh var(L)
Machine textflat

Same content, two rhythms. We make the top distribution a target and steer it during generation.


Abstract

A property to be specified and steered, not a heuristic to be measured.

Burstiness is the most reliable signal that text was produced by a language model, yet it is defined only by a detector heuristic and controlled only by post-hoc rewriting. We treat it instead as a property that can be specified as a distributional target and steered during generation.

We frame the work as modeling human rhythmic patterns, not as evading detection, and we report detector behavior across the full control range so the contribution serves measurement and model analysis as well as generation. The novelty is the composition: applying established steering machinery to a rhythm-specific target, with a reusable definition and perceptual validation.


The wedge

The bridge between two camps is empty.

Demand for human rhythm in AI text is everywhere: detectors, humanizer tools, prompt tricks. The literature splits cleanly, and nobody has crossed the gap.

A

Measurement and detection

A strong, recent line of work quantifies burstiness after the fact to catch machine text. None of it controls the property.

B

Steering machinery

Activation and LoRA methods steer frozen models toward style targets: sentiment, persona, authorship. None of them target rhythm.

Our contribution

A rhythm-specific distributional target, steered in-generation, validated by human perception. The composite is the novelty, not any single mechanism.


A reproducible definition

Burstiness as a vector, not a single scalar.

The canonical operational definition of burstiness in detection practice is a blog post. The field runs on an un-formalized metric. We replace it with a decomposable, reproducible target.

The burstiness vector B = [ var(L),   kurt(L),   mean_surprisal,   fluc_dabs(S),   punct_entropy ]

Variance and kurtosis of sentence length, surprisal fluctuation under a fixed reference model, and punctuation entropy. Our ablation shows the field's implicit metric, standard deviation of surprisal, is the weakest discriminator. Local jumpiness and mean surprisal separate human from machine far better.


Four contributions

From a heuristic to a controller.

01

A reproducible definition

Burstiness as a decomposable distributional target, replacing the single-scalar detector heuristic.

02

The prompt-control ceiling

A clean measurement of how prompt-only control degrades against the target over output length. A standalone negative result.

03

An in-generation controller

Biasing only the sentence-boundary decision yields a monotonic, coherence-preserving length dial. On distilgpt2: Spearman ρ 1.0, Cohen's d 0.93.

04

A perception protocol

A pre-registered human study, adapted from a method validated in speech synthesis, linking controlled rhythm to perceived humanness.


The lab

Where the systems converge.

The Burstiness Engine is built inside METAVENTIONS AI as a self-rebuilding research system: a curated literature corpus, an experiment harness, and a paper that regenerates from a single source of truth. Two people, one rhythm.

Dico Angelo

Dico Angelo

METAVENTIONS AI

Framing, systems, and the in-generation controller. Builds the harness that turns the research spine into a reproducible paper.

VL

Vittoria Lanzo

Co-author

Owns the research spine and the perception protocol: prompt-level control is the baseline, model-level steering is the answer.


The research questions

Seven questions, mapped to the literature.

#QuestionStatus
Q1The prompt-engineering ceiling for rhythmic controlGap, publishable
Q2Burstiness as a learnable rhythm embedding, no base retrainMechanism exists
Q3Is AI text getting more bursty across generations?Empirically yes
Q4Small auxiliary rhythm model, a burstiness LoRAController in progress
Q5Sentence variance vs inter-token timingVariance formalized
Q6Burstiness fingerprint vs idiosyncratic styleStylometry strong
Q7Punctuation and paragraph structure as proxiesGenuine gap

A deeper walk through the four research passes and the methodology lives on the research page.


Current status

The framing, definition, ablation, and negative result are done. A working controller with quantitative wins is the open frontier, with GRPO runs in progress. Explore the live corpus dashboard or the results and figures.