Synthetic Writing Samples with Controlled Quality Levels (N = 1000) — example_writing

A synthetic dataset of 1,000 short writing samples generated by a large language model for use in pairwise comparison and ranking experiments.

Usage

data("example_writing_samples1000")

Format

A tibble with 1,000 rows and 7 variables:

ID: Character. Unique sample identifier (S0001–S1000).
text: Character. The writing sample (approximately 120–180 words).
quality_level: Integer. Intended quality level used during generation (1–20).
theta_true: Numeric. Centered latent-quality proxy derived from quality_level.
prompt_id: Character. Identifier for the generation prompt template.
model: Character. Language model used to generate the samples.
created_at: POSIXct. Timestamp (UTC) when the samples were generated.

Source

Generated via live OpenAI API calls using a controlled, bucketed quality prompt. See data-raw/generate_example_writing_samples1000.R for details.

Details

Samples are generated in 20 discrete quality levels (1 = lowest, 20 = highest), with multiple responses per level. Quality levels are intended to represent overlapping ranges of overall writing quality rather than a strict total ordering, allowing for realistic noise and near-ties in pairwise judgments.

All samples respond to the same writing prompt to avoid topic effects. The dataset is primarily intended for benchmarking ranking models and for comparing random versus adaptive pair selection strategies under limited judgment budgets.

The column theta_true provides a centered numeric proxy for the latent quality dimension derived from quality_level. This proxy is intended for evaluation purposes (e.g., rank recovery or correlation) and does not imply a perfectly ordered ground truth at the individual-sample level.

Examples

data(example_writing_samples1000)
head(example_writing_samples1000)
#> # A tibble: 6 × 7
#>   ID    text        quality_level theta_true prompt_id model created_at         
#>   <chr> <chr>               <int>      <dbl> <chr>     <chr> <dttm>             
#> 1 S0001 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 2 S0002 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 3 S0003 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 4 S0004 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 5 S0005 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 6 S0006 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42