Skip to contents

A synthetic dataset of 1,000 short writing samples generated by a large language model for use in pairwise comparison and ranking experiments.

Usage

data("example_writing_samples1000")

Format

A tibble with 1,000 rows and 7 variables:

ID

Character. Unique sample identifier (S0001–S1000).

text

Character. The writing sample (approximately 120–180 words).

quality_level

Integer. Intended quality level used during generation (1–20).

theta_true

Numeric. Centered latent-quality proxy derived from quality_level.

prompt_id

Character. Identifier for the generation prompt template.

model

Character. Language model used to generate the samples.

created_at

POSIXct. Timestamp (UTC) when the samples were generated.

Source

Generated via live OpenAI API calls using a controlled, bucketed quality prompt. See data-raw/generate_example_writing_samples1000.R for details.

Details

Samples are generated in 20 discrete quality levels (1 = lowest, 20 = highest), with multiple responses per level. Quality levels are intended to represent overlapping ranges of overall writing quality rather than a strict total ordering, allowing for realistic noise and near-ties in pairwise judgments.

All samples respond to the same writing prompt to avoid topic effects. The dataset is primarily intended for benchmarking ranking models and for comparing random versus adaptive pair selection strategies under limited judgment budgets.

The column theta_true provides a centered numeric proxy for the latent quality dimension derived from quality_level. This proxy is intended for evaluation purposes (e.g., rank recovery or correlation) and does not imply a perfectly ordered ground truth at the individual-sample level.

Examples

data(example_writing_samples1000)
head(example_writing_samples1000)
#> # A tibble: 6 × 7
#>   ID    text        quality_level theta_true prompt_id model created_at         
#>   <chr> <chr>               <int>      <dbl> <chr>     <chr> <dttm>             
#> 1 S0001 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 2 S0002 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 3 S0003 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 4 S0004 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 5 S0005 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42
#> 6 S0006 A challeng…             1       -9.5 bucketed… gpt-… 2026-02-02 22:52:42