Run a full OpenAI batch pipeline for pairwise comparisons — run_openai_batch

This helper wires together the existing pieces:

build_openai_batch_requests()
write_openai_batch_file()
openai_upload_batch_file()
openai_create_batch()
optionally openai_poll_batch_until_complete()
optionally openai_download_batch_output()
optionally parse_openai_batch_output()

Usage

run_openai_batch_pipeline(
  pairs,
  model,
  trait_name,
  trait_description,
  prompt_template = set_prompt_template(),
  include_thoughts = FALSE,
  include_raw = FALSE,
  endpoint = NULL,
  batch_input_path = tempfile("openai_batch_input_", fileext = ".jsonl"),
  batch_output_path = tempfile("openai_batch_output_", fileext = ".jsonl"),
  poll = TRUE,
  interval_seconds = 5,
  timeout_seconds = 600,
  max_attempts = Inf,
  metadata = NULL,
  api_key = NULL,
  ...
)

Arguments

pairs: Tibble of pairs with at least ID1, text1, ID2, text2. Typically produced by make_pairs(), sample_pairs(), and randomize_pair_order().
model: OpenAI model name (e.g. "gpt-4.1", "gpt-5.1").
trait_name: Trait name to pass to build_openai_batch_requests().
trait_description: Trait description to pass to build_openai_batch_requests().
prompt_template: Prompt template string, typically from set_prompt_template().
include_thoughts: Logical; if TRUE and using endpoint = "responses", requests reasoning-style summaries to populate the thoughts column in the parsed output. When endpoint is not supplied, include_thoughts = TRUE causes the responses endpoint to be selected automatically.
include_raw: Logical; if TRUE, attaches the raw model response as a list-column raw_response in the parsed results.
endpoint: One of "chat.completions" or "responses". If NULL (or omitted), it is chosen automatically as described above.
batch_input_path: Path to write the batch input .jsonl file. Defaults to a temporary file.
batch_output_path: Path to write the batch output .jsonl file if poll = TRUE. Defaults to a temporary file.
poll: Logical; if TRUE, the function will poll the batch until it reaches a terminal status using openai_poll_batch_until_complete() and then download and parse the output. If FALSE, it stops after creating the batch and returns without polling or parsing.
interval_seconds: Polling interval in seconds (used when poll = TRUE).
timeout_seconds: Maximum total time in seconds for polling before giving up (used when poll = TRUE).
max_attempts: Maximum number of polling attempts (primarily useful for testing).
metadata: Optional named list of metadata key–value pairs to pass to openai_create_batch().
api_key: Optional OpenAI API key. Defaults to Sys.getenv("OPENAI_API_KEY").
...: Additional arguments passed through to build_openai_batch_requests(), e.g. temperature, top_p, logprobs, reasoning.

Value

A list with elements:

batch_input_path – path to the input .jsonl file.
batch_output_path – path to the output .jsonl file (or NULL if poll = FALSE).
file – File object returned by openai_upload_batch_file().
batch – Batch object; if poll = TRUE, this is the final batch after polling, otherwise the initial batch returned by openai_create_batch().
results – Parsed tibble from parse_openai_batch_output() if poll = TRUE, otherwise NULL.

Details

It is a convenience wrapper around these smaller functions and is intended for end-to-end batch runs on a set of pairwise comparisons. For more control (or testing), you can call the components directly.

When endpoint is not specified, it is chosen automatically:

if include_thoughts = TRUE, the "responses" endpoint is used and, for "gpt-5.1", a default reasoning effort of "low" is applied (unless overridden via reasoning).
otherwise, "chat.completions" is used.

Examples

# The OpenAI batch pipeline requires:
# - Internet access
# - A valid OpenAI API key in OPENAI_API_KEY (or supplied via `api_key`)
# - Billable API usage
#
if (FALSE) { # \dontrun{
data("example_writing_samples", package = "pairwiseLLM")

pairs <- example_writing_samples |>
  make_pairs() |>
  sample_pairs(n_pairs = 2, seed = 123) |>
  randomize_pair_order(seed = 456)

td <- trait_description("overall_quality")
tmpl <- set_prompt_template()

# Run a small batch using chat.completions
out <- run_openai_batch_pipeline(
  pairs             = pairs,
  model             = "gpt-4.1",
  trait_name        = td$name,
  trait_description = td$description,
  prompt_template   = tmpl,
  endpoint          = "chat.completions",
  poll              = TRUE,
  interval_seconds  = 5,
  timeout_seconds   = 600
)

print(out$batch$status)
print(utils::head(out$results))
} # }