Run a full OpenAI batch pipeline for pairwise comparisons
Source:R/openai_batch_api.R
run_openai_batch_pipeline.RdThis helper wires together the existing pieces:
Usage
run_openai_batch_pipeline(
pairs,
model,
trait_name,
trait_description,
prompt_template = set_prompt_template(),
include_thoughts = FALSE,
include_raw = FALSE,
endpoint = NULL,
batch_input_path = tempfile("openai_batch_input_", fileext = ".jsonl"),
batch_output_path = tempfile("openai_batch_output_", fileext = ".jsonl"),
poll = TRUE,
interval_seconds = 5,
timeout_seconds = 600,
max_attempts = Inf,
metadata = NULL,
api_key = NULL,
...
)Arguments
- pairs
Tibble of pairs with at least
ID1,text1,ID2,text2. Typically produced bymake_pairs(),sample_pairs(), andrandomize_pair_order().- model
OpenAI model name (e.g.
"gpt-4.1","gpt-5.1").- trait_name
Trait name to pass to
build_openai_batch_requests().- trait_description
Trait description to pass to
build_openai_batch_requests().- prompt_template
Prompt template string, typically from
set_prompt_template().- include_thoughts
Logical; if
TRUEand usingendpoint = "responses", requests reasoning-style summaries to populate thethoughtscolumn in the parsed output. Whenendpointis not supplied,include_thoughts = TRUEcauses theresponsesendpoint to be selected automatically.- include_raw
Logical; if
TRUE, attaches the raw model response as a list-columnraw_responsein the parsed results.- endpoint
One of
"chat.completions"or"responses". IfNULL(or omitted), it is chosen automatically as described above.- batch_input_path
Path to write the batch input
.jsonlfile. Defaults to a temporary file.- batch_output_path
Path to write the batch output
.jsonlfile ifpoll = TRUE. Defaults to a temporary file.- poll
Logical; if
TRUE, the function will poll the batch until it reaches a terminal status usingopenai_poll_batch_until_complete()and then download and parse the output. IfFALSE, it stops after creating the batch and returns without polling or parsing.- interval_seconds
Polling interval in seconds (used when
poll = TRUE).- timeout_seconds
Maximum total time in seconds for polling before giving up (used when
poll = TRUE).- max_attempts
Maximum number of polling attempts (primarily useful for testing).
- metadata
Optional named list of metadata key–value pairs to pass to
openai_create_batch().- api_key
Optional OpenAI API key. Defaults to
Sys.getenv("OPENAI_API_KEY").- ...
Additional arguments passed through to
build_openai_batch_requests(), e.g.temperature,top_p,logprobs,reasoning.
Value
A list with elements:
batch_input_path– path to the input.jsonlfile.batch_output_path– path to the output.jsonlfile (orNULLifpoll = FALSE).file– File object returned byopenai_upload_batch_file().batch– Batch object; ifpoll = TRUE, this is the final batch after polling, otherwise the initial batch returned byopenai_create_batch().results– Parsed tibble fromparse_openai_batch_output()ifpoll = TRUE, otherwiseNULL.
Details
It is a convenience wrapper around these smaller functions and is intended for end-to-end batch runs on a set of pairwise comparisons. For more control (or testing), you can call the components directly.
When endpoint is not specified, it is chosen automatically:
if
include_thoughts = TRUE, the"responses"endpoint is used and, for"gpt-5.1", a default reasoning effort of"low"is applied (unless overridden viareasoning).otherwise,
"chat.completions"is used.
Examples
# The OpenAI batch pipeline requires:
# - Internet access
# - A valid OpenAI API key in OPENAI_API_KEY (or supplied via `api_key`)
# - Billable API usage
#
if (FALSE) { # \dontrun{
data("example_writing_samples", package = "pairwiseLLM")
pairs <- example_writing_samples |>
make_pairs() |>
sample_pairs(n_pairs = 2, seed = 123) |>
randomize_pair_order(seed = 456)
td <- trait_description("overall_quality")
tmpl <- set_prompt_template()
# Run a small batch using chat.completions
out <- run_openai_batch_pipeline(
pairs = pairs,
model = "gpt-4.1",
trait_name = td$name,
trait_description = td$description,
prompt_template = tmpl,
endpoint = "chat.completions",
poll = TRUE,
interval_seconds = 5,
timeout_seconds = 600
)
print(out$batch$status)
print(utils::head(out$results))
} # }