NEWS
LLMR 0.8.4
Bug fixes
llm_batch_submit(): a named character vector like
c(system = "...", user = "...") is now correctly treated as a single
multi-role request rather than being split into separate batch jobs.
Unnamed character vectors still expand one element per request as before.
llm_batch_fetch() and call_llm_stream(): total token count is now NA
when both sent and received counts are unknown, consistent with the
call_llm() path. Previously sum(..., na.rm = TRUE) returned a false 0.
chat_session(): now rejects embedding configs inferred from model names
(e.g. "gemini-embedding-001") in addition to those with embedding = TRUE
set explicitly. Uses .is_embedding_config() consistently with all other
generative-only entry points.
inst/examples/demo.qmd dependencies (stringi, kableExtra) added to
Suggests so users rendering the installed example find them declared.
- Stray closing code fence removed from
vignettes/quickstart.Rmd.
LLMR 0.8.3
New features
- Provider batch APIs.
llm_batch_submit(), llm_batch_status(),
llm_batch_fetch(), and llm_batch_cancel() drive the asynchronous batch
endpoints of OpenAI, Groq, Anthropic, and Gemini, which price requests at
roughly half the live rate. Jobs are plain R objects without secrets (keys
stay environment references), so a job can be saved with state_path=,
the session closed, and results fetched later or from another machine.
- Audit log for reproducible research.
llm_log_enable(path) appends one
JSON record per API call (timestamp, provider, model, the served
model_version, full request parameters and messages, reply text, token
usage including cached tokens, request id, status, timing) to a JSONL file;
llm_log_disable() and llm_log_status() manage it. Failed calls are
logged too. include_messages = FALSE keeps a metadata-only trail for
confidential prompts.
- A draft methods paragraph.
llm_methods_text() turns a result frame
into the transparency paragraph journals increasingly require: models,
providers, call counts, recorded inference settings, token totals, and
failure counts, stated only as far as the data supports.
- Replication and reliability.
llm_replicate() runs every row .times
times through the parallel engine; llm_agreement() reports per-unit
majority labels and overall reliability (mean pairwise agreement and
Krippendorff's alpha for nominal data, missing-safe).
- Native tool calling with an execution loop.
llm_tool() wraps an R
function with a JSON-Schema argument spec; call_llm_tools() sends the
definitions, executes the calls the model makes, feeds results back, and
returns the final response with a tool_history attribute.
tool_calls() extracts requested calls from any response. Supported on
OpenAI-compatible providers and Anthropic.
- Streaming.
call_llm_stream() delivers the reply chunk by chunk
(callback) for OpenAI-compatible providers, Anthropic, and Gemini, and
returns a normal llmr_response at the end, with usage when the provider
reports it in the stream.
- Token log-probabilities. Pass
logprobs = TRUE (and top_logprobs = k)
in llm_config(); llm_logprobs() returns them tidily (OpenAI-compatible
providers and Gemini) for confidence scoring and calibration.
- Reproducibility knobs. A canonical
seed parameter is forwarded where
supported; every response now records model_version (the identifier the
server reports having served) and, when returned separately, the model's
reasoning text in a thinking field.
- Prompt caching.
cache = TRUE marks the system prompt and tools as
cacheable for Anthropic; cached prompt tokens are now extracted from all
providers that report them (tokens(x)$cached, cached_tokens /
*_cached diagnostic columns) and counted by llm_usage().
- Cost estimates on your own prices.
llm_usage(price_table = ...)
accepts a user-supplied table (per-million-token prices, optional cached
rate) and adds a cost_estimate; LLMR still ships no price list, on
purpose.
- NA policy for templates.
llm_fn()/llm_mutate() gain
.na_action = c("send", "skip", "error"); llm_preview() flags rows whose
templates reference NA or render empty prompts.
- Quality-of-life.
chat_session() gains quiet= and multimodal sends
(named-vector shortcut); a one-line summary message after runs with
failures or truncations points to llm_failures(); print(llm_config) is
masked and informative; options(llmr.quiet = TRUE) silences advisory
notes; every request carries a total timeout (timeout= per config or
options(llmr.timeout=), default 600 s).
Provider-path overhaul
- The nine OpenAI-compatible providers (groq, together, deepseek, xiaomi,
alibaba, zhipu, moonshot, xai, ollama) now share one request builder with
the full feature set: multimodal file parts (previously serialized raw,
with the local path leaked to the provider), the complete canonical
parameter set, structured output, tools, verbatim extras passthrough, and
the modifiability hooks. Parameters a provider rejects are dropped with a
console note instead of silently.
- The
req_builder, response_modifier, and request_modifier hooks now
apply on every provider path (previously only OpenAI) and are documented
in llm_config().
- Gemini:
thinking_budget / include_thoughts are finally sent
(generationConfig.thinkingConfig); presence/frequency penalties, seed,
and logprobs are translated rather than dropped; enable_structured_output()
sends the schema by default via responseJsonSchema (set
gemini_enable_response_schema = FALSE for old models); embeddings use
batchEmbedContents (one HTTP call per up to 100 texts instead of one per
text) and go through the standard error handling; Gemini thought parts are
kept out of the answer text and surfaced as thinking.
- Anthropic:
top_k is supported again (it was wrongly dropped as
"unsupported"); no_change is honored; PDFs are sent as document blocks
(previously mislabeled as images); an invalid thinking_budget >= max_tokens combination warns before the call; unknown typed content
blocks pass through, enabling the tool loop; the false claim that LLMR
"sets the beta header automatically" is gone from the docs
(anthropic_beta = ... sends one).
- OpenAI:
max_completion_tokens is honored directly (reasoning models no
longer pay a guaranteed failed round trip); the Responses-API autodetect
recognizes gpt-5-pro and the deep-research models and no longer matches
nonexistent ones; the Responses path drops the parameters that API rejects
(penalties, logprobs, seed) with a note, maps reasoning_effort to its
nested shape, and no longer duplicates system text when all messages are
system.
- Embedding routing now follows the documented inference (
embedding = NULL
- "embedding" in the model name) for every provider, and embedding
endpoints no longer receive chat-only parameters from a reused config.
get_batched_embeddings() gains retry controls and its verbose default
is documented correctly.
Reliability fixes
- The retry helper no longer sleeps after the final failed attempt
(previously up to ~104 minutes wasted on the default schedule); waits
honor
Retry-After when a 429 provides one, and add jitter so parallel
workers do not retry in lockstep.
- Retry classification is exact for typed errors: rate limits and server
errors (now including 403->auth, 408->retryable) retry; parameter,
authentication, and quota errors fail fast. The over-broad
"exceeded"
message pattern (which retried context_length_exceeded for the full
schedule) is gone.
call_llm_robust() defaults are humane (wait_seconds = 2,
backoff_factor = 3) and consistent with call_llm_par();
start_jitter in the parallel engine defaults to 0 (it silently added up
to 5 s per row).
- Failed rows in
call_llm_par() results now carry the provider's raw error
body in raw_response_json.
Bug fixes
expand_llm_config(): sweeping provider now updates the S3 class, so
the swept config dispatches to the right API (previously every swept
provider was called through the original provider's path).
call_llm_par_structured() / call_llm_par_tags(): prompts containing
literal braces (typical for structured-output instructions) no longer
abort the run; strings glue cannot parse pass through verbatim.
- Key handling: an empty-string
api_key (what Sys.getenv() returns
for an unset variable) falls back to the provider's default variables with
a warning instead of sending an empty Bearer header; NA keys are
rejected; a vector api_key works as ordered fallbacks; legacy configs
holding a literal key string resolve correctly and error messages never
echo a key.
- Chat sessions: the NA-token fix now also covers
send_structured() and
send_tags() (one usage-less response could previously poison the running
totals); multi-part sends are no longer truncated to their first element.
- Column safety:
llm_mutate() replaces an existing output column with a
notice (mutate semantics) instead of letting bind_cols() mangle both
names; hoisted structured/tag fields never silently overwrite existing
columns (suffix + warning); the parallel engine's collision warning fires
regardless of verbose; summary.llmr_experiment() follows
collision-renamed columns.
- Row batching: user tags shaped like
row_2 are rejected in batched tag
mode (they would scramble the demultiplexer); assistant turns error
instead of being silently dropped; duration is attributed once per batch
so llm_usage() stops overcounting wall time; fully-failed batch calls
attribute their token spend to the first failed row; the protocol
instructions state the actual item count.
- Structured output: JSON recovery is string-aware (braces inside string
values no longer corrupt extraction);
disable_structured_output()
removes a custom-named schema tool; llm_mutate_structured() validates
locally like llm_fn_structured() (new .validate_local).
enable_structured_output() now knows which providers reject a server-side
json_schema (DeepSeek, Alibaba, Zhipu, Moonshot, Xiaomi) and requests
JSON-object mode with local validation there, instead of a guaranteed
HTTP 400. In strict mode the schema sent to the provider is hardened the
way the OpenAI protocol formally requires (additionalProperties: false
and all properties required, filled in only where unspecified), so plain
schemas work on OpenAI and Groq without boilerplate.
llm_par_resume() works on call_llm_sweep() / call_llm_broadcast()
/ call_llm_compare() results (they now keep the config list-column) and
refreshes structured_ok / structured_data for re-run rows.
- Responses: OpenAI-style refusals surface their text and map to
finish_reason = "filter" (as do Gemini safety verdicts such as
RECITATION); llm_judge() gains .output= and refuses to clobber a
.target column.
- Assorted:
call_llm() troubleshooting output prints real newlines; dead
code removed (parse_embeddings() no-op branch, unreachable per-provider
default models); embedding example code no longer uses the discouraged
Sys.getenv() key pattern.
Documentation
llm_config() now documents the full canonical parameter set (including
seed, logprobs, thinking_budget, timeout, cache) and the three
request hooks. The Anthropic thinking example is valid
(max_tokens > thinking_budget). build_factorial_experiments() documents
that system prompts are crossed, not recycled. Vignettes updated
accordingly, plus a new article on reproducibility and cost.
- Live tests and examples run on inexpensive open-weight models (Groq
openai/gpt-oss-20b, DeepSeek, Moonshot, Qwen, Gemini Flash-Lite).
LLMR 0.8.0
New features
- Preview a call before spending anything.
llm_preview() renders exactly
what llm_fn() / llm_mutate() would send (using the same internal renderer,
so it can never drift from the real path) without making any API call or
reading/encoding files. It returns a row-level tibble with the rendered
messages, roles, character counts, file presence/existence, the batch plan
(batch_id / batch_size / batch_row), and an issues list-column that
flags problems up front: missing files, "file" content combined with
.batch_size > 1, an embedding config with row batching, .return = "object"
with batching, or a schema supplied without .structured. llm_render_messages()
exposes just the rendered message objects.
- Summarize a finished run.
llm_usage() reports outcome counts and token
totals (sent / received / total / reasoning) plus truncation and filter
counts, reading the diagnostic columns that call_llm_par() and llm_mutate()
already produce. It works on both result shapes and sums tokens with
na.rm = TRUE, which is correct under row batching. It reports tokens, not
money: no dollar figures and no built-in price table (which would go stale).
- Find and re-run failures.
llm_failures() lists exactly which rows failed
or were truncated/filtered, with status_code, error_code, bad_param,
and error_message. For a call_llm_par() result, pass the original frame to
the existing llm_par_resume() to re-run only those rows.
Internal
- The per-row message rendering used by
llm_fn(), llm_mutate(), and
llm_preview() is now a single shared internal helper, locked by golden tests
so its output stays byte-identical to previous releases.
Bug fixes
.fields = FALSE now correctly skips field extraction in structured/JSON
mode (keeping only the structured_data list-column), matching tag mode.
Previously the logical FALSE was treated as a one-element field name.
- Missing token usage is reported as
NA, not 0. When a provider returns
no usage metadata, chat sessions and responses no longer record the call as
having used zero tokens; running chat totals add NA as 0 so one unknown
response cannot poison the cumulative count.
llm_usage() / llm_failures() / llm_par_resume() handle the case where
call_llm_par() collision-renamed its output columns (because the input frame
already had a column named success, etc.): the summaries follow the renamed
columns, and llm_par_resume() raises a clear, actionable error.
- Bare environment-variable API keys.
api_key = "OPENAI_API_KEY" is now
always treated as an environment-variable reference, even when that variable is
not yet set (it then fails with a clear "missing env var" message at call time
instead of silently sending the literal name as the key), matching the
documented behavior.
llm_api_key_env(required = FALSE) is now honored: a missing variable
yields an empty key instead of an authentication error.
llm_parse_structured_col() now returns a tibble on every path, including
when the structured column is absent.
llm_usage() gains an n_unknown_tokens count so an all-NA token column
(a provider that reports no usage) is no longer indistinguishable from a true
zero. The token sums still use na.rm = TRUE, which is correct for batching.
Documentation
llm_api_key_env() is now exported and documented. The help also notes that
the simplest approach is to set the standard <PROVIDER>_API_KEY variable and
pass no key at all.
- New "LLMR in 5 minutes" quickstart vignette: install, set a key, a first
call_llm(), a generative llm_fn() over a vector, a data-frame llm_mutate(),
and tagged + batched extraction, all on the open-weight gpt-oss-20b so the
examples are runnable for everyone. The structured-output articles now lead
with a concrete example before the provider-by-provider details.
- The troubleshooting help no longer claims the API key is printed (it is
masked). The embeddings vignette can now be enabled with
LLMR_RUN_VIGNETTES=true (its run flag was previously hard-coded off), and its
stale prebuilt HTML was removed.
- Fixed a vignette that referenced a non-existent function and assorted stale
version labels; removed non-ASCII look-alike punctuation from R sources.
LLMR 0.7.2
Bug fixes
- Retry/error classification. API error messages containing curly braces
(common when providers echo JSON fragments) no longer break error
construction: the typed condition class,
status_code, and provider message
are preserved, so retryable rate-limit/server errors are retried correctly
instead of being misclassified. The retry helper also re-raises the original
typed error after exhausting attempts, and its wait-time message no longer
errors on fractional backoff values.
llm_par_resume() now re-runs only the failed/NA rows instead of every
row (it previously collapsed the per-row success vector with isTRUE()).
- Parallel tag/structured helpers. Unnamed (bare-prompt) messages passed to
call_llm_par_tags() / call_llm_par_structured() no longer have their
template text used as the message role.
- JSON recovery. A top-level JSON array embedded in prose is now extracted in
full (a regex character-class bug previously truncated it to the first object).
- Embeddings.
get_batched_embeddings() no longer locks in a wrong vector
dimension when the first batch returns empty, preserving the one-row-per-input
contract.
- Gemini token counts. Usage is preserved when
candidatesTokenCount is
absent (e.g. reasoning models with no visible output).
- Printing.
print() on an llmr_response with a non-standard finish reason
no longer drops the status line.
LLMR 0.7.1
New features
- Row batching for generative calls.
llm_fn() and llm_mutate() (and the
_tags / _structured variants) gain .batch_size, .batch_payload, and
.batch_recovery. With .batch_size > 1, several rows are packed into one
request wrapped in numbered <row_1>...</row_1> tags and de-multiplexed back
into rows, with fault-tolerant recovery (split-and-retry down to single rows)
for dropped, reordered, or truncated rows. Composes with .tags (one level of
required nesting) and with structured/JSON output ({"results":[{"row":i, ...}]}). The default .batch_size = 1 reproduces the previous one-call-per-row
behaviour exactly. New exported helper llm_parse_batch_tags().
LLMR 0.7.0
New features
- Soft structured output via XML-like tags.
llm_mutate() gains .tags,
backed by llm_mutate_tags(), llm_fn_tags(), llm_parse_tags(),
llm_parse_tags_col(), and call_llm_par_tags().
- Four new providers: Xiaomi MiMo, Alibaba (Qwen), Zhipu (GLM), and Moonshot
(Kimi). All use OpenAI-compatible structured output.
- Gemini Vertex AI supported via
llm_config("gemini", ..., vertex = TRUE).
- Multi-variable API key fallback. Providers can declare multiple
environment variable names (e.g.,
XIAOMI_KEY or XIAOMI_API_KEY); the first
one found is used.
Bug fixes
- Fixed API key resolution for providers with multiple fallback env vars.
- Removed dead
requireNamespace("LLMR") guard inside llm_mutate().
LLMR 0.6.3 (2025-10-11)
- Added Ollama provider (local generative and embedding models).
- Stable column names (
v1...vN) in get_batched_embeddings().
LLMR 0.6.2
llm_mutate() shorthand: llm_mutate(answer = "{question}", .config = cfg).
.structured = TRUE flag in llm_mutate() for inline JSON parsing.
setup_llm_parallel() accepts a positional numeric workers argument.
LLMR 0.6.1
- Fixed a bug that affected Anthropic calls.
LLMR 0.6.0 (2025-08-26)
call_llm() now returns an llmr_response object. Use as.character(x) for
plain text. Legacy json= arguments are removed.
- Secure API key handling: literal keys are moved to temporary env vars.
- Structured JSON output and schema validation.
- Multi-column injection in
llm_mutate().