NEWS

LLMR 0.8.4

Bug fixes

llm_batch_submit(): a named character vector like c(system = "...", user = "...") is now correctly treated as a single multi-role request rather than being split into separate batch jobs. Unnamed character vectors still expand one element per request as before.
llm_batch_fetch() and call_llm_stream(): total token count is now NA when both sent and received counts are unknown, consistent with the call_llm() path. Previously sum(..., na.rm = TRUE) returned a false 0.
chat_session(): now rejects embedding configs inferred from model names (e.g. "gemini-embedding-001") in addition to those with embedding = TRUE set explicitly. Uses .is_embedding_config() consistently with all other generative-only entry points.
inst/examples/demo.qmd dependencies (stringi, kableExtra) added to Suggests so users rendering the installed example find them declared.
Stray closing code fence removed from vignettes/quickstart.Rmd.

LLMR 0.8.3

New features

Provider batch APIs. llm_batch_submit(), llm_batch_status(), llm_batch_fetch(), and llm_batch_cancel() drive the asynchronous batch endpoints of OpenAI, Groq, Anthropic, and Gemini, which price requests at roughly half the live rate. Jobs are plain R objects without secrets (keys stay environment references), so a job can be saved with state_path=, the session closed, and results fetched later or from another machine.
Audit log for reproducible research. llm_log_enable(path) appends one JSON record per API call (timestamp, provider, model, the served model_version, full request parameters and messages, reply text, token usage including cached tokens, request id, status, timing) to a JSONL file; llm_log_disable() and llm_log_status() manage it. Failed calls are logged too. include_messages = FALSE keeps a metadata-only trail for confidential prompts.
A draft methods paragraph. llm_methods_text() turns a result frame into the transparency paragraph journals increasingly require: models, providers, call counts, recorded inference settings, token totals, and failure counts, stated only as far as the data supports.
Replication and reliability. llm_replicate() runs every row .times times through the parallel engine; llm_agreement() reports per-unit majority labels and overall reliability (mean pairwise agreement and Krippendorff's alpha for nominal data, missing-safe).
Native tool calling with an execution loop. llm_tool() wraps an R function with a JSON-Schema argument spec; call_llm_tools() sends the definitions, executes the calls the model makes, feeds results back, and returns the final response with a tool_history attribute. tool_calls() extracts requested calls from any response. Supported on OpenAI-compatible providers and Anthropic.
Streaming. call_llm_stream() delivers the reply chunk by chunk (callback) for OpenAI-compatible providers, Anthropic, and Gemini, and returns a normal llmr_response at the end, with usage when the provider reports it in the stream.
Token log-probabilities. Pass logprobs = TRUE (and top_logprobs = k) in llm_config(); llm_logprobs() returns them tidily (OpenAI-compatible providers and Gemini) for confidence scoring and calibration.
Reproducibility knobs. A canonical seed parameter is forwarded where supported; every response now records model_version (the identifier the server reports having served) and, when returned separately, the model's reasoning text in a thinking field.
Prompt caching. cache = TRUE marks the system prompt and tools as cacheable for Anthropic; cached prompt tokens are now extracted from all providers that report them (tokens(x)$cached, cached_tokens / *_cached diagnostic columns) and counted by llm_usage().
Cost estimates on your own prices. llm_usage(price_table = ...) accepts a user-supplied table (per-million-token prices, optional cached rate) and adds a cost_estimate; LLMR still ships no price list, on purpose.
NA policy for templates. llm_fn()/llm_mutate() gain .na_action = c("send", "skip", "error"); llm_preview() flags rows whose templates reference NA or render empty prompts.
Quality-of-life. chat_session() gains quiet= and multimodal sends (named-vector shortcut); a one-line summary message after runs with failures or truncations points to llm_failures(); print(llm_config) is masked and informative; options(llmr.quiet = TRUE) silences advisory notes; every request carries a total timeout (timeout= per config or options(llmr.timeout=), default 600 s).

Provider-path overhaul

The nine OpenAI-compatible providers (groq, together, deepseek, xiaomi, alibaba, zhipu, moonshot, xai, ollama) now share one request builder with the full feature set: multimodal file parts (previously serialized raw, with the local path leaked to the provider), the complete canonical parameter set, structured output, tools, verbatim extras passthrough, and the modifiability hooks. Parameters a provider rejects are dropped with a console note instead of silently.
The req_builder, response_modifier, and request_modifier hooks now apply on every provider path (previously only OpenAI) and are documented in llm_config().
Gemini: thinking_budget / include_thoughts are finally sent (generationConfig.thinkingConfig); presence/frequency penalties, seed, and logprobs are translated rather than dropped; enable_structured_output() sends the schema by default via responseJsonSchema (set gemini_enable_response_schema = FALSE for old models); embeddings use batchEmbedContents (one HTTP call per up to 100 texts instead of one per text) and go through the standard error handling; Gemini thought parts are kept out of the answer text and surfaced as thinking.
Anthropic: top_k is supported again (it was wrongly dropped as "unsupported"); no_change is honored; PDFs are sent as document blocks (previously mislabeled as images); an invalid thinking_budget >= max_tokens combination warns before the call; unknown typed content blocks pass through, enabling the tool loop; the false claim that LLMR "sets the beta header automatically" is gone from the docs (anthropic_beta = ... sends one).
OpenAI: max_completion_tokens is honored directly (reasoning models no longer pay a guaranteed failed round trip); the Responses-API autodetect recognizes gpt-5-pro and the deep-research models and no longer matches nonexistent ones; the Responses path drops the parameters that API rejects (penalties, logprobs, seed) with a note, maps reasoning_effort to its nested shape, and no longer duplicates system text when all messages are system.
Embedding routing now follows the documented inference (embedding = NULL
- "embedding" in the model name) for every provider, and embedding endpoints no longer receive chat-only parameters from a reused config. get_batched_embeddings() gains retry controls and its verbose default is documented correctly.

Reliability fixes

The retry helper no longer sleeps after the final failed attempt (previously up to ~104 minutes wasted on the default schedule); waits honor Retry-After when a 429 provides one, and add jitter so parallel workers do not retry in lockstep.
Retry classification is exact for typed errors: rate limits and server errors (now including 403->auth, 408->retryable) retry; parameter, authentication, and quota errors fail fast. The over-broad "exceeded" message pattern (which retried context_length_exceeded for the full schedule) is gone.
call_llm_robust() defaults are humane (wait_seconds = 2, backoff_factor = 3) and consistent with call_llm_par(); start_jitter in the parallel engine defaults to 0 (it silently added up to 5 s per row).
Failed rows in call_llm_par() results now carry the provider's raw error body in raw_response_json.

Bug fixes

expand_llm_config(): sweeping provider now updates the S3 class, so the swept config dispatches to the right API (previously every swept provider was called through the original provider's path).
call_llm_par_structured() / call_llm_par_tags(): prompts containing literal braces (typical for structured-output instructions) no longer abort the run; strings glue cannot parse pass through verbatim.
Key handling: an empty-string api_key (what Sys.getenv() returns for an unset variable) falls back to the provider's default variables with a warning instead of sending an empty Bearer header; NA keys are rejected; a vector api_key works as ordered fallbacks; legacy configs holding a literal key string resolve correctly and error messages never echo a key.
Chat sessions: the NA-token fix now also covers send_structured() and send_tags() (one usage-less response could previously poison the running totals); multi-part sends are no longer truncated to their first element.
Column safety: llm_mutate() replaces an existing output column with a notice (mutate semantics) instead of letting bind_cols() mangle both names; hoisted structured/tag fields never silently overwrite existing columns (suffix + warning); the parallel engine's collision warning fires regardless of verbose; summary.llmr_experiment() follows collision-renamed columns.
Row batching: user tags shaped like row_2 are rejected in batched tag mode (they would scramble the demultiplexer); assistant turns error instead of being silently dropped; duration is attributed once per batch so llm_usage() stops overcounting wall time; fully-failed batch calls attribute their token spend to the first failed row; the protocol instructions state the actual item count.
Structured output: JSON recovery is string-aware (braces inside string values no longer corrupt extraction); disable_structured_output() removes a custom-named schema tool; llm_mutate_structured() validates locally like llm_fn_structured() (new .validate_local). enable_structured_output() now knows which providers reject a server-side json_schema (DeepSeek, Alibaba, Zhipu, Moonshot, Xiaomi) and requests JSON-object mode with local validation there, instead of a guaranteed HTTP 400. In strict mode the schema sent to the provider is hardened the way the OpenAI protocol formally requires (additionalProperties: false and all properties required, filled in only where unspecified), so plain schemas work on OpenAI and Groq without boilerplate.
llm_par_resume() works on call_llm_sweep() / call_llm_broadcast() / call_llm_compare() results (they now keep the config list-column) and refreshes structured_ok / structured_data for re-run rows.
Responses: OpenAI-style refusals surface their text and map to finish_reason = "filter" (as do Gemini safety verdicts such as RECITATION); llm_judge() gains .output= and refuses to clobber a .target column.
Assorted: call_llm() troubleshooting output prints real newlines; dead code removed (parse_embeddings() no-op branch, unreachable per-provider default models); embedding example code no longer uses the discouraged Sys.getenv() key pattern.

Documentation

llm_config() now documents the full canonical parameter set (including seed, logprobs, thinking_budget, timeout, cache) and the three request hooks. The Anthropic thinking example is valid (max_tokens > thinking_budget). build_factorial_experiments() documents that system prompts are crossed, not recycled. Vignettes updated accordingly, plus a new article on reproducibility and cost.
Live tests and examples run on inexpensive open-weight models (Groq openai/gpt-oss-20b, DeepSeek, Moonshot, Qwen, Gemini Flash-Lite).

LLMR 0.8.0

New features

Preview a call before spending anything. llm_preview() renders exactly what llm_fn() / llm_mutate() would send (using the same internal renderer, so it can never drift from the real path) without making any API call or reading/encoding files. It returns a row-level tibble with the rendered messages, roles, character counts, file presence/existence, the batch plan (batch_id / batch_size / batch_row), and an issues list-column that flags problems up front: missing files, "file" content combined with .batch_size > 1, an embedding config with row batching, .return = "object" with batching, or a schema supplied without .structured. llm_render_messages() exposes just the rendered message objects.
Summarize a finished run. llm_usage() reports outcome counts and token totals (sent / received / total / reasoning) plus truncation and filter counts, reading the diagnostic columns that call_llm_par() and llm_mutate() already produce. It works on both result shapes and sums tokens with na.rm = TRUE, which is correct under row batching. It reports tokens, not money: no dollar figures and no built-in price table (which would go stale).
Find and re-run failures. llm_failures() lists exactly which rows failed or were truncated/filtered, with status_code, error_code, bad_param, and error_message. For a call_llm_par() result, pass the original frame to the existing llm_par_resume() to re-run only those rows.

Internal

The per-row message rendering used by llm_fn(), llm_mutate(), and llm_preview() is now a single shared internal helper, locked by golden tests so its output stays byte-identical to previous releases.

Bug fixes

.fields = FALSE now correctly skips field extraction in structured/JSON mode (keeping only the structured_data list-column), matching tag mode. Previously the logical FALSE was treated as a one-element field name.
Missing token usage is reported as NA, not 0. When a provider returns no usage metadata, chat sessions and responses no longer record the call as having used zero tokens; running chat totals add NA as 0 so one unknown response cannot poison the cumulative count.
llm_usage() / llm_failures() / llm_par_resume() handle the case where call_llm_par() collision-renamed its output columns (because the input frame already had a column named success, etc.): the summaries follow the renamed columns, and llm_par_resume() raises a clear, actionable error.
Bare environment-variable API keys. api_key = "OPENAI_API_KEY" is now always treated as an environment-variable reference, even when that variable is not yet set (it then fails with a clear "missing env var" message at call time instead of silently sending the literal name as the key), matching the documented behavior.
llm_api_key_env(required = FALSE) is now honored: a missing variable yields an empty key instead of an authentication error.
llm_parse_structured_col() now returns a tibble on every path, including when the structured column is absent.
llm_usage() gains an n_unknown_tokens count so an all-NA token column (a provider that reports no usage) is no longer indistinguishable from a true zero. The token sums still use na.rm = TRUE, which is correct for batching.

Documentation

llm_api_key_env() is now exported and documented. The help also notes that the simplest approach is to set the standard <PROVIDER>_API_KEY variable and pass no key at all.
New "LLMR in 5 minutes" quickstart vignette: install, set a key, a first call_llm(), a generative llm_fn() over a vector, a data-frame llm_mutate(), and tagged + batched extraction, all on the open-weight gpt-oss-20b so the examples are runnable for everyone. The structured-output articles now lead with a concrete example before the provider-by-provider details.
The troubleshooting help no longer claims the API key is printed (it is masked). The embeddings vignette can now be enabled with LLMR_RUN_VIGNETTES=true (its run flag was previously hard-coded off), and its stale prebuilt HTML was removed.
Fixed a vignette that referenced a non-existent function and assorted stale version labels; removed non-ASCII look-alike punctuation from R sources.

LLMR 0.7.2

Bug fixes

Retry/error classification. API error messages containing curly braces (common when providers echo JSON fragments) no longer break error construction: the typed condition class, status_code, and provider message are preserved, so retryable rate-limit/server errors are retried correctly instead of being misclassified. The retry helper also re-raises the original typed error after exhausting attempts, and its wait-time message no longer errors on fractional backoff values.
llm_par_resume() now re-runs only the failed/NA rows instead of every row (it previously collapsed the per-row success vector with isTRUE()).
Parallel tag/structured helpers. Unnamed (bare-prompt) messages passed to call_llm_par_tags() / call_llm_par_structured() no longer have their template text used as the message role.
JSON recovery. A top-level JSON array embedded in prose is now extracted in full (a regex character-class bug previously truncated it to the first object).
Embeddings. get_batched_embeddings() no longer locks in a wrong vector dimension when the first batch returns empty, preserving the one-row-per-input contract.
Gemini token counts. Usage is preserved when candidatesTokenCount is absent (e.g. reasoning models with no visible output).
Printing. print() on an llmr_response with a non-standard finish reason no longer drops the status line.

LLMR 0.7.1

New features

Row batching for generative calls. llm_fn() and llm_mutate() (and the _tags / _structured variants) gain .batch_size, .batch_payload, and .batch_recovery. With .batch_size > 1, several rows are packed into one request wrapped in numbered <row_1>...</row_1> tags and de-multiplexed back into rows, with fault-tolerant recovery (split-and-retry down to single rows) for dropped, reordered, or truncated rows. Composes with .tags (one level of required nesting) and with structured/JSON output ({"results":[{"row":i, ...}]}). The default .batch_size = 1 reproduces the previous one-call-per-row behaviour exactly. New exported helper llm_parse_batch_tags().

LLMR 0.7.0

New features

Soft structured output via XML-like tags. llm_mutate() gains .tags, backed by llm_mutate_tags(), llm_fn_tags(), llm_parse_tags(), llm_parse_tags_col(), and call_llm_par_tags().
Four new providers: Xiaomi MiMo, Alibaba (Qwen), Zhipu (GLM), and Moonshot (Kimi). All use OpenAI-compatible structured output.
Gemini Vertex AI supported via llm_config("gemini", ..., vertex = TRUE).
Multi-variable API key fallback. Providers can declare multiple environment variable names (e.g., XIAOMI_KEY or XIAOMI_API_KEY); the first one found is used.

Bug fixes

Fixed API key resolution for providers with multiple fallback env vars.
Removed dead requireNamespace("LLMR") guard inside llm_mutate().

LLMR 0.6.3 (2025-10-11)

Added Ollama provider (local generative and embedding models).
Stable column names (v1...vN) in get_batched_embeddings().

LLMR 0.6.2

llm_mutate() shorthand: llm_mutate(answer = "{question}", .config = cfg).
.structured = TRUE flag in llm_mutate() for inline JSON parsing.
setup_llm_parallel() accepts a positional numeric workers argument.

LLMR 0.6.1

Fixed a bug that affected Anthropic calls.

LLMR 0.6.0 (2025-08-26)

call_llm() now returns an llmr_response object. Use as.character(x) for plain text. Legacy json= arguments are removed.
Secure API key handling: literal keys are moved to temporary env vars.
Structured JSON output and schema validation.
Multi-column injection in llm_mutate().