Changes in version 0.8.3                        

New features

  - Provider batch APIs. llm_batch_submit(), llm_batch_status(),
    llm_batch_fetch(), and llm_batch_cancel() drive the asynchronous
    batch endpoints of OpenAI, Groq, Anthropic, and Gemini, which price
    requests at roughly half the live rate. Jobs are plain R objects
    without secrets (keys stay environment references), so a job can be
    saved with state_path=, the session closed, and results fetched
    later or from another machine.
  - Audit log for reproducible research. Each record carries a
    schema_version field (currently "1.0") so downstream tools can parse
    the log against a stable contract. llm_log_enable(path) appends one
    JSON record per API call (timestamp, provider, model, the served
    model_version, full request parameters and messages, reply text,
    token usage including cached tokens, request id, status, timing) to
    a JSONL file; llm_log_disable() and llm_log_status() manage it.
    Failed calls are logged too. include_messages = FALSE keeps a
    metadata-only trail for confidential prompts.
  - A draft methods paragraph. llm_methods_text() turns a result frame
    into the transparency paragraph journals increasingly require:
    models, providers, call counts, recorded inference settings, token
    totals, and failure counts, stated only as far as the data supports.
  - Replication and reliability. llm_replicate() runs every row .times
    times through the parallel engine; llm_agreement() reports per-unit
    majority labels and overall reliability (mean pairwise agreement and
    Krippendorff's alpha for nominal data, missing-safe).
  - Native tool calling with an execution loop. llm_tool() wraps an R
    function with a JSON-Schema argument spec; call_llm_tools() sends
    the definitions, executes the calls the model makes, feeds results
    back, and returns the final response with a tool_history attribute.
    tool_calls() extracts requested calls from any response. Supported
    on OpenAI-compatible providers and Anthropic.
  - Streaming. call_llm_stream() delivers the reply chunk by chunk
    (callback) for OpenAI-compatible providers, Anthropic, and Gemini,
    and returns a normal llmr_response at the end, with usage when the
    provider reports it in the stream.
  - Token log-probabilities. Pass logprobs = TRUE (and top_logprobs = k)
    in llm_config(); llm_logprobs() returns them tidily
    (OpenAI-compatible providers and Gemini) for confidence scoring and
    calibration.
  - Reproducibility knobs. A canonical seed parameter is forwarded where
    supported; every response now records model_version (the identifier
    the server reports having served) and, when returned separately, the
    model's reasoning text in a thinking field.
  - Prompt caching. cache = TRUE marks the system prompt and tools as
    cacheable for Anthropic; cached prompt tokens are now extracted from
    all providers that report them (tokens(x)$cached, cached_tokens /
    *_cached diagnostic columns) and counted by llm_usage().
  - Cost estimates on your own prices. llm_usage(price_table = ...)
    accepts a user-supplied table (per-million-token prices, optional
    cached rate) and adds a cost_estimate; LLMR still ships no price
    list, on purpose.
  - NA policy for templates. llm_fn()/llm_mutate() gain .na_action =
    c("send", "skip", "error"); llm_preview() flags rows whose templates
    reference NA or render empty prompts.
  - Quality-of-life. chat_session() gains quiet= and multimodal sends
    (named-vector shortcut); a one-line summary message after runs with
    failures or truncations points to llm_failures(); print(llm_config)
    is masked and informative; options(llmr.quiet = TRUE) silences
    advisory notes; every request carries a total timeout (timeout= per
    config or options(llmr.timeout=), default 600 s).

Provider-path overhaul

  - The nine OpenAI-compatible providers (groq, together, deepseek,
    xiaomi, alibaba, zhipu, moonshot, xai, ollama) now share one request
    builder with the full feature set: multimodal file parts (previously
    serialized raw, with the local path leaked to the provider), the
    complete canonical parameter set, structured output, tools, verbatim
    extras passthrough, and the modifiability hooks. Parameters a
    provider rejects are dropped with a console note instead of
    silently.
  - The req_builder, response_modifier, and request_modifier hooks now
    apply on every provider path (previously only OpenAI) and are
    documented in llm_config().
  - Gemini: thinking_budget / include_thoughts are finally sent
    (generationConfig.thinkingConfig); presence/frequency penalties,
    seed, and logprobs are translated rather than dropped;
    enable_structured_output() sends the schema by default via
    responseJsonSchema (set gemini_enable_response_schema = FALSE for
    old models); embeddings use batchEmbedContents (one HTTP call per up
    to 100 texts instead of one per text) and go through the standard
    error handling; Gemini thought parts are kept out of the answer text
    and surfaced as thinking.
  - Anthropic: top_k is supported again (it was wrongly dropped as
    "unsupported"); no_change is honored; PDFs are sent as document
    blocks (previously mislabeled as images); an invalid thinking_budget
    >= max_tokens combination warns before the call; unknown typed
    content blocks pass through, enabling the tool loop; the false claim
    that LLMR "sets the beta header automatically" is gone from the docs
    (anthropic_beta = ... sends one).
  - OpenAI: max_completion_tokens is honored directly (reasoning models
    no longer pay a guaranteed failed round trip); the Responses-API
    autodetect recognizes gpt-5-pro and the deep-research models and no
    longer matches nonexistent ones; the Responses path drops the
    parameters that API rejects (penalties, logprobs, seed) with a note,
    maps reasoning_effort to its nested shape, and no longer duplicates
    system text when all messages are system.
  - Embedding routing now follows the documented inference (embedding =
    NULL
      - "embedding" in the model name) for every provider, and embedding
        endpoints no longer receive chat-only parameters from a reused
        config. get_batched_embeddings() gains retry controls and its
        verbose default is documented correctly.

Reliability fixes

  - The retry helper no longer sleeps after the final failed attempt
    (previously up to ~104 minutes wasted on the default schedule);
    waits honor Retry-After when a 429 provides one, and add jitter so
    parallel workers do not retry in lockstep.
  - Retry classification is exact for typed errors: rate limits and
    server errors (now including 403->auth, 408->retryable) retry;
    parameter, authentication, and quota errors fail fast. The
    over-broad "exceeded" message pattern (which retried
    context_length_exceeded for the full schedule) is gone.
  - call_llm_robust() defaults are humane (wait_seconds = 2,
    backoff_factor = 3) and consistent with call_llm_par(); start_jitter
    in the parallel engine defaults to 0 (it silently added up to 5 s
    per row).
  - Failed rows in call_llm_par() results now carry the provider's raw
    error body in raw_response_json.

Bug fixes

  - Gemini multi-turn roles: assistant turns are now sent as Gemini's
    "model" role. Previously every turn was sent as "user", so the model
    saw its own prior replies as user messages, corrupting chat sessions
    and any multi-turn agent memory on Gemini.
  - extract_text() (Anthropic): responses with several text blocks
    (typical around tool use) now concatenate all of them; previously
    only the last block was returned and earlier content was silently
    dropped.
  - llm_fn_structured() / llm_mutate_structured(): with .schema = NULL
    these now request JSON-object mode as documented; previously the
    config was sent unchanged and the model could return arbitrary
    prose.
  - call_llm_tools(): the return value now carries attr(x, "tool_loop")
    with model_calls, aggregate sent/rec token totals across every
    internal round, and tool_calls -- tokens(x) alone covers only the
    final call, which undercounted multi-round loops. A new
    max_tool_calls argument caps tool executions across the loop,
    raising a typed llmr_tool_limit condition instead of continuing to
    spend.
  - call_llm.openai(): top_k and repetition_penalty are now dropped
    (with the usual one-time note) instead of being forwarded to an
    endpoint that rejects them; the streaming and batch paths already
    did this.
  - Anthropic stop_sequence now maps to finish reason "stop" rather than
    "other".
  - Templated .messages with partially named vectors (e.g. c(system =
    ..., "{x}")) now default unnamed elements to the user role, as
    documented, instead of erroring.
  - call_llm_broadcast() with zero messages returns the full diagnostic
    column schema (finish reasons, token columns, response), matching
    non-empty results.
  - llm_batch_submit(): a named character vector like c(system = "...",
    user = "...") is treated as a single multi-role request rather than
    being split into separate batch requests; unnamed character vectors
    still expand one element per request.
  - llm_batch_fetch() / call_llm_stream(): total token count is NA when
    both sent and received counts are unknown, consistent with the
    call_llm() path, instead of a false 0.
  - chat_session(): rejects embedding configs inferred from model names
    (e.g. "gemini-embedding-001") in addition to those with embedding =
    TRUE set explicitly.
  - expand_llm_config(): sweeping provider now updates the S3 class, so
    the swept config dispatches to the right API (previously every swept
    provider was called through the original provider's path).
  - call_llm_par_structured() / call_llm_par_tags(): prompts containing
    literal braces (typical for structured-output instructions) no
    longer abort the run; strings glue cannot parse pass through
    verbatim.
  - Key handling: an empty-string api_key (what Sys.getenv() returns for
    an unset variable) falls back to the provider's default variables
    with a warning instead of sending an empty Bearer header; NA keys
    are rejected; a vector api_key works as ordered fallbacks; legacy
    configs holding a literal key string resolve correctly and error
    messages never echo a key.
  - Chat sessions: the NA-token fix now also covers send_structured()
    and send_tags() (one usage-less response could previously poison the
    running totals); multi-part sends are no longer truncated to their
    first element.
  - Column safety: llm_mutate() replaces an existing output column with
    a notice (mutate semantics) instead of letting bind_cols() mangle
    both names; hoisted structured/tag fields never silently overwrite
    existing columns (suffix + warning); the parallel engine's collision
    warning fires regardless of verbose; summary.llmr_experiment()
    follows collision-renamed columns.
  - Row batching: user tags shaped like row_2 are rejected in batched
    tag mode (they would scramble the demultiplexer); assistant turns
    error instead of being silently dropped; duration is attributed once
    per batch so llm_usage() stops overcounting wall time; fully-failed
    batch calls attribute their token spend to the first failed row; the
    protocol instructions state the actual item count.
  - Structured output: JSON recovery is string-aware (braces inside
    string values no longer corrupt extraction);
    disable_structured_output() removes a custom-named schema tool;
    llm_mutate_structured() validates locally like llm_fn_structured()
    (new .validate_local). enable_structured_output() now knows which
    providers reject a server-side json_schema (DeepSeek, Alibaba,
    Zhipu, Moonshot, Xiaomi) and requests JSON-object mode with local
    validation there, instead of a guaranteed HTTP 400. In strict mode
    the schema sent to the provider is hardened the way the OpenAI
    protocol formally requires (additionalProperties: false and all
    properties required, filled in only where unspecified), so plain
    schemas work on OpenAI and Groq without boilerplate.
  - llm_par_resume() works on call_llm_sweep() / call_llm_broadcast() /
    call_llm_compare() results (they now keep the config list-column)
    and refreshes structured_ok / structured_data for re-run rows.
  - Responses: OpenAI-style refusals surface their text and map to
    finish_reason = "filter" (as do Gemini safety verdicts such as
    RECITATION); llm_judge() gains .output= and refuses to clobber a
    .target column.
  - Assorted: call_llm() troubleshooting output prints real newlines;
    dead code removed (parse_embeddings() no-op branch, unreachable
    per-provider default models); embedding example code no longer uses
    the discouraged Sys.getenv() key pattern.

Documentation

  - New vignette "Interactive calls: tools, streaming, and logprobs":
    the tool loop end to end (definitions, history, aggregate spend,
    max_tool_calls), streaming with custom callbacks, and
    log-probabilities as graded measurements, with honest notes on
    provider support.
  - llm_config() now documents the full canonical parameter set
    (including seed, logprobs, thinking_budget, timeout, cache) and the
    three request hooks. The Anthropic thinking example is valid
    (max_tokens > thinking_budget). build_factorial_experiments()
    documents that system prompts are crossed, not recycled. Vignettes
    updated accordingly, plus a new article on reproducibility and cost.
  - Live tests and examples run on inexpensive open-weight models (Groq
    openai/gpt-oss-20b, DeepSeek, Moonshot, Qwen, Gemini Flash-Lite).

                        Changes in version 0.8.0                        

New features

  - Preview a call before spending anything. llm_preview() renders
    exactly what llm_fn() / llm_mutate() would send (using the same
    internal renderer, so it can never drift from the real path) without
    making any API call or reading/encoding files. It returns a
    row-level tibble with the rendered messages, roles, character
    counts, file presence/existence, the batch plan (batch_id /
    batch_size / batch_row), and an issues list-column that flags
    problems up front: missing files, "file" content combined with
    .batch_size > 1, an embedding config with row batching, .return =
    "object" with batching, or a schema supplied without .structured.
    llm_render_messages() exposes just the rendered message objects.
  - Summarize a finished run. llm_usage() reports outcome counts and
    token totals (sent / received / total / reasoning) plus truncation
    and filter counts, reading the diagnostic columns that
    call_llm_par() and llm_mutate() already produce. It works on both
    result shapes and sums tokens with na.rm = TRUE, which is correct
    under row batching. It reports tokens, not money: no dollar figures
    and no built-in price table (which would go stale).
  - Find and re-run failures. llm_failures() lists exactly which rows
    failed or were truncated/filtered, with status_code, error_code,
    bad_param, and error_message. For a call_llm_par() result, pass the
    original frame to the existing llm_par_resume() to re-run only those
    rows.

Internal

  - The per-row message rendering used by llm_fn(), llm_mutate(), and
    llm_preview() is now a single shared internal helper, locked by
    golden tests so its output stays byte-identical to previous
    releases.

Bug fixes

  - .fields = FALSE now correctly skips field extraction in
    structured/JSON mode (keeping only the structured_data list-column),
    matching tag mode. Previously the logical FALSE was treated as a
    one-element field name.
  - Missing token usage is reported as NA, not 0. When a provider
    returns no usage metadata, chat sessions and responses no longer
    record the call as having used zero tokens; running chat totals add
    NA as 0 so one unknown response cannot poison the cumulative count.
  - llm_usage() / llm_failures() / llm_par_resume() handle the case
    where call_llm_par() collision-renamed its output columns (because
    the input frame already had a column named success, etc.): the
    summaries follow the renamed columns, and llm_par_resume() raises a
    clear, actionable error.
  - Bare environment-variable API keys. api_key = "OPENAI_API_KEY" is
    now always treated as an environment-variable reference, even when
    that variable is not yet set (it then fails with a clear "missing
    env var" message at call time instead of silently sending the
    literal name as the key), matching the documented behavior.
  - llm_api_key_env(required = FALSE) is now honored: a missing variable
    yields an empty key instead of an authentication error.
  - llm_parse_structured_col() now returns a tibble on every path,
    including when the structured column is absent.
  - llm_usage() gains an n_unknown_tokens count so an all-NA token
    column (a provider that reports no usage) is no longer
    indistinguishable from a true zero. The token sums still use na.rm =
    TRUE, which is correct for batching.

Documentation

  - llm_api_key_env() is now exported and documented. The help also
    notes that the simplest approach is to set the standard
    <PROVIDER>_API_KEY variable and pass no key at all.
  - New "LLMR in 5 minutes" quickstart vignette: install, set a key, a
    first call_llm(), a generative llm_fn() over a vector, a data-frame
    llm_mutate(), and tagged + batched extraction, all on the
    open-weight gpt-oss-20b so the examples are runnable for everyone.
    The structured-output articles now lead with a concrete example
    before the provider-by-provider details.
  - The troubleshooting help no longer claims the API key is printed (it
    is masked). The embeddings vignette can now be enabled with
    LLMR_RUN_VIGNETTES=true (its run flag was previously hard-coded
    off), and its stale prebuilt HTML was removed.
  - Fixed a vignette that referenced a non-existent function and
    assorted stale version labels; removed non-ASCII look-alike
    punctuation from R sources.

                        Changes in version 0.7.2                        

Bug fixes

  - Retry/error classification. API error messages containing curly
    braces (common when providers echo JSON fragments) no longer break
    error construction: the typed condition class, status_code, and
    provider message are preserved, so retryable rate-limit/server
    errors are retried correctly instead of being misclassified. The
    retry helper also re-raises the original typed error after
    exhausting attempts, and its wait-time message no longer errors on
    fractional backoff values.
  - llm_par_resume() now re-runs only the failed/NA rows instead of
    every row (it previously collapsed the per-row success vector with
    isTRUE()).
  - Parallel tag/structured helpers. Unnamed (bare-prompt) messages
    passed to call_llm_par_tags() / call_llm_par_structured() no longer
    have their template text used as the message role.
  - JSON recovery. A top-level JSON array embedded in prose is now
    extracted in full (a regex character-class bug previously truncated
    it to the first object).
  - Embeddings. get_batched_embeddings() no longer locks in a wrong
    vector dimension when the first batch returns empty, preserving the
    one-row-per-input contract.
  - Gemini token counts. Usage is preserved when candidatesTokenCount is
    absent (e.g. reasoning models with no visible output).
  - Printing. print() on an llmr_response with a non-standard finish
    reason no longer drops the status line.

                        Changes in version 0.7.1                        

New features

  - Row batching for generative calls. llm_fn() and llm_mutate() (and
    the _tags / _structured variants) gain .batch_size, .batch_payload,
    and .batch_recovery. With .batch_size > 1, several rows are packed
    into one request wrapped in numbered <row_1>...</row_1> tags and
    de-multiplexed back into rows, with fault-tolerant recovery
    (split-and-retry down to single rows) for dropped, reordered, or
    truncated rows. Composes with .tags (one level of required nesting)
    and with structured/JSON output ({"results":[{"row":i, ...}]}). The
    default .batch_size = 1 reproduces the previous one-call-per-row
    behaviour exactly. New exported helper llm_parse_batch_tags().

                        Changes in version 0.7.0                        

New features

  - Soft structured output via XML-like tags. llm_mutate() gains .tags,
    backed by llm_mutate_tags(), llm_fn_tags(), llm_parse_tags(),
    llm_parse_tags_col(), and call_llm_par_tags().
  - Four new providers: Xiaomi MiMo, Alibaba (Qwen), Zhipu (GLM), and
    Moonshot (Kimi). All use OpenAI-compatible structured output.
  - Gemini Vertex AI supported via llm_config("gemini", ..., vertex =
    TRUE).
  - Multi-variable API key fallback. Providers can declare multiple
    environment variable names (e.g., XIAOMI_KEY or XIAOMI_API_KEY); the
    first one found is used.

Bug fixes

  - Fixed API key resolution for providers with multiple fallback env
    vars.
  - Removed dead requireNamespace("LLMR") guard inside llm_mutate().

                 Changes in version 0.6.3 (2025-10-11)                  

  - Added Ollama provider (local generative and embedding models).
  - Stable column names (v1...vN) in get_batched_embeddings().

                        Changes in version 0.6.2                        

  - llm_mutate() shorthand: llm_mutate(answer = "{question}", .config =
    cfg).
  - .structured = TRUE flag in llm_mutate() for inline JSON parsing.
  - setup_llm_parallel() accepts a positional numeric workers argument.

                        Changes in version 0.6.1                        

  - Fixed a bug that affected Anthropic calls.

                 Changes in version 0.6.0 (2025-08-26)                  

  - call_llm() now returns an llmr_response object. Use as.character(x)
    for plain text. Legacy json= arguments are removed.
  - Secure API key handling: literal keys are moved to temporary env
    vars.
  - Structured JSON output and schema validation.
  - Multi-column injection in llm_mutate().