knitr::opts_chunk$set(
collapse = TRUE, comment = "#>",
eval = identical(tolower(Sys.getenv("LLMR_RUN_VIGNETTES", "false")), "true")
)LLMR gives R one interface to many language-model providers. You pick
the provider and model once with llm_config(); every other
function behaves the same regardless of which model is behind it.
You can hand llm_config() your key directly as a string,
but the safer habit is to keep it out of your code and let LLMR read it
from an environment variable. For each provider LLMR knows a default
variable to look in: it tries <PROVIDER>_API_KEY
first, then <PROVIDER>_KEY (upper-cased), so Groq
reads GROQ_API_KEY, OpenAI reads
OPENAI_API_KEY, and so on. If you set that variable, you
never pass a key in code at all.
Put the key in your ~/.Renviron file, one per line:
GROQ_API_KEY=...
The easiest way to open that file is:
Save it and restart R. You can check that R sees the key without printing it:
If this is FALSE, R cannot see the key yet: check the
spelling and that you restarted the session. A missing key shows up as
an authentication error on your first call, not before.
llm_config() selects the model; call_llm()
sends one message and returns a response object that prints the text
plus a short status line. We use Groq’s open-weight
gpt-oss-20b here because it is cheap and available to
everyone.
library(LLMR)
cfg <- llm_config("groq", "openai/gpt-oss-20b", temperature = 0.2)
r <- call_llm(cfg, c(system = "Be concise.", user = "Capital of Mongolia?"))
r # prints the text and a [model | finish | tokens | t] line
as.character(r) # just the text
tokens(r) # token counts as a listA message is a named character vector; the names are roles
(system, user, assistant). A bare
string is treated as a single user turn.
llm_mutate() adds model-generated columns to a tibble.
The shorthand puts the new column name and a glue prompt
template in one argument; {column} is filled from each
row.
library(tibble)
reviews <- tibble(text = c("The food was cold.",
"Absolutely loved it!",
"It was fine, nothing special."))
reviews |>
llm_mutate(
sentiment = "Reply with one word (positive/negative/neutral): {text}",
.config = cfg
)Alongside the sentiment column you also get diagnostic
columns (sentiment_ok, sentiment_finish,
sentiment_sent, sentiment_rec, …) so you can
see what succeeded and how many tokens each row used.
llm_fn() is the lighter-weight sibling of
llm_mutate(): give it a vector and a glue
prompt where {x} is each element, and it returns a
character vector.
countries <- c("Mongolia", "Bolivia", "Chad")
llm_fn(countries,
prompt = "Capital city of {x}. Reply with only the city name.",
.config = cfg)Switching to a different provider or model is a one-line change to
llm_config(); nothing else in your code changes.
When you want several fields per row, ask the model to wrap each in a
named tag and pass .tags; LLMR parses them into columns.
Add .batch_size to pack multiple rows into one request
(sent as numbered <row_i> blocks and split back
apart), which cuts the number of calls and the repeated instruction
overhead.
films <- tibble(title = c("Blade Runner", "Amelie", "Parasite", "Spirited Away"))
films |>
llm_mutate(
info = "For the film {title}, give its director and release year.",
.config = cfg,
.tags = c("director", "year"),
.batch_size = 2
)The four films were resolved in two calls (info_bn = 2).
The info_batch, info_bn, and
info_bi columns record which call each row landed in and
its position within it; the rows always come back in their original
order. Prefer modest batch sizes and temperature = 0:
batching only pays off when the model reliably follows the wrapping
protocol.
Embeddings turn text into numeric vectors you can compare. They use a
different kind of model, so you make a config with
embedding = TRUE; here we use Voyage, which specializes in
embeddings (set VOYAGE_API_KEY).
get_batched_embeddings() takes a character vector and
returns a matrix with one row per text.
emb_cfg <- llm_config("voyage", "voyage-3.5-lite", embedding = TRUE)
texts <- c("I love this restaurant.",
"The food was delicious.",
"My car broke down today.")
m <- get_batched_embeddings(texts, emb_cfg)
dim(m) # 3 texts x embedding dimensionCloseness in this space tracks meaning. Cosine similarity is high for the two sentences about food and low for the unrelated one:
llm_preview() shows exactly what would be sent, with no
API call, so you can catch a templating or role mistake before paying
for it:
After a run, llm_usage() summarizes token totals and
outcomes, and llm_failures() lists any rows that failed or
were truncated:
llm_fn(), .tags, JSON schemas, and row
batching.call_llm_par().