A Trolley Dilemma Experiment with LLMR

Some behavioral researchers use large language models to simulate human judgments. This vignette shows the mechanics of that workflow, not a validation of the practice: it runs a classical moral-philosophy experiment, the Trolley Dilemma, with the LLMR package, skipping the single-call chat functions and going straight to a vectorized experimental design built with llm_mutate().

The demonstration uses an open-weights model served through the Groq API.

library(LLMR)
library(dplyr)

# Configure an open model endpoint
cfg <- llm_config(
  provider = "groq",
  model    = "llama-3.1-8b-instant"
)

Designing the Experiment

We construct two standard variants of the Trolley Dilemma as the stimulus set.

dilemmas <- tibble::tibble(
  condition = c("Switch", "Footbridge"),
  scenario = c(
    "A runaway trolley is heading down the tracks toward five workers who will be killed. You are standing next to a switch. If you pull the switch, the trolley will be diverted onto a side track where it will kill one worker. Do you pull the switch?",
    "A runaway trolley is heading toward five workers. You are standing on a footbridge above the tracks next to a large stranger. If you push the stranger onto the tracks below, his mass will stop the trolley, saving the five workers but killing the stranger. Do you push the stranger?"
  )
)

Vectorized Execution with Soft Structuring

To extract the model’s decisions, we call llm_mutate(). Rather than imposing a rigid JSON schema, which some inference endpoints handle poorly, we ask the model to mark its answer with simple XML-like tags. Tags place fewer demands on the provider than schema validation, so the same prompt works across a wider range of endpoints.

experiment_results <- dilemmas |>
  llm_mutate(
    response = c(
      system = "You are a participant in a moral psychology experiment. Read the scenario and provide a definitive YES or NO decision, followed by a brief rationale. Enclose your decision in <decision>...</decision> tags and your reasoning in <rationale>...</rationale> tags.",
      user = "{scenario}"
    ),
    .config = cfg,
    .tags = c("decision", "rationale")
  )

By specifying the .tags argument, LLMR automatically parses the response string and appends the extracted content as distinct columns in the original dataset.

experiment_results |>
  select(condition, decision, rationale) |>
  print(n = Inf)

Conclusion

The example shows the pattern LLMR is built for. The researcher defines the conditions in a data frame, writes one prompt, and receives a structured dataset ready for statistical analysis. The tag parsing and the iteration over rows are handled by llm_mutate(), so no explicit loop or string-parsing code is needed.