Documentation

LLMOS Documentation

LLMOS is a research framework for training web agents. It has three subsystems: an LLM-based UI simulator, a browser benchmark (WebAgentBench), and a training pipeline for finetuning language models on agent trajectories.

Scoring

How evaluation works — base score, penalties, trajectory modifier, and the full formula.

Cognitive Primitives

The 12-primitive taxonomy for diagnosing where and why web agents fail.

Architecture

Sandwich architecture, unified agent format, state visibility, and the episode loop.

Benchmark

Task structure, the Gmail environment, fixtures, version history, and how to run it.

Training Pipeline

SFT and DPO finetuning on Qwen models using simulator and browser trajectories.