Documentation
LLMOS Documentation
LLMOS is a research framework for training web agents. It has three subsystems: an LLM-based UI simulator, a browser benchmark (WebAgentBench), and a training pipeline for finetuning language models on agent trajectories.
Scoring
How evaluation works — base score, penalties, trajectory modifier, and the full formula.
Cognitive Primitives
The 12-primitive taxonomy for diagnosing where and why web agents fail.
Architecture
Sandwich architecture, unified agent format, state visibility, and the episode loop.
Benchmark
Task structure, the Gmail environment, fixtures, version history, and how to run it.
Training Pipeline
SFT and DPO finetuning on Qwen models using simulator and browser trajectories.