Documentation / Cognitive Primitives

Cognitive Primitives

WebAgentBench organizes web agent capabilities into 12 cognitive primitives — the atomic skills required to complete realistic browser tasks. Each benchmark page is designed to stress one or more of these primitives, enabling precise diagnosis of where and why agents fail.

memory planning attention exploration backtracking adversarial patience verification arithmetic comprehension composition resilience

Taxonomy

memory

Maintaining and retrieving relevant context across many steps, pages, or interaction phases.

Finding a name in one email, then referencing it when composing a reply several steps later.

planning

Decomposing a complex task into sub-goals and coordinating constraints across steps.

Applying interdependent filters in the correct order when the sequence matters.

attention

Maintaining goal-directed behavior despite popups, modals, overlays, and distractions.

Dismissing a newsletter modal, cookie banner, and chat widget to reach the actual content.

exploration

Systematically searching through alternatives when the initial approach fails.

Loading additional search results and expanding collapsed sections to find hidden data.

backtracking

Detecting that a chosen path is wrong and reverting to a prior decision point.

Navigating back through a multi-step wizard after discovering a coverage gap.

adversarial

Resisting dark patterns, misleading labels, confirmshaming, and deliberate UI deceptions.

Recognizing that a prominently placed button subscribes rather than completes purchase.

patience

Waiting for asynchronous content to load and not acting prematurely on incomplete information.

Waiting for a spinner to finish loading additional results before concluding a search.

verification

Confirming that an action achieved its intended effect, especially with misleading feedback.

Checking the sidebar after a success banner to verify settings were actually saved.

arithmetic

Performing numerical calculations correctly within multi-step workflows.

Summing invoice line items and verifying the total matches across documents.

comprehension

Extracting meaning from complex, multi-part text and following nuanced instructions.

Parsing a long email thread to identify which of several proposed times is conflict-free.

composition

Combining multiple primitives within a single task to achieve a complex goal.

A task requiring exploration to find data, memory to retain it, and planning to act on it.

resilience

Handling failures gracefully — interpreting errors, retrying with modifications, preserving progress.

Persisting through form submission errors and saving a draft before a session-clearing failure.

Research Basis

The primitives taxonomy is grounded in published findings on web agent failure modes:

Agents achieve only 30–61% on realistic web tasks (Online-Mind2Web, COLM 2025)
Injecting realistic network errors causes 70–95% performance drops (WAREX, 2025)
Vision-language agents click adversarial pop-ups 86–100% of the time (PopupAttack, ACL 2025)
A single dark pattern compromises agent intent in 41% of runs (Ersoy et al., IEEE S&P 2026)
Explicit backtracking improves success by ~7.6% on GUI benchmarks (BacktrackAgent, EMNLP 2025)
Separating planning from execution improves WebArena-Lite to 57.58% (Plan-and-Act, ICML 2025)