Please enable JavaScript.
Coggle requires JavaScript to display documents.
VLM agent - Coggle Diagram
VLM agent
Grounding
set of mark prompting
HTML + Visuals
oracle grounding
image annotation
Benchmarks
online vs offline eval
Open ended computer use agents
predicting the future
trial an error?
Judging and re-iterating
Screen understanding
ScreenVLM
Trained on ScreenParse
ScreenVLM vs OMNIparser
Web vs domain geenralist
ICL vs SFT
RL? and imitation learning
VL
Saftey and Guarailing