Tools and research from the Unsupervised team.
Structured workflows for long-running AI agents. Describe a process in plain English, and DeepWork turns it into a reusable skill with quality gates. Works as a plugin for agent harnesses like Claude Code, Codex, and Gemini CLI.
Learn moreA public benchmark for evaluating how well AI tools handle real data analysis tasks. Tests dozens of prompts across 9 categories, with hallucination-focused scoring — tools that fabricate answers lose all points. Includes a leaderboard with scores and video recordings of every test.
Visit dabench.com