We are a lab working on elicitation: structuring both a model's tools and processes to get the best outputs from it, and building technology to measure the quality of these outputs.
Current models have capabilities far beyond what they show, particularly on tasks that are hard to verify. We believe eliciting models properly is critical to navigating the intelligence explosion safely.
We measure model capabilities, find failure modes, and elicit them more effectively on the hardest and fuzziest tasks. If you want to get the best agent performance on your tasks, reach out.