Microsoft has unveiled a major upgrade to the Researcher agent in Microsoft 365 Copilot, bringing a multi-model approach directly into real-world work scenarios. Instead of relying on a single model, multiple models now work together, each taking on clearly defined roles within the workflow.
Key highlights:
- Critique: A new layer based on Anthropic’s Claude reviews responses generated by OpenAI’s GPT to improve output quality before they are delivered.
- Model Council: Enables direct side-by-side comparison of responses from different models.
Microsoft is making Copilot Cowork available through the Frontier Program. This enables long-running, multi-step work processes within Microsoft 365 Copilot.
These developments are an early but important example of the shift toward multi-model systems in enterprise AI. The results are measurable: Researcher with Critique leads all evaluated deep research systems on the industry-standard DRACO benchmark, significantly outperforming single-model approaches.

DRACO Benchmark (Deep Research Accuracy, Completeness and Objectivity): Evaluation of 100 complex research tasks across 10 domains. All results are based on the original study [Zhong et al., arXiv:2602.11685 (February 2026)] – with the exception of Researcher with Critique. This improves the aggregated overall score by +7.0 points (SEM ±1.90), surpassing Perplexity Deep Research (Claude Opus 4.6), the top-ranked system in the study, by +13.88%.
Read the full original English article here: Introducing multi-model intelligence in Researcher