2025年上半年, AI Agent(AI智能体) 迅猛发展,点燃了 “万物皆可Agent” 的热潮。 这股热潮首先体现在技术底层—— 模型领域的激烈“军备竞赛” ...
SaaS-Bench, 一份新的研究判断 Agent 靠谱与否,核心指标只有一个:是不是真干完活了行业的做法大抵是:给 Agent ...
经常切换使用CC、Codex、OpenClaw这类Agent的人会发现:同一个模型,放进不同系统里,表现可能完全不同。 近期由CMU、耶鲁大学、弗吉尼亚理工大学及亚马逊等机构组成的研究团队,在系统梳理了170余个开源项目,并总结了OpenAI、Anthropic、LangChain以及大量开源Agent项目中的工程经验后明确指出:包裹模型的“线束系统(Harness)”才是决定Agent是否稳定、 ...
ToolCUA 的核心价值在于指出了 CUA 训练中的一个关键转折:当 Agent 从 GUI-only 进入 hybrid action space 后,能力瓶颈从“能否看懂界面”进一步变成“能否编排多种动作路径”。 这个问题看起来答案应该是肯定的 ...
SaaS-Bench用23个开源SaaS系统、106个任务测试Agent,结果全军覆没,暴露其在真实环境中的四种致命缺陷,距真正替人干活尚远。 想象一个真实的工作日:项目经理要更新项目状态,财务人员要整理客户账单,医疗管理员要核对预约和保险信息。 这些并不是高级 ...
What is a computer use agent? One of the big downsides of AI chatbots was that they were originally limited to their conversational interface, but that's now changing. With Claude computer use and ...
A new framework from researchers at The University of Hong Kong (HKU) and collaborating institutions provides an open source foundation for creating robust AI agents that can operate computers. The ...
The demos look remarkable. An AI agent opens a browser, navigates a website, fills out a form, and books a flight, all without a human touching the keyboard. Over the past several months, a wave of ...
Perplexity, the AI-powered search company valued at $20 billion, on Wednesday launched what it calls the most ambitious product in its three-year history: a multi-model agent orchestration platform ...
Hackers are using AI agents to outsmart old logins. It’s time to ditch passwords and move to phishing-proof credentials like passkeys. For years, organizations have relied on passwords and ...
Anthropic is pushing Claude beyond chat into “agent” work for non-coders. Cowork repackages the computer-using capabilities behind Claude Code into a simpler macOS experience where users can assign ...