We use a language model (LM) to aggregate the outputs of 2+ vision-language models (VLMs). Our model assemble approach is named Cola (COordinative LAnguage model or visual reasoning). Cola is most ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果一些您可能无法访问的结果已被隐去。
显示无法访问的结果