OmniVoice uses a Qwen3 backbone (28 layers, hidden_size=1024, ~500M params) as its LLM component. The baseline TRT-LLM engine runs in FP16 on NVIDIA L4. Quantization provides two benefits: Reduced ...
Custom Node Testing I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help) Your question ValueError: Failed to load model from file: ...