Local, token-free audio + video transcription with speaker diarization and screenshot curation. Runs entirely on your machine - WhisperX large-v3 + a token-free pyannote clone. No HuggingFace account ...
#include "sherpa-onnx/csrc/parse-options.h" #include "sherpa-onnx/csrc/wave-reader.h" wget https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker ...
Last time, I wrote about a locally running "away-from-desk summary" tool. It transcribes the audio from a study session and summarizes only the parts where I was away from my seat. At the end of the ...
To isolate children’s speech, a preprocessing pipeline combining automatic speaker diarization and manual verification was applied. Initially, speaker segmentation and timestamp-based utterance ...
In our previous article (Gemma 4 12B In-Depth: A New Model Bringing Full-Scale Multimodality to Laptops via Encoder-Free Architecture), we focused on the architecture and specifications. As a sequel, ...
Gemma 4 12B is Google DeepMind's first medium-sized open model with native audio in 2026 — and it runs entirely on a 16GB laptop. You don't need cloud credits or a data center. The model processes ...