A fast, memory-safe library for text extraction from Office documents. Rust core with first-class bindings for Python, Go, C#/.NET, Node.js (native and WASM), and a stable C FFI. Handles DOCX, XLSX, ...
sudo apt-get install wget gawk gcc g++ make cmake automake curl unzip zip bzip2 tar gzip pigz parallel build-essential libncurses5-dev libc6-dev zlib1g zlib1g-dev libtbb-dev libtbb2 python python-dev ...