whisper-local-transcribe/README.md

## Local Transcribe with Whisper

> **🍎 Apple Silicon GPU/NPU acceleration:** This version now supports native Apple GPU/NPU acceleration via [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). On Apple Silicon Macs, transcription runs on the Apple GPU and Neural Engine — no CPU fallback needed.

Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2) on Windows/Linux and [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.

## New in version 3.0!
1. **Apple Silicon GPU/NPU support** — native MLX backend for Apple Silicon Macs, using Apple GPU + Neural Engine.
2. **SRT subtitle export** — valid SubRip files alongside the existing TXT output, ready for HandBrake or any video player.
3. **VAD filter** — removes silence, reduces hallucination, improves accuracy.
4. **Word-level timestamps** — per-word SRT timing for precise subtitle burning.
5. **Translation mode** — transcribe any language and translate to English in one step.
6. **Stop button** — immediately cancel any transcription, including model downloads.
7. **Language dropdown** — 99 languages with proper ISO codes, no more guessing formats.
8. **Model descriptions** — speed, size, quality stars, and use case shown for every model.

## New in version 2.0!
1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage, simpler installation.
2. **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab)
3. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
4. **No admin rights required** — a plain `pip install` covers everything.
5. **No PyTorch dependency** — dramatically smaller install footprint.
6. **Integrated console** - all info in the same application.
7. **`tiny` model added** — smallest and fastest option.


## Features
* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
* Choose the language of the files you are transcribing from a dropdown of 99 supported languages, or let the application automatically detect the language.
* Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
* **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
* **VAD filter** — removes silence from audio before transcription, reducing hallucination and improving accuracy.
* **Word-level timestamps** — generates per-word timing in the SRT output for precise subtitle synchronization.
* **Translation mode** — transcribes audio in any language and translates the result to English.
* **SRT export** — valid SubRip subtitle files saved alongside TXT, ready for HandBrake or any video player.
* Monitor the progress of the transcription with the progress bar and terminal.
* Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
* View the transcribed text in a message box once the transcription is completed.
* **Stop button** — immediately cancel transcription, including model downloads.

## Installation
### Get the files
Download the zip folder and extract it to your preferred working folder.
![](images/Picture1.png)
Or by cloning the repository with:
```
git clone https://gitea.kobim.cloud/kobim/whisper-local-transcribe.git
```
### Prerequisites
Install **Python 3.10 or later**. Some IT policies allow installing from the Microsoft Store or Mac equivalent. However, I would prefer an install from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.

### Run on Windows
Double-click `run_Windows.bat` — it will auto-install everything on first run.

### Run on Mac / Linux
Run `./run_Mac.sh` — it will auto-install everything on first run. See [Mac instructions](Mac_instructions.md) for details.

> **Note:** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.

### Manual installation (if the launchers don't work)
If `run_Windows.bat` or `run_Mac.sh` fails (e.g. Python isn't on PATH, or permissions issues), open a terminal in the project folder and run these steps manually:
```
python -m venv .venv
```
Activate the virtual environment:
- **Windows:** `.venv\Scripts\activate`
- **Mac / Linux:** `source .venv/bin/activate`

Then install and run:
```
python install.py
python app.py
```

## GPU Support
### Apple Silicon
On Macs with Apple Silicon, the app automatically uses the **MLX backend**, which runs inference on the Apple GPU and Neural Engine. No additional setup is needed — just install and run. MLX models are downloaded from HuggingFace on first use.

### NVIDIA GPUs
This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.

#### Automatic Detection
The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
```
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
```

**Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.

#### Verifying GPU Support
After installation, you can verify that your GPU is available by running:
```python
import ctranslate2
print(ctranslate2.get_supported_compute_types("cuda"))
```
If this returns a list containing `"float16"`, GPU acceleration is working.

## Usage
1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates. The backend indicator at the bottom shows which inference engine is active (MLX · Apple GPU/NPU, CUDA · GPU, or CPU · int8).
2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
3. Select the language from the dropdown — 99 languages are available, or leave it on "Auto-detect". For English-only models (.en) the language is locked to English; for KB Swedish models it's locked to Swedish.
4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label. A description below shows speed, size, quality stars, and recommended use case for each model.
5. Toggle advanced options if needed: **VAD filter**, **Word-level timestamps**, or **Translate to English**.
6. Click the "Transcribe" button to start the transcription. Use the "Stop" button to cancel at any time.
7. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
8. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
9. Transcriptions are saved as both `.txt` (human-readable) and `.srt` (SubRip subtitles) in the `transcriptions/` folder within the selected directory.
10. You can run the application again or quit at any time by clicking the "Quit" button.

## Jupyter Notebook
Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.

[![DOI](https://zenodo.org/badge/617404576.svg)](https://zenodo.org/badge/latestdoi/617404576)