feat: update README to reflect Apple Silicon GPU support and new features in version 3.0

2026-04-11 14:16:07 +02:00
parent 0293a13177
commit e2e19940dd
1 changed files with 39 additions and 22 deletions
@@ -1,11 +1,18 @@
-## Local Transcribe with Whisper 
+## Local Transcribe with Whisper

-> **⚠ Note for Mac users (Apple Silicon):** This version uses `faster-whisper` (CTranslate2), which does **not** support Apple M-chip GPU acceleration. Transcription will run on CPU, which is slower than OpenAI's Whisper with Metal/CoreML support. The trade-off is a much simpler installation — no conda, no PyTorch, no admin rights. If you'd prefer M-chip GPU acceleration and don't mind a more involved setup, switch to the [**classic**](https://github.com/soderstromkr/whisper-local-transcribe/releases/tag/classic) release:
-> ```
-> git checkout classic
-> ```
+> **🍎 Apple Silicon GPU/NPU acceleration:** This version now supports native Apple GPU/NPU acceleration via [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). On Apple Silicon Macs, transcription runs on the Apple GPU and Neural Engine — no CPU fallback needed.

-Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2). This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
+Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2) on Windows/Linux and [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
+
+## New in version 3.0!
+1. **Apple Silicon GPU/NPU support** — native MLX backend for Apple Silicon Macs, using Apple GPU + Neural Engine.
+2. **SRT subtitle export** — valid SubRip files alongside the existing TXT output, ready for HandBrake or any video player.
+3. **VAD filter** — removes silence, reduces hallucination, improves accuracy.
+4. **Word-level timestamps** — per-word SRT timing for precise subtitle burning.
+5. **Translation mode** — transcribe any language and translate to English in one step.
+6. **Stop button** — immediately cancel any transcription, including model downloads.
+7. **Language dropdown** — 99 languages with proper ISO codes, no more guessing formats.
+8. **Model descriptions** — speed, size, quality stars, and use case shown for every model.

 ## New in version 2.0!
 1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage, simpler installation.
@@ -18,22 +25,26 @@ Local Transcribe with Whisper is a user-friendly desktop application that allows


 ## Features
-* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video. 
-* Choose the language of the files you are transcribing. You can either select a specific language or let the application automatically detect the language.
+* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
+* Choose the language of the files you are transcribing from a dropdown of 99 supported languages, or let the application automatically detect the language.
 * Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
 * **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
-* Enable the verbose mode to receive detailed information during the transcription process.
-* Monitor the progress of the transcription with the progress bar and terminal. 
+* **VAD filter** — removes silence from audio before transcription, reducing hallucination and improving accuracy.
+* **Word-level timestamps** — generates per-word timing in the SRT output for precise subtitle synchronization.
+* **Translation mode** — transcribes audio in any language and translates the result to English.
+* **SRT export** — valid SubRip subtitle files saved alongside TXT, ready for HandBrake or any video player.
+* Monitor the progress of the transcription with the progress bar and terminal.
 * Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
 * View the transcribed text in a message box once the transcription is completed.
+* **Stop button** — immediately cancel transcription, including model downloads.

 ## Installation
 ### Get the files
-Download the zip folder and extract it to your preferred working folder.  
-![](images/Picture1.png)  
+Download the zip folder and extract it to your preferred working folder.
+![](images/Picture1.png)
 Or by cloning the repository with:
 ```
-git clone https://github.com/soderstromkr/transcribe.git
+git clone https://gitea.kobim.cloud/kobim/whisper-local-transcribe.git
 ```
 ### Prerequisites
 Install **Python 3.10 or later**. Some IT policies allow installing from the Microsoft Store or Mac equivalent. However, I would prefer an install from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
@@ -62,9 +73,13 @@ python app.py
 ```

 ## GPU Support
+### Apple Silicon
+On Macs with Apple Silicon, the app automatically uses the **MLX backend**, which runs inference on the Apple GPU and Neural Engine. No additional setup is needed — just install and run. MLX models are downloaded from HuggingFace on first use.
+
+### NVIDIA GPUs
 This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.

-### Automatic Detection
+#### Automatic Detection
 The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
 ```
 pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
@@ -72,7 +87,7 @@ pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

 **Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.

-### Verifying GPU Support
+#### Verifying GPU Support
 After installation, you can verify that your GPU is available by running:
 ```python
 import ctranslate2
@@ -81,14 +96,16 @@ print(ctranslate2.get_supported_compute_types("cuda"))
 If this returns a list containing `"float16"`, GPU acceleration is working.

 ## Usage
-1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates.
+1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates. The backend indicator at the bottom shows which inference engine is active (MLX · Apple GPU/NPU, CUDA · GPU, or CPU · int8).
 2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
-3. Enter the desired language for the transcription in the "Language" field. You can either select a language or leave it blank to enable automatic language detection.
-4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label.
-5. Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
-6. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
-7. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
-8. You can run the application again or quit at any time by clicking the "Quit" button.
+3. Select the language from the dropdown — 99 languages are available, or leave it on "Auto-detect". For English-only models (.en) the language is locked to English; for KB Swedish models it's locked to Swedish.
+4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label. A description below shows speed, size, quality stars, and recommended use case for each model.
+5. Toggle advanced options if needed: **VAD filter**, **Word-level timestamps**, or **Translate to English**.
+6. Click the "Transcribe" button to start the transcription. Use the "Stop" button to cancel at any time.
+7. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
+8. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
+9. Transcriptions are saved as both `.txt` (human-readable) and `.srt` (SubRip subtitles) in the `transcriptions/` folder within the selected directory.
+10. You can run the application again or quit at any time by clicking the "Quit" button.

 ## Jupyter Notebook
 Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.