Compare commits

42 Commits

Author SHA1 Message Date
kobim e2e19940dd feat: update README to reflect Apple Silicon GPU support and new features in version 3.0 2026-04-11 14:16:07 +02:00
kobim 0293a13177 feat: add advanced transcription options for VAD, word-level timestamps, and translation 2026-04-11 14:06:04 +02:00
kobim 8d5c8d6097 feat: implement multiprocessing for transcription with immediate cancellation 2026-04-05 22:11:13 +02:00
kobim e29572420e feat: enhance transcription capabilities with MLX support and backend detection 2026-04-04 00:32:36 +02:00
Kristofer Söderström f7d621e510 Add timestamps toggle and update transcription format to include/exclude timestamps 2026-03-20 20:19:46 +01:00
Kristofer Rolf Söderström 2a1df6aeba Update Python installation instructions in README
Clarified installation instructions for Python 3.10 or later, specifying preferred installation method.
2026-03-03 08:35:03 +01:00
soderstromkr 58255c3d10 fix: Linux/Ubuntu support — icon fallback, HiDPI scaling, CUDA lib paths, per-file timing
- app.py: graceful icon loading (no crash on Linux Tk without .ico support)
- app.py: auto-detect display scaling for 4K/HiDPI screens
- _LocalTranscribe.py: register NVIDIA pip-package .so paths on Linux (LD_LIBRARY_PATH)
  so faster-whisper finds libcublas/libcudnn at runtime
- _LocalTranscribe.py: auto-fallback to CPU if CUDA runtime libs missing
- _LocalTranscribe.py: filter input to supported media extensions only
- _LocalTranscribe.py: show real decode errors instead of generic skip message
- _LocalTranscribe.py: per-file timer showing wall-clock vs audio duration
2026-03-02 21:49:32 +01:00
Kristofer Söderström ea43074852 Update README.md: Add manual installation instructions for troubleshooting launcher issues 2026-03-02 17:17:35 +01:00
Kristofer Söderström 7b81778d9e Update README.md: Simplify installation instructions and clarify auto-installation process 2026-03-02 17:16:09 +01:00
Kristofer Söderström e65462f57b Update README.md: Add link to classic release in Mac user note 2026-03-02 17:13:05 +01:00
Kristofer Söderström 09e3e43c51 Update README.md: Reorder features for clarity and emphasize integrated console 2026-03-02 17:11:13 +01:00
Kristofer Söderström d4c26f6c37 Update README.md: Rearrange new features for clarity and highlight Swedish-optimised models 2026-03-02 17:10:15 +01:00
Kristofer Söderström acb6947f87 Update README.md: Revise installation instructions and clarify platform-specific run commands 2026-03-02 17:04:59 +01:00
Kristofer Söderström f8cf42733d Revamp: embedded console, faster-whisper, simplified install 2026-03-02 17:02:16 +01:00
Kristofer Rolf Söderström 7d3fe1ba26 Merge pull request #11 from soderstromkr/copilot/update-whisper-device-parameter
Pass explicit device parameter to whisper.load_model() for MPS acceleration
2026-01-22 14:03:13 +01:00
copilot-swe-agent[bot] da42a6e4cc Add .gitignore and remove __pycache__ files
Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>
2026-01-22 13:00:38 +00:00
copilot-swe-agent[bot] 0dab0d9bea Add explicit device parameter to whisper.load_model()
Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>
2026-01-22 13:00:21 +00:00
copilot-swe-agent[bot] 953c71ab28 Initial plan 2026-01-22 12:57:09 +00:00
Kristofer Rolf Söderström 5522bdd575 Merge pull request #6
Merged pull request #6
2026-01-22 13:53:23 +01:00
Kristofer Rolf Söderström 861c470330 Merge pull request #10 from soderstromkr/copilot/add-readme-gpu-support
Add GPU support documentation to README
2026-01-22 13:44:11 +01:00
copilot-swe-agent[bot] 6de6d4b2ff Add GPU support section to README with CUDA PyTorch installation instructions
Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>
2026-01-22 12:42:09 +00:00
copilot-swe-agent[bot] 01552cc7cb Initial plan 2026-01-22 12:40:19 +00:00
Yaroslav P 049a168c81 amd graphic card support 2025-03-05 16:23:10 +02:00
Kristofer Rolf Söderström 56a925463f Update README.md 2024-05-17 08:51:16 +02:00
Kristofer Rolf Söderström fe60b04020 Update README.md 2024-05-17 08:49:28 +02:00
Kristofer Rolf Söderström ff06a257f2 Update README.md 2024-05-17 08:47:57 +02:00
Kristofer Rolf Söderström 5e31129ea2 Create requirements.txt 2024-05-17 08:44:39 +02:00
Kristofer Rolf Söderström 3f0bca02b7 Update README.md 2024-05-17 08:44:09 +02:00
Kristofer Rolf Söderström 488e78a5ae Update README.md 2024-05-17 08:42:42 +02:00
Kristofer Rolf Söderström 829a054300 Update README.md 2024-05-17 08:40:42 +02:00
Kristofer Rolf Söderström 462aae12ca Update README.md 2024-05-17 08:09:30 +02:00
Kristofer Rolf Söderström fec9190ba1 Update README.md 2024-05-17 08:08:51 +02:00
Kristofer Rolf Söderström 0dde25204d Update README.md
removed other installation options from readme
2024-05-17 08:07:00 +02:00
Kristofer Söderström b611aa6b8c removed messagebox 2023-11-06 10:13:04 +01:00
Kristofer Söderström 7d50d5f4cf QOL improvements 2023-11-06 09:57:44 +01:00
Kristofer Söderström 7799d03960 bug fixes 2023-11-06 09:31:53 +01:00
Kristofer Rolf Söderström f88186dacc Update app.py 2023-10-19 09:26:43 +02:00
Kristofer Rolf Söderström 3f5c1491ac Delete build.zip 2023-10-19 09:20:55 +02:00
Kristofer Rolf Söderström c83e15bdba Update README.md 2023-10-19 09:20:29 +02:00
Kristofer Rolf Söderström ff16ad30e1 Merge pull request #2 from ValentinFunk/patch-1
Fix mac instructions link
2023-10-19 09:09:01 +02:00
Valentin 622165b3e6 Update Mac_instructions.md 2023-09-08 10:11:02 +02:00
Valentin 0e9cbdca58 Fix mac instructions link 2023-09-08 10:09:15 +02:00
13 changed files with 1122 additions and 191 deletions
+26
View File
@@ -0,0 +1,26 @@
# Python cache
__pycache__/
*.py[cod]
*$py.class
# Virtual environments
venv/
env/
ENV/
.venv/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
# Build artifacts
dist/
build/
*.egg-info/
+30 -8
View File
@@ -1,9 +1,31 @@
### How to run on Mac
Unfortunately, I have not found a permament solution for this, not being a Mac user has limited the ways I can test this.
#### Recommended steps
1. Open a terminal and navigate to the root folder (the downloaded the folder).
1. You can also right-click (or equivalent) on the root folder to open a Terminal within the folder.
2. Run the following command:
### How to run on Mac / Linux
#### Quick start
1. Open Terminal and navigate to the project folder (or right-click the folder and select "Open in Terminal").
2. Make the script executable (only needed once):
```
python main.py
```
chmod +x run_Mac.sh
```
3. Run it:
```
./run_Mac.sh
```
This will automatically:
- Create a virtual environment (`.venv`)
- Install all dependencies (no admin rights needed)
- Launch the app
#### Manual steps (alternative)
If you prefer to do it manually:
```
python3 -m venv .venv
.venv/bin/python install.py
.venv/bin/python app.py
```
#### Notes
- **Python 3.10+** is required. macOS users can install it from [python.org](https://www.python.org/downloads/) or via `brew install python`.
- **No FFmpeg install needed** — audio decoding is bundled.
- **GPU acceleration** is not available on macOS (Apple Silicon MPS is not supported by CTranslate2). CPU with int8 quantization is still fast.
- On Apple Silicon (M1/M2/M3/M4), the `small` or `base` models run well. `medium` works but is slower.
+88 -50
View File
@@ -1,75 +1,113 @@
## Local Transcribe with Whisper
Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
## Local Transcribe with Whisper
> **🍎 Apple Silicon GPU/NPU acceleration:** This version now supports native Apple GPU/NPU acceleration via [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). On Apple Silicon Macs, transcription runs on the Apple GPU and Neural Engine — no CPU fallback needed.
Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2) on Windows/Linux and [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
## New in version 3.0!
1. **Apple Silicon GPU/NPU support** — native MLX backend for Apple Silicon Macs, using Apple GPU + Neural Engine.
2. **SRT subtitle export** — valid SubRip files alongside the existing TXT output, ready for HandBrake or any video player.
3. **VAD filter** — removes silence, reduces hallucination, improves accuracy.
4. **Word-level timestamps** — per-word SRT timing for precise subtitle burning.
5. **Translation mode** — transcribe any language and translate to English in one step.
6. **Stop button** — immediately cancel any transcription, including model downloads.
7. **Language dropdown** — 99 languages with proper ISO codes, no more guessing formats.
8. **Model descriptions** — speed, size, quality stars, and use case shown for every model.
## New in version 2.0!
1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage, simpler installation.
2. **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab)
3. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
4. **No admin rights required** — a plain `pip install` covers everything.
5. **No PyTorch dependency** — dramatically smaller install footprint.
6. **Integrated console** - all info in the same application.
7. **`tiny` model added** — smallest and fastest option.
## New in version 1.2!
1. Simpler usage:
1. File type: You no longer need to specify file type. The program will only transcribe elligible files.
2. Language: Added option to specify language, which might help in some cases. Clear the default text to run automatic language recognition.
3. Model selection: Now a dropdown option that includes most models for typical use.
2. New and improved GUI.
![python GUI.py](images/gui-windows.png)
3. Executable: On Windows and don't want to install python? Try the Exe file! See below for instructions (Experimental)
## Features
* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
* Choose the language of the files you are transcribing. You can either select a specific language or let the application automatically detect the language.
* Select the Whisper model to use for the transcription. Available models include "base.en", "base", "small.en", "small", "medium.en", "medium", and "large". Models with .en ending are better if you're transcribing English, especially the base and small models.
* Enable the verbose mode to receive detailed information during the transcription process.
* Monitor the progress of the transcription with the progress bar and terminal.
* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
* Choose the language of the files you are transcribing from a dropdown of 99 supported languages, or let the application automatically detect the language.
* Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
* **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
* **VAD filter** — removes silence from audio before transcription, reducing hallucination and improving accuracy.
* **Word-level timestamps** — generates per-word timing in the SRT output for precise subtitle synchronization.
* **Translation mode** — transcribes audio in any language and translates the result to English.
* **SRT export** — valid SubRip subtitle files saved alongside TXT, ready for HandBrake or any video player.
* Monitor the progress of the transcription with the progress bar and terminal.
* Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
* View the transcribed text in a message box once the transcription is completed.
* **Stop button** — immediately cancel transcription, including model downloads.
## Installation
### Get the files
Download the zip folder and extract it to your preferred working folder.
![](images/Picture1.png)
Download the zip folder and extract it to your preferred working folder.
![](images/Picture1.png)
Or by cloning the repository with:
```
git clone https://github.com/soderstromkr/transcribe.git
git clone https://gitea.kobim.cloud/kobim/whisper-local-transcribe.git
```
### Executable Version **(Experimental. Windows only)**
The executable version of Local Transcribe with Whisper is a standalone program and should work out of the box. This experimental version is available if you have Windows, and do not have (or don't want to install) python and additional dependencies. However, it requires more disk space (around 1Gb), has no GPU acceleration and has only been lightly tested for bugs, etc. Let me know if you run into any issues!
1. Download the project folder. As the image above shows.
2. Find and unzip build.zip (get a coffee or a tea, this might take a while depending on your computer)
3. Run the executable (app.exe) file.
### Python Version **(any platform including Mac users)**
This is recommended if you don't have Windows. Have Windows and use python, or want to use GPU acceleration (Pytorch and Cuda) for faster transcriptions. I would generally recommend this method anyway, but I can understand not everyone wants to go through the installation process for Python, Anaconda and the other required packages.
1. This script was made and tested in an Anaconda environment with Python 3.10. I recommend this method if you're not familiar with Python.
See [here](https://docs.anaconda.com/anaconda/install/index.html) for instructions. You might need administrator rights.
2. Whisper requires some additional libraries. The [setup](https://github.com/openai/whisper#setup) page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files."
Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmpeg[^1], which takes care of setting up PATH variables. From the anaconda prompt, type or copy the following:
### Prerequisites
Install **Python 3.10 or later**. Some IT policies allow installing from the Microsoft Store or Mac equivalent. However, I would prefer an install from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
### Run on Windows
Double-click `run_Windows.bat` — it will auto-install everything on first run.
### Run on Mac / Linux
Run `./run_Mac.sh` — it will auto-install everything on first run. See [Mac instructions](Mac_instructions.md) for details.
> **Note:** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.
### Manual installation (if the launchers don't work)
If `run_Windows.bat` or `run_Mac.sh` fails (e.g. Python isn't on PATH, or permissions issues), open a terminal in the project folder and run these steps manually:
```
conda install -c conda-forge ffmpeg-python
python -m venv .venv
```
3. The main functionality comes from openai-whisper. See their [page](https://github.com/openai/whisper) for details. As of 2023-03-22 you can install via:
Activate the virtual environment:
- **Windows:** `.venv\Scripts\activate`
- **Mac / Linux:** `source .venv/bin/activate`
Then install and run:
```
pip install -U openai-whisper
python install.py
python app.py
```
4. To run the app built on TKinter and TTKthemes. If using these options, make sure they are installed in your Python build. You can install them via pip.
## GPU Support
### Apple Silicon
On Macs with Apple Silicon, the app automatically uses the **MLX backend**, which runs inference on the Apple GPU and Neural Engine. No additional setup is needed — just install and run. MLX models are downloaded from HuggingFace on first use.
### NVIDIA GPUs
This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.
#### Automatic Detection
The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
```
pip install tkinter
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
```
and
**Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.
#### Verifying GPU Support
After installation, you can verify that your GPU is available by running:
```python
import ctranslate2
print(ctranslate2.get_supported_compute_types("cuda"))
```
pip install customtkinter
```
5. Run the app:
1. For **Windows**: In the same folder as the *app.py* file, run the app from terminal by running ```python app.py``` or with the batch file called run_Windows.bat (for Windows users), which assumes you have conda installed and in the base environment (This is for simplicity, but users are usually adviced to create an environment, see [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands) for more info) just make sure you have the correct environment (right click on the file and press edit to make any changes). If you want to download a model first, and then go offline for transcription, I recommend running the model with the default sample folder, which will download the model locally.
2. For **Mac**: Haven't figured out a better way to do this, see [the instructions here](Mac_instructions.txt)
If this returns a list containing `"float16"`, GPU acceleration is working.
## Usage
1. When launched, the app will also open a terminal that shows some additional information.
1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates. The backend indicator at the bottom shows which inference engine is active (MLX · Apple GPU/NPU, CUDA · GPU, or CPU · int8).
2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
3. Enter the desired language for the transcription in the "Language" field. You can either select a language or leave it blank to enable automatic language detection.
4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label.
5. Enable the verbose mode by checking the "Verbose" checkbox if you want to receive detailed information during the transcription process.
6. Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
7. Monitor the progress of the transcription with the progress bar.
8. Once the transcription is completed, a message box will appear displaying the transcribed text. Click "OK" to close the message box.
9. You can run the application again or quit the application at any time by clicking the "Quit" button.
3. Select the language from the dropdown — 99 languages are available, or leave it on "Auto-detect". For English-only models (.en) the language is locked to English; for KB Swedish models it's locked to Swedish.
4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label. A description below shows speed, size, quality stars, and recommended use case for each model.
5. Toggle advanced options if needed: **VAD filter**, **Word-level timestamps**, or **Translate to English**.
6. Click the "Transcribe" button to start the transcription. Use the "Stop" button to cancel at any time.
7. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
8. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
9. Transcriptions are saved as both `.txt` (human-readable) and `.srt` (SubRip subtitles) in the `transcriptions/` folder within the selected directory.
10. You can run the application again or quit at any time by clicking the "Quit" button.
## Jupyter Notebook
Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.
[^1]: Advanced users can use ```pip install ffmpeg-python``` but be ready to deal with some [PATH issues](https://stackoverflow.com/questions/65836756/python-ffmpeg-wont-accept-path-why), which I encountered in Windows 11.
[![DOI](https://zenodo.org/badge/617404576.svg)](https://zenodo.org/badge/latestdoi/617404576)
+338 -65
View File
@@ -1,23 +1,170 @@
import os
import sys
import tkinter as tk
from tkinter import ttk
from tkinter import filedialog
from tkinter import messagebox
from src._LocalTranscribe import transcribe, get_path
from src._LocalTranscribe import transcribe, get_path, detect_backend, _transcribe_worker_process
import multiprocessing as mp
import customtkinter
import threading
from colorama import Back, Fore
import colorama
colorama.init(autoreset=True)
# ── Helper: redirect stdout/stderr into a CTkTextbox ──────────────────────
import re
_ANSI_RE = re.compile(r'\x1b\[[0-9;]*m') # strip colour codes
class _ConsoleRedirector:
"""Redirects output exclusively to the in-app console panel."""
def __init__(self, text_widget):
self.widget = text_widget
def write(self, text):
clean = _ANSI_RE.sub('', text) # strip ANSI colours
if clean.strip() == '':
return
# Schedule UI update on the main thread
try:
self.widget.after(0, self._append, clean)
except Exception:
pass
def _append(self, text):
self.widget.configure(state='normal')
self.widget.insert('end', text + ('\n' if not text.endswith('\n') else ''))
self.widget.see('end')
self.widget.configure(state='disabled')
def flush(self):
pass
# HuggingFace model IDs for non-standard models
HF_MODEL_MAP = {
'KB Swedish (tiny)': 'KBLab/kb-whisper-tiny',
'KB Swedish (base)': 'KBLab/kb-whisper-base',
'KB Swedish (small)': 'KBLab/kb-whisper-small',
'KB Swedish (medium)': 'KBLab/kb-whisper-medium',
'KB Swedish (large)': 'KBLab/kb-whisper-large',
}
# Per-model info shown in the UI description label
# (speed, size, quality stars, suggested use)
MODEL_INFO = {
'tiny': ('Very fast', '~75 MB', '★★☆☆☆', 'Quick drafts & testing'),
'tiny.en': ('Very fast', '~75 MB', '★★☆☆☆', 'Quick drafts & testing (English only)'),
'base': ('Fast', '~145 MB', '★★★☆☆', 'Notes & short podcasts'),
'base.en': ('Fast', '~145 MB', '★★★☆☆', 'Notes & short podcasts (English only)'),
'small': ('Balanced', '~485 MB', '★★★★☆', 'Everyday use'),
'small.en': ('Balanced', '~485 MB', '★★★★☆', 'Everyday use (English only)'),
'medium': ('Accurate', '~1.5 GB', '★★★★☆', 'Professional content'),
'medium.en': ('Accurate', '~1.5 GB', '★★★★☆', 'Professional content (English only)'),
'large-v2': ('Slow', '~3 GB', '★★★★★', 'Maximum accuracy'),
'large-v3': ('Slow', '~3 GB', '★★★★★', 'Maximum accuracy (recommended)'),
'KB Swedish (tiny)': ('Very fast', '~75 MB', '★★★☆☆', 'Swedish — optimised by KBLab'),
'KB Swedish (base)': ('Fast', '~145 MB', '★★★☆☆', 'Swedish — optimised by KBLab'),
'KB Swedish (small)': ('Balanced', '~485 MB', '★★★★☆', 'Swedish — optimised by KBLab'),
'KB Swedish (medium)': ('Accurate', '~1.5 GB', '★★★★☆', 'Swedish — optimised by KBLab'),
'KB Swedish (large)': ('Slow', '~3 GB', '★★★★★', 'Swedish — KBLab, best accuracy'),
}
customtkinter.set_appearance_mode("System")
customtkinter.set_default_color_theme("blue") # Themes: blue (default), dark-blue, green
firstclick = True
# All languages supported by Whisper (display label → ISO code; None = auto-detect)
WHISPER_LANGUAGES = {
'Auto-detect': None,
'Afrikaans (af)': 'af', 'Albanian (sq)': 'sq',
'Amharic (am)': 'am', 'Arabic (ar)': 'ar',
'Armenian (hy)': 'hy', 'Assamese (as)': 'as',
'Azerbaijani (az)': 'az', 'Bashkir (ba)': 'ba',
'Basque (eu)': 'eu', 'Belarusian (be)': 'be',
'Bengali (bn)': 'bn', 'Bosnian (bs)': 'bs',
'Breton (br)': 'br', 'Bulgarian (bg)': 'bg',
'Catalan (ca)': 'ca', 'Chinese (zh)': 'zh',
'Croatian (hr)': 'hr', 'Czech (cs)': 'cs',
'Danish (da)': 'da', 'Dutch (nl)': 'nl',
'English (en)': 'en', 'Estonian (et)': 'et',
'Faroese (fo)': 'fo', 'Finnish (fi)': 'fi',
'French (fr)': 'fr', 'Galician (gl)': 'gl',
'Georgian (ka)': 'ka', 'German (de)': 'de',
'Greek (el)': 'el', 'Gujarati (gu)': 'gu',
'Haitian Creole (ht)': 'ht', 'Hausa (ha)': 'ha',
'Hawaiian (haw)': 'haw', 'Hebrew (he)': 'he',
'Hindi (hi)': 'hi', 'Hungarian (hu)': 'hu',
'Icelandic (is)': 'is', 'Indonesian (id)': 'id',
'Italian (it)': 'it', 'Japanese (ja)': 'ja',
'Javanese (jw)': 'jw', 'Kannada (kn)': 'kn',
'Kazakh (kk)': 'kk', 'Khmer (km)': 'km',
'Korean (ko)': 'ko', 'Lao (lo)': 'lo',
'Latin (la)': 'la', 'Latvian (lv)': 'lv',
'Lingala (ln)': 'ln', 'Lithuanian (lt)': 'lt',
'Luxembourgish (lb)': 'lb', 'Macedonian (mk)': 'mk',
'Malagasy (mg)': 'mg', 'Malay (ms)': 'ms',
'Malayalam (ml)': 'ml', 'Maltese (mt)': 'mt',
'Maori (mi)': 'mi', 'Marathi (mr)': 'mr',
'Mongolian (mn)': 'mn', 'Myanmar (my)': 'my',
'Nepali (ne)': 'ne', 'Norwegian (no)': 'no',
'Occitan (oc)': 'oc', 'Pashto (ps)': 'ps',
'Persian (fa)': 'fa', 'Polish (pl)': 'pl',
'Portuguese (pt)': 'pt', 'Punjabi (pa)': 'pa',
'Romanian (ro)': 'ro', 'Russian (ru)': 'ru',
'Sanskrit (sa)': 'sa', 'Serbian (sr)': 'sr',
'Shona (sn)': 'sn', 'Sindhi (sd)': 'sd',
'Sinhala (si)': 'si', 'Slovak (sk)': 'sk',
'Slovenian (sl)': 'sl', 'Somali (so)': 'so',
'Spanish (es)': 'es', 'Sundanese (su)': 'su',
'Swahili (sw)': 'sw', 'Swedish (sv)': 'sv',
'Tagalog (tl)': 'tl', 'Tajik (tg)': 'tg',
'Tamil (ta)': 'ta', 'Tatar (tt)': 'tt',
'Telugu (te)': 'te', 'Thai (th)': 'th',
'Tibetan (bo)': 'bo', 'Turkish (tr)': 'tr',
'Turkmen (tk)': 'tk', 'Ukrainian (uk)': 'uk',
'Urdu (ur)': 'ur', 'Uzbek (uz)': 'uz',
'Vietnamese (vi)': 'vi', 'Welsh (cy)': 'cy',
'Yiddish (yi)': 'yi', 'Yoruba (yo)': 'yo',
}
def _language_options_for_model(model_name):
"""Return (values, default, state) for the language combobox given a model name."""
if model_name.endswith('.en'):
return ['English (en)'], 'English (en)', 'disabled'
if model_name.startswith('KB Swedish'):
return ['Swedish (sv)'], 'Swedish (sv)', 'disabled'
return list(WHISPER_LANGUAGES.keys()), 'Auto-detect', 'readonly'
def _set_app_icon(root):
"""Set app icon when supported, without crashing on unsupported platforms."""
base_dir = os.path.dirname(os.path.abspath(__file__))
icon_path = os.path.join(base_dir, "images", "icon.ico")
if not os.path.exists(icon_path):
return
try:
root.iconbitmap(icon_path)
except tk.TclError:
# Some Linux Tk builds don't accept .ico for iconbitmap.
pass
def _apply_display_scaling(root):
"""Auto-scale UI for high-resolution displays (e.g., 4K)."""
try:
screen_w = root.winfo_screenwidth()
screen_h = root.winfo_screenheight()
scale = min(screen_w / 1920.0, screen_h / 1080.0)
scale = max(1.0, min(scale, 2.0))
customtkinter.set_widget_scaling(scale)
customtkinter.set_window_scaling(scale)
except Exception:
pass
class App:
def __init__(self, master):
print(Back.CYAN + "Welcome to Local Transcribe with Whisper!\U0001f600\nCheck back here to see some output from your transcriptions.\nDon't worry, they will also be saved on the computer!\U0001f64f")
self.master = master
# Change font
font = ('Roboto', 13, 'bold') # Change the font and size here
@@ -27,107 +174,233 @@ class App:
path_frame.pack(fill=tk.BOTH, padx=10, pady=10)
customtkinter.CTkLabel(path_frame, text="Folder:", font=font).pack(side=tk.LEFT, padx=5)
self.path_entry = customtkinter.CTkEntry(path_frame, width=50, font=font_b)
self.path_entry.insert(0, os.path.join(os.getcwd(), 'sample_audio'))
self.path_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
customtkinter.CTkButton(path_frame, text="Browse", command=self.browse, font=font).pack(side=tk.LEFT, padx=5)
# Language frame
#thanks to pommicket from Stackoverflow for this fix
def on_entry_click(event):
"""function that gets called whenever entry is clicked"""
global firstclick
if firstclick: # if this is the first time they clicked it
firstclick = False
self.language_entry.delete(0, "end") # delete all the text in the entry
# Language frame
language_frame = customtkinter.CTkFrame(master)
language_frame.pack(fill=tk.BOTH, padx=10, pady=10)
customtkinter.CTkLabel(language_frame, text="Language:", font=font).pack(side=tk.LEFT, padx=5)
self.language_entry = customtkinter.CTkEntry(language_frame, width=50, font=('Roboto', 12, 'italic'))
self.language_entry.insert(0, 'Select language or clear to detect automatically')
self.language_entry.bind('<FocusIn>', on_entry_click)
self.language_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
_lang_values, _lang_default, _lang_state = _language_options_for_model('medium')
self.language_combobox = customtkinter.CTkComboBox(
language_frame, width=50, state=_lang_state,
values=_lang_values, font=font_b)
self.language_combobox.set(_lang_default)
self.language_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
# Model frame
models = ['base.en', 'base', 'small.en',
'small', 'medium.en', 'medium', 'large']
models = ['tiny', 'tiny.en', 'base', 'base.en',
'small', 'small.en', 'medium', 'medium.en',
'large-v2', 'large-v3',
'───────────────',
'KB Swedish (tiny)', 'KB Swedish (base)',
'KB Swedish (small)', 'KB Swedish (medium)',
'KB Swedish (large)']
model_frame = customtkinter.CTkFrame(master)
model_frame.pack(fill=tk.BOTH, padx=10, pady=10)
customtkinter.CTkLabel(model_frame, text="Model:", font=font).pack(side=tk.LEFT, padx=5)
# ComboBox frame
self.model_combobox = customtkinter.CTkComboBox(
model_frame, width=50, state="readonly",
values=models, font=font_b)
self.model_combobox.set(models[1]) # Set the default value
values=models, font=font_b,
command=self._on_model_change)
self.model_combobox.set('medium') # Set the default value
self.model_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
# Verbose frame
verbose_frame = customtkinter.CTkFrame(master)
verbose_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.verbose_var = tk.BooleanVar()
customtkinter.CTkCheckBox(verbose_frame, text="Output transcription to terminal", variable=self.verbose_var, font=font).pack(side=tk.LEFT, padx=5)
# Model description label
self.model_desc_label = customtkinter.CTkLabel(
master, text=self._model_desc_text('medium'),
font=('Roboto', 11), text_color=('#555555', '#aaaaaa'),
anchor='w')
self.model_desc_label.pack(fill=tk.X, padx=14, pady=(0, 4))
# Timestamps toggle
ts_frame = customtkinter.CTkFrame(master)
ts_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.timestamps_var = tk.BooleanVar(value=True)
self.timestamps_switch = customtkinter.CTkSwitch(
ts_frame, text="Include timestamps in transcription",
variable=self.timestamps_var, font=font_b)
self.timestamps_switch.pack(side=tk.LEFT, padx=5)
# Advanced options frame
adv_frame = customtkinter.CTkFrame(master)
adv_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.vad_var = tk.BooleanVar(value=False)
customtkinter.CTkSwitch(
adv_frame, text="VAD filter (remove silence)",
variable=self.vad_var, font=font_b).pack(side=tk.LEFT, padx=5)
self.word_ts_var = tk.BooleanVar(value=False)
customtkinter.CTkSwitch(
adv_frame, text="Word-level timestamps",
variable=self.word_ts_var, font=font_b).pack(side=tk.LEFT, padx=5)
self.translate_var = tk.BooleanVar(value=False)
customtkinter.CTkSwitch(
adv_frame, text="Translate to English",
variable=self.translate_var, font=font_b).pack(side=tk.LEFT, padx=5)
# Progress Bar
self.progress_bar = ttk.Progressbar(master, length=200, mode='indeterminate')
# Worker process handle (replaces thread+stop_event for true immediate cancellation)
self._proc = None
self._parent_conn = None
self._child_conn = None
# Button actions frame
button_frame = customtkinter.CTkFrame(master)
button_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.transcribe_button = customtkinter.CTkButton(button_frame, text="Transcribe", command=self.start_transcription, font=font)
self.transcribe_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
self.stop_button = customtkinter.CTkButton(
button_frame, text="Stop", command=self._stop_transcription, font=font,
fg_color="#c0392b", hover_color="#922b21", state=tk.DISABLED)
self.stop_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
customtkinter.CTkButton(button_frame, text="Quit", command=master.quit, font=font).pack(side=tk.RIGHT, padx=5, pady=10, fill=tk.X, expand=True)
# ── Embedded console / log panel ──────────────────────────────────
log_label = customtkinter.CTkLabel(master, text="Console output", font=font, anchor='w')
log_label.pack(fill=tk.X, padx=12, pady=(8, 0))
self.log_box = customtkinter.CTkTextbox(master, height=220, font=('Consolas', 14),
wrap='word', state='disabled',
fg_color='#1e1e1e', text_color='#e0e0e0')
self.log_box.pack(fill=tk.BOTH, expand=True, padx=10, pady=(2, 10))
# Redirect stdout & stderr into the log panel (no backend console)
sys.stdout = _ConsoleRedirector(self.log_box)
sys.stderr = _ConsoleRedirector(self.log_box)
# Backend indicator
_bi = detect_backend()
backend_label = customtkinter.CTkLabel(
master,
text=f"Backend: {_bi['label']}",
font=('Roboto', 11),
text_color=("#555555", "#aaaaaa"),
anchor='e',
)
backend_label.pack(fill=tk.X, padx=12, pady=(0, 2))
# Welcome message (shown after redirect so it appears in the panel)
print("Welcome to Local Transcribe with Whisper! \U0001f600")
print("Transcriptions will be saved automatically.")
print("" * 46)
# Helper functions
def _stop_transcription(self):
self.stop_button.configure(state=tk.DISABLED)
if self._proc and self._proc.is_alive():
self._proc.terminate()
try:
self._proc.join(timeout=3)
except Exception:
pass
if self._proc.is_alive():
self._proc.kill()
try:
self._proc.join(timeout=1)
except Exception:
pass
# Close pipe ends — no semaphores, so no leak
for conn in (self._parent_conn, self._child_conn):
try:
if conn:
conn.close()
except Exception:
pass
self._parent_conn = self._child_conn = None
print("⛔ Transcription stopped by user.")
def _model_desc_text(self, model_name):
info = MODEL_INFO.get(model_name)
if not info:
return ''
speed, size, stars, use = info
return f'{stars} {speed} · {size} · {use}'
def _on_model_change(self, selected):
self.model_desc_label.configure(text=self._model_desc_text(selected))
values, default, state = _language_options_for_model(selected)
self.language_combobox.configure(values=values, state=state)
self.language_combobox.set(default)
# Browsing
def browse(self):
folder_path = filedialog.askdirectory()
initial_dir = os.getcwd()
folder_path = filedialog.askdirectory(initialdir=initial_dir)
self.path_entry.delete(0, tk.END)
self.path_entry.insert(0, folder_path)
# Start transcription
def start_transcription(self):
# Disable transcribe button
model_display = self.model_combobox.get()
if model_display.startswith(''):
messagebox.showinfo("Invalid selection", "Please select a model, not the separator line.")
return
self.transcribe_button.configure(state=tk.DISABLED)
# Start a new thread for the transcription process
threading.Thread(target=self.transcribe_thread).start()
# Threading
def transcribe_thread(self):
self.stop_button.configure(state=tk.NORMAL)
path = self.path_entry.get()
model = self.model_combobox.get()
language = self.language_entry.get() or None
verbose = self.verbose_var.get()
# Show progress bar
model = HF_MODEL_MAP.get(model_display, model_display)
lang_label = self.language_combobox.get()
language = WHISPER_LANGUAGES.get(lang_label, lang_label) if lang_label else None
timestamps = self.timestamps_var.get()
vad_filter = self.vad_var.get()
word_timestamps = self.word_ts_var.get()
translate = self.translate_var.get()
glob_file = get_path(path)
self.progress_bar.pack(fill=tk.X, padx=5, pady=5)
self.progress_bar.start()
# Setting path and files
glob_file = get_path(path)
info_path = 'I will transcribe all eligible audio/video files in the path: {}\n\nContinue?'.format(path)
answer = messagebox.askyesno("Confirmation", info_path)
if not answer:
self.progress_bar.stop()
self.progress_bar.pack_forget()
self.transcribe_button.configure(state=tk.NORMAL)
return
# Start transcription
error_language = 'https://github.com/openai/whisper#available-models-and-languages'
self._parent_conn, self._child_conn = mp.Pipe(duplex=False)
self._proc = mp.Process(
target=_transcribe_worker_process,
args=(self._child_conn, path, glob_file, model, language, True, timestamps),
kwargs={"vad_filter": vad_filter, "word_timestamps": word_timestamps, "translate": translate},
daemon=True,
)
self._proc.start()
self._child_conn.close() # parent doesn't write; close its write-end
self._child_conn = None
self.master.after(100, self._poll_worker)
def _poll_worker(self):
done = False
result = None
try:
output_text = transcribe(path, glob_file, model, language, verbose)
except UnboundLocalError:
messagebox.showinfo("Files not found error!", 'Nothing found, choose another folder.')
while self._parent_conn and self._parent_conn.poll():
msg = self._parent_conn.recv()
if isinstance(msg, tuple) and msg[0] == '__done__':
done = True
result = msg[1]
else:
sys.stdout.write(msg)
sys.stdout.flush()
except EOFError:
# Child closed the pipe (normal completion or kill)
done = True
except Exception:
pass
except ValueError:
messagebox.showinfo("Language error!", 'See {} for supported languages'.format(error_language))
# Hide progress bar
if done or (self._proc and not self._proc.is_alive()):
if self._parent_conn:
try:
self._parent_conn.close()
except Exception:
pass
self._parent_conn = None
self._on_transcription_done(result)
else:
self.master.after(100, self._poll_worker)
def _on_transcription_done(self, output_text):
self.progress_bar.stop()
self.progress_bar.pack_forget()
# Enable transcribe button
self.stop_button.configure(state=tk.DISABLED)
self.transcribe_button.configure(state=tk.NORMAL)
# Recover output text
try:
messagebox.showinfo("Finished!", output_text)
except UnboundLocalError:
pass
if output_text:
title = "Finished!" if not output_text.startswith('') else "Error"
messagebox.showinfo(title, output_text)
if __name__ == "__main__":
# Setting custom themes
root = customtkinter.CTk()
_apply_display_scaling(root)
root.title("Local Transcribe with Whisper")
# Geometry
width,height = 450,275
root.geometry('{}x{}'.format(width,height))
# Icon
root.iconbitmap('images/icon.ico')
# Geometry — taller to accommodate the embedded console panel
width, height = 550, 560
root.geometry('{}x{}'.format(width, height))
root.minsize(450, 480)
# Icon (best-effort; ignored on platforms/builds without .ico support)
_set_app_icon(root)
# Run
app = App(root)
root.mainloop()
-3
View File
@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b361c4993eceb2006f225ffdd2f7b63265586e3dded351972dfcd5e5d75559c7
size 249467977
+2 -2
View File
@@ -1,7 +1,7 @@
from cx_Freeze import setup, Executable
build_exe_options = {
"packages": ['whisper','tkinter','customtkinter']
"packages": ['faster_whisper','tkinter','customtkinter']
}
executables = (
[
@@ -13,7 +13,7 @@ executables = (
)
setup(
name="Local Transcribe with Whisper",
version="1.2",
version="2.0",
author="Kristofer Rolf Söderström",
options={"build_exe":build_exe_options},
executables=executables
+128
View File
@@ -0,0 +1,128 @@
"""
Installer script for Local Transcribe with Whisper.
Detects NVIDIA GPU and offers to install GPU acceleration support.
Usage:
python install.py
"""
import os
import subprocess
import sys
import shutil
import site
def detect_nvidia_gpu():
"""Check if an NVIDIA GPU is present."""
candidates = [
shutil.which("nvidia-smi"),
r"C:\Windows\System32\nvidia-smi.exe",
r"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe",
]
for path in candidates:
if not path or not os.path.isfile(path):
continue
try:
r = subprocess.run(
[path, "--query-gpu=name", "--format=csv,noheader"],
capture_output=True, text=True, timeout=10,
)
if r.returncode == 0 and r.stdout.strip():
return True, r.stdout.strip().split("\n")[0]
except Exception:
continue
return False, None
def pip_install(*packages):
cmd = [sys.executable, "-m", "pip", "install"] + list(packages)
print(f"\n> {' '.join(cmd)}\n")
subprocess.check_call(cmd)
def get_site_packages():
for p in site.getsitepackages():
if p.endswith("site-packages"):
return p
return site.getsitepackages()[0]
def create_nvidia_pth():
"""Create a .pth startup hook that registers NVIDIA DLL directories."""
sp = get_site_packages()
pth_path = os.path.join(sp, "nvidia_cuda_path.pth")
# This one-liner runs at Python startup, before any user code.
pth_content = (
"import os, glob as g; "
"any(os.add_dll_directory(d) or os.environ.__setitem__('PATH', d + os.pathsep + os.environ.get('PATH','')) "
"for d in g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'bin')) "
"+ g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'lib')) "
"if os.path.isdir(d)) if os.name == 'nt' else None\n"
)
with open(pth_path, "w") as f:
f.write(pth_content)
print(f" Created CUDA startup hook: {pth_path}")
def verify_cuda():
"""Verify CUDA works in a fresh subprocess."""
try:
r = subprocess.run(
[sys.executable, "-c",
"import ctranslate2; "
"print('float16' in ctranslate2.get_supported_compute_types('cuda'))"],
capture_output=True, text=True, timeout=30,
)
return r.stdout.strip() == "True"
except Exception:
return False
def main():
print("=" * 55)
print(" Local Transcribe with Whisper — Installer")
print("=" * 55)
# Step 1: Base packages
print("\n[1/2] Installing base requirements...")
pip_install("-r", "requirements.txt")
print("\n Base requirements installed!")
# Step 2: GPU
print("\n[2/2] Checking for NVIDIA GPU...")
has_gpu, gpu_name = detect_nvidia_gpu()
if has_gpu:
print(f"\n NVIDIA GPU detected: {gpu_name}")
print(" GPU acceleration can make transcription 2-5x faster.")
print(" This will install ~300 MB of additional CUDA libraries.\n")
while True:
answer = input(" Install GPU support? [Y/n]: ").strip().lower()
if answer in ("", "y", "yes"):
print("\n Installing CUDA libraries...")
pip_install("nvidia-cublas-cu12", "nvidia-cudnn-cu12")
create_nvidia_pth()
print("\n Verifying CUDA...")
if verify_cuda():
print(" GPU support verified and working!")
else:
print(" WARNING: CUDA installed but not detected.")
print(" Update your NVIDIA drivers and try again.")
break
elif answer in ("n", "no"):
print("\n Skipping GPU. Re-run install.py to add it later.")
break
else:
print(" Please enter Y or N.")
else:
print("\n No NVIDIA GPU detected — using CPU mode.")
print("\n" + "=" * 55)
print(" Done! Run the app with: python app.py")
print("=" * 55)
if __name__ == "__main__":
main()
+3
View File
@@ -0,0 +1,3 @@
faster-whisper
mlx-whisper
customtkinter
+29
View File
@@ -0,0 +1,29 @@
#!/bin/bash
# ============================================================
# Local Transcribe with Whisper — macOS / Linux launcher
# ============================================================
# Double-click this file or run: ./run_Mac.sh
# On first run it creates a venv and installs dependencies.
# ============================================================
set -e
cd "$(dirname "$0")"
# Create .venv if it doesn't exist
if [ ! -f ".venv/bin/python" ]; then
echo "Creating virtual environment..."
python3 -m venv .venv
fi
PYTHON=".venv/bin/python"
# Install dependencies on first run
if ! "$PYTHON" -c "import faster_whisper" 2>/dev/null; then
echo "First run detected — running installer..."
"$PYTHON" install.py
echo
fi
echo "Starting Local Transcribe..."
"$PYTHON" app.py
+22 -4
View File
@@ -1,5 +1,23 @@
@echo off
echo Starting...
call conda activate base
REM OPTION 2 : (KEEP TEXT WITHIN QUOTES AND CHANGE USERNAME) "C:/Users/user/Anaconda3/condabin/activate.bat"
call python app.py
REM Create .venv on first run if it doesn't exist
if not exist ".venv\Scripts\python.exe" (
echo Creating virtual environment...
python -m venv .venv
if errorlevel 1 (
echo ERROR: Failed to create virtual environment. Is Python installed and on PATH?
pause
exit /b 1
)
)
set PYTHON=.venv\Scripts\python.exe
REM Check if dependencies are installed
%PYTHON% -c "import faster_whisper" 2>nul
if errorlevel 1 (
echo First run detected - running installer...
%PYTHON% install.py
echo.
)
echo Starting Local Transcribe...
%PYTHON% app.py
@@ -1,4 +1,4 @@
Armstrong_Small_Step
[0:00:00 --> 0:00:07]: And they're still brought to land now.
[0:00:07 --> 0:00:18]: It's one small step for man.
[0:00:18 --> 0:00:23]: One by a fleet for man time.
────────────────────────────────────────
That's one small step for man, one giant leap for mankind.
@@ -1,4 +1,4 @@
Axel_Pettersson_röstinspelning
[0:00:00 --> 0:00:06]: Hej, jag heter Raxel Patterson, jag får att se över UR 1976.
[0:00:06 --> 0:00:12.540000]: Jag har varit Wikipedia-périonsen 2018 och jag har översat röst-intro-
[0:00:12.540000 --> 0:00:15.540000]:-projektet till svenska.
────────────────────────────────────────
Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
+450 -53
View File
@@ -1,27 +1,217 @@
import os
import sys
import platform
import datetime
import time
import site
from glob import glob
import whisper
from torch import cuda, Generator
import colorama
from colorama import Back,Fore
colorama.init(autoreset=True)
# ---------------------------------------------------------------------------
# CUDA setup — must happen before importing faster_whisper / ctranslate2
# ---------------------------------------------------------------------------
def _setup_cuda_libs():
"""Register NVIDIA pip-package lib dirs so ctranslate2 finds CUDA at runtime.
pip-installed nvidia-cublas-cu12 / nvidia-cudnn-cu12 place their shared
libraries inside the site-packages tree. Neither Windows nor Linux
automatically search those directories, so we must register them
explicitly:
- Windows: os.add_dll_directory() + PATH
- Linux: LD_LIBRARY_PATH (read by the dynamic linker)
"""
try:
sp_dirs = site.getsitepackages()
except AttributeError:
# virtualenv without site-packages helper
sp_dirs = [os.path.join(sys.prefix, "lib",
"python" + ".".join(map(str, sys.version_info[:2])),
"site-packages")]
for sp in sp_dirs:
nvidia_root = os.path.join(sp, "nvidia")
if not os.path.isdir(nvidia_root):
continue
for pkg in os.listdir(nvidia_root):
for sub in ("bin", "lib"):
d = os.path.join(nvidia_root, pkg, sub)
if not os.path.isdir(d):
continue
if sys.platform == "win32":
os.environ["PATH"] = d + os.pathsep + os.environ.get("PATH", "")
try:
os.add_dll_directory(d)
except (OSError, AttributeError):
pass
else:
# Linux / macOS — prepend to LD_LIBRARY_PATH
ld = os.environ.get("LD_LIBRARY_PATH", "")
if d not in ld:
os.environ["LD_LIBRARY_PATH"] = d + (":" + ld if ld else "")
# Also load via ctypes so already-started process sees it
import ctypes
try:
for so in sorted(os.listdir(d)):
if so.endswith(".so") or ".so." in so:
ctypes.cdll.LoadLibrary(os.path.join(d, so))
except OSError:
pass
_setup_cuda_libs()
from faster_whisper import WhisperModel
SUPPORTED_EXTENSIONS = {
".wav", ".mp3", ".m4a", ".flac", ".ogg", ".wma", ".aac",
".mp4", ".mkv", ".mov", ".webm", ".avi", ".mpeg", ".mpg",
}
# ---------------------------------------------------------------------------
# MLX model map (Apple Silicon only)
# ---------------------------------------------------------------------------
_MLX_MODEL_MAP = {
"tiny": "mlx-community/whisper-tiny-mlx",
"base": "mlx-community/whisper-base-mlx",
"small": "mlx-community/whisper-small-mlx",
"medium": "mlx-community/whisper-medium-mlx",
"large-v2": "mlx-community/whisper-large-v2-mlx",
"large-v3": "mlx-community/whisper-large-v3-mlx",
}
def detect_backend():
"""Return the best available inference backend.
Returns a dict with keys:
backend : "mlx" | "cuda" | "cpu"
device : device string for WhisperModel (cuda / cpu)
compute_type : compute type string for WhisperModel
label : human-readable label for UI display
"""
# Apple Silicon → try MLX (GPU + Neural Engine via Apple MLX)
if sys.platform == "darwin" and platform.machine() == "arm64":
try:
import mlx_whisper # noqa: F401
return {
"backend": "mlx",
"device": "cpu",
"compute_type": "int8",
"label": "MLX · Apple GPU/NPU",
}
except ImportError:
pass
# NVIDIA CUDA
try:
import ctranslate2
cuda_types = ctranslate2.get_supported_compute_types("cuda")
if "float16" in cuda_types:
return {
"backend": "cuda",
"device": "cuda",
"compute_type": "float16",
"label": "CUDA · GPU",
}
except Exception:
pass
return {
"backend": "cpu",
"device": "cpu",
"compute_type": "int8",
"label": "CPU · int8",
}
def _decode_audio_pyav(file_path):
"""Decode any audio/video file to a float32 mono 16 kHz numpy array.
Uses PyAV (bundled FFmpeg) — no external ffmpeg binary required.
Returns (audio_array, duration_seconds).
"""
import av
import numpy as np
with av.open(file_path) as container:
duration = float(container.duration) / 1_000_000 # microseconds → seconds
stream = container.streams.audio[0]
resampler = av.AudioResampler(format="fltp", layout="mono", rate=16000)
chunks = []
for frame in container.decode(stream):
for out in resampler.resample(frame):
if out:
chunks.append(out.to_ndarray()[0])
# Flush resampler
for out in resampler.resample(None):
if out:
chunks.append(out.to_ndarray()[0])
if not chunks:
return np.zeros(0, dtype=np.float32), duration
return np.concatenate(chunks, axis=0), duration
def _transcribe_mlx_file(file, mlx_model_id, language, timestamps, verbose, vad_filter=False, word_timestamps=False, translate=False):
"""Transcribe a single file with mlx-whisper (Apple GPU/NPU).
Decodes audio via PyAV (no system ffmpeg needed), then runs MLX inference.
Returns (segments_as_dicts, audio_duration_seconds).
Segments have dict keys: 'start', 'end', 'text'.
"""
import mlx_whisper
audio_array, duration = _decode_audio_pyav(file)
decode_opts = {}
if language:
decode_opts["language"] = language
if translate:
decode_opts["task"] = "translate"
if word_timestamps:
decode_opts["word_timestamps"] = True
result = mlx_whisper.transcribe(
audio_array,
path_or_hf_repo=mlx_model_id,
verbose=(True if verbose else None),
**decode_opts,
)
segments = result["segments"]
audio_duration = segments[-1]["end"] if segments else duration
return segments, audio_duration
def _srt_timestamp(seconds):
"""Convert seconds (float) to SRT timestamp format HH:MM:SS,mmm."""
ms = round(seconds * 1000)
h, ms = divmod(ms, 3_600_000)
m, ms = divmod(ms, 60_000)
s, ms = divmod(ms, 1000)
return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
# Get the path
def get_path(path):
glob_file = glob(path + '/*')
return glob_file
all_items = glob(path + '/*')
media_files = []
for item in all_items:
if not os.path.isfile(item):
continue
_, ext = os.path.splitext(item)
if ext.lower() in SUPPORTED_EXTENSIONS:
media_files.append(item)
return sorted(media_files)
# Main function
def transcribe(path, glob_file, model=None, language=None, verbose=False):
def transcribe(path, glob_file, model=None, language=None, verbose=False, timestamps=True, stop_event=None, vad_filter=False, word_timestamps=False, translate=False):
"""
Transcribes audio files in a specified folder using OpenAI's Whisper model.
Transcribes audio files in a specified folder using faster-whisper (CTranslate2).
Args:
path (str): Path to the folder containing the audio files.
glob_file (list): List of audio file paths to transcribe.
model (str, optional): Name of the Whisper model to use for transcription.
model (str, optional): Name of the Whisper model size to use for transcription.
Defaults to None, which uses the default model.
language (str, optional): Language code for transcription. Defaults to None,
which enables automatic language detection.
@@ -38,53 +228,260 @@ def transcribe(path, glob_file, model=None, language=None, verbose=False):
- The function downloads the specified model if not available locally.
- The transcribed text files will be saved in a "transcriptions" folder
within the specified path.
- Uses CTranslate2 for up to 4x faster inference compared to openai-whisper.
- FFmpeg is bundled via the PyAV dependency — no separate installation needed.
"""
# Check for GPU acceleration
if cuda.is_available():
Generator('cuda').manual_seed(42)
else:
Generator().manual_seed(42)
# Load model
model = whisper.load_model(model)
# Start main loop
files_transcripted=[]
"""
SEP = "" * 46
# ── Step 1: Detect hardware ──────────────────────────────────────
backend_info = detect_backend()
backend = backend_info["backend"]
device = backend_info["device"]
compute_type = backend_info["compute_type"]
print(f"⚙ Backend: {backend_info['label']}")
# ── Step 1b: MLX path (Apple GPU/NPU) ───────────────────────────
if backend == "mlx":
mlx_model_id = _MLX_MODEL_MAP.get(model)
if mlx_model_id is None:
print(f"⚠ Model '{model}' is not available in MLX format.")
print(" Falling back to faster-whisper on CPU (int8).")
backend = "cpu"
device, compute_type = "cpu", "int8"
else:
# ── Step 2 (MLX): load + transcribe ─────────────────────
print(f"⏳ Loading MLX model '{model}' — downloading if needed...")
print("✅ Model ready!")
print(SEP)
total_files = len(glob_file)
print(f"📂 Found {total_files} supported media file(s) in folder")
print(SEP)
if total_files == 0:
output_text = '⚠ No supported media files found — try another folder.'
print(output_text)
print(SEP)
return output_text
files_transcripted = []
file_num = 0
for file in glob_file:
if stop_event and stop_event.is_set():
print("⛔ Transcription stopped by user.")
break
title = os.path.basename(file).split('.')[0]
file_num += 1
print(f"\n{'' * 46}")
print(f"📄 File {file_num}/{total_files}: {title}")
try:
t_start = time.time()
segments, audio_duration = _transcribe_mlx_file(
file, mlx_model_id, language, timestamps, verbose,
vad_filter=vad_filter, word_timestamps=word_timestamps,
translate=translate
)
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
segment_list = []
txt_path = "{}/transcriptions/{}.txt".format(path, title)
srt_path = "{}/transcriptions/{}.srt".format(path, title)
with open(txt_path, 'w', encoding='utf-8') as f, \
open(srt_path, 'w', encoding='utf-8') as srt_f:
f.write(title)
f.write('\n' + '' * 40 + '\n')
for idx, seg in enumerate(segments, start=1):
if stop_event and stop_event.is_set():
break
text = seg["text"].strip()
if timestamps:
start_ts = str(datetime.timedelta(seconds=seg["start"]))
end_ts = str(datetime.timedelta(seconds=seg["end"]))
f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
else:
f.write('\n{}'.format(text))
srt_f.write(f'{idx}\n{_srt_timestamp(seg["start"])} --> {_srt_timestamp(seg["end"])}\n{text}\n\n')
f.flush()
srt_f.flush()
if verbose:
print(" [%.2fs → %.2fs] %s" % (seg["start"], seg["end"], seg["text"]))
else:
print(" Transcribed up to %.0fs..." % seg["end"], end='\r')
segment_list.append(seg)
elapsed = time.time() - t_start
elapsed_min = elapsed / 60.0
audio_min = audio_duration / 60.0
ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
print(f"✅ Done — saved to transcriptions/{title}.txt")
print(f"⏱ Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min ({ratio:.1f}x realtime)")
files_transcripted.append(segment_list)
except Exception as exc:
print(f"⚠ Could not decode '{os.path.basename(file)}', skipping.")
print(f" Reason: {exc}")
print(f"\n{SEP}")
if files_transcripted:
output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n Saved in: {path}/transcriptions"
else:
output_text = '⚠ No files eligible for transcription — try another folder.'
print(output_text)
print(SEP)
return output_text
# ── Step 2: Load model (faster-whisper / CTranslate2) ───────────
print(f"⏳ Loading model '{model}' — downloading if needed...")
try:
whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
except Exception as exc:
err = str(exc).lower()
cuda_runtime_missing = (
device == "cuda"
and (
"libcublas" in err
or "libcudnn" in err
or "cuda" in err
or "cannot be loaded" in err
or "not found" in err
)
)
if not cuda_runtime_missing:
raise
print("⚠ CUDA runtime not available; falling back to CPU (int8).")
print(f" Reason: {exc}")
device, compute_type = "cpu", "int8"
whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
print("✅ Model ready!")
print(SEP)
# ── Step 3: Transcribe files ─────────────────────────────────────
total_files = len(glob_file)
print(f"📂 Found {total_files} supported media file(s) in folder")
print(SEP)
if total_files == 0:
output_text = '⚠ No supported media files found — try another folder.'
print(output_text)
print(SEP)
return output_text
files_transcripted = []
file_num = 0
for file in glob_file:
if stop_event and stop_event.is_set():
print("⛔ Transcription stopped by user.")
break
title = os.path.basename(file).split('.')[0]
print(Back.CYAN + '\nTrying to transcribe file named: {}\U0001f550'.format(title))
file_num += 1
print(f"\n{'' * 46}")
print(f"📄 File {file_num}/{total_files}: {title}")
try:
result = model.transcribe(
file,
language=language,
verbose=verbose
)
files_transcripted.append(result)
# Make folder if missing
try:
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
except FileExistsError:
pass
# Create segments for text files
start = []
end = []
text = []
for segment in result['segments']:
start.append(str(datetime.timedelta(seconds=segment['start'])))
end.append(str(datetime.timedelta(seconds=segment['end'])))
text.append(segment['text'])
# Save files to transcriptions folder
with open("{}/transcriptions/{}.txt".format(path, title), 'w', encoding='utf-8') as file:
file.write(title)
for i in range(len(result['segments'])):
file.write('\n[{} --> {}]:{}'.format(start[i], end[i], text[i]))
# Skip invalid files
except RuntimeError:
print(Fore.RED + 'Not a valid file, skipping.')
pass
# Check if any files were processed.
t_start = time.time()
segments, info = whisper_model.transcribe(
file,
language=language,
beam_size=5,
task="translate" if translate else "transcribe",
vad_filter=vad_filter,
word_timestamps=word_timestamps,
)
audio_duration = info.duration # seconds
# Make folder if missing
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
# Stream segments as they are decoded
segment_list = []
txt_path = "{}/transcriptions/{}.txt".format(path, title)
srt_path = "{}/transcriptions/{}.srt".format(path, title)
with open(txt_path, 'w', encoding='utf-8') as f, \
open(srt_path, 'w', encoding='utf-8') as srt_f:
f.write(title)
f.write('\n' + '' * 40 + '\n')
for idx, seg in enumerate(segments, start=1):
if stop_event and stop_event.is_set():
break
text = seg.text.strip()
if timestamps:
start_ts = str(datetime.timedelta(seconds=seg.start))
end_ts = str(datetime.timedelta(seconds=seg.end))
f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
else:
f.write('\n{}'.format(text))
# Use word-level timestamps for SRT if available
if word_timestamps and hasattr(seg, 'words') and seg.words:
for w_idx, word in enumerate(seg.words, start=1):
w_text = word.word.strip()
if not w_text:
continue
w_start = _srt_timestamp(word.start)
w_end = _srt_timestamp(word.end)
srt_f.write(f'{idx}.{w_idx}\n{w_start} --> {w_end}\n{w_text}\n\n')
else:
srt_f.write(f'{idx}\n{_srt_timestamp(seg.start)} --> {_srt_timestamp(seg.end)}\n{text}\n\n')
f.flush()
srt_f.flush()
if verbose:
print(" [%.2fs → %.2fs] %s" % (seg.start, seg.end, seg.text))
else:
print(" Transcribed up to %.0fs..." % seg.end, end='\r')
segment_list.append(seg)
elapsed = time.time() - t_start
elapsed_min = elapsed / 60.0
audio_min = audio_duration / 60.0
ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
print(f"✅ Done — saved to transcriptions/{title}.txt")
print(f"⏱ Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min ({ratio:.1f}x realtime)")
files_transcripted.append(segment_list)
except Exception as exc:
print(f"⚠ Could not decode '{os.path.basename(file)}', skipping.")
print(f" Reason: {exc}")
# ── Summary ──────────────────────────────────────────────────────
print(f"\n{SEP}")
if len(files_transcripted) > 0:
output_text = 'Finished transcription, {} files can be found in {}/transcriptions'.format(len(files_transcripted), path)
output_text = f"Finished! {len(files_transcripted)} file(s) transcribed.\n Saved in: {path}/transcriptions"
else:
output_text = 'No files elligible for transcription, try adding audio or video files to this folder or choose another folder!'
# Return output text
output_text = 'No files eligible for transcription — try another folder.'
print(output_text)
print(SEP)
return output_text
def _transcribe_worker_process(conn, path, glob_file, model, language, verbose, timestamps, vad_filter=False, word_timestamps=False, translate=False):
"""Child-process entry point for the UI's multiprocessing backend.
Redirects stdout/stderr → pipe connection so the main process can display
output in the console panel. The main process sends SIGTERM/SIGKILL to
stop this process immediately, including any in-progress download or inference.
"""
import sys
class _PipeWriter:
def __init__(self, c):
self.c = c
def write(self, text):
if text:
try:
self.c.send(text)
except Exception:
pass
def flush(self):
pass
writer = _PipeWriter(conn)
sys.stdout = writer
sys.stderr = writer
result = '⚠ No output produced.'
try:
result = transcribe(path, glob_file, model, language, verbose, timestamps,
vad_filter=vad_filter, word_timestamps=word_timestamps,
translate=translate)
except Exception as exc:
result = f'⚠ Unexpected error: {exc}'
finally:
try:
conn.send(('__done__', result))
except Exception:
pass
conn.close()