Compare commits

82 Commits

Author SHA1 Message Date
kobim e2e19940dd feat: update README to reflect Apple Silicon GPU support and new features in version 3.0 2026-04-11 14:16:07 +02:00
kobim 0293a13177 feat: add advanced transcription options for VAD, word-level timestamps, and translation 2026-04-11 14:06:04 +02:00
kobim 8d5c8d6097 feat: implement multiprocessing for transcription with immediate cancellation 2026-04-05 22:11:13 +02:00
kobim e29572420e feat: enhance transcription capabilities with MLX support and backend detection 2026-04-04 00:32:36 +02:00
Kristofer Söderström f7d621e510 Add timestamps toggle and update transcription format to include/exclude timestamps 2026-03-20 20:19:46 +01:00
Kristofer Rolf Söderström 2a1df6aeba Update Python installation instructions in README
Clarified installation instructions for Python 3.10 or later, specifying preferred installation method.
2026-03-03 08:35:03 +01:00
soderstromkr 58255c3d10 fix: Linux/Ubuntu support — icon fallback, HiDPI scaling, CUDA lib paths, per-file timing
- app.py: graceful icon loading (no crash on Linux Tk without .ico support)
- app.py: auto-detect display scaling for 4K/HiDPI screens
- _LocalTranscribe.py: register NVIDIA pip-package .so paths on Linux (LD_LIBRARY_PATH)
  so faster-whisper finds libcublas/libcudnn at runtime
- _LocalTranscribe.py: auto-fallback to CPU if CUDA runtime libs missing
- _LocalTranscribe.py: filter input to supported media extensions only
- _LocalTranscribe.py: show real decode errors instead of generic skip message
- _LocalTranscribe.py: per-file timer showing wall-clock vs audio duration
2026-03-02 21:49:32 +01:00
Kristofer Söderström ea43074852 Update README.md: Add manual installation instructions for troubleshooting launcher issues 2026-03-02 17:17:35 +01:00
Kristofer Söderström 7b81778d9e Update README.md: Simplify installation instructions and clarify auto-installation process 2026-03-02 17:16:09 +01:00
Kristofer Söderström e65462f57b Update README.md: Add link to classic release in Mac user note 2026-03-02 17:13:05 +01:00
Kristofer Söderström 09e3e43c51 Update README.md: Reorder features for clarity and emphasize integrated console 2026-03-02 17:11:13 +01:00
Kristofer Söderström d4c26f6c37 Update README.md: Rearrange new features for clarity and highlight Swedish-optimised models 2026-03-02 17:10:15 +01:00
Kristofer Söderström acb6947f87 Update README.md: Revise installation instructions and clarify platform-specific run commands 2026-03-02 17:04:59 +01:00
Kristofer Söderström f8cf42733d Revamp: embedded console, faster-whisper, simplified install 2026-03-02 17:02:16 +01:00
Kristofer Rolf Söderström 7d3fe1ba26 Merge pull request #11 from soderstromkr/copilot/update-whisper-device-parameter
Pass explicit device parameter to whisper.load_model() for MPS acceleration
2026-01-22 14:03:13 +01:00
copilot-swe-agent[bot] da42a6e4cc Add .gitignore and remove __pycache__ files
Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>
2026-01-22 13:00:38 +00:00
copilot-swe-agent[bot] 0dab0d9bea Add explicit device parameter to whisper.load_model()
Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>
2026-01-22 13:00:21 +00:00
copilot-swe-agent[bot] 953c71ab28 Initial plan 2026-01-22 12:57:09 +00:00
Kristofer Rolf Söderström 5522bdd575 Merge pull request #6
Merged pull request #6
2026-01-22 13:53:23 +01:00
Kristofer Rolf Söderström 861c470330 Merge pull request #10 from soderstromkr/copilot/add-readme-gpu-support
Add GPU support documentation to README
2026-01-22 13:44:11 +01:00
copilot-swe-agent[bot] 6de6d4b2ff Add GPU support section to README with CUDA PyTorch installation instructions
Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>
2026-01-22 12:42:09 +00:00
copilot-swe-agent[bot] 01552cc7cb Initial plan 2026-01-22 12:40:19 +00:00
Yaroslav P 049a168c81 amd graphic card support 2025-03-05 16:23:10 +02:00
Kristofer Rolf Söderström 56a925463f Update README.md 2024-05-17 08:51:16 +02:00
Kristofer Rolf Söderström fe60b04020 Update README.md 2024-05-17 08:49:28 +02:00
Kristofer Rolf Söderström ff06a257f2 Update README.md 2024-05-17 08:47:57 +02:00
Kristofer Rolf Söderström 5e31129ea2 Create requirements.txt 2024-05-17 08:44:39 +02:00
Kristofer Rolf Söderström 3f0bca02b7 Update README.md 2024-05-17 08:44:09 +02:00
Kristofer Rolf Söderström 488e78a5ae Update README.md 2024-05-17 08:42:42 +02:00
Kristofer Rolf Söderström 829a054300 Update README.md 2024-05-17 08:40:42 +02:00
Kristofer Rolf Söderström 462aae12ca Update README.md 2024-05-17 08:09:30 +02:00
Kristofer Rolf Söderström fec9190ba1 Update README.md 2024-05-17 08:08:51 +02:00
Kristofer Rolf Söderström 0dde25204d Update README.md
removed other installation options from readme
2024-05-17 08:07:00 +02:00
Kristofer Söderström b611aa6b8c removed messagebox 2023-11-06 10:13:04 +01:00
Kristofer Söderström 7d50d5f4cf QOL improvements 2023-11-06 09:57:44 +01:00
Kristofer Söderström 7799d03960 bug fixes 2023-11-06 09:31:53 +01:00
Kristofer Rolf Söderström f88186dacc Update app.py 2023-10-19 09:26:43 +02:00
Kristofer Rolf Söderström 3f5c1491ac Delete build.zip 2023-10-19 09:20:55 +02:00
Kristofer Rolf Söderström c83e15bdba Update README.md 2023-10-19 09:20:29 +02:00
Kristofer Rolf Söderström ff16ad30e1 Merge pull request #2 from ValentinFunk/patch-1
Fix mac instructions link
2023-10-19 09:09:01 +02:00
Valentin 622165b3e6 Update Mac_instructions.md 2023-09-08 10:11:02 +02:00
Valentin 0e9cbdca58 Fix mac instructions link 2023-09-08 10:09:15 +02:00
Kristofer Söderström 87cb509b14 added windows exe in as zip 2023-06-30 17:26:24 +02:00
Kristofer Söderström ba935cafb7 formatting 2023-06-30 16:32:37 +02:00
Kristofer Söderström 6497508b7a fix formatting 2023-06-30 16:23:07 +02:00
Kristofer Söderström d96333a5a7 Complete rework for GUI, experimental EXE file and other minor changes, see readme for more info 2023-06-30 16:11:59 +02:00
Kristofer Rolf Söderström b765ff6bc6 Update README.md 2023-06-28 14:11:51 +02:00
Kristofer Rolf Söderström 867b082589 Add files via upload 2023-04-26 09:17:33 +02:00
Kristofer Rolf Söderström b4017c6fee Update README.md 2023-04-26 09:17:09 +02:00
Kristofer Rolf Söderström 1ea5187e78 Merge pull request #1 from bjornekstrom/main
README.md formatting suggestions
2023-04-24 09:25:07 +02:00
Björn Ekström 0051ceb873 Update README.md 2023-04-21 15:11:03 +02:00
Björn Ekström 76be00552f Updated README and Mac screenshot 2023-04-21 15:09:46 +02:00
Björn Ekström a5dd5d4a03 Update README.md
Further formatting.
2023-04-21 14:23:14 +02:00
Björn Ekström 43bcffaf4c Update README.md
Some formatting suggestions.
2023-04-21 14:22:34 +02:00
Kristofer Rolf Söderström 4e1c709f43 Update transcribe.py
better time keeping
2023-04-20 20:13:54 +02:00
Kristofer Rolf Söderström dfe967bd58 Update run_Windows.bat 2023-04-20 19:35:51 +02:00
Kristofer Rolf Söderström 586289efe5 Update Mac_instructions.txt 2023-04-19 16:51:36 +02:00
Kristofer Rolf Söderström c5a5597eee Update README.md 2023-04-19 16:46:49 +02:00
Kristofer Rolf Söderström ce8c365fc4 Update and rename Mac_2_instructions.txt to Mac_instructions.txt 2023-04-17 20:28:52 +02:00
Kristofer Rolf Söderström e2afd34170 Delete run_Mac_2.command 2023-04-17 20:25:18 +02:00
Kristofer Rolf Söderström 6fa49e41d9 Delete run_Mac_1.sh 2023-04-17 20:24:50 +02:00
Kristofer Söderström 1da9adbf5e updated version number 2023-04-14 10:32:38 +02:00
Kristofer Söderström 2769ddf68b dedicated windows and mac scripts, fixed verbose checkbox 2023-04-14 10:31:26 +02:00
Kristofer Rolf Söderström 1128e44486 Update README.md 2023-04-14 09:09:52 +02:00
Kristofer Rolf Söderström eec20b48c4 Update README.md 2023-04-14 08:30:29 +02:00
Kristofer Rolf Söderström b569d41aa9 Update README.md 2023-04-14 08:28:24 +02:00
Kristofer Rolf Söderström 99a6625e0e Update README.md 2023-03-31 11:12:06 +02:00
Kristofer Rolf Söderström b09114625a Update README.md 2023-03-27 21:29:51 +02:00
Kristofer Rolf Söderström 785f2b8215 Update README.md 2023-03-27 21:28:34 +02:00
Kristofer Rolf Söderström 412ab97157 Update README.md 2023-03-27 21:26:34 +02:00
Kristofer Rolf Söderström a14196b055 Update README.md 2023-03-27 21:25:41 +02:00
Kristofer Rolf Söderström c319316a4d Add files via upload 2023-03-27 21:25:11 +02:00
Kristofer Rolf Söderström 26c6f84e72 Update README.md 2023-03-27 21:24:12 +02:00
Kristofer Rolf Söderström 8f76466f57 typos 2023-03-27 21:18:04 +02:00
Kristofer Rolf Söderström 1f684a848a Update README.md 2023-03-27 10:08:19 +02:00
Kristofer Rolf Söderström bf75df30a4 Update README.md 2023-03-27 10:05:58 +02:00
Kristofer Söderström f5a8b19b65 fixed bug 2023-03-27 09:57:28 +02:00
Kristofer Söderström 7bbfef44cb added GUI and batch file to run GUI 2023-03-27 09:25:56 +02:00
Kristofer Söderström acadd17007 some corrections 2023-03-23 15:14:03 +01:00
Kristofer Söderström 918c2e489e added example 2023-03-23 15:06:20 +01:00
Kristofer Rolf Söderström 4710c61e22 Update README.md 2023-03-23 14:58:23 +01:00
Kristofer Söderström eea7441e43 fixed some spacing 2023-03-22 22:02:41 +01:00
20 changed files with 1332 additions and 140 deletions
+1
View File
@@ -0,0 +1 @@
*.zip filter=lfs diff=lfs merge=lfs -text
+26
View File
@@ -0,0 +1,26 @@
# Python cache
__pycache__/
*.py[cod]
*$py.class
# Virtual environments
venv/
env/
ENV/
.venv/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
# Build artifacts
dist/
build/
*.egg-info/
+3 -3
View File
@@ -4,8 +4,8 @@ authors:
- family-names: "Söderström"
given-names: "Kristofer Rolf"
orcid: "https://orcid.org/0000-0002-5322-3350"
title: "transcribe"
version: 1.0
doi: 10.5281/zenodo.7760511
title: "Local Transcribe"
version: 1.2
doi: 10.5281/zenodo.7760510
date-released: 2023-03-22
url: "https://github.com/soderstromkr/transcribe"
+31
View File
@@ -0,0 +1,31 @@
### How to run on Mac / Linux
#### Quick start
1. Open Terminal and navigate to the project folder (or right-click the folder and select "Open in Terminal").
2. Make the script executable (only needed once):
```
chmod +x run_Mac.sh
```
3. Run it:
```
./run_Mac.sh
```
This will automatically:
- Create a virtual environment (`.venv`)
- Install all dependencies (no admin rights needed)
- Launch the app
#### Manual steps (alternative)
If you prefer to do it manually:
```
python3 -m venv .venv
.venv/bin/python install.py
.venv/bin/python app.py
```
#### Notes
- **Python 3.10+** is required. macOS users can install it from [python.org](https://www.python.org/downloads/) or via `brew install python`.
- **No FFmpeg install needed** — audio decoding is bundled.
- **GPU acceleration** is not available on macOS (Apple Silicon MPS is not supported by CTranslate2). CPU with int8 quantization is still fast.
- On Apple Silicon (M1/M2/M3/M4), the `small` or `base` models run well. `medium` works but is slower.
+108 -24
View File
@@ -1,29 +1,113 @@
## transcribe
Simple script that uses OpenAI's Whisper to transcribe audio files from your local folders.
## Local Transcribe with Whisper
### Instructions
#### Requirements
1. This script was made and tested in an Anaconda environment with python 3.10. I recommend this method if you're not familiar with python.
See [here](https://docs.anaconda.com/anaconda/install/index.html) for instructions. You might need administrator rights.
2. Whisper requires some additional libraries. The [setup](https://github.com/openai/whisper#setup) page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files."
Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmepg[^1], which takes care of setting up PATH variables. From the anaconda prompt, type or copy the following:
```
conda install -c conda-forge ffmpeg-python
```
3. The main functionality comes from openai-whisper. See their [page](https://github.com/openai/whisper) for details. As of 2023-03-22 you can install via:
```
pip install -U openai-whisper
```
#### Using the script
This is a simple script with no installation. You can either clone the repository with
```
git clone https://github.com/soderstromkr/transcribe.git
```
and use the example.ipynb template to use the script **OR (for beginners)** download the ```transcribe.py``` file into your work folder. Then you can either import it to another script or notebook for use. I recommend jupyter notebook for new users, see the example below. (Remember to have transcribe.py and example.ipynb in the same working folder).
> **🍎 Apple Silicon GPU/NPU acceleration:** This version now supports native Apple GPU/NPU acceleration via [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). On Apple Silicon Macs, transcription runs on the Apple GPU and Neural Engine — no CPU fallback needed.
### Example
See the [example](example.ipynb) implementation on jupyter notebook.
Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2) on Windows/Linux and [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
[^1]: Advanced users can use ```pip install ffmpeg-python``` but be ready to deal with some [PATH issues](https://stackoverflow.com/questions/65836756/python-ffmpeg-wont-accept-path-why), which I encountered in Windows 11.
## New in version 3.0!
1. **Apple Silicon GPU/NPU support** — native MLX backend for Apple Silicon Macs, using Apple GPU + Neural Engine.
2. **SRT subtitle export** — valid SubRip files alongside the existing TXT output, ready for HandBrake or any video player.
3. **VAD filter** — removes silence, reduces hallucination, improves accuracy.
4. **Word-level timestamps** — per-word SRT timing for precise subtitle burning.
5. **Translation mode** — transcribe any language and translate to English in one step.
6. **Stop button** — immediately cancel any transcription, including model downloads.
7. **Language dropdown** — 99 languages with proper ISO codes, no more guessing formats.
8. **Model descriptions** — speed, size, quality stars, and use case shown for every model.
## New in version 2.0!
1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage, simpler installation.
2. **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab)
3. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
4. **No admin rights required** — a plain `pip install` covers everything.
5. **No PyTorch dependency** — dramatically smaller install footprint.
6. **Integrated console** - all info in the same application.
7. **`tiny` model added** — smallest and fastest option.
## Features
* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
* Choose the language of the files you are transcribing from a dropdown of 99 supported languages, or let the application automatically detect the language.
* Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
* **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
* **VAD filter** — removes silence from audio before transcription, reducing hallucination and improving accuracy.
* **Word-level timestamps** — generates per-word timing in the SRT output for precise subtitle synchronization.
* **Translation mode** — transcribes audio in any language and translates the result to English.
* **SRT export** — valid SubRip subtitle files saved alongside TXT, ready for HandBrake or any video player.
* Monitor the progress of the transcription with the progress bar and terminal.
* Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
* View the transcribed text in a message box once the transcription is completed.
* **Stop button** — immediately cancel transcription, including model downloads.
## Installation
### Get the files
Download the zip folder and extract it to your preferred working folder.
![](images/Picture1.png)
Or by cloning the repository with:
```
git clone https://gitea.kobim.cloud/kobim/whisper-local-transcribe.git
```
### Prerequisites
Install **Python 3.10 or later**. Some IT policies allow installing from the Microsoft Store or Mac equivalent. However, I would prefer an install from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
### Run on Windows
Double-click `run_Windows.bat` — it will auto-install everything on first run.
### Run on Mac / Linux
Run `./run_Mac.sh` — it will auto-install everything on first run. See [Mac instructions](Mac_instructions.md) for details.
> **Note:** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.
### Manual installation (if the launchers don't work)
If `run_Windows.bat` or `run_Mac.sh` fails (e.g. Python isn't on PATH, or permissions issues), open a terminal in the project folder and run these steps manually:
```
python -m venv .venv
```
Activate the virtual environment:
- **Windows:** `.venv\Scripts\activate`
- **Mac / Linux:** `source .venv/bin/activate`
Then install and run:
```
python install.py
python app.py
```
## GPU Support
### Apple Silicon
On Macs with Apple Silicon, the app automatically uses the **MLX backend**, which runs inference on the Apple GPU and Neural Engine. No additional setup is needed — just install and run. MLX models are downloaded from HuggingFace on first use.
### NVIDIA GPUs
This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.
#### Automatic Detection
The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
```
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
```
**Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.
#### Verifying GPU Support
After installation, you can verify that your GPU is available by running:
```python
import ctranslate2
print(ctranslate2.get_supported_compute_types("cuda"))
```
If this returns a list containing `"float16"`, GPU acceleration is working.
## Usage
1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates. The backend indicator at the bottom shows which inference engine is active (MLX · Apple GPU/NPU, CUDA · GPU, or CPU · int8).
2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
3. Select the language from the dropdown — 99 languages are available, or leave it on "Auto-detect". For English-only models (.en) the language is locked to English; for KB Swedish models it's locked to Swedish.
4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label. A description below shows speed, size, quality stars, and recommended use case for each model.
5. Toggle advanced options if needed: **VAD filter**, **Word-level timestamps**, or **Translate to English**.
6. Click the "Transcribe" button to start the transcription. Use the "Stop" button to cancel at any time.
7. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
8. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
9. Transcriptions are saved as both `.txt` (human-readable) and `.srt` (SubRip subtitles) in the `transcriptions/` folder within the selected directory.
10. You can run the application again or quit at any time by clicking the "Quit" button.
## Jupyter Notebook
Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.
[![DOI](https://zenodo.org/badge/617404576.svg)](https://zenodo.org/badge/latestdoi/617404576)
+406
View File
@@ -0,0 +1,406 @@
import os
import sys
import tkinter as tk
from tkinter import ttk
from tkinter import filedialog
from tkinter import messagebox
from src._LocalTranscribe import transcribe, get_path, detect_backend, _transcribe_worker_process
import multiprocessing as mp
import customtkinter
import threading
# ── Helper: redirect stdout/stderr into a CTkTextbox ──────────────────────
import re
_ANSI_RE = re.compile(r'\x1b\[[0-9;]*m') # strip colour codes
class _ConsoleRedirector:
"""Redirects output exclusively to the in-app console panel."""
def __init__(self, text_widget):
self.widget = text_widget
def write(self, text):
clean = _ANSI_RE.sub('', text) # strip ANSI colours
if clean.strip() == '':
return
# Schedule UI update on the main thread
try:
self.widget.after(0, self._append, clean)
except Exception:
pass
def _append(self, text):
self.widget.configure(state='normal')
self.widget.insert('end', text + ('\n' if not text.endswith('\n') else ''))
self.widget.see('end')
self.widget.configure(state='disabled')
def flush(self):
pass
# HuggingFace model IDs for non-standard models
HF_MODEL_MAP = {
'KB Swedish (tiny)': 'KBLab/kb-whisper-tiny',
'KB Swedish (base)': 'KBLab/kb-whisper-base',
'KB Swedish (small)': 'KBLab/kb-whisper-small',
'KB Swedish (medium)': 'KBLab/kb-whisper-medium',
'KB Swedish (large)': 'KBLab/kb-whisper-large',
}
# Per-model info shown in the UI description label
# (speed, size, quality stars, suggested use)
MODEL_INFO = {
'tiny': ('Very fast', '~75 MB', '★★☆☆☆', 'Quick drafts & testing'),
'tiny.en': ('Very fast', '~75 MB', '★★☆☆☆', 'Quick drafts & testing (English only)'),
'base': ('Fast', '~145 MB', '★★★☆☆', 'Notes & short podcasts'),
'base.en': ('Fast', '~145 MB', '★★★☆☆', 'Notes & short podcasts (English only)'),
'small': ('Balanced', '~485 MB', '★★★★☆', 'Everyday use'),
'small.en': ('Balanced', '~485 MB', '★★★★☆', 'Everyday use (English only)'),
'medium': ('Accurate', '~1.5 GB', '★★★★☆', 'Professional content'),
'medium.en': ('Accurate', '~1.5 GB', '★★★★☆', 'Professional content (English only)'),
'large-v2': ('Slow', '~3 GB', '★★★★★', 'Maximum accuracy'),
'large-v3': ('Slow', '~3 GB', '★★★★★', 'Maximum accuracy (recommended)'),
'KB Swedish (tiny)': ('Very fast', '~75 MB', '★★★☆☆', 'Swedish — optimised by KBLab'),
'KB Swedish (base)': ('Fast', '~145 MB', '★★★☆☆', 'Swedish — optimised by KBLab'),
'KB Swedish (small)': ('Balanced', '~485 MB', '★★★★☆', 'Swedish — optimised by KBLab'),
'KB Swedish (medium)': ('Accurate', '~1.5 GB', '★★★★☆', 'Swedish — optimised by KBLab'),
'KB Swedish (large)': ('Slow', '~3 GB', '★★★★★', 'Swedish — KBLab, best accuracy'),
}
customtkinter.set_appearance_mode("System")
customtkinter.set_default_color_theme("blue") # Themes: blue (default), dark-blue, green
# All languages supported by Whisper (display label → ISO code; None = auto-detect)
WHISPER_LANGUAGES = {
'Auto-detect': None,
'Afrikaans (af)': 'af', 'Albanian (sq)': 'sq',
'Amharic (am)': 'am', 'Arabic (ar)': 'ar',
'Armenian (hy)': 'hy', 'Assamese (as)': 'as',
'Azerbaijani (az)': 'az', 'Bashkir (ba)': 'ba',
'Basque (eu)': 'eu', 'Belarusian (be)': 'be',
'Bengali (bn)': 'bn', 'Bosnian (bs)': 'bs',
'Breton (br)': 'br', 'Bulgarian (bg)': 'bg',
'Catalan (ca)': 'ca', 'Chinese (zh)': 'zh',
'Croatian (hr)': 'hr', 'Czech (cs)': 'cs',
'Danish (da)': 'da', 'Dutch (nl)': 'nl',
'English (en)': 'en', 'Estonian (et)': 'et',
'Faroese (fo)': 'fo', 'Finnish (fi)': 'fi',
'French (fr)': 'fr', 'Galician (gl)': 'gl',
'Georgian (ka)': 'ka', 'German (de)': 'de',
'Greek (el)': 'el', 'Gujarati (gu)': 'gu',
'Haitian Creole (ht)': 'ht', 'Hausa (ha)': 'ha',
'Hawaiian (haw)': 'haw', 'Hebrew (he)': 'he',
'Hindi (hi)': 'hi', 'Hungarian (hu)': 'hu',
'Icelandic (is)': 'is', 'Indonesian (id)': 'id',
'Italian (it)': 'it', 'Japanese (ja)': 'ja',
'Javanese (jw)': 'jw', 'Kannada (kn)': 'kn',
'Kazakh (kk)': 'kk', 'Khmer (km)': 'km',
'Korean (ko)': 'ko', 'Lao (lo)': 'lo',
'Latin (la)': 'la', 'Latvian (lv)': 'lv',
'Lingala (ln)': 'ln', 'Lithuanian (lt)': 'lt',
'Luxembourgish (lb)': 'lb', 'Macedonian (mk)': 'mk',
'Malagasy (mg)': 'mg', 'Malay (ms)': 'ms',
'Malayalam (ml)': 'ml', 'Maltese (mt)': 'mt',
'Maori (mi)': 'mi', 'Marathi (mr)': 'mr',
'Mongolian (mn)': 'mn', 'Myanmar (my)': 'my',
'Nepali (ne)': 'ne', 'Norwegian (no)': 'no',
'Occitan (oc)': 'oc', 'Pashto (ps)': 'ps',
'Persian (fa)': 'fa', 'Polish (pl)': 'pl',
'Portuguese (pt)': 'pt', 'Punjabi (pa)': 'pa',
'Romanian (ro)': 'ro', 'Russian (ru)': 'ru',
'Sanskrit (sa)': 'sa', 'Serbian (sr)': 'sr',
'Shona (sn)': 'sn', 'Sindhi (sd)': 'sd',
'Sinhala (si)': 'si', 'Slovak (sk)': 'sk',
'Slovenian (sl)': 'sl', 'Somali (so)': 'so',
'Spanish (es)': 'es', 'Sundanese (su)': 'su',
'Swahili (sw)': 'sw', 'Swedish (sv)': 'sv',
'Tagalog (tl)': 'tl', 'Tajik (tg)': 'tg',
'Tamil (ta)': 'ta', 'Tatar (tt)': 'tt',
'Telugu (te)': 'te', 'Thai (th)': 'th',
'Tibetan (bo)': 'bo', 'Turkish (tr)': 'tr',
'Turkmen (tk)': 'tk', 'Ukrainian (uk)': 'uk',
'Urdu (ur)': 'ur', 'Uzbek (uz)': 'uz',
'Vietnamese (vi)': 'vi', 'Welsh (cy)': 'cy',
'Yiddish (yi)': 'yi', 'Yoruba (yo)': 'yo',
}
def _language_options_for_model(model_name):
"""Return (values, default, state) for the language combobox given a model name."""
if model_name.endswith('.en'):
return ['English (en)'], 'English (en)', 'disabled'
if model_name.startswith('KB Swedish'):
return ['Swedish (sv)'], 'Swedish (sv)', 'disabled'
return list(WHISPER_LANGUAGES.keys()), 'Auto-detect', 'readonly'
def _set_app_icon(root):
"""Set app icon when supported, without crashing on unsupported platforms."""
base_dir = os.path.dirname(os.path.abspath(__file__))
icon_path = os.path.join(base_dir, "images", "icon.ico")
if not os.path.exists(icon_path):
return
try:
root.iconbitmap(icon_path)
except tk.TclError:
# Some Linux Tk builds don't accept .ico for iconbitmap.
pass
def _apply_display_scaling(root):
"""Auto-scale UI for high-resolution displays (e.g., 4K)."""
try:
screen_w = root.winfo_screenwidth()
screen_h = root.winfo_screenheight()
scale = min(screen_w / 1920.0, screen_h / 1080.0)
scale = max(1.0, min(scale, 2.0))
customtkinter.set_widget_scaling(scale)
customtkinter.set_window_scaling(scale)
except Exception:
pass
class App:
def __init__(self, master):
self.master = master
# Change font
font = ('Roboto', 13, 'bold') # Change the font and size here
font_b = ('Roboto', 12) # Change the font and size here
# Folder Path
path_frame = customtkinter.CTkFrame(master)
path_frame.pack(fill=tk.BOTH, padx=10, pady=10)
customtkinter.CTkLabel(path_frame, text="Folder:", font=font).pack(side=tk.LEFT, padx=5)
self.path_entry = customtkinter.CTkEntry(path_frame, width=50, font=font_b)
self.path_entry.insert(0, os.path.join(os.getcwd(), 'sample_audio'))
self.path_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
customtkinter.CTkButton(path_frame, text="Browse", command=self.browse, font=font).pack(side=tk.LEFT, padx=5)
# Language frame
language_frame = customtkinter.CTkFrame(master)
language_frame.pack(fill=tk.BOTH, padx=10, pady=10)
customtkinter.CTkLabel(language_frame, text="Language:", font=font).pack(side=tk.LEFT, padx=5)
_lang_values, _lang_default, _lang_state = _language_options_for_model('medium')
self.language_combobox = customtkinter.CTkComboBox(
language_frame, width=50, state=_lang_state,
values=_lang_values, font=font_b)
self.language_combobox.set(_lang_default)
self.language_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
# Model frame
models = ['tiny', 'tiny.en', 'base', 'base.en',
'small', 'small.en', 'medium', 'medium.en',
'large-v2', 'large-v3',
'───────────────',
'KB Swedish (tiny)', 'KB Swedish (base)',
'KB Swedish (small)', 'KB Swedish (medium)',
'KB Swedish (large)']
model_frame = customtkinter.CTkFrame(master)
model_frame.pack(fill=tk.BOTH, padx=10, pady=10)
customtkinter.CTkLabel(model_frame, text="Model:", font=font).pack(side=tk.LEFT, padx=5)
# ComboBox frame
self.model_combobox = customtkinter.CTkComboBox(
model_frame, width=50, state="readonly",
values=models, font=font_b,
command=self._on_model_change)
self.model_combobox.set('medium') # Set the default value
self.model_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
# Model description label
self.model_desc_label = customtkinter.CTkLabel(
master, text=self._model_desc_text('medium'),
font=('Roboto', 11), text_color=('#555555', '#aaaaaa'),
anchor='w')
self.model_desc_label.pack(fill=tk.X, padx=14, pady=(0, 4))
# Timestamps toggle
ts_frame = customtkinter.CTkFrame(master)
ts_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.timestamps_var = tk.BooleanVar(value=True)
self.timestamps_switch = customtkinter.CTkSwitch(
ts_frame, text="Include timestamps in transcription",
variable=self.timestamps_var, font=font_b)
self.timestamps_switch.pack(side=tk.LEFT, padx=5)
# Advanced options frame
adv_frame = customtkinter.CTkFrame(master)
adv_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.vad_var = tk.BooleanVar(value=False)
customtkinter.CTkSwitch(
adv_frame, text="VAD filter (remove silence)",
variable=self.vad_var, font=font_b).pack(side=tk.LEFT, padx=5)
self.word_ts_var = tk.BooleanVar(value=False)
customtkinter.CTkSwitch(
adv_frame, text="Word-level timestamps",
variable=self.word_ts_var, font=font_b).pack(side=tk.LEFT, padx=5)
self.translate_var = tk.BooleanVar(value=False)
customtkinter.CTkSwitch(
adv_frame, text="Translate to English",
variable=self.translate_var, font=font_b).pack(side=tk.LEFT, padx=5)
# Progress Bar
self.progress_bar = ttk.Progressbar(master, length=200, mode='indeterminate')
# Worker process handle (replaces thread+stop_event for true immediate cancellation)
self._proc = None
self._parent_conn = None
self._child_conn = None
# Button actions frame
button_frame = customtkinter.CTkFrame(master)
button_frame.pack(fill=tk.BOTH, padx=10, pady=10)
self.transcribe_button = customtkinter.CTkButton(button_frame, text="Transcribe", command=self.start_transcription, font=font)
self.transcribe_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
self.stop_button = customtkinter.CTkButton(
button_frame, text="Stop", command=self._stop_transcription, font=font,
fg_color="#c0392b", hover_color="#922b21", state=tk.DISABLED)
self.stop_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
customtkinter.CTkButton(button_frame, text="Quit", command=master.quit, font=font).pack(side=tk.RIGHT, padx=5, pady=10, fill=tk.X, expand=True)
# ── Embedded console / log panel ──────────────────────────────────
log_label = customtkinter.CTkLabel(master, text="Console output", font=font, anchor='w')
log_label.pack(fill=tk.X, padx=12, pady=(8, 0))
self.log_box = customtkinter.CTkTextbox(master, height=220, font=('Consolas', 14),
wrap='word', state='disabled',
fg_color='#1e1e1e', text_color='#e0e0e0')
self.log_box.pack(fill=tk.BOTH, expand=True, padx=10, pady=(2, 10))
# Redirect stdout & stderr into the log panel (no backend console)
sys.stdout = _ConsoleRedirector(self.log_box)
sys.stderr = _ConsoleRedirector(self.log_box)
# Backend indicator
_bi = detect_backend()
backend_label = customtkinter.CTkLabel(
master,
text=f"Backend: {_bi['label']}",
font=('Roboto', 11),
text_color=("#555555", "#aaaaaa"),
anchor='e',
)
backend_label.pack(fill=tk.X, padx=12, pady=(0, 2))
# Welcome message (shown after redirect so it appears in the panel)
print("Welcome to Local Transcribe with Whisper! \U0001f600")
print("Transcriptions will be saved automatically.")
print("" * 46)
# Helper functions
def _stop_transcription(self):
self.stop_button.configure(state=tk.DISABLED)
if self._proc and self._proc.is_alive():
self._proc.terminate()
try:
self._proc.join(timeout=3)
except Exception:
pass
if self._proc.is_alive():
self._proc.kill()
try:
self._proc.join(timeout=1)
except Exception:
pass
# Close pipe ends — no semaphores, so no leak
for conn in (self._parent_conn, self._child_conn):
try:
if conn:
conn.close()
except Exception:
pass
self._parent_conn = self._child_conn = None
print("⛔ Transcription stopped by user.")
def _model_desc_text(self, model_name):
info = MODEL_INFO.get(model_name)
if not info:
return ''
speed, size, stars, use = info
return f'{stars} {speed} · {size} · {use}'
def _on_model_change(self, selected):
self.model_desc_label.configure(text=self._model_desc_text(selected))
values, default, state = _language_options_for_model(selected)
self.language_combobox.configure(values=values, state=state)
self.language_combobox.set(default)
# Browsing
def browse(self):
initial_dir = os.getcwd()
folder_path = filedialog.askdirectory(initialdir=initial_dir)
self.path_entry.delete(0, tk.END)
self.path_entry.insert(0, folder_path)
# Start transcription
def start_transcription(self):
model_display = self.model_combobox.get()
if model_display.startswith(''):
messagebox.showinfo("Invalid selection", "Please select a model, not the separator line.")
return
self.transcribe_button.configure(state=tk.DISABLED)
self.stop_button.configure(state=tk.NORMAL)
path = self.path_entry.get()
model = HF_MODEL_MAP.get(model_display, model_display)
lang_label = self.language_combobox.get()
language = WHISPER_LANGUAGES.get(lang_label, lang_label) if lang_label else None
timestamps = self.timestamps_var.get()
vad_filter = self.vad_var.get()
word_timestamps = self.word_ts_var.get()
translate = self.translate_var.get()
glob_file = get_path(path)
self.progress_bar.pack(fill=tk.X, padx=5, pady=5)
self.progress_bar.start()
self._parent_conn, self._child_conn = mp.Pipe(duplex=False)
self._proc = mp.Process(
target=_transcribe_worker_process,
args=(self._child_conn, path, glob_file, model, language, True, timestamps),
kwargs={"vad_filter": vad_filter, "word_timestamps": word_timestamps, "translate": translate},
daemon=True,
)
self._proc.start()
self._child_conn.close() # parent doesn't write; close its write-end
self._child_conn = None
self.master.after(100, self._poll_worker)
def _poll_worker(self):
done = False
result = None
try:
while self._parent_conn and self._parent_conn.poll():
msg = self._parent_conn.recv()
if isinstance(msg, tuple) and msg[0] == '__done__':
done = True
result = msg[1]
else:
sys.stdout.write(msg)
sys.stdout.flush()
except EOFError:
# Child closed the pipe (normal completion or kill)
done = True
except Exception:
pass
if done or (self._proc and not self._proc.is_alive()):
if self._parent_conn:
try:
self._parent_conn.close()
except Exception:
pass
self._parent_conn = None
self._on_transcription_done(result)
else:
self.master.after(100, self._poll_worker)
def _on_transcription_done(self, output_text):
self.progress_bar.stop()
self.progress_bar.pack_forget()
self.stop_button.configure(state=tk.DISABLED)
self.transcribe_button.configure(state=tk.NORMAL)
if output_text:
title = "Finished!" if not output_text.startswith('') else "Error"
messagebox.showinfo(title, output_text)
if __name__ == "__main__":
# Setting custom themes
root = customtkinter.CTk()
_apply_display_scaling(root)
root.title("Local Transcribe with Whisper")
# Geometry — taller to accommodate the embedded console panel
width, height = 550, 560
root.geometry('{}x{}'.format(width, height))
root.minsize(450, 480)
# Icon (best-effort; ignored on platforms/builds without .ico support)
_set_app_icon(root)
# Run
app = App(root)
root.mainloop()
+20
View File
@@ -0,0 +1,20 @@
from cx_Freeze import setup, Executable
build_exe_options = {
"packages": ['faster_whisper','tkinter','customtkinter']
}
executables = (
[
Executable(
"app.py",
icon='images/icon.ico',
)
]
)
setup(
name="Local Transcribe with Whisper",
version="2.0",
author="Kristofer Rolf Söderström",
options={"build_exe":build_exe_options},
executables=executables
)
+61 -58
View File
@@ -1,123 +1,125 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "a2cd4050",
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"from transcribe import transcribe"
"# Local Transcribe with Whisper\n",
"## Example"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "24e1d24e",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function transcribe in module transcribe:\n",
"Help on function transcribe in module src._LocalTranscribe:\n",
"\n",
"transcribe(path, file_type, model=None, language=None, verbose=True)\n",
" Implementation of OpenAI's whisper model. Downloads model, transcribes audio files in a folder and returns the text files with transcriptions\n",
"transcribe(path, glob_file, model=None, language=None, verbose=False)\n",
" Transcribes audio files in a specified folder using OpenAI's Whisper model.\n",
" \n",
" Args:\n",
" path (str): Path to the folder containing the audio files.\n",
" glob_file (list): List of audio file paths to transcribe.\n",
" model (str, optional): Name of the Whisper model to use for transcription.\n",
" Defaults to None, which uses the default model.\n",
" language (str, optional): Language code for transcription. Defaults to None,\n",
" which enables automatic language detection.\n",
" verbose (bool, optional): If True, enables verbose mode with detailed information\n",
" during the transcription process. Defaults to False.\n",
" \n",
" Returns:\n",
" str: A message indicating the result of the transcription process.\n",
" \n",
" Raises:\n",
" RuntimeError: If an invalid file is encountered, it will be skipped.\n",
" \n",
" Notes:\n",
" - The function downloads the specified model if not available locally.\n",
" - The transcribed text files will be saved in a \"transcriptions\" folder\n",
" within the specified path.\n",
"\n"
]
}
],
"source": [
"# Import the modules and get the docstring\n",
"from src._LocalTranscribe import transcribe, get_path\n",
"help(transcribe)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e52477fb",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"path='sample_audio/'#folder path\n",
"file_type='ogg' #check your file for file type, will only transcribe files with the file type, 'ogg', 'WAV'\n",
"model='medium' #'small', 'medium', 'large' (tradeoff between speed and accuracy)\n",
"language= None #tries to auto-detect, other options include 'English', 'Spanish', etc...\n",
"verbose = True # prints output while transcribing, False to deactivate"
"# Set the variables\n",
"path='sample_audio/'# Folder path\n",
"model='small' # Model size\n",
"language= None # Preset language, None for automatic detection\n",
"verbose = True # Output transcription in realtime\n",
"\n",
"# Get glob file, additional step for app version.\n",
"\n",
"glob_file = get_path(path)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d66866af",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using medium model, you can change this by specifying model=\"medium\" for example\n",
"Only looking for file type ogg, you can change this by specifying file_type=\"mp3\"\n",
"Expecting None language, you can change this by specifying language=\"English\". None will try to auto-detect\n",
"Verbosity is True. If TRUE it will print out the text as it is transcribed, you can turn this off by setting verbose=False\n",
"\n",
"There are 2 ogg files in path: sample_audio/\n",
"\n",
"\n",
"Loading model...\n",
"Transcribing file number number 1: Armstrong_Small_Step\n",
"Model and file loaded...\n",
"Starting transcription...\n",
"\n",
"Trying to transcribe file named: Armstrong_Small_Step🕐\n",
"Detecting language using up to the first 30 seconds. Use `--language` to specify the language\n",
"Detected language: English\n",
"[00:00.000 --> 00:24.000] That's one small step for man, one giant leap for mankind.\n",
"\n",
"Finished file number 1.\n",
"\n",
"\n",
"\n",
"Transcribing file number number 2: Axel_Pettersson_röstinspelning\n",
"Model and file loaded...\n",
"Starting transcription...\n",
"[00:00.000 --> 00:07.000] I'm going to step off the limb now.\n",
"[00:07.000 --> 00:18.000] That's one small step for man.\n",
"[00:18.000 --> 00:24.000] One giant leap for mankind.\n",
"\n",
"Trying to transcribe file named: Axel_Pettersson_röstinspelning🕐\n",
"Detecting language using up to the first 30 seconds. Use `--language` to specify the language\n",
"Detected language: Swedish\n",
"[00:00.000 --> 00:16.000] Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.\n",
"[00:00.000 --> 00:06.140] Hej, jag heter Axel Pettersson. Jag följer bror 1976.\n",
"[00:06.400 --> 00:15.100] Jag har varit vikerpedjan sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.\n",
"\n",
"Finished file number 2.\n",
"Trying to transcribe file named: readme🕐\n",
"Not a valid file, skipping.\n",
"\n",
"\n",
"\n"
"Trying to transcribe file named: transcriptions🕐\n",
"Not a valid file, skipping.\n"
]
},
{
"data": {
"text/plain": [
"'Finished transcription, files can be found in sample_audio/transcriptions'"
"'Finished transcription, 2 files can be found in sample_audio//transcriptions'"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transcribe(path, file_type, model, language, verbose)"
"# Run the script\n",
"transcribe(path, glob_file, model, language, verbose)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bc67265",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "venv",
"language": "python",
"name": "python3"
},
@@ -132,8 +134,9 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 5
"nbformat_minor": 2
}
Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

+128
View File
@@ -0,0 +1,128 @@
"""
Installer script for Local Transcribe with Whisper.
Detects NVIDIA GPU and offers to install GPU acceleration support.
Usage:
python install.py
"""
import os
import subprocess
import sys
import shutil
import site
def detect_nvidia_gpu():
"""Check if an NVIDIA GPU is present."""
candidates = [
shutil.which("nvidia-smi"),
r"C:\Windows\System32\nvidia-smi.exe",
r"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe",
]
for path in candidates:
if not path or not os.path.isfile(path):
continue
try:
r = subprocess.run(
[path, "--query-gpu=name", "--format=csv,noheader"],
capture_output=True, text=True, timeout=10,
)
if r.returncode == 0 and r.stdout.strip():
return True, r.stdout.strip().split("\n")[0]
except Exception:
continue
return False, None
def pip_install(*packages):
cmd = [sys.executable, "-m", "pip", "install"] + list(packages)
print(f"\n> {' '.join(cmd)}\n")
subprocess.check_call(cmd)
def get_site_packages():
for p in site.getsitepackages():
if p.endswith("site-packages"):
return p
return site.getsitepackages()[0]
def create_nvidia_pth():
"""Create a .pth startup hook that registers NVIDIA DLL directories."""
sp = get_site_packages()
pth_path = os.path.join(sp, "nvidia_cuda_path.pth")
# This one-liner runs at Python startup, before any user code.
pth_content = (
"import os, glob as g; "
"any(os.add_dll_directory(d) or os.environ.__setitem__('PATH', d + os.pathsep + os.environ.get('PATH','')) "
"for d in g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'bin')) "
"+ g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'lib')) "
"if os.path.isdir(d)) if os.name == 'nt' else None\n"
)
with open(pth_path, "w") as f:
f.write(pth_content)
print(f" Created CUDA startup hook: {pth_path}")
def verify_cuda():
"""Verify CUDA works in a fresh subprocess."""
try:
r = subprocess.run(
[sys.executable, "-c",
"import ctranslate2; "
"print('float16' in ctranslate2.get_supported_compute_types('cuda'))"],
capture_output=True, text=True, timeout=30,
)
return r.stdout.strip() == "True"
except Exception:
return False
def main():
print("=" * 55)
print(" Local Transcribe with Whisper — Installer")
print("=" * 55)
# Step 1: Base packages
print("\n[1/2] Installing base requirements...")
pip_install("-r", "requirements.txt")
print("\n Base requirements installed!")
# Step 2: GPU
print("\n[2/2] Checking for NVIDIA GPU...")
has_gpu, gpu_name = detect_nvidia_gpu()
if has_gpu:
print(f"\n NVIDIA GPU detected: {gpu_name}")
print(" GPU acceleration can make transcription 2-5x faster.")
print(" This will install ~300 MB of additional CUDA libraries.\n")
while True:
answer = input(" Install GPU support? [Y/n]: ").strip().lower()
if answer in ("", "y", "yes"):
print("\n Installing CUDA libraries...")
pip_install("nvidia-cublas-cu12", "nvidia-cudnn-cu12")
create_nvidia_pth()
print("\n Verifying CUDA...")
if verify_cuda():
print(" GPU support verified and working!")
else:
print(" WARNING: CUDA installed but not detected.")
print(" Update your NVIDIA drivers and try again.")
break
elif answer in ("n", "no"):
print("\n Skipping GPU. Re-run install.py to add it later.")
break
else:
print(" Please enter Y or N.")
else:
print("\n No NVIDIA GPU detected — using CPU mode.")
print("\n" + "=" * 55)
print(" Done! Run the app with: python app.py")
print("=" * 55)
if __name__ == "__main__":
main()
+3
View File
@@ -0,0 +1,3 @@
faster-whisper
mlx-whisper
customtkinter
+29
View File
@@ -0,0 +1,29 @@
#!/bin/bash
# ============================================================
# Local Transcribe with Whisper — macOS / Linux launcher
# ============================================================
# Double-click this file or run: ./run_Mac.sh
# On first run it creates a venv and installs dependencies.
# ============================================================
set -e
cd "$(dirname "$0")"
# Create .venv if it doesn't exist
if [ ! -f ".venv/bin/python" ]; then
echo "Creating virtual environment..."
python3 -m venv .venv
fi
PYTHON=".venv/bin/python"
# Install dependencies on first run
if ! "$PYTHON" -c "import faster_whisper" 2>/dev/null; then
echo "First run detected — running installer..."
"$PYTHON" install.py
echo
fi
echo "Starting Local Transcribe..."
"$PYTHON" app.py
+23
View File
@@ -0,0 +1,23 @@
@echo off
REM Create .venv on first run if it doesn't exist
if not exist ".venv\Scripts\python.exe" (
echo Creating virtual environment...
python -m venv .venv
if errorlevel 1 (
echo ERROR: Failed to create virtual environment. Is Python installed and on PATH?
pause
exit /b 1
)
)
set PYTHON=.venv\Scripts\python.exe
REM Check if dependencies are installed
%PYTHON% -c "import faster_whisper" 2>nul
if errorlevel 1 (
echo First run detected - running installer...
%PYTHON% install.py
echo.
)
echo Starting Local Transcribe...
%PYTHON% app.py
@@ -1,3 +1,4 @@
Armstrong_Small_Step
In seconds:
[0.00 --> 24.00]: That's one small step for man, one giant leap for mankind.
────────────────────────────────────────
That's one small step for man, one giant leap for mankind.
@@ -1,3 +1,4 @@
Axel_Pettersson_röstinspelning
In seconds:
[0.00 --> 16.00]: Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
────────────────────────────────────────
Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
+487
View File
@@ -0,0 +1,487 @@
import os
import sys
import platform
import datetime
import time
import site
from glob import glob
# ---------------------------------------------------------------------------
# CUDA setup — must happen before importing faster_whisper / ctranslate2
# ---------------------------------------------------------------------------
def _setup_cuda_libs():
"""Register NVIDIA pip-package lib dirs so ctranslate2 finds CUDA at runtime.
pip-installed nvidia-cublas-cu12 / nvidia-cudnn-cu12 place their shared
libraries inside the site-packages tree. Neither Windows nor Linux
automatically search those directories, so we must register them
explicitly:
- Windows: os.add_dll_directory() + PATH
- Linux: LD_LIBRARY_PATH (read by the dynamic linker)
"""
try:
sp_dirs = site.getsitepackages()
except AttributeError:
# virtualenv without site-packages helper
sp_dirs = [os.path.join(sys.prefix, "lib",
"python" + ".".join(map(str, sys.version_info[:2])),
"site-packages")]
for sp in sp_dirs:
nvidia_root = os.path.join(sp, "nvidia")
if not os.path.isdir(nvidia_root):
continue
for pkg in os.listdir(nvidia_root):
for sub in ("bin", "lib"):
d = os.path.join(nvidia_root, pkg, sub)
if not os.path.isdir(d):
continue
if sys.platform == "win32":
os.environ["PATH"] = d + os.pathsep + os.environ.get("PATH", "")
try:
os.add_dll_directory(d)
except (OSError, AttributeError):
pass
else:
# Linux / macOS — prepend to LD_LIBRARY_PATH
ld = os.environ.get("LD_LIBRARY_PATH", "")
if d not in ld:
os.environ["LD_LIBRARY_PATH"] = d + (":" + ld if ld else "")
# Also load via ctypes so already-started process sees it
import ctypes
try:
for so in sorted(os.listdir(d)):
if so.endswith(".so") or ".so." in so:
ctypes.cdll.LoadLibrary(os.path.join(d, so))
except OSError:
pass
_setup_cuda_libs()
from faster_whisper import WhisperModel
SUPPORTED_EXTENSIONS = {
".wav", ".mp3", ".m4a", ".flac", ".ogg", ".wma", ".aac",
".mp4", ".mkv", ".mov", ".webm", ".avi", ".mpeg", ".mpg",
}
# ---------------------------------------------------------------------------
# MLX model map (Apple Silicon only)
# ---------------------------------------------------------------------------
_MLX_MODEL_MAP = {
"tiny": "mlx-community/whisper-tiny-mlx",
"base": "mlx-community/whisper-base-mlx",
"small": "mlx-community/whisper-small-mlx",
"medium": "mlx-community/whisper-medium-mlx",
"large-v2": "mlx-community/whisper-large-v2-mlx",
"large-v3": "mlx-community/whisper-large-v3-mlx",
}
def detect_backend():
"""Return the best available inference backend.
Returns a dict with keys:
backend : "mlx" | "cuda" | "cpu"
device : device string for WhisperModel (cuda / cpu)
compute_type : compute type string for WhisperModel
label : human-readable label for UI display
"""
# Apple Silicon → try MLX (GPU + Neural Engine via Apple MLX)
if sys.platform == "darwin" and platform.machine() == "arm64":
try:
import mlx_whisper # noqa: F401
return {
"backend": "mlx",
"device": "cpu",
"compute_type": "int8",
"label": "MLX · Apple GPU/NPU",
}
except ImportError:
pass
# NVIDIA CUDA
try:
import ctranslate2
cuda_types = ctranslate2.get_supported_compute_types("cuda")
if "float16" in cuda_types:
return {
"backend": "cuda",
"device": "cuda",
"compute_type": "float16",
"label": "CUDA · GPU",
}
except Exception:
pass
return {
"backend": "cpu",
"device": "cpu",
"compute_type": "int8",
"label": "CPU · int8",
}
def _decode_audio_pyav(file_path):
"""Decode any audio/video file to a float32 mono 16 kHz numpy array.
Uses PyAV (bundled FFmpeg) — no external ffmpeg binary required.
Returns (audio_array, duration_seconds).
"""
import av
import numpy as np
with av.open(file_path) as container:
duration = float(container.duration) / 1_000_000 # microseconds → seconds
stream = container.streams.audio[0]
resampler = av.AudioResampler(format="fltp", layout="mono", rate=16000)
chunks = []
for frame in container.decode(stream):
for out in resampler.resample(frame):
if out:
chunks.append(out.to_ndarray()[0])
# Flush resampler
for out in resampler.resample(None):
if out:
chunks.append(out.to_ndarray()[0])
if not chunks:
return np.zeros(0, dtype=np.float32), duration
return np.concatenate(chunks, axis=0), duration
def _transcribe_mlx_file(file, mlx_model_id, language, timestamps, verbose, vad_filter=False, word_timestamps=False, translate=False):
"""Transcribe a single file with mlx-whisper (Apple GPU/NPU).
Decodes audio via PyAV (no system ffmpeg needed), then runs MLX inference.
Returns (segments_as_dicts, audio_duration_seconds).
Segments have dict keys: 'start', 'end', 'text'.
"""
import mlx_whisper
audio_array, duration = _decode_audio_pyav(file)
decode_opts = {}
if language:
decode_opts["language"] = language
if translate:
decode_opts["task"] = "translate"
if word_timestamps:
decode_opts["word_timestamps"] = True
result = mlx_whisper.transcribe(
audio_array,
path_or_hf_repo=mlx_model_id,
verbose=(True if verbose else None),
**decode_opts,
)
segments = result["segments"]
audio_duration = segments[-1]["end"] if segments else duration
return segments, audio_duration
def _srt_timestamp(seconds):
"""Convert seconds (float) to SRT timestamp format HH:MM:SS,mmm."""
ms = round(seconds * 1000)
h, ms = divmod(ms, 3_600_000)
m, ms = divmod(ms, 60_000)
s, ms = divmod(ms, 1000)
return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
# Get the path
def get_path(path):
all_items = glob(path + '/*')
media_files = []
for item in all_items:
if not os.path.isfile(item):
continue
_, ext = os.path.splitext(item)
if ext.lower() in SUPPORTED_EXTENSIONS:
media_files.append(item)
return sorted(media_files)
# Main function
def transcribe(path, glob_file, model=None, language=None, verbose=False, timestamps=True, stop_event=None, vad_filter=False, word_timestamps=False, translate=False):
"""
Transcribes audio files in a specified folder using faster-whisper (CTranslate2).
Args:
path (str): Path to the folder containing the audio files.
glob_file (list): List of audio file paths to transcribe.
model (str, optional): Name of the Whisper model size to use for transcription.
Defaults to None, which uses the default model.
language (str, optional): Language code for transcription. Defaults to None,
which enables automatic language detection.
verbose (bool, optional): If True, enables verbose mode with detailed information
during the transcription process. Defaults to False.
Returns:
str: A message indicating the result of the transcription process.
Raises:
RuntimeError: If an invalid file is encountered, it will be skipped.
Notes:
- The function downloads the specified model if not available locally.
- The transcribed text files will be saved in a "transcriptions" folder
within the specified path.
- Uses CTranslate2 for up to 4x faster inference compared to openai-whisper.
- FFmpeg is bundled via the PyAV dependency — no separate installation needed.
"""
SEP = "" * 46
# ── Step 1: Detect hardware ──────────────────────────────────────
backend_info = detect_backend()
backend = backend_info["backend"]
device = backend_info["device"]
compute_type = backend_info["compute_type"]
print(f"⚙ Backend: {backend_info['label']}")
# ── Step 1b: MLX path (Apple GPU/NPU) ───────────────────────────
if backend == "mlx":
mlx_model_id = _MLX_MODEL_MAP.get(model)
if mlx_model_id is None:
print(f"⚠ Model '{model}' is not available in MLX format.")
print(" Falling back to faster-whisper on CPU (int8).")
backend = "cpu"
device, compute_type = "cpu", "int8"
else:
# ── Step 2 (MLX): load + transcribe ─────────────────────
print(f"⏳ Loading MLX model '{model}' — downloading if needed...")
print("✅ Model ready!")
print(SEP)
total_files = len(glob_file)
print(f"📂 Found {total_files} supported media file(s) in folder")
print(SEP)
if total_files == 0:
output_text = '⚠ No supported media files found — try another folder.'
print(output_text)
print(SEP)
return output_text
files_transcripted = []
file_num = 0
for file in glob_file:
if stop_event and stop_event.is_set():
print("⛔ Transcription stopped by user.")
break
title = os.path.basename(file).split('.')[0]
file_num += 1
print(f"\n{'' * 46}")
print(f"📄 File {file_num}/{total_files}: {title}")
try:
t_start = time.time()
segments, audio_duration = _transcribe_mlx_file(
file, mlx_model_id, language, timestamps, verbose,
vad_filter=vad_filter, word_timestamps=word_timestamps,
translate=translate
)
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
segment_list = []
txt_path = "{}/transcriptions/{}.txt".format(path, title)
srt_path = "{}/transcriptions/{}.srt".format(path, title)
with open(txt_path, 'w', encoding='utf-8') as f, \
open(srt_path, 'w', encoding='utf-8') as srt_f:
f.write(title)
f.write('\n' + '' * 40 + '\n')
for idx, seg in enumerate(segments, start=1):
if stop_event and stop_event.is_set():
break
text = seg["text"].strip()
if timestamps:
start_ts = str(datetime.timedelta(seconds=seg["start"]))
end_ts = str(datetime.timedelta(seconds=seg["end"]))
f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
else:
f.write('\n{}'.format(text))
srt_f.write(f'{idx}\n{_srt_timestamp(seg["start"])} --> {_srt_timestamp(seg["end"])}\n{text}\n\n')
f.flush()
srt_f.flush()
if verbose:
print(" [%.2fs → %.2fs] %s" % (seg["start"], seg["end"], seg["text"]))
else:
print(" Transcribed up to %.0fs..." % seg["end"], end='\r')
segment_list.append(seg)
elapsed = time.time() - t_start
elapsed_min = elapsed / 60.0
audio_min = audio_duration / 60.0
ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
print(f"✅ Done — saved to transcriptions/{title}.txt")
print(f"⏱ Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min ({ratio:.1f}x realtime)")
files_transcripted.append(segment_list)
except Exception as exc:
print(f"⚠ Could not decode '{os.path.basename(file)}', skipping.")
print(f" Reason: {exc}")
print(f"\n{SEP}")
if files_transcripted:
output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n Saved in: {path}/transcriptions"
else:
output_text = '⚠ No files eligible for transcription — try another folder.'
print(output_text)
print(SEP)
return output_text
# ── Step 2: Load model (faster-whisper / CTranslate2) ───────────
print(f"⏳ Loading model '{model}' — downloading if needed...")
try:
whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
except Exception as exc:
err = str(exc).lower()
cuda_runtime_missing = (
device == "cuda"
and (
"libcublas" in err
or "libcudnn" in err
or "cuda" in err
or "cannot be loaded" in err
or "not found" in err
)
)
if not cuda_runtime_missing:
raise
print("⚠ CUDA runtime not available; falling back to CPU (int8).")
print(f" Reason: {exc}")
device, compute_type = "cpu", "int8"
whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
print("✅ Model ready!")
print(SEP)
# ── Step 3: Transcribe files ─────────────────────────────────────
total_files = len(glob_file)
print(f"📂 Found {total_files} supported media file(s) in folder")
print(SEP)
if total_files == 0:
output_text = '⚠ No supported media files found — try another folder.'
print(output_text)
print(SEP)
return output_text
files_transcripted = []
file_num = 0
for file in glob_file:
if stop_event and stop_event.is_set():
print("⛔ Transcription stopped by user.")
break
title = os.path.basename(file).split('.')[0]
file_num += 1
print(f"\n{'' * 46}")
print(f"📄 File {file_num}/{total_files}: {title}")
try:
t_start = time.time()
segments, info = whisper_model.transcribe(
file,
language=language,
beam_size=5,
task="translate" if translate else "transcribe",
vad_filter=vad_filter,
word_timestamps=word_timestamps,
)
audio_duration = info.duration # seconds
# Make folder if missing
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
# Stream segments as they are decoded
segment_list = []
txt_path = "{}/transcriptions/{}.txt".format(path, title)
srt_path = "{}/transcriptions/{}.srt".format(path, title)
with open(txt_path, 'w', encoding='utf-8') as f, \
open(srt_path, 'w', encoding='utf-8') as srt_f:
f.write(title)
f.write('\n' + '' * 40 + '\n')
for idx, seg in enumerate(segments, start=1):
if stop_event and stop_event.is_set():
break
text = seg.text.strip()
if timestamps:
start_ts = str(datetime.timedelta(seconds=seg.start))
end_ts = str(datetime.timedelta(seconds=seg.end))
f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
else:
f.write('\n{}'.format(text))
# Use word-level timestamps for SRT if available
if word_timestamps and hasattr(seg, 'words') and seg.words:
for w_idx, word in enumerate(seg.words, start=1):
w_text = word.word.strip()
if not w_text:
continue
w_start = _srt_timestamp(word.start)
w_end = _srt_timestamp(word.end)
srt_f.write(f'{idx}.{w_idx}\n{w_start} --> {w_end}\n{w_text}\n\n')
else:
srt_f.write(f'{idx}\n{_srt_timestamp(seg.start)} --> {_srt_timestamp(seg.end)}\n{text}\n\n')
f.flush()
srt_f.flush()
if verbose:
print(" [%.2fs → %.2fs] %s" % (seg.start, seg.end, seg.text))
else:
print(" Transcribed up to %.0fs..." % seg.end, end='\r')
segment_list.append(seg)
elapsed = time.time() - t_start
elapsed_min = elapsed / 60.0
audio_min = audio_duration / 60.0
ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
print(f"✅ Done — saved to transcriptions/{title}.txt")
print(f"⏱ Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min ({ratio:.1f}x realtime)")
files_transcripted.append(segment_list)
except Exception as exc:
print(f"⚠ Could not decode '{os.path.basename(file)}', skipping.")
print(f" Reason: {exc}")
# ── Summary ──────────────────────────────────────────────────────
print(f"\n{SEP}")
if len(files_transcripted) > 0:
output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n Saved in: {path}/transcriptions"
else:
output_text = '⚠ No files eligible for transcription — try another folder.'
print(output_text)
print(SEP)
return output_text
def _transcribe_worker_process(conn, path, glob_file, model, language, verbose, timestamps, vad_filter=False, word_timestamps=False, translate=False):
"""Child-process entry point for the UI's multiprocessing backend.
Redirects stdout/stderr → pipe connection so the main process can display
output in the console panel. The main process sends SIGTERM/SIGKILL to
stop this process immediately, including any in-progress download or inference.
"""
import sys
class _PipeWriter:
def __init__(self, c):
self.c = c
def write(self, text):
if text:
try:
self.c.send(text)
except Exception:
pass
def flush(self):
pass
writer = _PipeWriter(conn)
sys.stdout = writer
sys.stderr = writer
result = '⚠ No output produced.'
try:
result = transcribe(path, glob_file, model, language, verbose, timestamps,
vad_filter=vad_filter, word_timestamps=word_timestamps,
translate=translate)
except Exception as exc:
result = f'⚠ Unexpected error: {exc}'
finally:
try:
conn.send(('__done__', result))
except Exception:
pass
conn.close()
-51
View File
@@ -1,51 +0,0 @@
import whisper
import glob, os
def transcribe(path, file_type, model=None, language=None, verbose=True):
'''Implementation of OpenAI's whisper model. Downloads model, transcribes audio files in a folder and returns the text files with transcriptions'''
try:
os.mkdir('{}transcriptions'.format(path))
except FileExistsError:
pass
glob_file = glob.glob(path+'/*{}'.format(file_type))
path = path
print('Using {} model, you can change this by specifying model="medium" for example'.format(model))
print('Only looking for file type {}, you can change this by specifying file_type="mp3"'.format(file_type))
print('Expecting {} language, you can change this by specifying language="English". None will try to auto-detect'.format(language))
print('Verbosity is {}. If TRUE it will print out the text as it is transcribed, you can turn this off by setting verbose=False'.format(verbose))
print('\nThere are {} {} files in path: {}\n\n'.format(len(glob_file), file_type, path))
print('Loading model...')
model = whisper.load_model(model)
for idx,file in enumerate(glob_file):
title = os.path.basename(file).split('.')[0]
print('Transcribing file number number {}: {}'.format(idx+1,title))
print('Model and file loaded...\nStarting transcription...\n')
result = model.transcribe(
file,
language=language,
verbose=True
)
start=[]
end=[]
text=[]
for i in range(len(result['segments'])):
start.append(result['segments'][i]['start'])
end.append(result['segments'][i]['end'])
text.append(result['segments'][i]['text'])
with open("{}transcriptions/{}.txt".format(path,title), 'w', encoding='utf-8') as file:
file.write(title)
file.write('\nIn seconds:')
for i in range(len(result['segments'])):
file.writelines('\n[{:.2f} --> {:.2f}]:{}'.format(start[i], end[i], text[i]))
print('\nFinished file number {}.\n\n\n'.format(idx+1))
return 'Finished transcription, files can be found in {}transcriptions'.format(path)