Compare commits
32 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| f8cf42733d | |||
| 7d3fe1ba26 | |||
| da42a6e4cc | |||
| 0dab0d9bea | |||
| 953c71ab28 | |||
| 5522bdd575 | |||
| 861c470330 | |||
| 6de6d4b2ff | |||
| 01552cc7cb | |||
| 049a168c81 | |||
| 56a925463f | |||
| fe60b04020 | |||
| ff06a257f2 | |||
| 5e31129ea2 | |||
| 3f0bca02b7 | |||
| 488e78a5ae | |||
| 829a054300 | |||
| 462aae12ca | |||
| fec9190ba1 | |||
| 0dde25204d | |||
| b611aa6b8c | |||
| 7d50d5f4cf | |||
| 7799d03960 | |||
| f88186dacc | |||
| 3f5c1491ac | |||
| c83e15bdba | |||
| ff16ad30e1 | |||
| 622165b3e6 | |||
| 0e9cbdca58 | |||
| 87cb509b14 | |||
| ba935cafb7 | |||
| 6497508b7a |
@@ -0,0 +1 @@
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
+26
@@ -0,0 +1,26 @@
|
||||
# Python cache
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# Virtual environments
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
.venv/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Build artifacts
|
||||
dist/
|
||||
build/
|
||||
*.egg-info/
|
||||
+30
-8
@@ -1,9 +1,31 @@
|
||||
### How to run on Mac
|
||||
Unfortunately, I have not found a permament solution for this, not being a Mac user has limited the ways I can test this.
|
||||
#### Recommended steps
|
||||
1. Open a terminal and navigate to the root folder (the downloaded the folder).
|
||||
1. You can also right-click (or equivalent) on the root folder to open a Terminal within the folder.
|
||||
2. Run the following command:
|
||||
### How to run on Mac / Linux
|
||||
|
||||
#### Quick start
|
||||
1. Open Terminal and navigate to the project folder (or right-click the folder and select "Open in Terminal").
|
||||
2. Make the script executable (only needed once):
|
||||
```
|
||||
python main.py
|
||||
```
|
||||
chmod +x run_Mac.sh
|
||||
```
|
||||
3. Run it:
|
||||
```
|
||||
./run_Mac.sh
|
||||
```
|
||||
|
||||
This will automatically:
|
||||
- Create a virtual environment (`.venv`)
|
||||
- Install all dependencies (no admin rights needed)
|
||||
- Launch the app
|
||||
|
||||
#### Manual steps (alternative)
|
||||
If you prefer to do it manually:
|
||||
```
|
||||
python3 -m venv .venv
|
||||
.venv/bin/python install.py
|
||||
.venv/bin/python app.py
|
||||
```
|
||||
|
||||
#### Notes
|
||||
- **Python 3.10+** is required. macOS users can install it from [python.org](https://www.python.org/downloads/) or via `brew install python`.
|
||||
- **No FFmpeg install needed** — audio decoding is bundled.
|
||||
- **GPU acceleration** is not available on macOS (Apple Silicon MPS is not supported by CTranslate2). CPU with int8 quantization is still fast.
|
||||
- On Apple Silicon (M1/M2/M3/M4), the `small` or `base` models run well. `medium` works but is slower.
|
||||
|
||||
@@ -1,19 +1,24 @@
|
||||
## Local Transcribe with Whisper
|
||||
Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
|
||||
|
||||
## New in version 1.2!
|
||||
1. Simpler usage:
|
||||
1. File type: You no longer need to specify file type. The program will only transcribe elligible files.
|
||||
2. Language: Added option to specify language, which might help in some cases. Clear the default text to run automatic language recognition.
|
||||
3. Model selection: Now a dropdown option that includes most models for typical use.
|
||||
2. New and improved GUI.
|
||||

|
||||
3. Executable: On Windows and don't want to install python? Try the Exe file! See below for instructions (Experimental)
|
||||
> **⚠ Note for Mac users (Apple Silicon):** This version uses `faster-whisper` (CTranslate2), which does **not** support Apple M-chip GPU acceleration. Transcription will run on CPU, which is slower than OpenAI's Whisper with Metal/CoreML support. The trade-off is a much simpler installation — no conda, no PyTorch, no admin rights. If you'd prefer M-chip GPU acceleration and don't mind a more involved setup, switch to the **classic** release:
|
||||
> ```
|
||||
> git checkout classic
|
||||
> ```
|
||||
|
||||
Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2). This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
|
||||
|
||||
## New in version 2.0!
|
||||
1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage.
|
||||
2. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
|
||||
3. **No admin rights required** — a plain `pip install` covers everything.
|
||||
4. **No PyTorch dependency** — dramatically smaller install footprint.
|
||||
5. **`tiny` model added** — smallest and fastest option for quick drafts.
|
||||
|
||||
## Features
|
||||
* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
|
||||
* Choose the language of the files you are transcribing. You can either select a specific language or let the application automatically detect the language.
|
||||
* Select the Whisper model to use for the transcription. Available models include "base.en", "base", "small.en", "small", "medium.en", "medium", and "large". Models with .en ending are better if you're transcribing English, especially the base and small models.
|
||||
* Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
|
||||
* **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
|
||||
* Enable the verbose mode to receive detailed information during the transcription process.
|
||||
* Monitor the progress of the transcription with the progress bar and terminal.
|
||||
* Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
|
||||
@@ -21,56 +26,65 @@ Local Transcribe with Whisper is a user-friendly desktop application that allows
|
||||
|
||||
## Installation
|
||||
### Get the files
|
||||
Download the zip folder and extract it to your preferred working folder.
|
||||

|
||||
Download the zip folder and extract it to your preferred working folder.
|
||||

|
||||
Or by cloning the repository with:
|
||||
```
|
||||
git clone https://github.com/soderstromkr/transcribe.git
|
||||
```
|
||||
### Executable Version **(Experimental. Windows only)**
|
||||
The executable version of Local Transcribe with Whisper is a standalone program and should work out of the box. This experimental version is available if you have Windows, and do not have (or don't want to install) python and additional dependencies. However, it requires more disk space (around 1Gb), has no GPU acceleration and has only been lightly tested for bugs, etc. Let me know if you run into any issues!
|
||||
1. Download the project folder. As the image above shows.
|
||||
2. Navigate to build.
|
||||
3. Unzip the folder (get a coffee or a tea, this might take a while depending on your computer)
|
||||
3. Run the executable (app.exe) file.
|
||||
### Python Version **(any platform including Mac users)**
|
||||
This is recommended if you don't have Windows. Have Windows and use python, or want to use GPU acceleration (Pytorch and Cuda) for faster transcriptions. I would generally recommend this method anyway, but I can understand not everyone wants to go through the installation process for Python, Anaconda and the other required packages.
|
||||
1. This script was made and tested in an Anaconda environment with Python 3.10. I recommend this method if you're not familiar with Python.
|
||||
See [here](https://docs.anaconda.com/anaconda/install/index.html) for instructions. You might need administrator rights.
|
||||
2. Whisper requires some additional libraries. The [setup](https://github.com/openai/whisper#setup) page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files."
|
||||
Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmpeg[^1], which takes care of setting up PATH variables. From the anaconda prompt, type or copy the following:
|
||||
1. Install Python 3.10 or later. You can download it from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
|
||||
|
||||
2. Run the installer. Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux) in the project folder and run:
|
||||
```
|
||||
conda install -c conda-forge ffmpeg-python
|
||||
python install.py
|
||||
```
|
||||
3. The main functionality comes from openai-whisper. See their [page](https://github.com/openai/whisper) for details. As of 2023-03-22 you can install via:
|
||||
This will:
|
||||
- Install all required packages (including bundled FFmpeg — no separate install needed)
|
||||
- **Auto-detect your NVIDIA GPU** and ask if you want GPU acceleration
|
||||
- No conda, no admin rights required
|
||||
|
||||
Alternatively, you can install manually with `pip install -r requirements.txt`.
|
||||
|
||||
3. Run the app:
|
||||
1. For **Windows**: double-click `run_Windows.bat` (it will auto-install on first run) or run:
|
||||
```
|
||||
pip install -U openai-whisper
|
||||
python app.py
|
||||
```
|
||||
4. To run the app built on TKinter and TTKthemes. If using these options, make sure they are installed in your Python build. You can install them via pip.
|
||||
2. For **Mac / Linux**: run `./run_Mac.sh` (auto-installs on first run). See [Mac instructions](Mac_instructions.md) for details.
|
||||
|
||||
**Note** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.
|
||||
|
||||
## GPU Support
|
||||
This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.
|
||||
|
||||
### Automatic Detection
|
||||
The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
|
||||
```
|
||||
pip install tkinter
|
||||
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
|
||||
```
|
||||
and
|
||||
|
||||
**Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.
|
||||
|
||||
### Verifying GPU Support
|
||||
After installation, you can verify that your GPU is available by running:
|
||||
```python
|
||||
import ctranslate2
|
||||
print(ctranslate2.get_supported_compute_types("cuda"))
|
||||
```
|
||||
pip install customtkinter
|
||||
```
|
||||
5. Run the app:
|
||||
1. For **Windows**: In the same folder as the *app.py* file, run the app from terminal by running ```python app.py``` or with the batch file called run_Windows.bat (for Windows users), which assumes you have conda installed and in the base environment (This is for simplicity, but users are usually adviced to create an environment, see [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands) for more info) just make sure you have the correct environment (right click on the file and press edit to make any changes). If you want to download a model first, and then go offline for transcription, I recommend running the model with the default sample folder, which will download the model locally.
|
||||
2. For **Mac**: Haven't figured out a better way to do this, see [the instructions here](Mac_instructions.txt)
|
||||
If this returns a list containing `"float16"`, GPU acceleration is working.
|
||||
|
||||
## Usage
|
||||
1. When launched, the app will also open a terminal that shows some additional information.
|
||||
1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates.
|
||||
2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
|
||||
3. Enter the desired language for the transcription in the "Language" field. You can either select a language or leave it blank to enable automatic language detection.
|
||||
4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label.
|
||||
5. Enable the verbose mode by checking the "Verbose" checkbox if you want to receive detailed information during the transcription process.
|
||||
6. Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
|
||||
7. Monitor the progress of the transcription with the progress bar.
|
||||
8. Once the transcription is completed, a message box will appear displaying the transcribed text. Click "OK" to close the message box.
|
||||
9. You can run the application again or quit the application at any time by clicking the "Quit" button.
|
||||
5. Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
|
||||
6. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
|
||||
7. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
|
||||
8. You can run the application again or quit at any time by clicking the "Quit" button.
|
||||
|
||||
## Jupyter Notebook
|
||||
Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.
|
||||
|
||||
[^1]: Advanced users can use ```pip install ffmpeg-python``` but be ready to deal with some [PATH issues](https://stackoverflow.com/questions/65836756/python-ffmpeg-wont-accept-path-why), which I encountered in Windows 11.
|
||||
|
||||
[](https://zenodo.org/badge/latestdoi/617404576)
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
import os
|
||||
import sys
|
||||
import tkinter as tk
|
||||
from tkinter import ttk
|
||||
from tkinter import filedialog
|
||||
@@ -5,9 +7,44 @@ from tkinter import messagebox
|
||||
from src._LocalTranscribe import transcribe, get_path
|
||||
import customtkinter
|
||||
import threading
|
||||
from colorama import Back, Fore
|
||||
import colorama
|
||||
colorama.init(autoreset=True)
|
||||
|
||||
|
||||
# ── Helper: redirect stdout/stderr into a CTkTextbox ──────────────────────
|
||||
import re
|
||||
_ANSI_RE = re.compile(r'\x1b\[[0-9;]*m') # strip colour codes
|
||||
|
||||
class _ConsoleRedirector:
|
||||
"""Redirects output exclusively to the in-app console panel."""
|
||||
def __init__(self, text_widget):
|
||||
self.widget = text_widget
|
||||
|
||||
def write(self, text):
|
||||
clean = _ANSI_RE.sub('', text) # strip ANSI colours
|
||||
if clean.strip() == '':
|
||||
return
|
||||
# Schedule UI update on the main thread
|
||||
try:
|
||||
self.widget.after(0, self._append, clean)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _append(self, text):
|
||||
self.widget.configure(state='normal')
|
||||
self.widget.insert('end', text + ('\n' if not text.endswith('\n') else ''))
|
||||
self.widget.see('end')
|
||||
self.widget.configure(state='disabled')
|
||||
|
||||
def flush(self):
|
||||
pass
|
||||
|
||||
# HuggingFace model IDs for non-standard models
|
||||
HF_MODEL_MAP = {
|
||||
'KB Swedish (tiny)': 'KBLab/kb-whisper-tiny',
|
||||
'KB Swedish (base)': 'KBLab/kb-whisper-base',
|
||||
'KB Swedish (small)': 'KBLab/kb-whisper-small',
|
||||
'KB Swedish (medium)': 'KBLab/kb-whisper-medium',
|
||||
'KB Swedish (large)': 'KBLab/kb-whisper-large',
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -17,7 +54,6 @@ firstclick = True
|
||||
|
||||
class App:
|
||||
def __init__(self, master):
|
||||
print(Back.CYAN + "Welcome to Local Transcribe with Whisper!\U0001f600\nCheck back here to see some output from your transcriptions.\nDon't worry, they will also be saved on the computer!\U0001f64f")
|
||||
self.master = master
|
||||
# Change font
|
||||
font = ('Roboto', 13, 'bold') # Change the font and size here
|
||||
@@ -27,6 +63,7 @@ class App:
|
||||
path_frame.pack(fill=tk.BOTH, padx=10, pady=10)
|
||||
customtkinter.CTkLabel(path_frame, text="Folder:", font=font).pack(side=tk.LEFT, padx=5)
|
||||
self.path_entry = customtkinter.CTkEntry(path_frame, width=50, font=font_b)
|
||||
self.path_entry.insert(0, os.path.join(os.getcwd(), 'sample_audio'))
|
||||
self.path_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
|
||||
customtkinter.CTkButton(path_frame, text="Browse", command=self.browse, font=font).pack(side=tk.LEFT, padx=5)
|
||||
# Language frame
|
||||
@@ -41,12 +78,18 @@ class App:
|
||||
language_frame.pack(fill=tk.BOTH, padx=10, pady=10)
|
||||
customtkinter.CTkLabel(language_frame, text="Language:", font=font).pack(side=tk.LEFT, padx=5)
|
||||
self.language_entry = customtkinter.CTkEntry(language_frame, width=50, font=('Roboto', 12, 'italic'))
|
||||
self.language_entry.insert(0, 'Select language or clear to detect automatically')
|
||||
self.default_language_text = "Enter language (or ignore to auto-detect)"
|
||||
self.language_entry.insert(0, self.default_language_text)
|
||||
self.language_entry.bind('<FocusIn>', on_entry_click)
|
||||
self.language_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
|
||||
# Model frame
|
||||
models = ['base.en', 'base', 'small.en',
|
||||
'small', 'medium.en', 'medium', 'large']
|
||||
models = ['tiny', 'tiny.en', 'base', 'base.en',
|
||||
'small', 'small.en', 'medium', 'medium.en',
|
||||
'large-v2', 'large-v3',
|
||||
'───────────────',
|
||||
'KB Swedish (tiny)', 'KB Swedish (base)',
|
||||
'KB Swedish (small)', 'KB Swedish (medium)',
|
||||
'KB Swedish (large)']
|
||||
model_frame = customtkinter.CTkFrame(master)
|
||||
model_frame.pack(fill=tk.BOTH, padx=10, pady=10)
|
||||
customtkinter.CTkLabel(model_frame, text="Model:", font=font).pack(side=tk.LEFT, padx=5)
|
||||
@@ -54,13 +97,8 @@ class App:
|
||||
self.model_combobox = customtkinter.CTkComboBox(
|
||||
model_frame, width=50, state="readonly",
|
||||
values=models, font=font_b)
|
||||
self.model_combobox.set(models[1]) # Set the default value
|
||||
self.model_combobox.set('medium') # Set the default value
|
||||
self.model_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
|
||||
# Verbose frame
|
||||
verbose_frame = customtkinter.CTkFrame(master)
|
||||
verbose_frame.pack(fill=tk.BOTH, padx=10, pady=10)
|
||||
self.verbose_var = tk.BooleanVar()
|
||||
customtkinter.CTkCheckBox(verbose_frame, text="Output transcription to terminal", variable=self.verbose_var, font=font).pack(side=tk.LEFT, padx=5)
|
||||
# Progress Bar
|
||||
self.progress_bar = ttk.Progressbar(master, length=200, mode='indeterminate')
|
||||
# Button actions frame
|
||||
@@ -69,10 +107,28 @@ class App:
|
||||
self.transcribe_button = customtkinter.CTkButton(button_frame, text="Transcribe", command=self.start_transcription, font=font)
|
||||
self.transcribe_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
|
||||
customtkinter.CTkButton(button_frame, text="Quit", command=master.quit, font=font).pack(side=tk.RIGHT, padx=5, pady=10, fill=tk.X, expand=True)
|
||||
|
||||
# ── Embedded console / log panel ──────────────────────────────────
|
||||
log_label = customtkinter.CTkLabel(master, text="Console output", font=font, anchor='w')
|
||||
log_label.pack(fill=tk.X, padx=12, pady=(8, 0))
|
||||
self.log_box = customtkinter.CTkTextbox(master, height=220, font=('Consolas', 14),
|
||||
wrap='word', state='disabled',
|
||||
fg_color='#1e1e1e', text_color='#e0e0e0')
|
||||
self.log_box.pack(fill=tk.BOTH, expand=True, padx=10, pady=(2, 10))
|
||||
|
||||
# Redirect stdout & stderr into the log panel (no backend console)
|
||||
sys.stdout = _ConsoleRedirector(self.log_box)
|
||||
sys.stderr = _ConsoleRedirector(self.log_box)
|
||||
|
||||
# Welcome message (shown after redirect so it appears in the panel)
|
||||
print("Welcome to Local Transcribe with Whisper! \U0001f600")
|
||||
print("Transcriptions will be saved automatically.")
|
||||
print("─" * 46)
|
||||
# Helper functions
|
||||
# Browsing
|
||||
def browse(self):
|
||||
folder_path = filedialog.askdirectory()
|
||||
initial_dir = os.getcwd()
|
||||
folder_path = filedialog.askdirectory(initialdir=initial_dir)
|
||||
self.path_entry.delete(0, tk.END)
|
||||
self.path_entry.insert(0, folder_path)
|
||||
# Start transcription
|
||||
@@ -84,30 +140,36 @@ class App:
|
||||
# Threading
|
||||
def transcribe_thread(self):
|
||||
path = self.path_entry.get()
|
||||
model = self.model_combobox.get()
|
||||
language = self.language_entry.get() or None
|
||||
verbose = self.verbose_var.get()
|
||||
model_display = self.model_combobox.get()
|
||||
# Ignore the visual separator
|
||||
if model_display.startswith('─'):
|
||||
messagebox.showinfo("Invalid selection", "Please select a model, not the separator line.")
|
||||
self.transcribe_button.configure(state=tk.NORMAL)
|
||||
return
|
||||
model = HF_MODEL_MAP.get(model_display, model_display)
|
||||
language = self.language_entry.get()
|
||||
# Auto-set Swedish for KB models
|
||||
is_kb_model = model_display.startswith('KB Swedish')
|
||||
# Check if the language field has the default text or is empty
|
||||
if is_kb_model:
|
||||
language = 'sv'
|
||||
elif language == self.default_language_text or not language.strip():
|
||||
language = None # This is the same as passing nothing
|
||||
verbose = True # always show transcription progress in the console panel
|
||||
# Show progress bar
|
||||
self.progress_bar.pack(fill=tk.X, padx=5, pady=5)
|
||||
self.progress_bar.start()
|
||||
# Setting path and files
|
||||
glob_file = get_path(path)
|
||||
info_path = 'I will transcribe all eligible audio/video files in the path: {}\n\nContinue?'.format(path)
|
||||
answer = messagebox.askyesno("Confirmation", info_path)
|
||||
if not answer:
|
||||
self.progress_bar.stop()
|
||||
self.progress_bar.pack_forget()
|
||||
self.transcribe_button.configure(state=tk.NORMAL)
|
||||
return
|
||||
#messagebox.showinfo("Message", "Starting transcription!")
|
||||
# Start transcription
|
||||
error_language = 'https://github.com/openai/whisper#available-models-and-languages'
|
||||
try:
|
||||
output_text = transcribe(path, glob_file, model, language, verbose)
|
||||
except UnboundLocalError:
|
||||
messagebox.showinfo("Files not found error!", 'Nothing found, choose another folder.')
|
||||
pass
|
||||
except ValueError:
|
||||
messagebox.showinfo("Language error!", 'See {} for supported languages'.format(error_language))
|
||||
messagebox.showinfo("Invalid language name, you might have to clear the default text to continue!")
|
||||
# Hide progress bar
|
||||
self.progress_bar.stop()
|
||||
self.progress_bar.pack_forget()
|
||||
@@ -123,9 +185,10 @@ if __name__ == "__main__":
|
||||
# Setting custom themes
|
||||
root = customtkinter.CTk()
|
||||
root.title("Local Transcribe with Whisper")
|
||||
# Geometry
|
||||
width,height = 450,275
|
||||
root.geometry('{}x{}'.format(width,height))
|
||||
# Geometry — taller to accommodate the embedded console panel
|
||||
width, height = 550, 560
|
||||
root.geometry('{}x{}'.format(width, height))
|
||||
root.minsize(450, 480)
|
||||
# Icon
|
||||
root.iconbitmap('images/icon.ico')
|
||||
# Run
|
||||
|
||||
+2
-2
@@ -1,7 +1,7 @@
|
||||
from cx_Freeze import setup, Executable
|
||||
|
||||
build_exe_options = {
|
||||
"packages": ['whisper','tkinter','customtkinter']
|
||||
"packages": ['faster_whisper','tkinter','customtkinter']
|
||||
}
|
||||
executables = (
|
||||
[
|
||||
@@ -13,7 +13,7 @@ executables = (
|
||||
)
|
||||
setup(
|
||||
name="Local Transcribe with Whisper",
|
||||
version="1.2",
|
||||
version="2.0",
|
||||
author="Kristofer Rolf Söderström",
|
||||
options={"build_exe":build_exe_options},
|
||||
executables=executables
|
||||
|
||||
+128
@@ -0,0 +1,128 @@
|
||||
"""
|
||||
Installer script for Local Transcribe with Whisper.
|
||||
Detects NVIDIA GPU and offers to install GPU acceleration support.
|
||||
|
||||
Usage:
|
||||
python install.py
|
||||
"""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import shutil
|
||||
import site
|
||||
|
||||
|
||||
def detect_nvidia_gpu():
|
||||
"""Check if an NVIDIA GPU is present."""
|
||||
candidates = [
|
||||
shutil.which("nvidia-smi"),
|
||||
r"C:\Windows\System32\nvidia-smi.exe",
|
||||
r"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe",
|
||||
]
|
||||
for path in candidates:
|
||||
if not path or not os.path.isfile(path):
|
||||
continue
|
||||
try:
|
||||
r = subprocess.run(
|
||||
[path, "--query-gpu=name", "--format=csv,noheader"],
|
||||
capture_output=True, text=True, timeout=10,
|
||||
)
|
||||
if r.returncode == 0 and r.stdout.strip():
|
||||
return True, r.stdout.strip().split("\n")[0]
|
||||
except Exception:
|
||||
continue
|
||||
return False, None
|
||||
|
||||
|
||||
def pip_install(*packages):
|
||||
cmd = [sys.executable, "-m", "pip", "install"] + list(packages)
|
||||
print(f"\n> {' '.join(cmd)}\n")
|
||||
subprocess.check_call(cmd)
|
||||
|
||||
|
||||
def get_site_packages():
|
||||
for p in site.getsitepackages():
|
||||
if p.endswith("site-packages"):
|
||||
return p
|
||||
return site.getsitepackages()[0]
|
||||
|
||||
|
||||
def create_nvidia_pth():
|
||||
"""Create a .pth startup hook that registers NVIDIA DLL directories."""
|
||||
sp = get_site_packages()
|
||||
pth_path = os.path.join(sp, "nvidia_cuda_path.pth")
|
||||
# This one-liner runs at Python startup, before any user code.
|
||||
pth_content = (
|
||||
"import os, glob as g; "
|
||||
"any(os.add_dll_directory(d) or os.environ.__setitem__('PATH', d + os.pathsep + os.environ.get('PATH','')) "
|
||||
"for d in g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'bin')) "
|
||||
"+ g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'lib')) "
|
||||
"if os.path.isdir(d)) if os.name == 'nt' else None\n"
|
||||
)
|
||||
with open(pth_path, "w") as f:
|
||||
f.write(pth_content)
|
||||
print(f" Created CUDA startup hook: {pth_path}")
|
||||
|
||||
|
||||
def verify_cuda():
|
||||
"""Verify CUDA works in a fresh subprocess."""
|
||||
try:
|
||||
r = subprocess.run(
|
||||
[sys.executable, "-c",
|
||||
"import ctranslate2; "
|
||||
"print('float16' in ctranslate2.get_supported_compute_types('cuda'))"],
|
||||
capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
return r.stdout.strip() == "True"
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
print("=" * 55)
|
||||
print(" Local Transcribe with Whisper — Installer")
|
||||
print("=" * 55)
|
||||
|
||||
# Step 1: Base packages
|
||||
print("\n[1/2] Installing base requirements...")
|
||||
pip_install("-r", "requirements.txt")
|
||||
print("\n Base requirements installed!")
|
||||
|
||||
# Step 2: GPU
|
||||
print("\n[2/2] Checking for NVIDIA GPU...")
|
||||
has_gpu, gpu_name = detect_nvidia_gpu()
|
||||
|
||||
if has_gpu:
|
||||
print(f"\n NVIDIA GPU detected: {gpu_name}")
|
||||
print(" GPU acceleration can make transcription 2-5x faster.")
|
||||
print(" This will install ~300 MB of additional CUDA libraries.\n")
|
||||
|
||||
while True:
|
||||
answer = input(" Install GPU support? [Y/n]: ").strip().lower()
|
||||
if answer in ("", "y", "yes"):
|
||||
print("\n Installing CUDA libraries...")
|
||||
pip_install("nvidia-cublas-cu12", "nvidia-cudnn-cu12")
|
||||
create_nvidia_pth()
|
||||
print("\n Verifying CUDA...")
|
||||
if verify_cuda():
|
||||
print(" GPU support verified and working!")
|
||||
else:
|
||||
print(" WARNING: CUDA installed but not detected.")
|
||||
print(" Update your NVIDIA drivers and try again.")
|
||||
break
|
||||
elif answer in ("n", "no"):
|
||||
print("\n Skipping GPU. Re-run install.py to add it later.")
|
||||
break
|
||||
else:
|
||||
print(" Please enter Y or N.")
|
||||
else:
|
||||
print("\n No NVIDIA GPU detected — using CPU mode.")
|
||||
|
||||
print("\n" + "=" * 55)
|
||||
print(" Done! Run the app with: python app.py")
|
||||
print("=" * 55)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,2 @@
|
||||
faster-whisper
|
||||
customtkinter
|
||||
+29
@@ -0,0 +1,29 @@
|
||||
#!/bin/bash
|
||||
# ============================================================
|
||||
# Local Transcribe with Whisper — macOS / Linux launcher
|
||||
# ============================================================
|
||||
# Double-click this file or run: ./run_Mac.sh
|
||||
# On first run it creates a venv and installs dependencies.
|
||||
# ============================================================
|
||||
|
||||
set -e
|
||||
|
||||
cd "$(dirname "$0")"
|
||||
|
||||
# Create .venv if it doesn't exist
|
||||
if [ ! -f ".venv/bin/python" ]; then
|
||||
echo "Creating virtual environment..."
|
||||
python3 -m venv .venv
|
||||
fi
|
||||
|
||||
PYTHON=".venv/bin/python"
|
||||
|
||||
# Install dependencies on first run
|
||||
if ! "$PYTHON" -c "import faster_whisper" 2>/dev/null; then
|
||||
echo "First run detected — running installer..."
|
||||
"$PYTHON" install.py
|
||||
echo
|
||||
fi
|
||||
|
||||
echo "Starting Local Transcribe..."
|
||||
"$PYTHON" app.py
|
||||
+22
-4
@@ -1,5 +1,23 @@
|
||||
@echo off
|
||||
echo Starting...
|
||||
call conda activate base
|
||||
REM OPTION 2 : (KEEP TEXT WITHIN QUOTES AND CHANGE USERNAME) "C:/Users/user/Anaconda3/condabin/activate.bat"
|
||||
call python app.py
|
||||
REM Create .venv on first run if it doesn't exist
|
||||
if not exist ".venv\Scripts\python.exe" (
|
||||
echo Creating virtual environment...
|
||||
python -m venv .venv
|
||||
if errorlevel 1 (
|
||||
echo ERROR: Failed to create virtual environment. Is Python installed and on PATH?
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
)
|
||||
|
||||
set PYTHON=.venv\Scripts\python.exe
|
||||
|
||||
REM Check if dependencies are installed
|
||||
%PYTHON% -c "import faster_whisper" 2>nul
|
||||
if errorlevel 1 (
|
||||
echo First run detected - running installer...
|
||||
%PYTHON% install.py
|
||||
echo.
|
||||
)
|
||||
echo Starting Local Transcribe...
|
||||
%PYTHON% app.py
|
||||
@@ -1,4 +1,2 @@
|
||||
Armstrong_Small_Step
|
||||
[0:00:00 --> 0:00:07]: And they're still brought to land now.
|
||||
[0:00:07 --> 0:00:18]: It's one small step for man.
|
||||
[0:00:18 --> 0:00:23]: One by a fleet for man time.
|
||||
[0:00:00 --> 0:00:07]: That's one small step for man, one giant leap for mankind.
|
||||
@@ -1,4 +1,2 @@
|
||||
Axel_Pettersson_röstinspelning
|
||||
[0:00:00 --> 0:00:06]: Hej, jag heter Raxel Patterson, jag får att se över UR 1976.
|
||||
[0:00:06 --> 0:00:12.540000]: Jag har varit Wikipedia-périonsen 2018 och jag har översat röst-intro-
|
||||
[0:00:12.540000 --> 0:00:15.540000]:-projektet till svenska.
|
||||
[0:00:00 --> 0:00:15]: Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
|
||||
+109
-50
@@ -1,11 +1,56 @@
|
||||
import os
|
||||
import sys
|
||||
import datetime
|
||||
import site
|
||||
from glob import glob
|
||||
import whisper
|
||||
from torch import cuda, Generator
|
||||
import colorama
|
||||
from colorama import Back,Fore
|
||||
colorama.init(autoreset=True)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CUDA setup — must happen before importing faster_whisper / ctranslate2
|
||||
# ---------------------------------------------------------------------------
|
||||
def _setup_cuda_dlls():
|
||||
"""Add NVIDIA pip-package DLL dirs to the DLL search path (Windows only).
|
||||
|
||||
pip-installed nvidia-cublas-cu12 / nvidia-cudnn-cu12 place their .dll
|
||||
files inside the site-packages tree. Python 3.8+ on Windows does NOT
|
||||
search PATH for DLLs loaded via ctypes/LoadLibrary, so we must
|
||||
explicitly register every nvidia/*/bin and nvidia/*/lib directory using
|
||||
os.add_dll_directory *and* prepend them to PATH (some native extensions
|
||||
still rely on PATH).
|
||||
"""
|
||||
if sys.platform != "win32":
|
||||
return
|
||||
try:
|
||||
for sp in site.getsitepackages():
|
||||
nvidia_root = os.path.join(sp, "nvidia")
|
||||
if not os.path.isdir(nvidia_root):
|
||||
continue
|
||||
for pkg in os.listdir(nvidia_root):
|
||||
for sub in ("bin", "lib"):
|
||||
d = os.path.join(nvidia_root, pkg, sub)
|
||||
if os.path.isdir(d):
|
||||
os.environ["PATH"] = d + os.pathsep + os.environ.get("PATH", "")
|
||||
try:
|
||||
os.add_dll_directory(d)
|
||||
except (OSError, AttributeError):
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
_setup_cuda_dlls()
|
||||
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
|
||||
def _detect_device():
|
||||
"""Return (device, compute_type) for the best available backend."""
|
||||
try:
|
||||
import ctranslate2
|
||||
cuda_types = ctranslate2.get_supported_compute_types("cuda")
|
||||
if "float16" in cuda_types:
|
||||
return "cuda", "float16"
|
||||
except Exception:
|
||||
pass
|
||||
return "cpu", "int8"
|
||||
|
||||
|
||||
# Get the path
|
||||
@@ -16,12 +61,12 @@ def get_path(path):
|
||||
# Main function
|
||||
def transcribe(path, glob_file, model=None, language=None, verbose=False):
|
||||
"""
|
||||
Transcribes audio files in a specified folder using OpenAI's Whisper model.
|
||||
Transcribes audio files in a specified folder using faster-whisper (CTranslate2).
|
||||
|
||||
Args:
|
||||
path (str): Path to the folder containing the audio files.
|
||||
glob_file (list): List of audio file paths to transcribe.
|
||||
model (str, optional): Name of the Whisper model to use for transcription.
|
||||
model (str, optional): Name of the Whisper model size to use for transcription.
|
||||
Defaults to None, which uses the default model.
|
||||
language (str, optional): Language code for transcription. Defaults to None,
|
||||
which enables automatic language detection.
|
||||
@@ -38,53 +83,67 @@ def transcribe(path, glob_file, model=None, language=None, verbose=False):
|
||||
- The function downloads the specified model if not available locally.
|
||||
- The transcribed text files will be saved in a "transcriptions" folder
|
||||
within the specified path.
|
||||
- Uses CTranslate2 for up to 4x faster inference compared to openai-whisper.
|
||||
- FFmpeg is bundled via the PyAV dependency — no separate installation needed.
|
||||
|
||||
"""
|
||||
# Check for GPU acceleration
|
||||
if cuda.is_available():
|
||||
Generator('cuda').manual_seed(42)
|
||||
else:
|
||||
Generator().manual_seed(42)
|
||||
# Load model
|
||||
model = whisper.load_model(model)
|
||||
# Start main loop
|
||||
files_transcripted=[]
|
||||
"""
|
||||
SEP = "─" * 46
|
||||
|
||||
# ── Step 1: Detect hardware ──────────────────────────────────────
|
||||
device, compute_type = _detect_device()
|
||||
print(f"⚙ Device: {device} | Compute: {compute_type}")
|
||||
|
||||
# ── Step 2: Load model ───────────────────────────────────────────
|
||||
print(f"⏳ Loading model '{model}' — downloading if needed...")
|
||||
whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
|
||||
print("✅ Model ready!")
|
||||
print(SEP)
|
||||
|
||||
# ── Step 3: Transcribe files ─────────────────────────────────────
|
||||
total_files = len(glob_file)
|
||||
print(f"📂 Found {total_files} item(s) in folder")
|
||||
print(SEP)
|
||||
|
||||
files_transcripted = []
|
||||
file_num = 0
|
||||
for file in glob_file:
|
||||
title = os.path.basename(file).split('.')[0]
|
||||
print(Back.CYAN + '\nTrying to transcribe file named: {}\U0001f550'.format(title))
|
||||
file_num += 1
|
||||
print(f"\n{'─' * 46}")
|
||||
print(f"📄 File {file_num}/{total_files}: {title}")
|
||||
try:
|
||||
result = model.transcribe(
|
||||
file,
|
||||
language=language,
|
||||
verbose=verbose
|
||||
)
|
||||
files_transcripted.append(result)
|
||||
# Make folder if missing
|
||||
try:
|
||||
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
|
||||
except FileExistsError:
|
||||
pass
|
||||
# Create segments for text files
|
||||
start = []
|
||||
end = []
|
||||
text = []
|
||||
for segment in result['segments']:
|
||||
start.append(str(datetime.timedelta(seconds=segment['start'])))
|
||||
end.append(str(datetime.timedelta(seconds=segment['end'])))
|
||||
text.append(segment['text'])
|
||||
# Save files to transcriptions folder
|
||||
with open("{}/transcriptions/{}.txt".format(path, title), 'w', encoding='utf-8') as file:
|
||||
file.write(title)
|
||||
for i in range(len(result['segments'])):
|
||||
file.write('\n[{} --> {}]:{}'.format(start[i], end[i], text[i]))
|
||||
# Skip invalid files
|
||||
except RuntimeError:
|
||||
print(Fore.RED + 'Not a valid file, skipping.')
|
||||
pass
|
||||
# Check if any files were processed.
|
||||
segments, info = whisper_model.transcribe(
|
||||
file,
|
||||
language=language,
|
||||
beam_size=5
|
||||
)
|
||||
# Make folder if missing
|
||||
os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
|
||||
# Stream segments as they are decoded
|
||||
segment_list = []
|
||||
with open("{}/transcriptions/{}.txt".format(path, title), 'w', encoding='utf-8') as f:
|
||||
f.write(title)
|
||||
for seg in segments:
|
||||
start_ts = str(datetime.timedelta(seconds=seg.start))
|
||||
end_ts = str(datetime.timedelta(seconds=seg.end))
|
||||
f.write('\n[{} --> {}]:{}'.format(start_ts, end_ts, seg.text))
|
||||
f.flush()
|
||||
if verbose:
|
||||
print(" [%.2fs → %.2fs] %s" % (seg.start, seg.end, seg.text))
|
||||
else:
|
||||
print(" Transcribed up to %.0fs..." % seg.end, end='\r')
|
||||
segment_list.append(seg)
|
||||
print(f"✅ Done — saved to transcriptions/{title}.txt")
|
||||
files_transcripted.append(segment_list)
|
||||
except Exception:
|
||||
print('⚠ Not a valid audio/video file, skipping.')
|
||||
|
||||
# ── Summary ──────────────────────────────────────────────────────
|
||||
print(f"\n{SEP}")
|
||||
if len(files_transcripted) > 0:
|
||||
output_text = 'Finished transcription, {} files can be found in {}/transcriptions'.format(len(files_transcripted), path)
|
||||
output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n Saved in: {path}/transcriptions"
|
||||
else:
|
||||
output_text = 'No files elligible for transcription, try adding audio or video files to this folder or choose another folder!'
|
||||
# Return output text
|
||||
output_text = '⚠ No files eligible for transcription — try another folder.'
|
||||
print(output_text)
|
||||
print(SEP)
|
||||
return output_text
|
||||
|
||||
Reference in New Issue
Block a user