Merge pull request #11 from soderstromkr/copilot/update-whisper-device-parameter

Pass explicit device parameter to whisper.load_model() for MPS acceleration
Add .gitignore and remove __pycache__ files
2026-01-22 14:03:13 +01:00 · 2026-01-22 13:00:38 +00:00 · 2026-01-22 13:00:21 +00:00 · 2026-01-22 12:57:09 +00:00 · 2026-01-22 13:53:23 +01:00 · 2026-01-22 13:44:11 +01:00
7 changed files with 86 additions and 27 deletions
@@ -0,0 +1,25 @@
 # Python cache
 __pycache__/
 *.py[cod]
 *$py.class
 # Virtual environments
 venv/
 env/
 ENV/
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # OS
 .DS_Store
 Thumbs.db
 # Build artifacts
 dist/
 build/
 *.egg-info/
@@ -27,29 +27,52 @@ Or by cloning the repository with:
 git clone https://github.com/soderstromkr/transcribe.git
 ```
 ### Python Version **(any platform including Mac users)**
-This is recommended if you don't have Windows. Have Windows and use python, or want to use GPU acceleration (Pytorch and Cuda) for faster transcriptions. I would generally recommend this method anyway, but I can understand not everyone wants to go through the installation process for Python, Anaconda and the other required packages. 
+1. This script was made and tested in an Anaconda environment with Python 3.10. I recommend miniconda for a smaller installation, and if you're not familiar with Python.
-1. This script was made and tested in an Anaconda environment with Python 3.10. I recommend this method if you're not familiar with Python.
+See [here](https://docs.anaconda.com/free/miniconda/miniconda-install/) for instructions. You will **need administrator rights**. 
-See [here](https://docs.anaconda.com/anaconda/install/index.html) for instructions. You might need administrator rights. 
+2. Whisper also requires some additional libraries. The [setup](https://github.com/openai/whisper#setup) page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files."
-2. Whisper requires some additional libraries. The [setup](https://github.com/openai/whisper#setup) page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files."
+Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmpeg[^1], which takes care of setting up PATH variables.
-Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmpeg[^1], which takes care of setting up PATH variables. From the anaconda prompt, type or copy the following:
+
 From the Anaconda Prompt (which should now be installed in your system, find it with the search function), type or copy the following:
 ```
 conda install -c conda-forge ffmpeg-python
 ```
-3. The main functionality comes from openai-whisper. See their [page](https://github.com/openai/whisper) for details. As of 2023-03-22 you can install via:
+You can also choose not to use Anaconda (or miniconda), and use Python. In that case, you need to [download and install FFMPEG](https://ffmpeg.org/download.html) (and potentially add it to your PATH). See here for [WikiHow instructions](https://www.wikihow.com/Install-FFmpeg-on-Windows)
 3. The main functionality comes from openai-whisper. See their [page](https://github.com/openai/whisper) for details. It also uses some additional packages (colorama, and customtkinter), install them with the following command:
 ```
-pip install -U openai-whisper
+pip install -r requirements.txt
 ```
-4. To run the app built on TKinter and TTKthemes. If using these options, make sure they are installed in your Python build. You can install them and colorama via pip.
+4. Run the app: 
    1. For **Windows**: In the same folder as the *app.py* file, run the app from Anaconda prompt by running
 ```python app.py```
 or with the batch file called run_Windows.bat (for Windows users), which assumes you have conda installed and in the base environment (This is for simplicity, but users are usually adviced to create an environment, see [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands) for more info) just make sure you have the correct environment (right click on the file and press edit to make any changes). 
    3. For **Mac**: Haven't figured out a better way to do this, see [the instructions here](Mac_instructions.md)
    **Note** If you want to download a model first, and then go offline for transcription, I recommend running the model with the default sample folder, which will download the model locally. 
 ## GPU Support
 This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. To use GPU acceleration, you need to have the correct version of PyTorch installed with CUDA support.
 ### Installing PyTorch with CUDA Support
 If you have an NVIDIA GPU and want to take advantage of GPU acceleration, you can install a CUDA-enabled version of PyTorch using:
 ```
-pip install colorama
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 ```
-and
+
 **Note:** The command above installs PyTorch with CUDA 12.1 support. Make sure your NVIDIA GPU drivers are compatible with CUDA 12.1. You can check your CUDA version by running `nvidia-smi` in your terminal.
 If you need a different CUDA version, visit the [PyTorch installation page](https://pytorch.org/get-started/locally/) to generate the appropriate installation command for your system.
 ### Verifying GPU Support
 After installation, you can verify that PyTorch can detect your GPU by running:
 ```python
 import torch
 print(torch.cuda.is_available())  # Should print True if GPU is available
 print(torch.cuda.get_device_name(0))  # Should print your GPU name
 ```
-pip install customtkinter 
+
-```
+If GPU is not detected, the program will automatically fall back to CPU processing, though this will be slower.
-5. Run the app: 
+
    1. For **Windows**: In the same folder as the *app.py* file, run the app from terminal by running ```python app.py``` or with the batch file called run_Windows.bat (for Windows users), which assumes you have conda installed and in the base environment (This is for simplicity, but users are usually adviced to create an environment, see [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands) for more info) just make sure you have the correct environment (right click on the file and press edit to make any changes). If you want to download a model first, and then go offline for transcription, I recommend running the model with the default sample folder, which will download the model locally. 
    2. For **Mac**: Haven't figured out a better way to do this, see [the instructions here](Mac_instructions.md)
 ## Usage
 1. When launched, the app will also open a terminal that shows some additional information.
 2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
@@ -42,7 +42,8 @@ class App:
        language_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        customtkinter.CTkLabel(language_frame, text="Language:", font=font).pack(side=tk.LEFT, padx=5)
        self.language_entry = customtkinter.CTkEntry(language_frame, width=50, font=('Roboto', 12, 'italic'))
-        self.language_entry.insert(0, 'Select language or clear to detect automatically')
+        self.default_language_text = "Enter language (or ignore to auto-detect)"
        self.language_entry.insert(0, self.default_language_text)
        self.language_entry.bind('<FocusIn>', on_entry_click)
        self.language_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
        # Model frame
@@ -87,14 +88,17 @@ class App:
    def transcribe_thread(self):
        path = self.path_entry.get()
        model = self.model_combobox.get()
-        language = self.language_entry.get() or None
+        language = self.language_entry.get()
        # Check if the language field has the default text or is empty
        if language == self.default_language_text or not language.strip():
            language = None  # This is the same as passing nothing
        verbose = self.verbose_var.get()
        # Show progress bar
        self.progress_bar.pack(fill=tk.X, padx=5, pady=5)
        self.progress_bar.start()
        # Setting path and files
        glob_file = get_path(path)
-        messagebox.showinfo("Message", "Starting transcription!")
+        #messagebox.showinfo("Message", "Starting transcription!")
        # Start transcription
        try:
            output_text = transcribe(path, glob_file, model, language, verbose)
@@ -0,0 +1,3 @@
 openai-whisper
 customtkinter
 colorama
@@ -1,4 +1,2 @@
 Armstrong_Small_Step
-[0:00:00 --> 0:00:07]: And they're still brought to land now.
+[0:00:00 --> 0:00:29.360000]: alumnfeldaguyrjarna om det nya skirprå kızım om det där föddarna hatt splittar, do nackrott,
 [0:00:07 --> 0:00:18]: It's one small step for man.
 [0:00:18 --> 0:00:23]: One by a fleet for man time.
@@ -2,7 +2,7 @@ import os
 import datetime
 from glob import glob
 import whisper
-from torch import cuda, Generator
+from torch import backends, cuda, Generator
 import colorama
 from colorama import Back,Fore
 colorama.init(autoreset=True)
@@ -39,14 +39,20 @@ def transcribe(path, glob_file, model=None, language=None, verbose=False):
        - The transcribed text files will be saved in a "transcriptions" folder
          within the specified path.
-    """    
+    """
-    # Check for GPU acceleration
+    # Check for GPU acceleration and set device
-    if cuda.is_available():
+    if backends.mps.is_available():
        device = 'mps'
        Generator('mps').manual_seed(42)
    elif cuda.is_available():
        device = 'cuda'
        Generator('cuda').manual_seed(42)
    else:
        device = 'cpu'
        Generator().manual_seed(42)
-    # Load model
+
-    model = whisper.load_model(model)
+    # Load model on the correct device
    model = whisper.load_model(model, device=device)
    # Start main loop
    files_transcripted=[]   
    for file in glob_file:
Author	SHA1	Message	Date
Kristofer Rolf Söderström	7d3fe1ba26	Merge pull request #11 from soderstromkr/copilot/update-whisper-device-parameter Pass explicit device parameter to whisper.load_model() for MPS acceleration	2026-01-22 14:03:13 +01:00
copilot-swe-agent[bot]	da42a6e4cc	Add .gitignore and remove __pycache__ files Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>	2026-01-22 13:00:38 +00:00
copilot-swe-agent[bot]	0dab0d9bea	Add explicit device parameter to whisper.load_model() Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>	2026-01-22 13:00:21 +00:00
copilot-swe-agent[bot]	953c71ab28	Initial plan	2026-01-22 12:57:09 +00:00
Kristofer Rolf Söderström	5522bdd575	Merge pull request #6 Merged pull request #6	2026-01-22 13:53:23 +01:00
Kristofer Rolf Söderström	861c470330	Merge pull request #10 from soderstromkr/copilot/add-readme-gpu-support Add GPU support documentation to README	2026-01-22 13:44:11 +01:00
copilot-swe-agent[bot]	6de6d4b2ff	Add GPU support section to README with CUDA PyTorch installation instructions Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>	2026-01-22 12:42:09 +00:00
copilot-swe-agent[bot]	01552cc7cb	Initial plan	2026-01-22 12:40:19 +00:00
Yaroslav P	049a168c81	amd graphic card support	2025-03-05 16:23:10 +02:00
Kristofer Rolf Söderström	56a925463f	Update README.md	2024-05-17 08:51:16 +02:00
Kristofer Rolf Söderström	fe60b04020	Update README.md	2024-05-17 08:49:28 +02:00
Kristofer Rolf Söderström	ff06a257f2	Update README.md	2024-05-17 08:47:57 +02:00
Kristofer Rolf Söderström	5e31129ea2	Create requirements.txt	2024-05-17 08:44:39 +02:00
Kristofer Rolf Söderström	3f0bca02b7	Update README.md	2024-05-17 08:44:09 +02:00
Kristofer Rolf Söderström	488e78a5ae	Update README.md	2024-05-17 08:42:42 +02:00
Kristofer Rolf Söderström	829a054300	Update README.md	2024-05-17 08:40:42 +02:00
Kristofer Rolf Söderström	462aae12ca	Update README.md	2024-05-17 08:09:30 +02:00
Kristofer Rolf Söderström	fec9190ba1	Update README.md	2024-05-17 08:08:51 +02:00
Kristofer Rolf Söderström	0dde25204d	Update README.md removed other installation options from readme	2024-05-17 08:07:00 +02:00
Kristofer Söderström	b611aa6b8c	removed messagebox	2023-11-06 10:13:04 +01:00
Kristofer Söderström	7d50d5f4cf	QOL improvements	2023-11-06 09:57:44 +01:00