feat: update README to reflect Apple Silicon GPU support and new features in version 3.0

feat: add advanced transcription options for VAD, word-level timestamps, and translation
feat: implement multiprocessing for transcription with immediate cancellation
2026-04-11 14:16:07 +02:00 · 2026-04-11 14:06:04 +02:00 · 2026-04-05 22:11:13 +02:00 · 2026-04-04 00:32:36 +02:00 · 2026-03-20 20:19:46 +01:00 · 2026-03-03 08:35:03 +01:00
12 changed files with 1121 additions and 181 deletions
@@ -0,0 +1,26 @@
+# Python cache
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Virtual environments
+venv/
+env/
+ENV/
+.venv/
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Build artifacts
+dist/
+build/
+*.egg-info/
@@ -1,9 +1,31 @@
-### How to run on Mac
-Unfortunately, I have not found a permament solution for this, not being a Mac user has limited the ways I can test this.  
-#### Recommended steps
-1. Open a terminal and navigate to the root folder (the downloaded the folder). 
-    1. You can also right-click (or equivalent) on the root folder to open a Terminal within the folder.
-2. Run the following command:
+### How to run on Mac / Linux
+
+#### Quick start
+1. Open Terminal and navigate to the project folder (or right-click the folder and select "Open in Terminal").
+2. Make the script executable (only needed once):
 ```
-python app.py
+chmod +x run_Mac.sh
 ```
+3. Run it:
+```
+./run_Mac.sh
+```
+
+This will automatically:
+- Create a virtual environment (`.venv`)
+- Install all dependencies (no admin rights needed)
+- Launch the app
+
+#### Manual steps (alternative)
+If you prefer to do it manually:
+```
+python3 -m venv .venv
+.venv/bin/python install.py
+.venv/bin/python app.py
+```
+
+#### Notes
+- **Python 3.10+** is required. macOS users can install it from [python.org](https://www.python.org/downloads/) or via `brew install python`.
+- **No FFmpeg install needed** — audio decoding is bundled.
+- **GPU acceleration** is not available on macOS (Apple Silicon MPS is not supported by CTranslate2). CPU with int8 quantization is still fast.
+- On Apple Silicon (M1/M2/M3/M4), the `small` or `base` models run well. `medium` works but is slower.
@@ -1,22 +1,42 @@
 ## Local Transcribe with Whisper
-Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.

-## New in version 1.2!
-1. Simpler usage:
-    1. File type: You no longer need to specify file type. The program will only transcribe elligible files.
-    2. Language: Added option to specify language, which might help in some cases. Clear the default text to run automatic language recognition.
-    3. Model selection: Now a dropdown option that includes most models for typical use. 
-2. New and improved GUI.  
-![python GUI.py](images/gui-windows.png)
+> **🍎 Apple Silicon GPU/NPU acceleration:** This version now supports native Apple GPU/NPU acceleration via [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). On Apple Silicon Macs, transcription runs on the Apple GPU and Neural Engine — no CPU fallback needed.
+
+Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2) on Windows/Linux and [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
+
+## New in version 3.0!
+1. **Apple Silicon GPU/NPU support** — native MLX backend for Apple Silicon Macs, using Apple GPU + Neural Engine.
+2. **SRT subtitle export** — valid SubRip files alongside the existing TXT output, ready for HandBrake or any video player.
+3. **VAD filter** — removes silence, reduces hallucination, improves accuracy.
+4. **Word-level timestamps** — per-word SRT timing for precise subtitle burning.
+5. **Translation mode** — transcribe any language and translate to English in one step.
+6. **Stop button** — immediately cancel any transcription, including model downloads.
+7. **Language dropdown** — 99 languages with proper ISO codes, no more guessing formats.
+8. **Model descriptions** — speed, size, quality stars, and use case shown for every model.
+
+## New in version 2.0!
+1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage, simpler installation.
+2. **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab)
+3. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
+4. **No admin rights required** — a plain `pip install` covers everything.
+5. **No PyTorch dependency** — dramatically smaller install footprint.
+6. **Integrated console** - all info in the same application.
+7. **`tiny` model added** — smallest and fastest option.
+

 ## Features
 * Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
-* Choose the language of the files you are transcribing. You can either select a specific language or let the application automatically detect the language.
-* Select the Whisper model to use for the transcription. Available models include "base.en", "base", "small.en", "small", "medium.en", "medium", and "large". Models with .en ending are better if you're transcribing English, especially the base and small models.
-* Enable the verbose mode to receive detailed information during the transcription process.
+* Choose the language of the files you are transcribing from a dropdown of 99 supported languages, or let the application automatically detect the language.
+* Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
+* **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
+* **VAD filter** — removes silence from audio before transcription, reducing hallucination and improving accuracy.
+* **Word-level timestamps** — generates per-word timing in the SRT output for precise subtitle synchronization.
+* **Translation mode** — transcribes audio in any language and translates the result to English.
+* **SRT export** — valid SubRip subtitle files saved alongside TXT, ready for HandBrake or any video player.
 * Monitor the progress of the transcription with the progress bar and terminal.
 * Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
 * View the transcribed text in a message box once the transcription is completed.
+* **Stop button** — immediately cancel transcription, including model downloads.

 ## Installation
 ### Get the files
@@ -24,46 +44,70 @@ Download the zip folder and extract it to your preferred working folder.
 ![](images/Picture1.png)
 Or by cloning the repository with:
 ```
-git clone https://github.com/soderstromkr/transcribe.git
+git clone https://gitea.kobim.cloud/kobim/whisper-local-transcribe.git
 ```
-### Python Version **(any platform including Mac users)**
-This is recommended if you don't have Windows. Have Windows and use python, or want to use GPU acceleration (Pytorch and Cuda) for faster transcriptions. I would generally recommend this method anyway, but I can understand not everyone wants to go through the installation process for Python, Anaconda and the other required packages. 
-1. This script was made and tested in an Anaconda environment with Python 3.10. I recommend this method if you're not familiar with Python.
-See [here](https://docs.anaconda.com/anaconda/install/index.html) for instructions. You might need administrator rights. 
-2. Whisper requires some additional libraries. The [setup](https://github.com/openai/whisper#setup) page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files."
-Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmpeg[^1], which takes care of setting up PATH variables. From the anaconda prompt, type or copy the following:
+### Prerequisites
+Install **Python 3.10 or later**. Some IT policies allow installing from the Microsoft Store or Mac equivalent. However, I would prefer an install from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
+
+### Run on Windows
+Double-click `run_Windows.bat` — it will auto-install everything on first run.
+
+### Run on Mac / Linux
+Run `./run_Mac.sh` — it will auto-install everything on first run. See [Mac instructions](Mac_instructions.md) for details.
+
+> **Note:** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.
+
+### Manual installation (if the launchers don't work)
+If `run_Windows.bat` or `run_Mac.sh` fails (e.g. Python isn't on PATH, or permissions issues), open a terminal in the project folder and run these steps manually:
 ```
-conda install -c conda-forge ffmpeg-python
+python -m venv .venv
 ```
-3. The main functionality comes from openai-whisper. See their [page](https://github.com/openai/whisper) for details. As of 2023-03-22 you can install via:
+Activate the virtual environment:
+- **Windows:** `.venv\Scripts\activate`
+- **Mac / Linux:** `source .venv/bin/activate`
+
+Then install and run:
 ```
-pip install -U openai-whisper
+python install.py
+python app.py
 ```
-4. To run the app built on TKinter and TTKthemes. If using these options, make sure they are installed in your Python build. You can install them and colorama via pip.
+
+## GPU Support
+### Apple Silicon
+On Macs with Apple Silicon, the app automatically uses the **MLX backend**, which runs inference on the Apple GPU and Neural Engine. No additional setup is needed — just install and run. MLX models are downloaded from HuggingFace on first use.
+
+### NVIDIA GPUs
+This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.
+
+#### Automatic Detection
+The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
 ```
-pip install colorama
+pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
 ```
-and
+
+**Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.
+
+#### Verifying GPU Support
+After installation, you can verify that your GPU is available by running:
+```python
+import ctranslate2
+print(ctranslate2.get_supported_compute_types("cuda"))
 ```
-pip install customtkinter 
-```
-5. Run the app: 
-    1. For **Windows**: In the same folder as the *app.py* file, run the app from terminal by running ```python app.py``` or with the batch file called run_Windows.bat (for Windows users), which assumes you have conda installed and in the base environment (This is for simplicity, but users are usually adviced to create an environment, see [here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands) for more info) just make sure you have the correct environment (right click on the file and press edit to make any changes). If you want to download a model first, and then go offline for transcription, I recommend running the model with the default sample folder, which will download the model locally. 
-    2. For **Mac**: Haven't figured out a better way to do this, see [the instructions here](Mac_instructions.md)
+If this returns a list containing `"float16"`, GPU acceleration is working.
+
 ## Usage
-1. When launched, the app will also open a terminal that shows some additional information.
+1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates. The backend indicator at the bottom shows which inference engine is active (MLX · Apple GPU/NPU, CUDA · GPU, or CPU · int8).
 2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
-3. Enter the desired language for the transcription in the "Language" field. You can either select a language or leave it blank to enable automatic language detection.
-4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label.
-5. Enable the verbose mode by checking the "Verbose" checkbox if you want to receive detailed information during the transcription process.
-6. Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
-7. Monitor the progress of the transcription with the progress bar.
-8. Once the transcription is completed, a message box will appear displaying the transcribed text. Click "OK" to close the message box.
-9. You can run the application again or quit the application at any time by clicking the "Quit" button.
+3. Select the language from the dropdown — 99 languages are available, or leave it on "Auto-detect". For English-only models (.en) the language is locked to English; for KB Swedish models it's locked to Swedish.
+4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label. A description below shows speed, size, quality stars, and recommended use case for each model.
+5. Toggle advanced options if needed: **VAD filter**, **Word-level timestamps**, or **Translate to English**.
+6. Click the "Transcribe" button to start the transcription. Use the "Stop" button to cancel at any time.
+7. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
+8. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
+9. Transcriptions are saved as both `.txt` (human-readable) and `.srt` (SubRip subtitles) in the `transcriptions/` folder within the selected directory.
+10. You can run the application again or quit at any time by clicking the "Quit" button.

 ## Jupyter Notebook
 Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.

-[^1]: Advanced users can use ```pip install ffmpeg-python``` but be ready to deal with some [PATH issues](https://stackoverflow.com/questions/65836756/python-ffmpeg-wont-accept-path-why), which I encountered in Windows 11.
-
 [![DOI](https://zenodo.org/badge/617404576.svg)](https://zenodo.org/badge/latestdoi/617404576)
@@ -1,23 +1,170 @@
+import os
+import sys
 import tkinter as tk
 from tkinter import ttk
 from tkinter import filedialog
 from tkinter import messagebox
-from src._LocalTranscribe import transcribe, get_path
+from src._LocalTranscribe import transcribe, get_path, detect_backend, _transcribe_worker_process
+import multiprocessing as mp
 import customtkinter
 import threading
-from colorama import Back, Fore
-import colorama
-colorama.init(autoreset=True)
+
+
+# ── Helper: redirect stdout/stderr into a CTkTextbox ──────────────────────
+import re
+_ANSI_RE = re.compile(r'\x1b\[[0-9;]*m')  # strip colour codes
+
+class _ConsoleRedirector:
+    """Redirects output exclusively to the in-app console panel."""
+    def __init__(self, text_widget):
+        self.widget = text_widget
+
+    def write(self, text):
+        clean = _ANSI_RE.sub('', text)        # strip ANSI colours
+        if clean.strip() == '':
+            return
+        # Schedule UI update on the main thread
+        try:
+            self.widget.after(0, self._append, clean)
+        except Exception:
+            pass
+
+    def _append(self, text):
+        self.widget.configure(state='normal')
+        self.widget.insert('end', text + ('\n' if not text.endswith('\n') else ''))
+        self.widget.see('end')
+        self.widget.configure(state='disabled')
+
+    def flush(self):
+        pass
+
+# HuggingFace model IDs for non-standard models
+HF_MODEL_MAP = {
+    'KB Swedish (tiny)':   'KBLab/kb-whisper-tiny',
+    'KB Swedish (base)':   'KBLab/kb-whisper-base',
+    'KB Swedish (small)':  'KBLab/kb-whisper-small',
+    'KB Swedish (medium)': 'KBLab/kb-whisper-medium',
+    'KB Swedish (large)':  'KBLab/kb-whisper-large',
+}
+
+# Per-model info shown in the UI description label
+# (speed, size, quality stars, suggested use)
+MODEL_INFO = {
+    'tiny':                 ('Very fast',   '~75 MB',   '★★☆☆☆', 'Quick drafts & testing'),
+    'tiny.en':              ('Very fast',   '~75 MB',   '★★☆☆☆', 'Quick drafts & testing (English only)'),
+    'base':                 ('Fast',        '~145 MB',  '★★★☆☆', 'Notes & short podcasts'),
+    'base.en':              ('Fast',        '~145 MB',  '★★★☆☆', 'Notes & short podcasts (English only)'),
+    'small':                ('Balanced',    '~485 MB',  '★★★★☆', 'Everyday use'),
+    'small.en':             ('Balanced',    '~485 MB',  '★★★★☆', 'Everyday use (English only)'),
+    'medium':               ('Accurate',    '~1.5 GB',  '★★★★☆', 'Professional content'),
+    'medium.en':            ('Accurate',    '~1.5 GB',  '★★★★☆', 'Professional content (English only)'),
+    'large-v2':             ('Slow',        '~3 GB',    '★★★★★', 'Maximum accuracy'),
+    'large-v3':             ('Slow',        '~3 GB',    '★★★★★', 'Maximum accuracy (recommended)'),
+    'KB Swedish (tiny)':    ('Very fast',   '~75 MB',   '★★★☆☆', 'Swedish — optimised by KBLab'),
+    'KB Swedish (base)':    ('Fast',        '~145 MB',  '★★★☆☆', 'Swedish — optimised by KBLab'),
+    'KB Swedish (small)':   ('Balanced',    '~485 MB',  '★★★★☆', 'Swedish — optimised by KBLab'),
+    'KB Swedish (medium)':  ('Accurate',    '~1.5 GB',  '★★★★☆', 'Swedish — optimised by KBLab'),
+    'KB Swedish (large)':   ('Slow',        '~3 GB',    '★★★★★', 'Swedish — KBLab, best accuracy'),
+}



 customtkinter.set_appearance_mode("System")
 customtkinter.set_default_color_theme("blue")  # Themes: blue (default), dark-blue, green
-firstclick = True
+
+# All languages supported by Whisper  (display label → ISO code; None = auto-detect)
+WHISPER_LANGUAGES = {
+    'Auto-detect':          None,
+    'Afrikaans (af)':       'af',   'Albanian (sq)':        'sq',
+    'Amharic (am)':         'am',   'Arabic (ar)':          'ar',
+    'Armenian (hy)':        'hy',   'Assamese (as)':        'as',
+    'Azerbaijani (az)':     'az',   'Bashkir (ba)':         'ba',
+    'Basque (eu)':          'eu',   'Belarusian (be)':      'be',
+    'Bengali (bn)':         'bn',   'Bosnian (bs)':         'bs',
+    'Breton (br)':          'br',   'Bulgarian (bg)':       'bg',
+    'Catalan (ca)':         'ca',   'Chinese (zh)':         'zh',
+    'Croatian (hr)':        'hr',   'Czech (cs)':           'cs',
+    'Danish (da)':          'da',   'Dutch (nl)':           'nl',
+    'English (en)':         'en',   'Estonian (et)':        'et',
+    'Faroese (fo)':         'fo',   'Finnish (fi)':         'fi',
+    'French (fr)':          'fr',   'Galician (gl)':        'gl',
+    'Georgian (ka)':        'ka',   'German (de)':          'de',
+    'Greek (el)':           'el',   'Gujarati (gu)':        'gu',
+    'Haitian Creole (ht)':  'ht',   'Hausa (ha)':           'ha',
+    'Hawaiian (haw)':       'haw',  'Hebrew (he)':          'he',
+    'Hindi (hi)':           'hi',   'Hungarian (hu)':       'hu',
+    'Icelandic (is)':       'is',   'Indonesian (id)':      'id',
+    'Italian (it)':         'it',   'Japanese (ja)':        'ja',
+    'Javanese (jw)':        'jw',   'Kannada (kn)':         'kn',
+    'Kazakh (kk)':          'kk',   'Khmer (km)':           'km',
+    'Korean (ko)':          'ko',   'Lao (lo)':             'lo',
+    'Latin (la)':           'la',   'Latvian (lv)':         'lv',
+    'Lingala (ln)':         'ln',   'Lithuanian (lt)':      'lt',
+    'Luxembourgish (lb)':   'lb',   'Macedonian (mk)':      'mk',
+    'Malagasy (mg)':        'mg',   'Malay (ms)':           'ms',
+    'Malayalam (ml)':       'ml',   'Maltese (mt)':         'mt',
+    'Maori (mi)':           'mi',   'Marathi (mr)':         'mr',
+    'Mongolian (mn)':       'mn',   'Myanmar (my)':         'my',
+    'Nepali (ne)':          'ne',   'Norwegian (no)':       'no',
+    'Occitan (oc)':         'oc',   'Pashto (ps)':          'ps',
+    'Persian (fa)':         'fa',   'Polish (pl)':          'pl',
+    'Portuguese (pt)':      'pt',   'Punjabi (pa)':         'pa',
+    'Romanian (ro)':        'ro',   'Russian (ru)':         'ru',
+    'Sanskrit (sa)':        'sa',   'Serbian (sr)':         'sr',
+    'Shona (sn)':           'sn',   'Sindhi (sd)':          'sd',
+    'Sinhala (si)':         'si',   'Slovak (sk)':          'sk',
+    'Slovenian (sl)':       'sl',   'Somali (so)':          'so',
+    'Spanish (es)':         'es',   'Sundanese (su)':       'su',
+    'Swahili (sw)':         'sw',   'Swedish (sv)':         'sv',
+    'Tagalog (tl)':         'tl',   'Tajik (tg)':           'tg',
+    'Tamil (ta)':           'ta',   'Tatar (tt)':           'tt',
+    'Telugu (te)':          'te',   'Thai (th)':            'th',
+    'Tibetan (bo)':         'bo',   'Turkish (tr)':         'tr',
+    'Turkmen (tk)':         'tk',   'Ukrainian (uk)':       'uk',
+    'Urdu (ur)':            'ur',   'Uzbek (uz)':           'uz',
+    'Vietnamese (vi)':      'vi',   'Welsh (cy)':           'cy',
+    'Yiddish (yi)':         'yi',   'Yoruba (yo)':          'yo',
+}
+
+
+def _language_options_for_model(model_name):
+    """Return (values, default, state) for the language combobox given a model name."""
+    if model_name.endswith('.en'):
+        return ['English (en)'], 'English (en)', 'disabled'
+    if model_name.startswith('KB Swedish'):
+        return ['Swedish (sv)'], 'Swedish (sv)', 'disabled'
+    return list(WHISPER_LANGUAGES.keys()), 'Auto-detect', 'readonly'
+
+
+def _set_app_icon(root):
+    """Set app icon when supported, without crashing on unsupported platforms."""
+    base_dir = os.path.dirname(os.path.abspath(__file__))
+    icon_path = os.path.join(base_dir, "images", "icon.ico")
+
+    if not os.path.exists(icon_path):
+        return
+
+    try:
+        root.iconbitmap(icon_path)
+    except tk.TclError:
+        # Some Linux Tk builds don't accept .ico for iconbitmap.
+        pass
+
+
+def _apply_display_scaling(root):
+    """Auto-scale UI for high-resolution displays (e.g., 4K)."""
+    try:
+        screen_w = root.winfo_screenwidth()
+        screen_h = root.winfo_screenheight()
+        scale = min(screen_w / 1920.0, screen_h / 1080.0)
+        scale = max(1.0, min(scale, 2.0))
+        customtkinter.set_widget_scaling(scale)
+        customtkinter.set_window_scaling(scale)
+    except Exception:
+        pass

 class App:
    def __init__(self, master):
-        print(Back.CYAN + "Welcome to Local Transcribe with Whisper!\U0001f600\nCheck back here to see some output from your transcriptions.\nDon't worry, they will also be saved on the computer!\U0001f64f")
        self.master = master
        # Change font
        font = ('Roboto', 13, 'bold')  # Change the font and size here
@@ -27,107 +174,233 @@ class App:
        path_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        customtkinter.CTkLabel(path_frame, text="Folder:", font=font).pack(side=tk.LEFT, padx=5)
        self.path_entry = customtkinter.CTkEntry(path_frame, width=50, font=font_b)
+        self.path_entry.insert(0, os.path.join(os.getcwd(), 'sample_audio'))
        self.path_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
        customtkinter.CTkButton(path_frame, text="Browse", command=self.browse, font=font).pack(side=tk.LEFT, padx=5)
        # Language frame
-        #thanks to pommicket from Stackoverflow for this fix
-        def on_entry_click(event):
-            """function that gets called whenever entry is clicked"""        
-            global firstclick
-            if firstclick: # if this is the first time they clicked it
-                firstclick = False
-                self.language_entry.delete(0, "end") # delete all the text in the entry
        language_frame = customtkinter.CTkFrame(master)
        language_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        customtkinter.CTkLabel(language_frame, text="Language:", font=font).pack(side=tk.LEFT, padx=5)
-        self.language_entry = customtkinter.CTkEntry(language_frame, width=50, font=('Roboto', 12, 'italic'))
-        self.language_entry.insert(0, 'Select language or clear to detect automatically')
-        self.language_entry.bind('<FocusIn>', on_entry_click)
-        self.language_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
+        _lang_values, _lang_default, _lang_state = _language_options_for_model('medium')
+        self.language_combobox = customtkinter.CTkComboBox(
+            language_frame, width=50, state=_lang_state,
+            values=_lang_values, font=font_b)
+        self.language_combobox.set(_lang_default)
+        self.language_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
        # Model frame
-        models = ['base.en', 'base', 'small.en',
-                  'small', 'medium.en', 'medium', 'large']
+        models = ['tiny', 'tiny.en', 'base', 'base.en',
+                  'small', 'small.en', 'medium', 'medium.en',
+                  'large-v2', 'large-v3',
+                  '───────────────',
+                  'KB Swedish (tiny)', 'KB Swedish (base)',
+                  'KB Swedish (small)', 'KB Swedish (medium)',
+                  'KB Swedish (large)']
        model_frame = customtkinter.CTkFrame(master)
        model_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        customtkinter.CTkLabel(model_frame, text="Model:", font=font).pack(side=tk.LEFT, padx=5)
        # ComboBox frame
        self.model_combobox = customtkinter.CTkComboBox(
            model_frame, width=50, state="readonly",
-            values=models, font=font_b)
-        self.model_combobox.set(models[1])  # Set the default value
+            values=models, font=font_b,
+            command=self._on_model_change)
+        self.model_combobox.set('medium')  # Set the default value
        self.model_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
-        # Verbose frame
-        verbose_frame = customtkinter.CTkFrame(master)
-        verbose_frame.pack(fill=tk.BOTH, padx=10, pady=10)
-        self.verbose_var = tk.BooleanVar()
-        customtkinter.CTkCheckBox(verbose_frame, text="Output transcription to terminal", variable=self.verbose_var, font=font).pack(side=tk.LEFT, padx=5)
+        # Model description label
+        self.model_desc_label = customtkinter.CTkLabel(
+            master, text=self._model_desc_text('medium'),
+            font=('Roboto', 11), text_color=('#555555', '#aaaaaa'),
+            anchor='w')
+        self.model_desc_label.pack(fill=tk.X, padx=14, pady=(0, 4))
+        # Timestamps toggle
+        ts_frame = customtkinter.CTkFrame(master)
+        ts_frame.pack(fill=tk.BOTH, padx=10, pady=10)
+        self.timestamps_var = tk.BooleanVar(value=True)
+        self.timestamps_switch = customtkinter.CTkSwitch(
+            ts_frame, text="Include timestamps in transcription",
+            variable=self.timestamps_var, font=font_b)
+        self.timestamps_switch.pack(side=tk.LEFT, padx=5)
+        # Advanced options frame
+        adv_frame = customtkinter.CTkFrame(master)
+        adv_frame.pack(fill=tk.BOTH, padx=10, pady=10)
+        self.vad_var = tk.BooleanVar(value=False)
+        customtkinter.CTkSwitch(
+            adv_frame, text="VAD filter (remove silence)",
+            variable=self.vad_var, font=font_b).pack(side=tk.LEFT, padx=5)
+        self.word_ts_var = tk.BooleanVar(value=False)
+        customtkinter.CTkSwitch(
+            adv_frame, text="Word-level timestamps",
+            variable=self.word_ts_var, font=font_b).pack(side=tk.LEFT, padx=5)
+        self.translate_var = tk.BooleanVar(value=False)
+        customtkinter.CTkSwitch(
+            adv_frame, text="Translate to English",
+            variable=self.translate_var, font=font_b).pack(side=tk.LEFT, padx=5)
        # Progress Bar
        self.progress_bar = ttk.Progressbar(master, length=200, mode='indeterminate')
+        # Worker process handle (replaces thread+stop_event for true immediate cancellation)
+        self._proc = None
+        self._parent_conn = None
+        self._child_conn = None
        # Button actions frame
        button_frame = customtkinter.CTkFrame(master)
        button_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        self.transcribe_button = customtkinter.CTkButton(button_frame, text="Transcribe", command=self.start_transcription, font=font)
        self.transcribe_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
+        self.stop_button = customtkinter.CTkButton(
+            button_frame, text="Stop", command=self._stop_transcription, font=font,
+            fg_color="#c0392b", hover_color="#922b21", state=tk.DISABLED)
+        self.stop_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
        customtkinter.CTkButton(button_frame, text="Quit", command=master.quit, font=font).pack(side=tk.RIGHT, padx=5, pady=10, fill=tk.X, expand=True)
+
+        # ── Embedded console / log panel ──────────────────────────────────
+        log_label = customtkinter.CTkLabel(master, text="Console output", font=font, anchor='w')
+        log_label.pack(fill=tk.X, padx=12, pady=(8, 0))
+        self.log_box = customtkinter.CTkTextbox(master, height=220, font=('Consolas', 14),
+                                                 wrap='word', state='disabled',
+                                                 fg_color='#1e1e1e', text_color='#e0e0e0')
+        self.log_box.pack(fill=tk.BOTH, expand=True, padx=10, pady=(2, 10))
+
+        # Redirect stdout & stderr into the log panel (no backend console)
+        sys.stdout = _ConsoleRedirector(self.log_box)
+        sys.stderr = _ConsoleRedirector(self.log_box)
+
+        # Backend indicator
+        _bi = detect_backend()
+        backend_label = customtkinter.CTkLabel(
+            master,
+            text=f"Backend: {_bi['label']}",
+            font=('Roboto', 11),
+            text_color=("#555555", "#aaaaaa"),
+            anchor='e',
+        )
+        backend_label.pack(fill=tk.X, padx=12, pady=(0, 2))
+
+        # Welcome message (shown after redirect so it appears in the panel)
+        print("Welcome to Local Transcribe with Whisper! \U0001f600")
+        print("Transcriptions will be saved automatically.")
+        print("─" * 46)
    # Helper functions
+    def _stop_transcription(self):
+        self.stop_button.configure(state=tk.DISABLED)
+        if self._proc and self._proc.is_alive():
+            self._proc.terminate()
+            try:
+                self._proc.join(timeout=3)
+            except Exception:
+                pass
+            if self._proc.is_alive():
+                self._proc.kill()
+                try:
+                    self._proc.join(timeout=1)
+                except Exception:
+                    pass
+        # Close pipe ends — no semaphores, so no leak
+        for conn in (self._parent_conn, self._child_conn):
+            try:
+                if conn:
+                    conn.close()
+            except Exception:
+                pass
+        self._parent_conn = self._child_conn = None
+        print("⛔  Transcription stopped by user.")
+
+    def _model_desc_text(self, model_name):
+        info = MODEL_INFO.get(model_name)
+        if not info:
+            return ''
+        speed, size, stars, use = info
+        return f'{stars}  {speed}  ·  {size}  ·  {use}'
+
+    def _on_model_change(self, selected):
+        self.model_desc_label.configure(text=self._model_desc_text(selected))
+        values, default, state = _language_options_for_model(selected)
+        self.language_combobox.configure(values=values, state=state)
+        self.language_combobox.set(default)
+
    # Browsing
    def browse(self):
-        folder_path = filedialog.askdirectory()
+        initial_dir = os.getcwd()
+        folder_path = filedialog.askdirectory(initialdir=initial_dir)
        self.path_entry.delete(0, tk.END)
        self.path_entry.insert(0, folder_path)
    # Start transcription
    def start_transcription(self):
-        # Disable transcribe button
+        model_display = self.model_combobox.get()
+        if model_display.startswith('─'):
+            messagebox.showinfo("Invalid selection", "Please select a model, not the separator line.")
+            return
        self.transcribe_button.configure(state=tk.DISABLED)
-        # Start a new thread for the transcription process
-        threading.Thread(target=self.transcribe_thread).start()
-    # Threading
-    def transcribe_thread(self):
+        self.stop_button.configure(state=tk.NORMAL)
        path = self.path_entry.get()
-        model = self.model_combobox.get()
-        language = self.language_entry.get() or None
-        verbose = self.verbose_var.get()
-        # Show progress bar
+        model = HF_MODEL_MAP.get(model_display, model_display)
+        lang_label = self.language_combobox.get()
+        language = WHISPER_LANGUAGES.get(lang_label, lang_label) if lang_label else None
+        timestamps = self.timestamps_var.get()
+        vad_filter = self.vad_var.get()
+        word_timestamps = self.word_ts_var.get()
+        translate = self.translate_var.get()
+        glob_file = get_path(path)
        self.progress_bar.pack(fill=tk.X, padx=5, pady=5)
        self.progress_bar.start()
-        # Setting path and files
-        glob_file = get_path(path)
-        info_path = 'Continue?'
-        answer = messagebox.askyesno("Confirmation", info_path)
-        if not answer:
-            self.progress_bar.stop()
-            self.progress_bar.pack_forget()
-            self.transcribe_button.configure(state=tk.NORMAL)
-            return
-        # Start transcription
-        error_language = 'https://github.com/openai/whisper#available-models-and-languages'
+        self._parent_conn, self._child_conn = mp.Pipe(duplex=False)
+        self._proc = mp.Process(
+            target=_transcribe_worker_process,
+            args=(self._child_conn, path, glob_file, model, language, True, timestamps),
+            kwargs={"vad_filter": vad_filter, "word_timestamps": word_timestamps, "translate": translate},
+            daemon=True,
+        )
+        self._proc.start()
+        self._child_conn.close()  # parent doesn't write; close its write-end
+        self._child_conn = None
+        self.master.after(100, self._poll_worker)
+
+    def _poll_worker(self):
+        done = False
+        result = None
        try:
-            output_text = transcribe(path, glob_file, model, language, verbose)
-        except UnboundLocalError:
-            messagebox.showinfo("Files not found error!", 'Nothing found, choose another folder.')
+            while self._parent_conn and self._parent_conn.poll():
+                msg = self._parent_conn.recv()
+                if isinstance(msg, tuple) and msg[0] == '__done__':
+                    done = True
+                    result = msg[1]
+                else:
+                    sys.stdout.write(msg)
+                    sys.stdout.flush()
+        except EOFError:
+            # Child closed the pipe (normal completion or kill)
+            done = True
+        except Exception:
            pass
-        except ValueError:
-            messagebox.showinfo("Invalid language name, you might have to clear the default text to continue!")
-        # Hide progress bar
+        if done or (self._proc and not self._proc.is_alive()):
+            if self._parent_conn:
+                try:
+                    self._parent_conn.close()
+                except Exception:
+                    pass
+                self._parent_conn = None
+            self._on_transcription_done(result)
+        else:
+            self.master.after(100, self._poll_worker)
+
+    def _on_transcription_done(self, output_text):
        self.progress_bar.stop()
        self.progress_bar.pack_forget()
-        # Enable transcribe button
+        self.stop_button.configure(state=tk.DISABLED)
        self.transcribe_button.configure(state=tk.NORMAL)
-        # Recover output text
-        try:
-            messagebox.showinfo("Finished!", output_text)
-        except UnboundLocalError:
-            pass
+        if output_text:
+            title = "Finished!" if not output_text.startswith('⚠') else "Error"
+            messagebox.showinfo(title, output_text)

 if __name__ == "__main__":
    # Setting custom themes
    root = customtkinter.CTk()
+    _apply_display_scaling(root)
    root.title("Local Transcribe with Whisper")
-    # Geometry
-    width,height = 450,275
-    root.geometry('{}x{}'.format(width,height))
-    # Icon 
-    root.iconbitmap('images/icon.ico')
+    # Geometry — taller to accommodate the embedded console panel
+    width, height = 550, 560
+    root.geometry('{}x{}'.format(width, height))
+    root.minsize(450, 480)
+    # Icon (best-effort; ignored on platforms/builds without .ico support)
+    _set_app_icon(root)
    # Run
    app = App(root)
    root.mainloop()
@@ -1,7 +1,7 @@
 from cx_Freeze import setup, Executable

 build_exe_options = {
-    "packages": ['whisper','tkinter','customtkinter']
+    "packages": ['faster_whisper','tkinter','customtkinter']
    }
 executables = (
    [
@@ -13,7 +13,7 @@ executables = (
 )
 setup(
    name="Local Transcribe with Whisper",
-    version="1.2",
+    version="2.0",
    author="Kristofer Rolf Söderström",
    options={"build_exe":build_exe_options},
    executables=executables
@@ -0,0 +1,128 @@
+"""
+Installer script for Local Transcribe with Whisper.
+Detects NVIDIA GPU and offers to install GPU acceleration support.
+
+Usage:
+    python install.py
+"""
+
+import os
+import subprocess
+import sys
+import shutil
+import site
+
+
+def detect_nvidia_gpu():
+    """Check if an NVIDIA GPU is present."""
+    candidates = [
+        shutil.which("nvidia-smi"),
+        r"C:\Windows\System32\nvidia-smi.exe",
+        r"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe",
+    ]
+    for path in candidates:
+        if not path or not os.path.isfile(path):
+            continue
+        try:
+            r = subprocess.run(
+                [path, "--query-gpu=name", "--format=csv,noheader"],
+                capture_output=True, text=True, timeout=10,
+            )
+            if r.returncode == 0 and r.stdout.strip():
+                return True, r.stdout.strip().split("\n")[0]
+        except Exception:
+            continue
+    return False, None
+
+
+def pip_install(*packages):
+    cmd = [sys.executable, "-m", "pip", "install"] + list(packages)
+    print(f"\n> {' '.join(cmd)}\n")
+    subprocess.check_call(cmd)
+
+
+def get_site_packages():
+    for p in site.getsitepackages():
+        if p.endswith("site-packages"):
+            return p
+    return site.getsitepackages()[0]
+
+
+def create_nvidia_pth():
+    """Create a .pth startup hook that registers NVIDIA DLL directories."""
+    sp = get_site_packages()
+    pth_path = os.path.join(sp, "nvidia_cuda_path.pth")
+    # This one-liner runs at Python startup, before any user code.
+    pth_content = (
+        "import os, glob as g; "
+        "any(os.add_dll_directory(d) or os.environ.__setitem__('PATH', d + os.pathsep + os.environ.get('PATH','')) "
+        "for d in g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'bin')) "
+        "+ g.glob(os.path.join(r'" + sp.replace("'", "\\'") + "', 'nvidia', '*', 'lib')) "
+        "if os.path.isdir(d)) if os.name == 'nt' else None\n"
+    )
+    with open(pth_path, "w") as f:
+        f.write(pth_content)
+    print(f"  Created CUDA startup hook: {pth_path}")
+
+
+def verify_cuda():
+    """Verify CUDA works in a fresh subprocess."""
+    try:
+        r = subprocess.run(
+            [sys.executable, "-c",
+             "import ctranslate2; "
+             "print('float16' in ctranslate2.get_supported_compute_types('cuda'))"],
+            capture_output=True, text=True, timeout=30,
+        )
+        return r.stdout.strip() == "True"
+    except Exception:
+        return False
+
+
+def main():
+    print("=" * 55)
+    print("  Local Transcribe with Whisper — Installer")
+    print("=" * 55)
+
+    # Step 1: Base packages
+    print("\n[1/2] Installing base requirements...")
+    pip_install("-r", "requirements.txt")
+    print("\n  Base requirements installed!")
+
+    # Step 2: GPU
+    print("\n[2/2] Checking for NVIDIA GPU...")
+    has_gpu, gpu_name = detect_nvidia_gpu()
+
+    if has_gpu:
+        print(f"\n  NVIDIA GPU detected: {gpu_name}")
+        print("  GPU acceleration can make transcription 2-5x faster.")
+        print("  This will install ~300 MB of additional CUDA libraries.\n")
+
+        while True:
+            answer = input("  Install GPU support? [Y/n]: ").strip().lower()
+            if answer in ("", "y", "yes"):
+                print("\n  Installing CUDA libraries...")
+                pip_install("nvidia-cublas-cu12", "nvidia-cudnn-cu12")
+                create_nvidia_pth()
+                print("\n  Verifying CUDA...")
+                if verify_cuda():
+                    print("  GPU support verified and working!")
+                else:
+                    print("  WARNING: CUDA installed but not detected.")
+                    print("  Update your NVIDIA drivers and try again.")
+                break
+            elif answer in ("n", "no"):
+                print("\n  Skipping GPU. Re-run install.py to add it later.")
+                break
+            else:
+                print("  Please enter Y or N.")
+    else:
+        print("\n  No NVIDIA GPU detected — using CPU mode.")
+
+    print("\n" + "=" * 55)
+    print("  Done! Run the app with:  python app.py")
+    print("=" * 55)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,3 @@
+faster-whisper
+mlx-whisper
+customtkinter
@@ -0,0 +1,29 @@
+#!/bin/bash
+# ============================================================
+#  Local Transcribe with Whisper — macOS / Linux launcher
+# ============================================================
+# Double-click this file or run:  ./run_Mac.sh
+# On first run it creates a venv and installs dependencies.
+# ============================================================
+
+set -e
+
+cd "$(dirname "$0")"
+
+# Create .venv if it doesn't exist
+if [ ! -f ".venv/bin/python" ]; then
+    echo "Creating virtual environment..."
+    python3 -m venv .venv
+fi
+
+PYTHON=".venv/bin/python"
+
+# Install dependencies on first run
+if ! "$PYTHON" -c "import faster_whisper" 2>/dev/null; then
+    echo "First run detected — running installer..."
+    "$PYTHON" install.py
+    echo
+fi
+
+echo "Starting Local Transcribe..."
+"$PYTHON" app.py
@@ -1,5 +1,23 @@
@echo off
-echo Starting...
-call conda activate base
-REM OPTION 2 : (KEEP TEXT WITHIN QUOTES AND CHANGE USERNAME) "C:/Users/user/Anaconda3/condabin/activate.bat"
-call python app.py
+REM Create .venv on first run if it doesn't exist
+if not exist ".venv\Scripts\python.exe" (
+    echo Creating virtual environment...
+    python -m venv .venv
+    if errorlevel 1 (
+        echo ERROR: Failed to create virtual environment. Is Python installed and on PATH?
+        pause
+        exit /b 1
+    )
+)
+
+set PYTHON=.venv\Scripts\python.exe
+
+REM Check if dependencies are installed
+%PYTHON% -c "import faster_whisper" 2>nul
+if errorlevel 1 (
+    echo First run detected - running installer...
+    %PYTHON% install.py
+    echo.
+)
+echo Starting Local Transcribe...
+%PYTHON% app.py
@@ -1,4 +1,4 @@
 Armstrong_Small_Step
-[0:00:00 --> 0:00:07]: And they're still brought to land now.
-[0:00:07 --> 0:00:18]: It's one small step for man.
-[0:00:18 --> 0:00:23]: One by a fleet for man time.
+────────────────────────────────────────
+
+That's one small step for man, one giant leap for mankind.
@@ -1,4 +1,4 @@
 Axel_Pettersson_röstinspelning
-[0:00:00 --> 0:00:06]: Hej, jag heter Raxel Patterson, jag får att se över UR 1976.
-[0:00:06 --> 0:00:12.540000]: Jag har varit Wikipedia-périonsen 2018 och jag har översat röst-intro-
-[0:00:12.540000 --> 0:00:15.540000]:-projektet till svenska.
+────────────────────────────────────────
+
+Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
@@ -1,27 +1,217 @@
 import os
+import sys
+import platform
 import datetime
+import time
+import site
 from glob import glob
-import whisper
-from torch import cuda, Generator
-import colorama
-from colorama import Back,Fore
-colorama.init(autoreset=True)
+
+# ---------------------------------------------------------------------------
+# CUDA setup — must happen before importing faster_whisper / ctranslate2
+# ---------------------------------------------------------------------------
+def _setup_cuda_libs():
+    """Register NVIDIA pip-package lib dirs so ctranslate2 finds CUDA at runtime.
+
+    pip-installed nvidia-cublas-cu12 / nvidia-cudnn-cu12 place their shared
+    libraries inside the site-packages tree.  Neither Windows nor Linux
+    automatically search those directories, so we must register them
+    explicitly:
+      - Windows: os.add_dll_directory() + PATH
+      - Linux:   LD_LIBRARY_PATH  (read by the dynamic linker)
+    """
+    try:
+        sp_dirs = site.getsitepackages()
+    except AttributeError:
+        # virtualenv without site-packages helper
+        sp_dirs = [os.path.join(sys.prefix, "lib",
+                                "python" + ".".join(map(str, sys.version_info[:2])),
+                                "site-packages")]
+
+    for sp in sp_dirs:
+        nvidia_root = os.path.join(sp, "nvidia")
+        if not os.path.isdir(nvidia_root):
+            continue
+        for pkg in os.listdir(nvidia_root):
+            for sub in ("bin", "lib"):
+                d = os.path.join(nvidia_root, pkg, sub)
+                if not os.path.isdir(d):
+                    continue
+                if sys.platform == "win32":
+                    os.environ["PATH"] = d + os.pathsep + os.environ.get("PATH", "")
+                    try:
+                        os.add_dll_directory(d)
+                    except (OSError, AttributeError):
+                        pass
+                else:
+                    # Linux / macOS — prepend to LD_LIBRARY_PATH
+                    ld = os.environ.get("LD_LIBRARY_PATH", "")
+                    if d not in ld:
+                        os.environ["LD_LIBRARY_PATH"] = d + (":" + ld if ld else "")
+                        # Also load via ctypes so already-started process sees it
+                        import ctypes
+                        try:
+                            for so in sorted(os.listdir(d)):
+                                if so.endswith(".so") or ".so." in so:
+                                    ctypes.cdll.LoadLibrary(os.path.join(d, so))
+                        except OSError:
+                            pass
+
+_setup_cuda_libs()
+
+from faster_whisper import WhisperModel
+
+
+SUPPORTED_EXTENSIONS = {
+    ".wav", ".mp3", ".m4a", ".flac", ".ogg", ".wma", ".aac",
+    ".mp4", ".mkv", ".mov", ".webm", ".avi", ".mpeg", ".mpg",
+}
+
+
+# ---------------------------------------------------------------------------
+# MLX model map  (Apple Silicon only)
+# ---------------------------------------------------------------------------
+_MLX_MODEL_MAP = {
+    "tiny":     "mlx-community/whisper-tiny-mlx",
+    "base":     "mlx-community/whisper-base-mlx",
+    "small":    "mlx-community/whisper-small-mlx",
+    "medium":   "mlx-community/whisper-medium-mlx",
+    "large-v2": "mlx-community/whisper-large-v2-mlx",
+    "large-v3": "mlx-community/whisper-large-v3-mlx",
+}
+
+
+def detect_backend():
+    """Return the best available inference backend.
+
+    Returns a dict with keys:
+        backend      : "mlx" | "cuda" | "cpu"
+        device       : device string for WhisperModel (cuda / cpu)
+        compute_type : compute type string for WhisperModel
+        label        : human-readable label for UI display
+    """
+    # Apple Silicon → try MLX (GPU + Neural Engine via Apple MLX)
+    if sys.platform == "darwin" and platform.machine() == "arm64":
+        try:
+            import mlx_whisper  # noqa: F401
+            return {
+                "backend": "mlx",
+                "device": "cpu",
+                "compute_type": "int8",
+                "label": "MLX · Apple GPU/NPU",
+            }
+        except ImportError:
+            pass
+
+    # NVIDIA CUDA
+    try:
+        import ctranslate2
+        cuda_types = ctranslate2.get_supported_compute_types("cuda")
+        if "float16" in cuda_types:
+            return {
+                "backend": "cuda",
+                "device": "cuda",
+                "compute_type": "float16",
+                "label": "CUDA · GPU",
+            }
+    except Exception:
+        pass
+
+    return {
+        "backend": "cpu",
+        "device": "cpu",
+        "compute_type": "int8",
+        "label": "CPU · int8",
+    }
+
+
+def _decode_audio_pyav(file_path):
+    """Decode any audio/video file to a float32 mono 16 kHz numpy array.
+
+    Uses PyAV (bundled FFmpeg) — no external ffmpeg binary required.
+    Returns (audio_array, duration_seconds).
+    """
+    import av
+    import numpy as np
+
+    with av.open(file_path) as container:
+        duration = float(container.duration) / 1_000_000  # microseconds → seconds
+        stream = container.streams.audio[0]
+        resampler = av.AudioResampler(format="fltp", layout="mono", rate=16000)
+        chunks = []
+        for frame in container.decode(stream):
+            for out in resampler.resample(frame):
+                if out:
+                    chunks.append(out.to_ndarray()[0])
+        # Flush resampler
+        for out in resampler.resample(None):
+            if out:
+                chunks.append(out.to_ndarray()[0])
+
+    if not chunks:
+        return np.zeros(0, dtype=np.float32), duration
+    return np.concatenate(chunks, axis=0), duration
+
+
+def _transcribe_mlx_file(file, mlx_model_id, language, timestamps, verbose, vad_filter=False, word_timestamps=False, translate=False):
+    """Transcribe a single file with mlx-whisper (Apple GPU/NPU).
+
+    Decodes audio via PyAV (no system ffmpeg needed), then runs MLX inference.
+    Returns (segments_as_dicts, audio_duration_seconds).
+    Segments have dict keys: 'start', 'end', 'text'.
+    """
+    import mlx_whisper
+
+    audio_array, duration = _decode_audio_pyav(file)
+
+    decode_opts = {}
+    if language:
+        decode_opts["language"] = language
+    if translate:
+        decode_opts["task"] = "translate"
+    if word_timestamps:
+        decode_opts["word_timestamps"] = True
+
+    result = mlx_whisper.transcribe(
+        audio_array,
+        path_or_hf_repo=mlx_model_id,
+        verbose=(True if verbose else None),
+        **decode_opts,
+    )
+    segments = result["segments"]
+    audio_duration = segments[-1]["end"] if segments else duration
+    return segments, audio_duration
+
+
+def _srt_timestamp(seconds):
+    """Convert seconds (float) to SRT timestamp format HH:MM:SS,mmm."""
+    ms = round(seconds * 1000)
+    h, ms = divmod(ms, 3_600_000)
+    m, ms = divmod(ms, 60_000)
+    s, ms = divmod(ms, 1000)
+    return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"


 # Get the path
 def get_path(path):
-    glob_file = glob(path + '/*')
-    return glob_file
+    all_items = glob(path + '/*')
+    media_files = []
+    for item in all_items:
+        if not os.path.isfile(item):
+            continue
+        _, ext = os.path.splitext(item)
+        if ext.lower() in SUPPORTED_EXTENSIONS:
+            media_files.append(item)
+    return sorted(media_files)

 # Main function
-def transcribe(path, glob_file, model=None, language=None, verbose=False):
+def transcribe(path, glob_file, model=None, language=None, verbose=False, timestamps=True, stop_event=None, vad_filter=False, word_timestamps=False, translate=False):
    """
-    Transcribes audio files in a specified folder using OpenAI's Whisper model.
+    Transcribes audio files in a specified folder using faster-whisper (CTranslate2).

    Args:
        path (str): Path to the folder containing the audio files.
        glob_file (list): List of audio file paths to transcribe.
-        model (str, optional): Name of the Whisper model to use for transcription.
+        model (str, optional): Name of the Whisper model size to use for transcription.
            Defaults to None, which uses the default model.
        language (str, optional): Language code for transcription. Defaults to None,
            which enables automatic language detection.
@@ -38,53 +228,260 @@ def transcribe(path, glob_file, model=None, language=None, verbose=False):
        - The function downloads the specified model if not available locally.
        - The transcribed text files will be saved in a "transcriptions" folder
          within the specified path.
+        - Uses CTranslate2 for up to 4x faster inference compared to openai-whisper.
+        - FFmpeg is bundled via the PyAV dependency — no separate installation needed.

    """
-    # Check for GPU acceleration
-    if cuda.is_available():
-        Generator('cuda').manual_seed(42)
-    else:
-        Generator().manual_seed(42)
-    # Load model
-    model = whisper.load_model(model)
-    # Start main loop
-    files_transcripted=[]   
+    SEP = "─" * 46
+
+    # ── Step 1: Detect hardware ──────────────────────────────────────
+    backend_info = detect_backend()
+    backend   = backend_info["backend"]
+    device    = backend_info["device"]
+    compute_type = backend_info["compute_type"]
+    print(f"⚙  Backend: {backend_info['label']}")
+
+    # ── Step 1b: MLX path (Apple GPU/NPU) ───────────────────────────
+    if backend == "mlx":
+        mlx_model_id = _MLX_MODEL_MAP.get(model)
+        if mlx_model_id is None:
+            print(f"⚠  Model '{model}' is not available in MLX format.")
+            print("   Falling back to faster-whisper on CPU (int8).")
+            backend = "cpu"
+            device, compute_type = "cpu", "int8"
+        else:
+            # ── Step 2 (MLX): load + transcribe ─────────────────────
+            print(f"⏳ Loading MLX model '{model}' — downloading if needed...")
+            print("✅ Model ready!")
+            print(SEP)
+
+            total_files = len(glob_file)
+            print(f"📂 Found {total_files} supported media file(s) in folder")
+            print(SEP)
+
+            if total_files == 0:
+                output_text = '⚠  No supported media files found — try another folder.'
+                print(output_text)
+                print(SEP)
+                return output_text
+
+            files_transcripted = []
+            file_num = 0
+            for file in glob_file:
+                if stop_event and stop_event.is_set():
+                    print("⛔  Transcription stopped by user.")
+                    break
+                title = os.path.basename(file).split('.')[0]
+                file_num += 1
+                print(f"\n{'─' * 46}")
+                print(f"📄 File {file_num}/{total_files}: {title}")
+                try:
+                    t_start = time.time()
+                    segments, audio_duration = _transcribe_mlx_file(
+                        file, mlx_model_id, language, timestamps, verbose,
+                        vad_filter=vad_filter, word_timestamps=word_timestamps,
+                        translate=translate
+                    )
+                    os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
+                    segment_list = []
+                    txt_path = "{}/transcriptions/{}.txt".format(path, title)
+                    srt_path = "{}/transcriptions/{}.srt".format(path, title)
+                    with open(txt_path, 'w', encoding='utf-8') as f, \
+                         open(srt_path, 'w', encoding='utf-8') as srt_f:
+                        f.write(title)
+                        f.write('\n' + '─' * 40 + '\n')
+                        for idx, seg in enumerate(segments, start=1):
+                            if stop_event and stop_event.is_set():
+                                break
+                            text = seg["text"].strip()
+                            if timestamps:
+                                start_ts = str(datetime.timedelta(seconds=seg["start"]))
+                                end_ts   = str(datetime.timedelta(seconds=seg["end"]))
+                                f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
+                            else:
+                                f.write('\n{}'.format(text))
+                            srt_f.write(f'{idx}\n{_srt_timestamp(seg["start"])} --> {_srt_timestamp(seg["end"])}\n{text}\n\n')
+                            f.flush()
+                            srt_f.flush()
+                            if verbose:
+                                print("   [%.2fs → %.2fs] %s" % (seg["start"], seg["end"], seg["text"]))
+                            else:
+                                print("   Transcribed up to %.0fs..." % seg["end"], end='\r')
+                            segment_list.append(seg)
+                    elapsed = time.time() - t_start
+                    elapsed_min = elapsed / 60.0
+                    audio_min   = audio_duration / 60.0
+                    ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
+                    print(f"✅ Done — saved to transcriptions/{title}.txt")
+                    print(f"⏱  Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min  ({ratio:.1f}x realtime)")
+                    files_transcripted.append(segment_list)
+                except Exception as exc:
+                    print(f"⚠  Could not decode '{os.path.basename(file)}', skipping.")
+                    print(f"   Reason: {exc}")
+
+            print(f"\n{SEP}")
+            if files_transcripted:
+                output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n   Saved in: {path}/transcriptions"
+            else:
+                output_text = '⚠  No files eligible for transcription — try another folder.'
+            print(output_text)
+            print(SEP)
+            return output_text
+
+    # ── Step 2: Load model (faster-whisper / CTranslate2) ───────────
+    print(f"⏳ Loading model '{model}' — downloading if needed...")
+    try:
+        whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
+    except Exception as exc:
+        err = str(exc).lower()
+        cuda_runtime_missing = (
+            device == "cuda"
+            and (
+                "libcublas" in err
+                or "libcudnn" in err
+                or "cuda" in err
+                or "cannot be loaded" in err
+                or "not found" in err
+            )
+        )
+        if not cuda_runtime_missing:
+            raise
+        print("⚠  CUDA runtime not available; falling back to CPU (int8).")
+        print(f"   Reason: {exc}")
+        device, compute_type = "cpu", "int8"
+        whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
+    print("✅ Model ready!")
+    print(SEP)
+
+    # ── Step 3: Transcribe files ─────────────────────────────────────
+    total_files = len(glob_file)
+    print(f"📂 Found {total_files} supported media file(s) in folder")
+    print(SEP)
+
+    if total_files == 0:
+        output_text = '⚠  No supported media files found — try another folder.'
+        print(output_text)
+        print(SEP)
+        return output_text
+
+    files_transcripted = []
+    file_num = 0
    for file in glob_file:
+        if stop_event and stop_event.is_set():
+            print("⛔  Transcription stopped by user.")
+            break
        title = os.path.basename(file).split('.')[0]
-        print(Back.CYAN + '\nTrying to transcribe file named: {}\U0001f550'.format(title))
+        file_num += 1
+        print(f"\n{'─' * 46}")
+        print(f"📄 File {file_num}/{total_files}: {title}")
        try:
-            result = model.transcribe(
+            t_start = time.time()
+            segments, info = whisper_model.transcribe(
                file,
                language=language,
-                verbose=verbose
-                )
-            files_transcripted.append(result)
+                beam_size=5,
+                task="translate" if translate else "transcribe",
+                vad_filter=vad_filter,
+                word_timestamps=word_timestamps,
+            )
+            audio_duration = info.duration  # seconds
            # Make folder if missing
-            try:
-                os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
-            except FileExistsError:
-                pass
-            # Create segments for text files
-            start = []
-            end = []
-            text = []
-            for segment in result['segments']:
-                start.append(str(datetime.timedelta(seconds=segment['start'])))
-                end.append(str(datetime.timedelta(seconds=segment['end'])))
-                text.append(segment['text'])
-            # Save files to transcriptions folder
-            with open("{}/transcriptions/{}.txt".format(path, title), 'w', encoding='utf-8') as file:
-                file.write(title)
-                for i in range(len(result['segments'])):
-                    file.write('\n[{} --> {}]:{}'.format(start[i], end[i], text[i]))
-        # Skip invalid files
-        except RuntimeError:
-            print(Fore.RED + 'Not a valid file, skipping.')
-            pass
-     # Check if any files were processed.
+            os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
+            # Stream segments as they are decoded
+            segment_list = []
+            txt_path = "{}/transcriptions/{}.txt".format(path, title)
+            srt_path = "{}/transcriptions/{}.srt".format(path, title)
+            with open(txt_path, 'w', encoding='utf-8') as f, \
+                 open(srt_path, 'w', encoding='utf-8') as srt_f:
+                f.write(title)
+                f.write('\n' + '─' * 40 + '\n')
+                for idx, seg in enumerate(segments, start=1):
+                    if stop_event and stop_event.is_set():
+                        break
+                    text = seg.text.strip()
+                    if timestamps:
+                        start_ts = str(datetime.timedelta(seconds=seg.start))
+                        end_ts = str(datetime.timedelta(seconds=seg.end))
+                        f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
+                    else:
+                        f.write('\n{}'.format(text))
+                    # Use word-level timestamps for SRT if available
+                    if word_timestamps and hasattr(seg, 'words') and seg.words:
+                        for w_idx, word in enumerate(seg.words, start=1):
+                            w_text = word.word.strip()
+                            if not w_text:
+                                continue
+                            w_start = _srt_timestamp(word.start)
+                            w_end = _srt_timestamp(word.end)
+                            srt_f.write(f'{idx}.{w_idx}\n{w_start} --> {w_end}\n{w_text}\n\n')
+                    else:
+                        srt_f.write(f'{idx}\n{_srt_timestamp(seg.start)} --> {_srt_timestamp(seg.end)}\n{text}\n\n')
+                    f.flush()
+                    srt_f.flush()
+                    if verbose:
+                        print("   [%.2fs → %.2fs] %s" % (seg.start, seg.end, seg.text))
+                    else:
+                        print("   Transcribed up to %.0fs..." % seg.end, end='\r')
+                    segment_list.append(seg)
+            elapsed = time.time() - t_start
+            elapsed_min = elapsed / 60.0
+            audio_min = audio_duration / 60.0
+            ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
+            print(f"✅ Done — saved to transcriptions/{title}.txt")
+            print(f"⏱  Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min  ({ratio:.1f}x realtime)")
+            files_transcripted.append(segment_list)
+        except Exception as exc:
+            print(f"⚠  Could not decode '{os.path.basename(file)}', skipping.")
+            print(f"   Reason: {exc}")
+
+    # ── Summary ──────────────────────────────────────────────────────
+    print(f"\n{SEP}")
    if len(files_transcripted) > 0:
-        output_text = 'Finished transcription, {} files can be found in {}/transcriptions'.format(len(files_transcripted), path)
+        output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n   Saved in: {path}/transcriptions"
    else:
-        output_text = 'No files elligible for transcription, try adding audio or video files to this folder or choose another folder!'
-    # Return output text
+        output_text = '⚠  No files eligible for transcription — try another folder.'
+    print(output_text)
+    print(SEP)
    return output_text
+
+
+def _transcribe_worker_process(conn, path, glob_file, model, language, verbose, timestamps, vad_filter=False, word_timestamps=False, translate=False):
+    """Child-process entry point for the UI's multiprocessing backend.
+
+    Redirects stdout/stderr → pipe connection so the main process can display
+    output in the console panel.  The main process sends SIGTERM/SIGKILL to
+    stop this process immediately, including any in-progress download or inference.
+    """
+    import sys
+
+    class _PipeWriter:
+        def __init__(self, c):
+            self.c = c
+
+        def write(self, text):
+            if text:
+                try:
+                    self.c.send(text)
+                except Exception:
+                    pass
+
+        def flush(self):
+            pass
+
+    writer = _PipeWriter(conn)
+    sys.stdout = writer
+    sys.stderr = writer
+
+    result = '⚠  No output produced.'
+    try:
+        result = transcribe(path, glob_file, model, language, verbose, timestamps,
+                           vad_filter=vad_filter, word_timestamps=word_timestamps,
+                           translate=translate)
+    except Exception as exc:
+        result = f'⚠  Unexpected error: {exc}'
+    finally:
+        try:
+            conn.send(('__done__', result))
+        except Exception:
+            pass
+        conn.close()
Author	SHA1	Message	Date
kobim	e2e19940dd	feat: update README to reflect Apple Silicon GPU support and new features in version 3.0	2026-04-11 14:16:07 +02:00
kobim	0293a13177	feat: add advanced transcription options for VAD, word-level timestamps, and translation	2026-04-11 14:06:04 +02:00
kobim	8d5c8d6097	feat: implement multiprocessing for transcription with immediate cancellation	2026-04-05 22:11:13 +02:00
kobim	e29572420e	feat: enhance transcription capabilities with MLX support and backend detection	2026-04-04 00:32:36 +02:00
Kristofer Söderström	f7d621e510	Add timestamps toggle and update transcription format to include/exclude timestamps	2026-03-20 20:19:46 +01:00
Kristofer Rolf Söderström	2a1df6aeba	Update Python installation instructions in README Clarified installation instructions for Python 3.10 or later, specifying preferred installation method.	2026-03-03 08:35:03 +01:00
soderstromkr	58255c3d10	fix: Linux/Ubuntu support — icon fallback, HiDPI scaling, CUDA lib paths, per-file timing - app.py: graceful icon loading (no crash on Linux Tk without .ico support) - app.py: auto-detect display scaling for 4K/HiDPI screens - _LocalTranscribe.py: register NVIDIA pip-package .so paths on Linux (LD_LIBRARY_PATH) so faster-whisper finds libcublas/libcudnn at runtime - _LocalTranscribe.py: auto-fallback to CPU if CUDA runtime libs missing - _LocalTranscribe.py: filter input to supported media extensions only - _LocalTranscribe.py: show real decode errors instead of generic skip message - _LocalTranscribe.py: per-file timer showing wall-clock vs audio duration	2026-03-02 21:49:32 +01:00
Kristofer Söderström	ea43074852	Update README.md: Add manual installation instructions for troubleshooting launcher issues	2026-03-02 17:17:35 +01:00
Kristofer Söderström	7b81778d9e	Update README.md: Simplify installation instructions and clarify auto-installation process	2026-03-02 17:16:09 +01:00
Kristofer Söderström	e65462f57b	Update README.md: Add link to classic release in Mac user note	2026-03-02 17:13:05 +01:00
Kristofer Söderström	09e3e43c51	Update README.md: Reorder features for clarity and emphasize integrated console	2026-03-02 17:11:13 +01:00
Kristofer Söderström	d4c26f6c37	Update README.md: Rearrange new features for clarity and highlight Swedish-optimised models	2026-03-02 17:10:15 +01:00
Kristofer Söderström	acb6947f87	Update README.md: Revise installation instructions and clarify platform-specific run commands	2026-03-02 17:04:59 +01:00
Kristofer Söderström	f8cf42733d	Revamp: embedded console, faster-whisper, simplified install	2026-03-02 17:02:16 +01:00
Kristofer Rolf Söderström	7d3fe1ba26	Merge pull request #11 from soderstromkr/copilot/update-whisper-device-parameter Pass explicit device parameter to whisper.load_model() for MPS acceleration	2026-01-22 14:03:13 +01:00
copilot-swe-agent[bot]	da42a6e4cc	Add .gitignore and remove __pycache__ files Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>	2026-01-22 13:00:38 +00:00
copilot-swe-agent[bot]	0dab0d9bea	Add explicit device parameter to whisper.load_model() Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>	2026-01-22 13:00:21 +00:00
copilot-swe-agent[bot]	953c71ab28	Initial plan	2026-01-22 12:57:09 +00:00
Kristofer Rolf Söderström	5522bdd575	Merge pull request #6 Merged pull request #6	2026-01-22 13:53:23 +01:00
Kristofer Rolf Söderström	861c470330	Merge pull request #10 from soderstromkr/copilot/add-readme-gpu-support Add GPU support documentation to README	2026-01-22 13:44:11 +01:00
copilot-swe-agent[bot]	6de6d4b2ff	Add GPU support section to README with CUDA PyTorch installation instructions Co-authored-by: soderstromkr <23003509+soderstromkr@users.noreply.github.com>	2026-01-22 12:42:09 +00:00
copilot-swe-agent[bot]	01552cc7cb	Initial plan	2026-01-22 12:40:19 +00:00
Yaroslav P	049a168c81	amd graphic card support	2025-03-05 16:23:10 +02:00
Kristofer Rolf Söderström	56a925463f	Update README.md	2024-05-17 08:51:16 +02:00
Kristofer Rolf Söderström	fe60b04020	Update README.md	2024-05-17 08:49:28 +02:00
Kristofer Rolf Söderström	ff06a257f2	Update README.md	2024-05-17 08:47:57 +02:00
Kristofer Rolf Söderström	5e31129ea2	Create requirements.txt	2024-05-17 08:44:39 +02:00
Kristofer Rolf Söderström	3f0bca02b7	Update README.md	2024-05-17 08:44:09 +02:00
Kristofer Rolf Söderström	488e78a5ae	Update README.md	2024-05-17 08:42:42 +02:00
Kristofer Rolf Söderström	829a054300	Update README.md	2024-05-17 08:40:42 +02:00
Kristofer Rolf Söderström	462aae12ca	Update README.md	2024-05-17 08:09:30 +02:00
Kristofer Rolf Söderström	fec9190ba1	Update README.md	2024-05-17 08:08:51 +02:00
Kristofer Rolf Söderström	0dde25204d	Update README.md removed other installation options from readme	2024-05-17 08:07:00 +02:00
Kristofer Söderström	b611aa6b8c	removed messagebox	2023-11-06 10:13:04 +01:00
Kristofer Söderström	7d50d5f4cf	QOL improvements	2023-11-06 09:57:44 +01:00
Kristofer Söderström	7799d03960	bug fixes	2023-11-06 09:31:53 +01:00