feat: update README to reflect Apple Silicon GPU support and new features in version 3.0

feat: add advanced transcription options for VAD, word-level timestamps, and translation
feat: implement multiprocessing for transcription with immediate cancellation
2026-04-11 14:16:07 +02:00 · 2026-04-11 14:06:04 +02:00 · 2026-04-05 22:11:13 +02:00 · 2026-04-04 00:32:36 +02:00 · 2026-03-20 20:19:46 +01:00 · 2026-03-03 08:35:03 +01:00
6 changed files with 721 additions and 145 deletions
@@ -1,64 +1,85 @@
-## Local Transcribe with Whisper 
+## Local Transcribe with Whisper
-> **⚠ Note for Mac users (Apple Silicon):** This version uses `faster-whisper` (CTranslate2), which does **not** support Apple M-chip GPU acceleration. Transcription will run on CPU, which is slower than OpenAI's Whisper with Metal/CoreML support. The trade-off is a much simpler installation — no conda, no PyTorch, no admin rights. If you'd prefer M-chip GPU acceleration and don't mind a more involved setup, switch to the **classic** release:
+> **🍎 Apple Silicon GPU/NPU acceleration:** This version now supports native Apple GPU/NPU acceleration via [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper). On Apple Silicon Macs, transcription runs on the Apple GPU and Neural Engine — no CPU fallback needed.
 > ```
 > git checkout classic
 > ```
-Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2). This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
+Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system, powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2) on Windows/Linux and [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.
 ## New in version 3.0!
 1. **Apple Silicon GPU/NPU support** — native MLX backend for Apple Silicon Macs, using Apple GPU + Neural Engine.
 2. **SRT subtitle export** — valid SubRip files alongside the existing TXT output, ready for HandBrake or any video player.
 3. **VAD filter** — removes silence, reduces hallucination, improves accuracy.
 4. **Word-level timestamps** — per-word SRT timing for precise subtitle burning.
 5. **Translation mode** — transcribe any language and translate to English in one step.
 6. **Stop button** — immediately cancel any transcription, including model downloads.
 7. **Language dropdown** — 99 languages with proper ISO codes, no more guessing formats.
 8. **Model descriptions** — speed, size, quality stars, and use case shown for every model.
 ## New in version 2.0!
-1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage.
+1. **Switched to faster-whisper** — up to 4× faster transcription with lower memory usage, simpler installation.
-2. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
+2. **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab)
-3. **No admin rights required** — a plain `pip install` covers everything.
+3. **No separate FFmpeg installation needed** — audio decoding is handled by the bundled PyAV library.
-4. **No PyTorch dependency** — dramatically smaller install footprint.
+4. **No admin rights required** — a plain `pip install` covers everything.
-5. **`tiny` model added** — smallest and fastest option for quick drafts.
+5. **No PyTorch dependency** — dramatically smaller install footprint.
 6. **Integrated console** - all info in the same application.
 7. **`tiny` model added** — smallest and fastest option.
 ## Features
-* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video. 
+* Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
-* Choose the language of the files you are transcribing. You can either select a specific language or let the application automatically detect the language.
+* Choose the language of the files you are transcribing from a dropdown of 99 supported languages, or let the application automatically detect the language.
 * Select the Whisper model to use for the transcription. Available models include "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v2", and "large-v3". Models with .en ending are better if you're transcribing English, especially the base and small models.
 * **Swedish-optimised models** — [KB-Whisper](https://huggingface.co/collections/KBLab/kb-whisper) from the National Library of Sweden (KBLab) is available in all sizes (tiny → large). These models reduce Word Error Rate by up to 47 % compared to OpenAI Whisper on Swedish speech. The language is set to Swedish automatically when a KB model is selected.
-* Enable the verbose mode to receive detailed information during the transcription process.
+* **VAD filter** — removes silence from audio before transcription, reducing hallucination and improving accuracy.
-* Monitor the progress of the transcription with the progress bar and terminal. 
+* **Word-level timestamps** — generates per-word timing in the SRT output for precise subtitle synchronization.
 * **Translation mode** — transcribes audio in any language and translates the result to English.
 * **SRT export** — valid SubRip subtitle files saved alongside TXT, ready for HandBrake or any video player.
 * Monitor the progress of the transcription with the progress bar and terminal.
 * Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
 * View the transcribed text in a message box once the transcription is completed.
 * **Stop button** — immediately cancel transcription, including model downloads.
 ## Installation
 ### Get the files
-Download the zip folder and extract it to your preferred working folder.  
+Download the zip folder and extract it to your preferred working folder.
-![](images/Picture1.png)  
+![](images/Picture1.png)
 Or by cloning the repository with:
 ```
-git clone https://github.com/soderstromkr/transcribe.git
+git clone https://gitea.kobim.cloud/kobim/whisper-local-transcribe.git
 ```
-### Python Version **(any platform including Mac users)**
+### Prerequisites
-1. Install Python 3.10 or later. You can download it from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
+Install **Python 3.10 or later**. Some IT policies allow installing from the Microsoft Store or Mac equivalent. However, I would prefer an install from [python.org](https://www.python.org/downloads/). During installation, **check "Add Python to PATH"**. No administrator rights are needed if you install for your user only.
-2. Run the installer. Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux) in the project folder and run:
+### Run on Windows
 Double-click `run_Windows.bat` — it will auto-install everything on first run.
 ### Run on Mac / Linux
 Run `./run_Mac.sh` — it will auto-install everything on first run. See [Mac instructions](Mac_instructions.md) for details.
 > **Note:** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.
 ### Manual installation (if the launchers don't work)
 If `run_Windows.bat` or `run_Mac.sh` fails (e.g. Python isn't on PATH, or permissions issues), open a terminal in the project folder and run these steps manually:
 ```
 python -m venv .venv
 ```
 Activate the virtual environment:
 - **Windows:** `.venv\Scripts\activate`
 - **Mac / Linux:** `source .venv/bin/activate`
 Then install and run:
 ```
 python install.py
 ```
 This will:
 - Install all required packages (including bundled FFmpeg — no separate install needed)
 - **Auto-detect your NVIDIA GPU** and ask if you want GPU acceleration
 - No conda, no admin rights required
 Alternatively, you can install manually with `pip install -r requirements.txt`.
 3. Run the app: 
    1. For **Windows**: double-click `run_Windows.bat` (it will auto-install on first run) or run:
 ```
 python app.py
 ```
    2. For **Mac / Linux**: run `./run_Mac.sh` (auto-installs on first run). See [Mac instructions](Mac_instructions.md) for details.
    **Note** The first run with a given model will download it (~75 MB for base, ~500 MB for medium). After that, everything works offline.
 ## GPU Support
 ### Apple Silicon
 On Macs with Apple Silicon, the app automatically uses the **MLX backend**, which runs inference on the Apple GPU and Neural Engine. No additional setup is needed — just install and run. MLX models are downloaded from HuggingFace on first use.
 ### NVIDIA GPUs
 This program **does support running on NVIDIA GPUs**, which can significantly speed up transcription times. faster-whisper uses CTranslate2, which requires NVIDIA CUDA libraries for GPU acceleration.
-### Automatic Detection
+#### Automatic Detection
 The `install.py` script **automatically detects NVIDIA GPUs** and will ask if you want to install GPU support. If you skipped it during installation, you can add it anytime:
 ```
 pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
@@ -66,7 +87,7 @@ pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
 **Note:** Make sure your NVIDIA GPU drivers are up to date. You can check by running `nvidia-smi` in your terminal. The program will automatically detect and use your GPU if available, otherwise it falls back to CPU.
-### Verifying GPU Support
+#### Verifying GPU Support
 After installation, you can verify that your GPU is available by running:
 ```python
 import ctranslate2
@@ -75,14 +96,16 @@ print(ctranslate2.get_supported_compute_types("cuda"))
 If this returns a list containing `"float16"`, GPU acceleration is working.
 ## Usage
-1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates.
+1. Launch the app — the built-in console panel at the bottom shows a welcome message and all progress updates. The backend indicator at the bottom shows which inference engine is active (MLX · Apple GPU/NPU, CUDA · GPU, or CPU · int8).
 2. Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
-3. Enter the desired language for the transcription in the "Language" field. You can either select a language or leave it blank to enable automatic language detection.
+3. Select the language from the dropdown — 99 languages are available, or leave it on "Auto-detect". For English-only models (.en) the language is locked to English; for KB Swedish models it's locked to Swedish.
-4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label.
+4. Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label. A description below shows speed, size, quality stars, and recommended use case for each model.
-5. Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
+5. Toggle advanced options if needed: **VAD filter**, **Word-level timestamps**, or **Translate to English**.
-6. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
+6. Click the "Transcribe" button to start the transcription. Use the "Stop" button to cancel at any time.
-7. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
+7. Monitor progress in the embedded console panel — it shows model loading, per-file progress, and segment timestamps in real time.
-8. You can run the application again or quit at any time by clicking the "Quit" button.
+8. Once the transcription is completed, a message box will appear displaying the result. Click "OK" to close it.
 9. Transcriptions are saved as both `.txt` (human-readable) and `.srt` (SubRip subtitles) in the `transcriptions/` folder within the selected directory.
 10. You can run the application again or quit at any time by clicking the "Quit" button.
 ## Jupyter Notebook
 Don't want fancy EXEs or GUIs? Use the function as is. See [example](example.ipynb) for an implementation on Jupyter Notebook.
@@ -4,7 +4,8 @@ import tkinter as tk
 from tkinter import ttk
 from tkinter import filedialog
 from tkinter import messagebox
-from src._LocalTranscribe import transcribe, get_path
+from src._LocalTranscribe import transcribe, get_path, detect_backend, _transcribe_worker_process
 import multiprocessing as mp
 import customtkinter
 import threading
@@ -46,11 +47,121 @@ HF_MODEL_MAP = {
    'KB Swedish (large)':  'KBLab/kb-whisper-large',
 }
 # Per-model info shown in the UI description label
 # (speed, size, quality stars, suggested use)
 MODEL_INFO = {
    'tiny':                 ('Very fast',   '~75 MB',   '★★☆☆☆', 'Quick drafts & testing'),
    'tiny.en':              ('Very fast',   '~75 MB',   '★★☆☆☆', 'Quick drafts & testing (English only)'),
    'base':                 ('Fast',        '~145 MB',  '★★★☆☆', 'Notes & short podcasts'),
    'base.en':              ('Fast',        '~145 MB',  '★★★☆☆', 'Notes & short podcasts (English only)'),
    'small':                ('Balanced',    '~485 MB',  '★★★★☆', 'Everyday use'),
    'small.en':             ('Balanced',    '~485 MB',  '★★★★☆', 'Everyday use (English only)'),
    'medium':               ('Accurate',    '~1.5 GB',  '★★★★☆', 'Professional content'),
    'medium.en':            ('Accurate',    '~1.5 GB',  '★★★★☆', 'Professional content (English only)'),
    'large-v2':             ('Slow',        '~3 GB',    '★★★★★', 'Maximum accuracy'),
    'large-v3':             ('Slow',        '~3 GB',    '★★★★★', 'Maximum accuracy (recommended)'),
    'KB Swedish (tiny)':    ('Very fast',   '~75 MB',   '★★★☆☆', 'Swedish — optimised by KBLab'),
    'KB Swedish (base)':    ('Fast',        '~145 MB',  '★★★☆☆', 'Swedish — optimised by KBLab'),
    'KB Swedish (small)':   ('Balanced',    '~485 MB',  '★★★★☆', 'Swedish — optimised by KBLab'),
    'KB Swedish (medium)':  ('Accurate',    '~1.5 GB',  '★★★★☆', 'Swedish — optimised by KBLab'),
    'KB Swedish (large)':   ('Slow',        '~3 GB',    '★★★★★', 'Swedish — KBLab, best accuracy'),
 }
 customtkinter.set_appearance_mode("System")
 customtkinter.set_default_color_theme("blue")  # Themes: blue (default), dark-blue, green
-firstclick = True
+
 # All languages supported by Whisper  (display label → ISO code; None = auto-detect)
 WHISPER_LANGUAGES = {
    'Auto-detect':          None,
    'Afrikaans (af)':       'af',   'Albanian (sq)':        'sq',
    'Amharic (am)':         'am',   'Arabic (ar)':          'ar',
    'Armenian (hy)':        'hy',   'Assamese (as)':        'as',
    'Azerbaijani (az)':     'az',   'Bashkir (ba)':         'ba',
    'Basque (eu)':          'eu',   'Belarusian (be)':      'be',
    'Bengali (bn)':         'bn',   'Bosnian (bs)':         'bs',
    'Breton (br)':          'br',   'Bulgarian (bg)':       'bg',
    'Catalan (ca)':         'ca',   'Chinese (zh)':         'zh',
    'Croatian (hr)':        'hr',   'Czech (cs)':           'cs',
    'Danish (da)':          'da',   'Dutch (nl)':           'nl',
    'English (en)':         'en',   'Estonian (et)':        'et',
    'Faroese (fo)':         'fo',   'Finnish (fi)':         'fi',
    'French (fr)':          'fr',   'Galician (gl)':        'gl',
    'Georgian (ka)':        'ka',   'German (de)':          'de',
    'Greek (el)':           'el',   'Gujarati (gu)':        'gu',
    'Haitian Creole (ht)':  'ht',   'Hausa (ha)':           'ha',
    'Hawaiian (haw)':       'haw',  'Hebrew (he)':          'he',
    'Hindi (hi)':           'hi',   'Hungarian (hu)':       'hu',
    'Icelandic (is)':       'is',   'Indonesian (id)':      'id',
    'Italian (it)':         'it',   'Japanese (ja)':        'ja',
    'Javanese (jw)':        'jw',   'Kannada (kn)':         'kn',
    'Kazakh (kk)':          'kk',   'Khmer (km)':           'km',
    'Korean (ko)':          'ko',   'Lao (lo)':             'lo',
    'Latin (la)':           'la',   'Latvian (lv)':         'lv',
    'Lingala (ln)':         'ln',   'Lithuanian (lt)':      'lt',
    'Luxembourgish (lb)':   'lb',   'Macedonian (mk)':      'mk',
    'Malagasy (mg)':        'mg',   'Malay (ms)':           'ms',
    'Malayalam (ml)':       'ml',   'Maltese (mt)':         'mt',
    'Maori (mi)':           'mi',   'Marathi (mr)':         'mr',
    'Mongolian (mn)':       'mn',   'Myanmar (my)':         'my',
    'Nepali (ne)':          'ne',   'Norwegian (no)':       'no',
    'Occitan (oc)':         'oc',   'Pashto (ps)':          'ps',
    'Persian (fa)':         'fa',   'Polish (pl)':          'pl',
    'Portuguese (pt)':      'pt',   'Punjabi (pa)':         'pa',
    'Romanian (ro)':        'ro',   'Russian (ru)':         'ru',
    'Sanskrit (sa)':        'sa',   'Serbian (sr)':         'sr',
    'Shona (sn)':           'sn',   'Sindhi (sd)':          'sd',
    'Sinhala (si)':         'si',   'Slovak (sk)':          'sk',
    'Slovenian (sl)':       'sl',   'Somali (so)':          'so',
    'Spanish (es)':         'es',   'Sundanese (su)':       'su',
    'Swahili (sw)':         'sw',   'Swedish (sv)':         'sv',
    'Tagalog (tl)':         'tl',   'Tajik (tg)':           'tg',
    'Tamil (ta)':           'ta',   'Tatar (tt)':           'tt',
    'Telugu (te)':          'te',   'Thai (th)':            'th',
    'Tibetan (bo)':         'bo',   'Turkish (tr)':         'tr',
    'Turkmen (tk)':         'tk',   'Ukrainian (uk)':       'uk',
    'Urdu (ur)':            'ur',   'Uzbek (uz)':           'uz',
    'Vietnamese (vi)':      'vi',   'Welsh (cy)':           'cy',
    'Yiddish (yi)':         'yi',   'Yoruba (yo)':          'yo',
 }
 def _language_options_for_model(model_name):
    """Return (values, default, state) for the language combobox given a model name."""
    if model_name.endswith('.en'):
        return ['English (en)'], 'English (en)', 'disabled'
    if model_name.startswith('KB Swedish'):
        return ['Swedish (sv)'], 'Swedish (sv)', 'disabled'
    return list(WHISPER_LANGUAGES.keys()), 'Auto-detect', 'readonly'
 def _set_app_icon(root):
    """Set app icon when supported, without crashing on unsupported platforms."""
    base_dir = os.path.dirname(os.path.abspath(__file__))
    icon_path = os.path.join(base_dir, "images", "icon.ico")
    if not os.path.exists(icon_path):
        return
    try:
        root.iconbitmap(icon_path)
    except tk.TclError:
        # Some Linux Tk builds don't accept .ico for iconbitmap.
        pass
 def _apply_display_scaling(root):
    """Auto-scale UI for high-resolution displays (e.g., 4K)."""
    try:
        screen_w = root.winfo_screenwidth()
        screen_h = root.winfo_screenheight()
        scale = min(screen_w / 1920.0, screen_h / 1080.0)
        scale = max(1.0, min(scale, 2.0))
        customtkinter.set_widget_scaling(scale)
        customtkinter.set_window_scaling(scale)
    except Exception:
        pass
 class App:
    def __init__(self, master):
@@ -66,22 +177,16 @@ class App:
        self.path_entry.insert(0, os.path.join(os.getcwd(), 'sample_audio'))
        self.path_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
        customtkinter.CTkButton(path_frame, text="Browse", command=self.browse, font=font).pack(side=tk.LEFT, padx=5)
-        # Language frame        
+        # Language frame
        #thanks to pommicket from Stackoverflow for this fix
        def on_entry_click(event):
            """function that gets called whenever entry is clicked"""        
            global firstclick
            if firstclick: # if this is the first time they clicked it
                firstclick = False
                self.language_entry.delete(0, "end") # delete all the text in the entry
        language_frame = customtkinter.CTkFrame(master)
        language_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        customtkinter.CTkLabel(language_frame, text="Language:", font=font).pack(side=tk.LEFT, padx=5)
-        self.language_entry = customtkinter.CTkEntry(language_frame, width=50, font=('Roboto', 12, 'italic'))
+        _lang_values, _lang_default, _lang_state = _language_options_for_model('medium')
-        self.default_language_text = "Enter language (or ignore to auto-detect)"
+        self.language_combobox = customtkinter.CTkComboBox(
-        self.language_entry.insert(0, self.default_language_text)
+            language_frame, width=50, state=_lang_state,
-        self.language_entry.bind('<FocusIn>', on_entry_click)
+            values=_lang_values, font=font_b)
-        self.language_entry.pack(side=tk.LEFT, fill=tk.X, expand=True)
+        self.language_combobox.set(_lang_default)
        self.language_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
        # Model frame
        models = ['tiny', 'tiny.en', 'base', 'base.en',
                  'small', 'small.en', 'medium', 'medium.en',
@@ -96,16 +201,54 @@ class App:
        # ComboBox frame
        self.model_combobox = customtkinter.CTkComboBox(
            model_frame, width=50, state="readonly",
-            values=models, font=font_b)
+            values=models, font=font_b,
            command=self._on_model_change)
        self.model_combobox.set('medium')  # Set the default value
        self.model_combobox.pack(side=tk.LEFT, fill=tk.X, expand=True)
        # Model description label
        self.model_desc_label = customtkinter.CTkLabel(
            master, text=self._model_desc_text('medium'),
            font=('Roboto', 11), text_color=('#555555', '#aaaaaa'),
            anchor='w')
        self.model_desc_label.pack(fill=tk.X, padx=14, pady=(0, 4))
        # Timestamps toggle
        ts_frame = customtkinter.CTkFrame(master)
        ts_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        self.timestamps_var = tk.BooleanVar(value=True)
        self.timestamps_switch = customtkinter.CTkSwitch(
            ts_frame, text="Include timestamps in transcription",
            variable=self.timestamps_var, font=font_b)
        self.timestamps_switch.pack(side=tk.LEFT, padx=5)
        # Advanced options frame
        adv_frame = customtkinter.CTkFrame(master)
        adv_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        self.vad_var = tk.BooleanVar(value=False)
        customtkinter.CTkSwitch(
            adv_frame, text="VAD filter (remove silence)",
            variable=self.vad_var, font=font_b).pack(side=tk.LEFT, padx=5)
        self.word_ts_var = tk.BooleanVar(value=False)
        customtkinter.CTkSwitch(
            adv_frame, text="Word-level timestamps",
            variable=self.word_ts_var, font=font_b).pack(side=tk.LEFT, padx=5)
        self.translate_var = tk.BooleanVar(value=False)
        customtkinter.CTkSwitch(
            adv_frame, text="Translate to English",
            variable=self.translate_var, font=font_b).pack(side=tk.LEFT, padx=5)
        # Progress Bar
        self.progress_bar = ttk.Progressbar(master, length=200, mode='indeterminate')
        # Worker process handle (replaces thread+stop_event for true immediate cancellation)
        self._proc = None
        self._parent_conn = None
        self._child_conn = None
        # Button actions frame
        button_frame = customtkinter.CTkFrame(master)
        button_frame.pack(fill=tk.BOTH, padx=10, pady=10)
        self.transcribe_button = customtkinter.CTkButton(button_frame, text="Transcribe", command=self.start_transcription, font=font)
        self.transcribe_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
        self.stop_button = customtkinter.CTkButton(
            button_frame, text="Stop", command=self._stop_transcription, font=font,
            fg_color="#c0392b", hover_color="#922b21", state=tk.DISABLED)
        self.stop_button.pack(side=tk.LEFT, padx=5, pady=10, fill=tk.X, expand=True)
        customtkinter.CTkButton(button_frame, text="Quit", command=master.quit, font=font).pack(side=tk.RIGHT, padx=5, pady=10, fill=tk.X, expand=True)
        # ── Embedded console / log panel ──────────────────────────────────
@@ -120,11 +263,59 @@ class App:
        sys.stdout = _ConsoleRedirector(self.log_box)
        sys.stderr = _ConsoleRedirector(self.log_box)
        # Backend indicator
        _bi = detect_backend()
        backend_label = customtkinter.CTkLabel(
            master,
            text=f"Backend: {_bi['label']}",
            font=('Roboto', 11),
            text_color=("#555555", "#aaaaaa"),
            anchor='e',
        )
        backend_label.pack(fill=tk.X, padx=12, pady=(0, 2))
        # Welcome message (shown after redirect so it appears in the panel)
        print("Welcome to Local Transcribe with Whisper! \U0001f600")
        print("Transcriptions will be saved automatically.")
        print("─" * 46)
    # Helper functions
    def _stop_transcription(self):
        self.stop_button.configure(state=tk.DISABLED)
        if self._proc and self._proc.is_alive():
            self._proc.terminate()
            try:
                self._proc.join(timeout=3)
            except Exception:
                pass
            if self._proc.is_alive():
                self._proc.kill()
                try:
                    self._proc.join(timeout=1)
                except Exception:
                    pass
        # Close pipe ends — no semaphores, so no leak
        for conn in (self._parent_conn, self._child_conn):
            try:
                if conn:
                    conn.close()
            except Exception:
                pass
        self._parent_conn = self._child_conn = None
        print("⛔  Transcription stopped by user.")
    def _model_desc_text(self, model_name):
        info = MODEL_INFO.get(model_name)
        if not info:
            return ''
        speed, size, stars, use = info
        return f'{stars}  {speed}  ·  {size}  ·  {use}'
    def _on_model_change(self, selected):
        self.model_desc_label.configure(text=self._model_desc_text(selected))
        values, default, state = _language_options_for_model(selected)
        self.language_combobox.configure(values=values, state=state)
        self.language_combobox.set(default)
    # Browsing
    def browse(self):
        initial_dir = os.getcwd()
@@ -133,64 +324,83 @@ class App:
        self.path_entry.insert(0, folder_path)
    # Start transcription
    def start_transcription(self):
        # Disable transcribe button
        self.transcribe_button.configure(state=tk.DISABLED)
        # Start a new thread for the transcription process
        threading.Thread(target=self.transcribe_thread).start()
    # Threading
    def transcribe_thread(self):
        path = self.path_entry.get()
        model_display = self.model_combobox.get()
        # Ignore the visual separator
        if model_display.startswith('─'):
            messagebox.showinfo("Invalid selection", "Please select a model, not the separator line.")
            self.transcribe_button.configure(state=tk.NORMAL)
            return
        self.transcribe_button.configure(state=tk.DISABLED)
        self.stop_button.configure(state=tk.NORMAL)
        path = self.path_entry.get()
        model = HF_MODEL_MAP.get(model_display, model_display)
-        language = self.language_entry.get()
+        lang_label = self.language_combobox.get()
-        # Auto-set Swedish for KB models
+        language = WHISPER_LANGUAGES.get(lang_label, lang_label) if lang_label else None
-        is_kb_model = model_display.startswith('KB Swedish')
+        timestamps = self.timestamps_var.get()
-        # Check if the language field has the default text or is empty
+        vad_filter = self.vad_var.get()
-        if is_kb_model:
+        word_timestamps = self.word_ts_var.get()
-            language = 'sv'
+        translate = self.translate_var.get()
-        elif language == self.default_language_text or not language.strip():
+        glob_file = get_path(path)
            language = None  # This is the same as passing nothing
        verbose = True   # always show transcription progress in the console panel
        # Show progress bar
        self.progress_bar.pack(fill=tk.X, padx=5, pady=5)
        self.progress_bar.start()
-        # Setting path and files
+        self._parent_conn, self._child_conn = mp.Pipe(duplex=False)
-        glob_file = get_path(path)
+        self._proc = mp.Process(
-        #messagebox.showinfo("Message", "Starting transcription!")
+            target=_transcribe_worker_process,
-        # Start transcription
+            args=(self._child_conn, path, glob_file, model, language, True, timestamps),
            kwargs={"vad_filter": vad_filter, "word_timestamps": word_timestamps, "translate": translate},
            daemon=True,
        )
        self._proc.start()
        self._child_conn.close()  # parent doesn't write; close its write-end
        self._child_conn = None
        self.master.after(100, self._poll_worker)
    def _poll_worker(self):
        done = False
        result = None
        try:
-            output_text = transcribe(path, glob_file, model, language, verbose)
+            while self._parent_conn and self._parent_conn.poll():
-        except UnboundLocalError:
+                msg = self._parent_conn.recv()
-            messagebox.showinfo("Files not found error!", 'Nothing found, choose another folder.')
+                if isinstance(msg, tuple) and msg[0] == '__done__':
                    done = True
                    result = msg[1]
                else:
                    sys.stdout.write(msg)
                    sys.stdout.flush()
        except EOFError:
            # Child closed the pipe (normal completion or kill)
            done = True
        except Exception:
            pass
-        except ValueError:
+        if done or (self._proc and not self._proc.is_alive()):
-            messagebox.showinfo("Invalid language name, you might have to clear the default text to continue!")
+            if self._parent_conn:
-        # Hide progress bar
+                try:
                    self._parent_conn.close()
                except Exception:
                    pass
                self._parent_conn = None
            self._on_transcription_done(result)
        else:
            self.master.after(100, self._poll_worker)
    def _on_transcription_done(self, output_text):
        self.progress_bar.stop()
        self.progress_bar.pack_forget()
-        # Enable transcribe button
+        self.stop_button.configure(state=tk.DISABLED)
        self.transcribe_button.configure(state=tk.NORMAL)
-        # Recover output text
+        if output_text:
-        try:
+            title = "Finished!" if not output_text.startswith('⚠') else "Error"
-            messagebox.showinfo("Finished!", output_text)
+            messagebox.showinfo(title, output_text)
        except UnboundLocalError:
            pass
 if __name__ == "__main__":
    # Setting custom themes
    root = customtkinter.CTk()
    _apply_display_scaling(root)
    root.title("Local Transcribe with Whisper")
    # Geometry — taller to accommodate the embedded console panel
    width, height = 550, 560
    root.geometry('{}x{}'.format(width, height))
    root.minsize(450, 480)
-    # Icon 
+    # Icon (best-effort; ignored on platforms/builds without .ico support)
-    root.iconbitmap('images/icon.ico')
+    _set_app_icon(root)
    # Run
    app = App(root)
    root.mainloop()
@@ -1,2 +1,3 @@
 faster-whisper
 mlx-whisper
 customtkinter
@@ -1,2 +1,4 @@
 Armstrong_Small_Step
-[0:00:00 --> 0:00:07]: That's one small step for man, one giant leap for mankind.
+────────────────────────────────────────
 That's one small step for man, one giant leap for mankind.
@@ -1,2 +1,4 @@
 Axel_Pettersson_röstinspelning
-[0:00:00 --> 0:00:15]: Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
+────────────────────────────────────────
 Hej, jag heter Axel Pettersson, jag föddes i Örebro 1976. Jag har varit Wikipedia sen 2008 och jag har översatt röstintroduktionsprojektet till svenska.
@@ -1,65 +1,210 @@
 import os
 import sys
 import platform
 import datetime
 import time
 import site
 from glob import glob
 # ---------------------------------------------------------------------------
 # CUDA setup — must happen before importing faster_whisper / ctranslate2
 # ---------------------------------------------------------------------------
-def _setup_cuda_dlls():
+def _setup_cuda_libs():
-    """Add NVIDIA pip-package DLL dirs to the DLL search path (Windows only).
+    """Register NVIDIA pip-package lib dirs so ctranslate2 finds CUDA at runtime.
-    pip-installed nvidia-cublas-cu12 / nvidia-cudnn-cu12 place their .dll
+    pip-installed nvidia-cublas-cu12 / nvidia-cudnn-cu12 place their shared
-    files inside the site-packages tree.  Python 3.8+ on Windows does NOT
+    libraries inside the site-packages tree.  Neither Windows nor Linux
-    search PATH for DLLs loaded via ctypes/LoadLibrary, so we must
+    automatically search those directories, so we must register them
-    explicitly register every nvidia/*/bin and nvidia/*/lib directory using
+    explicitly:
-    os.add_dll_directory *and* prepend them to PATH (some native extensions
+      - Windows: os.add_dll_directory() + PATH
-    still rely on PATH).
+      - Linux:   LD_LIBRARY_PATH  (read by the dynamic linker)
    """
    if sys.platform != "win32":
        return
    try:
-        for sp in site.getsitepackages():
+        sp_dirs = site.getsitepackages()
-            nvidia_root = os.path.join(sp, "nvidia")
+    except AttributeError:
-            if not os.path.isdir(nvidia_root):
+        # virtualenv without site-packages helper
-                continue
+        sp_dirs = [os.path.join(sys.prefix, "lib",
-            for pkg in os.listdir(nvidia_root):
+                                "python" + ".".join(map(str, sys.version_info[:2])),
-                for sub in ("bin", "lib"):
+                                "site-packages")]
                    d = os.path.join(nvidia_root, pkg, sub)
                    if os.path.isdir(d):
                        os.environ["PATH"] = d + os.pathsep + os.environ.get("PATH", "")
                        try:
                            os.add_dll_directory(d)
                        except (OSError, AttributeError):
                            pass
    except Exception:
        pass
-_setup_cuda_dlls()
+    for sp in sp_dirs:
        nvidia_root = os.path.join(sp, "nvidia")
        if not os.path.isdir(nvidia_root):
            continue
        for pkg in os.listdir(nvidia_root):
            for sub in ("bin", "lib"):
                d = os.path.join(nvidia_root, pkg, sub)
                if not os.path.isdir(d):
                    continue
                if sys.platform == "win32":
                    os.environ["PATH"] = d + os.pathsep + os.environ.get("PATH", "")
                    try:
                        os.add_dll_directory(d)
                    except (OSError, AttributeError):
                        pass
                else:
                    # Linux / macOS — prepend to LD_LIBRARY_PATH
                    ld = os.environ.get("LD_LIBRARY_PATH", "")
                    if d not in ld:
                        os.environ["LD_LIBRARY_PATH"] = d + (":" + ld if ld else "")
                        # Also load via ctypes so already-started process sees it
                        import ctypes
                        try:
                            for so in sorted(os.listdir(d)):
                                if so.endswith(".so") or ".so." in so:
                                    ctypes.cdll.LoadLibrary(os.path.join(d, so))
                        except OSError:
                            pass
 _setup_cuda_libs()
 from faster_whisper import WhisperModel
-def _detect_device():
+SUPPORTED_EXTENSIONS = {
-    """Return (device, compute_type) for the best available backend."""
+    ".wav", ".mp3", ".m4a", ".flac", ".ogg", ".wma", ".aac",
    ".mp4", ".mkv", ".mov", ".webm", ".avi", ".mpeg", ".mpg",
 }
 # ---------------------------------------------------------------------------
 # MLX model map  (Apple Silicon only)
 # ---------------------------------------------------------------------------
 _MLX_MODEL_MAP = {
    "tiny":     "mlx-community/whisper-tiny-mlx",
    "base":     "mlx-community/whisper-base-mlx",
    "small":    "mlx-community/whisper-small-mlx",
    "medium":   "mlx-community/whisper-medium-mlx",
    "large-v2": "mlx-community/whisper-large-v2-mlx",
    "large-v3": "mlx-community/whisper-large-v3-mlx",
 }
 def detect_backend():
    """Return the best available inference backend.
    Returns a dict with keys:
        backend      : "mlx" | "cuda" | "cpu"
        device       : device string for WhisperModel (cuda / cpu)
        compute_type : compute type string for WhisperModel
        label        : human-readable label for UI display
    """
    # Apple Silicon → try MLX (GPU + Neural Engine via Apple MLX)
    if sys.platform == "darwin" and platform.machine() == "arm64":
        try:
            import mlx_whisper  # noqa: F401
            return {
                "backend": "mlx",
                "device": "cpu",
                "compute_type": "int8",
                "label": "MLX · Apple GPU/NPU",
            }
        except ImportError:
            pass
    # NVIDIA CUDA
    try:
        import ctranslate2
        cuda_types = ctranslate2.get_supported_compute_types("cuda")
        if "float16" in cuda_types:
-            return "cuda", "float16"
+            return {
                "backend": "cuda",
                "device": "cuda",
                "compute_type": "float16",
                "label": "CUDA · GPU",
            }
    except Exception:
        pass
-    return "cpu", "int8"
+
    return {
        "backend": "cpu",
        "device": "cpu",
        "compute_type": "int8",
        "label": "CPU · int8",
    }
 def _decode_audio_pyav(file_path):
    """Decode any audio/video file to a float32 mono 16 kHz numpy array.
    Uses PyAV (bundled FFmpeg) — no external ffmpeg binary required.
    Returns (audio_array, duration_seconds).
    """
    import av
    import numpy as np
    with av.open(file_path) as container:
        duration = float(container.duration) / 1_000_000  # microseconds → seconds
        stream = container.streams.audio[0]
        resampler = av.AudioResampler(format="fltp", layout="mono", rate=16000)
        chunks = []
        for frame in container.decode(stream):
            for out in resampler.resample(frame):
                if out:
                    chunks.append(out.to_ndarray()[0])
        # Flush resampler
        for out in resampler.resample(None):
            if out:
                chunks.append(out.to_ndarray()[0])
    if not chunks:
        return np.zeros(0, dtype=np.float32), duration
    return np.concatenate(chunks, axis=0), duration
 def _transcribe_mlx_file(file, mlx_model_id, language, timestamps, verbose, vad_filter=False, word_timestamps=False, translate=False):
    """Transcribe a single file with mlx-whisper (Apple GPU/NPU).
    Decodes audio via PyAV (no system ffmpeg needed), then runs MLX inference.
    Returns (segments_as_dicts, audio_duration_seconds).
    Segments have dict keys: 'start', 'end', 'text'.
    """
    import mlx_whisper
    audio_array, duration = _decode_audio_pyav(file)
    decode_opts = {}
    if language:
        decode_opts["language"] = language
    if translate:
        decode_opts["task"] = "translate"
    if word_timestamps:
        decode_opts["word_timestamps"] = True
    result = mlx_whisper.transcribe(
        audio_array,
        path_or_hf_repo=mlx_model_id,
        verbose=(True if verbose else None),
        **decode_opts,
    )
    segments = result["segments"]
    audio_duration = segments[-1]["end"] if segments else duration
    return segments, audio_duration
 def _srt_timestamp(seconds):
    """Convert seconds (float) to SRT timestamp format HH:MM:SS,mmm."""
    ms = round(seconds * 1000)
    h, ms = divmod(ms, 3_600_000)
    m, ms = divmod(ms, 60_000)
    s, ms = divmod(ms, 1000)
    return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
 # Get the path
 def get_path(path):
-    glob_file = glob(path + '/*')
+    all_items = glob(path + '/*')
-    return glob_file
+    media_files = []
    for item in all_items:
        if not os.path.isfile(item):
            continue
        _, ext = os.path.splitext(item)
        if ext.lower() in SUPPORTED_EXTENSIONS:
            media_files.append(item)
    return sorted(media_files)
 # Main function
-def transcribe(path, glob_file, model=None, language=None, verbose=False):
+def transcribe(path, glob_file, model=None, language=None, verbose=False, timestamps=True, stop_event=None, vad_filter=False, word_timestamps=False, translate=False):
    """
    Transcribes audio files in a specified folder using faster-whisper (CTranslate2).
@@ -90,53 +235,204 @@ def transcribe(path, glob_file, model=None, language=None, verbose=False):
    SEP = "─" * 46
    # ── Step 1: Detect hardware ──────────────────────────────────────
-    device, compute_type = _detect_device()
+    backend_info = detect_backend()
-    print(f"⚙  Device: {device}  |  Compute: {compute_type}")
+    backend   = backend_info["backend"]
    device    = backend_info["device"]
    compute_type = backend_info["compute_type"]
    print(f"⚙  Backend: {backend_info['label']}")
-    # ── Step 2: Load model ───────────────────────────────────────────
+    # ── Step 1b: MLX path (Apple GPU/NPU) ───────────────────────────
    if backend == "mlx":
        mlx_model_id = _MLX_MODEL_MAP.get(model)
        if mlx_model_id is None:
            print(f"⚠  Model '{model}' is not available in MLX format.")
            print("   Falling back to faster-whisper on CPU (int8).")
            backend = "cpu"
            device, compute_type = "cpu", "int8"
        else:
            # ── Step 2 (MLX): load + transcribe ─────────────────────
            print(f"⏳ Loading MLX model '{model}' — downloading if needed...")
            print("✅ Model ready!")
            print(SEP)
            total_files = len(glob_file)
            print(f"📂 Found {total_files} supported media file(s) in folder")
            print(SEP)
            if total_files == 0:
                output_text = '⚠  No supported media files found — try another folder.'
                print(output_text)
                print(SEP)
                return output_text
            files_transcripted = []
            file_num = 0
            for file in glob_file:
                if stop_event and stop_event.is_set():
                    print("⛔  Transcription stopped by user.")
                    break
                title = os.path.basename(file).split('.')[0]
                file_num += 1
                print(f"\n{'─' * 46}")
                print(f"📄 File {file_num}/{total_files}: {title}")
                try:
                    t_start = time.time()
                    segments, audio_duration = _transcribe_mlx_file(
                        file, mlx_model_id, language, timestamps, verbose,
                        vad_filter=vad_filter, word_timestamps=word_timestamps,
                        translate=translate
                    )
                    os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
                    segment_list = []
                    txt_path = "{}/transcriptions/{}.txt".format(path, title)
                    srt_path = "{}/transcriptions/{}.srt".format(path, title)
                    with open(txt_path, 'w', encoding='utf-8') as f, \
                         open(srt_path, 'w', encoding='utf-8') as srt_f:
                        f.write(title)
                        f.write('\n' + '─' * 40 + '\n')
                        for idx, seg in enumerate(segments, start=1):
                            if stop_event and stop_event.is_set():
                                break
                            text = seg["text"].strip()
                            if timestamps:
                                start_ts = str(datetime.timedelta(seconds=seg["start"]))
                                end_ts   = str(datetime.timedelta(seconds=seg["end"]))
                                f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
                            else:
                                f.write('\n{}'.format(text))
                            srt_f.write(f'{idx}\n{_srt_timestamp(seg["start"])} --> {_srt_timestamp(seg["end"])}\n{text}\n\n')
                            f.flush()
                            srt_f.flush()
                            if verbose:
                                print("   [%.2fs → %.2fs] %s" % (seg["start"], seg["end"], seg["text"]))
                            else:
                                print("   Transcribed up to %.0fs..." % seg["end"], end='\r')
                            segment_list.append(seg)
                    elapsed = time.time() - t_start
                    elapsed_min = elapsed / 60.0
                    audio_min   = audio_duration / 60.0
                    ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
                    print(f"✅ Done — saved to transcriptions/{title}.txt")
                    print(f"⏱  Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min  ({ratio:.1f}x realtime)")
                    files_transcripted.append(segment_list)
                except Exception as exc:
                    print(f"⚠  Could not decode '{os.path.basename(file)}', skipping.")
                    print(f"   Reason: {exc}")
            print(f"\n{SEP}")
            if files_transcripted:
                output_text = f"✅ Finished! {len(files_transcripted)} file(s) transcribed.\n   Saved in: {path}/transcriptions"
            else:
                output_text = '⚠  No files eligible for transcription — try another folder.'
            print(output_text)
            print(SEP)
            return output_text
    # ── Step 2: Load model (faster-whisper / CTranslate2) ───────────
    print(f"⏳ Loading model '{model}' — downloading if needed...")
-    whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
+    try:
        whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
    except Exception as exc:
        err = str(exc).lower()
        cuda_runtime_missing = (
            device == "cuda"
            and (
                "libcublas" in err
                or "libcudnn" in err
                or "cuda" in err
                or "cannot be loaded" in err
                or "not found" in err
            )
        )
        if not cuda_runtime_missing:
            raise
        print("⚠  CUDA runtime not available; falling back to CPU (int8).")
        print(f"   Reason: {exc}")
        device, compute_type = "cpu", "int8"
        whisper_model = WhisperModel(model, device=device, compute_type=compute_type)
    print("✅ Model ready!")
    print(SEP)
    # ── Step 3: Transcribe files ─────────────────────────────────────
    total_files = len(glob_file)
-    print(f"📂 Found {total_files} item(s) in folder")
+    print(f"📂 Found {total_files} supported media file(s) in folder")
    print(SEP)
    if total_files == 0:
        output_text = '⚠  No supported media files found — try another folder.'
        print(output_text)
        print(SEP)
        return output_text
    files_transcripted = []
    file_num = 0
    for file in glob_file:
        if stop_event and stop_event.is_set():
            print("⛔  Transcription stopped by user.")
            break
        title = os.path.basename(file).split('.')[0]
        file_num += 1
        print(f"\n{'─' * 46}")
        print(f"📄 File {file_num}/{total_files}: {title}")
        try:
            t_start = time.time()
            segments, info = whisper_model.transcribe(
                file,
                language=language,
-                beam_size=5
+                beam_size=5,
                task="translate" if translate else "transcribe",
                vad_filter=vad_filter,
                word_timestamps=word_timestamps,
            )
            audio_duration = info.duration  # seconds
            # Make folder if missing
            os.makedirs('{}/transcriptions'.format(path), exist_ok=True)
            # Stream segments as they are decoded
            segment_list = []
-            with open("{}/transcriptions/{}.txt".format(path, title), 'w', encoding='utf-8') as f:
+            txt_path = "{}/transcriptions/{}.txt".format(path, title)
            srt_path = "{}/transcriptions/{}.srt".format(path, title)
            with open(txt_path, 'w', encoding='utf-8') as f, \
                 open(srt_path, 'w', encoding='utf-8') as srt_f:
                f.write(title)
-                for seg in segments:
+                f.write('\n' + '─' * 40 + '\n')
-                    start_ts = str(datetime.timedelta(seconds=seg.start))
+                for idx, seg in enumerate(segments, start=1):
-                    end_ts = str(datetime.timedelta(seconds=seg.end))
+                    if stop_event and stop_event.is_set():
-                    f.write('\n[{} --> {}]:{}'.format(start_ts, end_ts, seg.text))
+                        break
                    text = seg.text.strip()
                    if timestamps:
                        start_ts = str(datetime.timedelta(seconds=seg.start))
                        end_ts = str(datetime.timedelta(seconds=seg.end))
                        f.write('\n[{} --> {}] {}'.format(start_ts, end_ts, text))
                    else:
                        f.write('\n{}'.format(text))
                    # Use word-level timestamps for SRT if available
                    if word_timestamps and hasattr(seg, 'words') and seg.words:
                        for w_idx, word in enumerate(seg.words, start=1):
                            w_text = word.word.strip()
                            if not w_text:
                                continue
                            w_start = _srt_timestamp(word.start)
                            w_end = _srt_timestamp(word.end)
                            srt_f.write(f'{idx}.{w_idx}\n{w_start} --> {w_end}\n{w_text}\n\n')
                    else:
                        srt_f.write(f'{idx}\n{_srt_timestamp(seg.start)} --> {_srt_timestamp(seg.end)}\n{text}\n\n')
                    f.flush()
                    srt_f.flush()
                    if verbose:
                        print("   [%.2fs → %.2fs] %s" % (seg.start, seg.end, seg.text))
                    else:
                        print("   Transcribed up to %.0fs..." % seg.end, end='\r')
                    segment_list.append(seg)
            elapsed = time.time() - t_start
            elapsed_min = elapsed / 60.0
            audio_min = audio_duration / 60.0
            ratio = audio_duration / elapsed if elapsed > 0 else float('inf')
            print(f"✅ Done — saved to transcriptions/{title}.txt")
            print(f"⏱  Transcribed {audio_min:.1f} min of audio in {elapsed_min:.1f} min  ({ratio:.1f}x realtime)")
            files_transcripted.append(segment_list)
-        except Exception:
+        except Exception as exc:
-            print('⚠  Not a valid audio/video file, skipping.')
+            print(f"⚠  Could not decode '{os.path.basename(file)}', skipping.")
            print(f"   Reason: {exc}")
    # ── Summary ──────────────────────────────────────────────────────
    print(f"\n{SEP}")
@@ -147,3 +443,45 @@ def transcribe(path, glob_file, model=None, language=None, verbose=False):
    print(output_text)
    print(SEP)
    return output_text
 def _transcribe_worker_process(conn, path, glob_file, model, language, verbose, timestamps, vad_filter=False, word_timestamps=False, translate=False):
    """Child-process entry point for the UI's multiprocessing backend.
    Redirects stdout/stderr → pipe connection so the main process can display
    output in the console panel.  The main process sends SIGTERM/SIGKILL to
    stop this process immediately, including any in-progress download or inference.
    """
    import sys
    class _PipeWriter:
        def __init__(self, c):
            self.c = c
        def write(self, text):
            if text:
                try:
                    self.c.send(text)
                except Exception:
                    pass
        def flush(self):
            pass
    writer = _PipeWriter(conn)
    sys.stdout = writer
    sys.stderr = writer
    result = '⚠  No output produced.'
    try:
        result = transcribe(path, glob_file, model, language, verbose, timestamps,
                           vad_filter=vad_filter, word_timestamps=word_timestamps,
                           translate=translate)
    except Exception as exc:
        result = f'⚠  Unexpected error: {exc}'
    finally:
        try:
            conn.send(('__done__', result))
        except Exception:
            pass
        conn.close()
Author	SHA1	Message	Date
kobim	e2e19940dd	feat: update README to reflect Apple Silicon GPU support and new features in version 3.0	2026-04-11 14:16:07 +02:00
kobim	0293a13177	feat: add advanced transcription options for VAD, word-level timestamps, and translation	2026-04-11 14:06:04 +02:00
kobim	8d5c8d6097	feat: implement multiprocessing for transcription with immediate cancellation	2026-04-05 22:11:13 +02:00
kobim	e29572420e	feat: enhance transcription capabilities with MLX support and backend detection	2026-04-04 00:32:36 +02:00
Kristofer Söderström	f7d621e510	Add timestamps toggle and update transcription format to include/exclude timestamps	2026-03-20 20:19:46 +01:00
Kristofer Rolf Söderström	2a1df6aeba	Update Python installation instructions in README Clarified installation instructions for Python 3.10 or later, specifying preferred installation method.	2026-03-03 08:35:03 +01:00
soderstromkr	58255c3d10	fix: Linux/Ubuntu support — icon fallback, HiDPI scaling, CUDA lib paths, per-file timing - app.py: graceful icon loading (no crash on Linux Tk without .ico support) - app.py: auto-detect display scaling for 4K/HiDPI screens - _LocalTranscribe.py: register NVIDIA pip-package .so paths on Linux (LD_LIBRARY_PATH) so faster-whisper finds libcublas/libcudnn at runtime - _LocalTranscribe.py: auto-fallback to CPU if CUDA runtime libs missing - _LocalTranscribe.py: filter input to supported media extensions only - _LocalTranscribe.py: show real decode errors instead of generic skip message - _LocalTranscribe.py: per-file timer showing wall-clock vs audio duration	2026-03-02 21:49:32 +01:00
Kristofer Söderström	ea43074852	Update README.md: Add manual installation instructions for troubleshooting launcher issues	2026-03-02 17:17:35 +01:00
Kristofer Söderström	7b81778d9e	Update README.md: Simplify installation instructions and clarify auto-installation process	2026-03-02 17:16:09 +01:00
Kristofer Söderström	e65462f57b	Update README.md: Add link to classic release in Mac user note	2026-03-02 17:13:05 +01:00
Kristofer Söderström	09e3e43c51	Update README.md: Reorder features for clarity and emphasize integrated console	2026-03-02 17:11:13 +01:00
Kristofer Söderström	d4c26f6c37	Update README.md: Rearrange new features for clarity and highlight Swedish-optimised models	2026-03-02 17:10:15 +01:00
Kristofer Söderström	acb6947f87	Update README.md: Revise installation instructions and clarify platform-specific run commands	2026-03-02 17:04:59 +01:00