Skip to content

joshjama/audiobook-creator

Repository files navigation

Audiobook Creator - Readme

What is Audiobook Creator?

  • This project creates audiobooks from plain text files.

  • It uses the Python TTS and xTTS models to synthesize natural human-sounding audio.

  • Powered by TTS and XTTS

Requirements :

Main-Requirements :
  • python3.10-pip

  • python3-virtualenv

  • Additional :

    • A working cuda setup together with a Nvidea-GPU of at least 8 GB of vram.

Translation-Requirements :
  • For the translation-feature you will need a working ollama-server up and running.

    • You can adjust the url of your ollama-server inside the CreateAudiobook.py.

    • $ OLLAMA_URL = 'http://localhost:11434' # For your leptops GPU.

  • A working cuda setup together with a Nvidea-GPU of at least 8 GB of vram.

  • Please note that translation requires at least 16 GB of vram in order to run both, the tts and the large language model for translation (gemma2) at thesame time. If you do not fit this requirement please turn off the gpu-usage on the web-client in order to ensure the creation will succeed.

Installation :

Installation :
  • Just run the following to install the Python modules

  • $ ./install_requirements.sh

  • Please note that you have to run the create_audiobook.sh ones after installation in order to be able to axcept the license.

Additional ollama installation :
  • For translation and additional instructions to do suffisticated things with your texts, a large language model is required.

  • I am useing ollama with google - gemma2 to perform this additional work.

  • Get ollama with the following command :

  • $ curl -fsSL https://ollama.com/install.sh | sh 2>&1

  • then pull the gemma2 modell from ollama-library :

  • $ ollama pull gemma2

Usage :

Usage :

Usage :
  • Just run the following to create an audiobook containing several audio - wav files.

  • $ ./create_audiobook.sh <PATH_TO_YOUR_TEXT> <AUDIOBOOK_NAME>

  • It will create a folder with your AUDIO_BOOK_NAME, create audio files from your text, and place them into this folder.

  • Please note that you will need a CUDA-capable GPU in order to run the voices as they work with deep neural networks in order to provide high-quality TTS.

Supported Languages :

Currently we support :
  • German

  • English

  • We will add more languages in later releases.

  • We need to download a spacy model for each language in order to split the book to sentences correctly.

Create Speaker - Samples :

  • You can now select the speaker for your Audio-Book within a dropdown on the top of the Web_Server.

  • You can create a sample wav-file for each supported speaker as follows.

  • $ source audiobook_venv/bin/activate && \

  • $ cd create_voice_samples && \

  • $ python ./CreateVoiceSamples.py

  • They will be created into subdirectories of the name of the speaker they applay to.

  • You can just copy this wav-files into a custom folder under the Web_Server - Directory to be able to brows them from the frontend.

Webserver :

Attension ! Do only use it inside secure local or vpn - networks ! It is not ready to be used as service online !

  • The web-server can be used to work with audiobook-creation from another device like a smartphone.

  • It should only be used within secure networks as it dows not implement any security or privacy functionality.

  • Please note that currently the complete folder structure is shown to the user. This will e fixed in later versions. For the moment you should only use it within your vpn or local network!!!

Usage :

  • $ source audiobook_venv/bin/activate && cd Web_Server && python ./AudiobookCreatorWebServer.py

  • It will runn under localhost:5000

  • You can insert your text into the textfeald or upload a .txt file.

  • The created audiobooks will show up as folders in the directory Tree.

  • You can play them by klicking on the dirctory and after then just on play.

Translation :

translation :
  • You can translate texts from english to german and wise versa.

  • You can translate any other language to English or German.

  • Please keep in mind that non - supported languages may leed to unusual sentense-splitting.

  • Just use the toggle-button on the web-frontend.

  • Please keep attension of the outputs as they are currently produced via gemma2.

    • Large Language Models can make mistakes.

    • Always check importent information.

    • Just try it out, translations may be incorrect or missmatching parts of your text. I am working on better results.

  • Plese note that you will need at least 16 GB of vram in order to perform translation and tts by gpu-support at the same time. If you don’t fit this requirement just turn off gpu-usage at the web-frontend to ensure that the audiobook-creation will be performed correctly.

Alternative Instructions :
  • You can give alternative instructions for the gpt-model to process your text.

  • Just type your instruction into the textfeeld <Alternative Instructions> and press submit.

    • Please note! For Alternative instructions do need the same resources and dependencies the translationfeature needs.

  • You will have ollama up and running with gemma2.

  • From summarization to logical analyses is everything possible.

    • Please note that some instructions will be to complex for gemma2. Considder to use larger models for more complicated tasks, like logical analyses or strong reasoning.

  • All gpt-s are used to make mistakes from time to time. Please dubblecheck importent information!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published