Metadata-Version: 2.1
Name: aeneas
Version: 1.7.3.0
Summary: aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Home-page: https://github.com/readbeyond/aeneas
Author: Alberto Pettarin
Author-email: alberto@albertopettarin.it
License: GNU Affero General Public License v3 (AGPL v3)
Keywords: AUD,AWS Polly TTS API,CSV,DTW,EAF,ELAN,EPUB 3 Media Overlay,EPUB 3,EPUB,Festival,JSON,MFCC,Mel-frequency cepstral coefficients,Nuance TTS API,ReadBeyond Sync,ReadBeyond,SBV,SMIL,SRT,SSV,SUB,TGT,TSV,TTML,TTS,TextGrid,VTT,XML,aeneas,audio/text alignment,dynamic time warping,eSpeak,eSpeak-ng,espeak,espeak-ng,festival,ffmpeg,ffprobe,forced alignment,media overlay,rb_smil_emulator,speech to text,subtitles,sync,synchronization,text to speech,text2wave,tts
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Education
Classifier: Topic :: Multimedia
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Printing
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Utilities
License-File: LICENSE
Requires-Dist: BeautifulSoup4 >=4.5.1
Requires-Dist: lxml >=3.6.0
Requires-Dist: numpy >=1.9
Provides-Extra: boto3
Requires-Dist: boto3 >=1.4.2 ; extra == 'boto3'
Provides-Extra: full
Requires-Dist: boto3 >=1.4.2 ; extra == 'full'
Requires-Dist: Pillow >=3.1.1 ; extra == 'full'
Requires-Dist: requests >=2.9.1 ; extra == 'full'
Requires-Dist: tgt >=1.4.2 ; extra == 'full'
Requires-Dist: youtube-dl >=2016.9.27 ; extra == 'full'
Provides-Extra: nopillow
Requires-Dist: boto3 >=1.4.2 ; extra == 'nopillow'
Requires-Dist: requests >=2.9.1 ; extra == 'nopillow'
Requires-Dist: tgt >=1.4.2 ; extra == 'nopillow'
Requires-Dist: youtube-dl >=2016.9.27 ; extra == 'nopillow'
Provides-Extra: pillow
Requires-Dist: Pillow >=3.1.1 ; extra == 'pillow'
Provides-Extra: requests
Requires-Dist: requests >=2.9.1 ; extra == 'requests'
Provides-Extra: tgt
Requires-Dist: tgt >=1.4.2 ; extra == 'tgt'

aeneas
======

**aeneas** is a Python/C library and a set of tools to automagically
synchronize audio and text (aka forced alignment).

-  Version: 1.7.3
-  Date: 2017-03-15
-  Developed by: `ReadBeyond <http://www.readbeyond.it/>`__
-  Lead Developer: `Alberto Pettarin <http://www.albertopettarin.it/>`__
-  License: the GNU Affero General Public License Version 3 (AGPL v3)
-  Contact: aeneas@readbeyond.it
-  Quick Links: `Home <http://www.readbeyond.it/aeneas/>`__ -
   `GitHub <https://github.com/readbeyond/aeneas/>`__ -
   `PyPI <https://pypi.python.org/pypi/aeneas/>`__ -
   `Docs <http://www.readbeyond.it/aeneas/docs/>`__ -
   `Tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__
   - `Benchmark <https://readbeyond.github.io/aeneas-benchmark/>`__ -
   `Mailing
   List <https://groups.google.com/d/forum/aeneas-forced-alignment>`__ -
   `Web App <http://aeneasweb.org>`__

Goal
----

**aeneas** automatically generates a **synchronization map** between a
list of text fragments and an audio file containing the narration of the
text. In computer science this task is known as (automatically computing
a) **forced alignment**.

For example, given `this text
file <https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml>`__
and `this audio
file <https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3>`__,
**aeneas** determines, for each fragment, the corresponding time
interval in the audio file:

::

    1                                                     => [00:00:00.000, 00:00:02.640]
    From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
    That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
    But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
    His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
    But thou contracted to thine own bright eyes,         => [00:00:15.280, 00:00:18.800]
    Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
    Making a famine where abundance lies,                 => [00:00:22.760, 00:00:25.680]
    Thy self thy foe, to thy sweet self too cruel:        => [00:00:25.680, 00:00:31.240]
    Thou that art now the world's fresh ornament,         => [00:00:31.240, 00:00:34.400]
    And only herald to the gaudy spring,                  => [00:00:34.400, 00:00:36.920]
    Within thine own bud buriest thy content,             => [00:00:36.920, 00:00:40.640]
    And tender churl mak'st waste in niggarding:          => [00:00:40.640, 00:00:43.640]
    Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
    To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

.. figure:: wiki/align.png
   :alt: Waveform with aligned labels, detail

   Waveform with aligned labels, detail

This synchronization map can be output to file in several formats,
depending on its application:

-  research: Audacity (AUD), ELAN (EAF), TextGrid;
-  digital publishing: SMIL for EPUB 3;
-  closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT
   (VTT);
-  Web: JSON;
-  further processing: CSV, SSV, TSV, TXT, XML.

System Requirements, Supported Platforms and Installation
---------------------------------------------------------

System Requirements
~~~~~~~~~~~~~~~~~~~

1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
2. `Python <https://python.org/>`__ 2.7 (Linux, OS X, Windows) or 3.5 or
   later (Linux, OS X)
3. `FFmpeg <https://www.ffmpeg.org/>`__
4. `eSpeak <http://espeak.sourceforge.net/>`__
5. Python packages ``BeautifulSoup4``, ``lxml``, and ``numpy``
6. Python headers to compile the Python C/C++ extensions (optional but
   strongly recommended)
7. A shell supporting UTF-8 (optional but strongly recommended)

Supported Platforms
~~~~~~~~~~~~~~~~~~~

**aeneas** has been developed and tested on **Debian 64bit**, with
**Python 2.7** and **Python 3.5**, which are the **only supported
platforms** at the moment. Nevertheless, **aeneas** has been confirmed
to work on other Linux distributions, Mac OS X, and Windows. See the
`PLATFORMS
file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>`__
for details.

If installing **aeneas** natively on your OS proves difficult, you are
strongly encouraged to use
`aeneas-vagrant <https://github.com/readbeyond/aeneas-vagrant>`__, which
provides **aeneas** inside a virtualized Debian image running under
`VirtualBox <https://www.virtualbox.org/>`__ and
`Vagrant <http://www.vagrantup.com/>`__, which can be installed on any
modern OS (Linux, Mac OS X, Windows).

Installation
~~~~~~~~~~~~

All-in-one installers are available for Mac OS X and Windows, and a Bash
script for deb-based Linux distributions (Debian, Ubuntu) is provided in
this repository. It is also possible to download a VirtualBox+Vagrant
virtual machine. Please see the `INSTALL
file <https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md>`__
for detailed, step-by-step installation procedures for different
operating systems.

The generic OS-independent procedure is simple:

1. **Install** `Python <https://python.org/>`__ (2.7.x preferred),
   `FFmpeg <https://www.ffmpeg.org/>`__, and
   `eSpeak <http://espeak.sourceforge.net/>`__

2. Make sure the following **executables** can be called from your
   **shell**: ``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and
   ``python``

3. First install ``numpy`` with ``pip`` and then ``aeneas`` (this order
   is important):

   .. code:: bash

       pip install numpy
       pip install aeneas

4. To **check** whether you installed **aeneas** correctly, run:

``bash     python -m aeneas.diagnostics``

Usage
-----

1. Run without arguments to get the **usage message**:

   .. code:: bash

       python -m aeneas.tools.execute_task
       python -m aeneas.tools.execute_job

   You can also get a list of **live examples** that you can immediately
   run on your machine thanks to the included files:

   .. code:: bash

       python -m aeneas.tools.execute_task --examples
       python -m aeneas.tools.execute_task --examples-all

2. To **compute a synchronization map** ``map.json`` for a pair
   (``audio.mp3``, ``text.txt`` in
   `plain <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__
   text format), you can run:

   .. code:: bash

       python -m aeneas.tools.execute_task \
           audio.mp3 \
           text.txt \
           "task_language=eng|os_task_file_format=json|is_text_type=plain" \
           map.json

(The command has been split into lines with ``\`` for visual clarity; in
production you can have the entire command on a single line and/or you
can use shell variables.)

To **compute a synchronization map** ``map.smil`` for a pair
(``audio.mp3``,
`page.xhtml <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__
containing fragments marked by ``id`` attributes like ``f001``), you can
run:

::

    ```bash
    python -m aeneas.tools.execute_task \
        audio.mp3 \
        page.xhtml \
        "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
        map.smil
    ```

As you can see, the third argument (the *configuration string*)
specifies the parameters controlling the I/O formats and the processing
options for the task. Consult the
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details.

3. If you have several tasks to process, you can create a **job
   container** to batch process them:

   .. code:: bash

       python -m aeneas.tools.execute_job job.zip output_directory

File ``job.zip`` should contain a ``config.txt`` or ``config.xml``
configuration file, providing **aeneas** with all the information needed
to parse the input assets and format the output sync map files. Consult
the `documentation <http://www.readbeyond.it/aeneas/docs/>`__ for
details.

The `documentation <http://www.readbeyond.it/aeneas/docs/>`__ contains a
highly suggested
`tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__
which explains how to use the built-in command line tools.

Documentation and Support
-------------------------

-  Documentation: http://www.readbeyond.it/aeneas/docs/
-  Command line tools tutorial:
   http://www.readbeyond.it/aeneas/docs/clitutorial.html
-  Library tutorial:
   http://www.readbeyond.it/aeneas/docs/libtutorial.html
-  Old, verbose tutorial: `A Practical Introduction To The aeneas
   Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html>`__
-  Mailing list:
   https://groups.google.com/d/forum/aeneas-forced-alignment
-  Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
-  High level description of how aeneas works:
   `HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__
-  Development history:
   `HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__
-  Testing:
   `TESTING <https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md>`__
-  Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/

Supported Features
------------------

-  Input text files in ``parsed``, ``plain``, ``subtitles``, or
   ``unparsed`` (XML) format
-  Multilevel input text files in ``mplain`` and ``munparsed`` (XML)
   format
-  Text extraction from XML (e.g., XHTML) files using ``id`` and
   ``class`` attributes
-  Arbitrary text fragment granularity (single word, subphrase, phrase,
   paragraph, etc.)
-  Input audio file formats: all those readable by ``ffmpeg``
-  Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
   TEXTGRID, TSV, TTML, TXT, VTT, XML
-  Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN,
   DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA,
   JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA,
   SWE, TUR, UKR
-  MFCC and DTW computed via Python C extensions to reduce the
   processing time
-  Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak
   (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
-  Default TTS (eSpeak) called via a Python C extension for fast audio
   synthesis
-  Possibility of running a custom, user-provided TTS engine Python
   wrapper (e.g., included example for speect)
-  Batch processing of multiple audio/text pairs
-  Download audio from a YouTube video
-  In multilevel mode, recursive alignment from paragraph to sentence to
   word level
-  In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and
   TTS engine can be specified for each level independently
-  Robust against misspelled/mispronounced words, local rearrangements
   of words, background noise/sporadic spikes
-  Adjustable splitting times, including a max character/second
   constraint for CC applications
-  Automated detection of audio head/tail
-  Output an HTML file for fine tuning the sync map manually
   (``finetuneas`` project)
-  Execution parameters tunable at runtime
-  Code suitable for Web app deployment (e.g., on-demand cloud computing
   instances)
-  Extensive test suite including 1,200+ unit/integration/performance
   tests, that run and must pass before each release

Limitations and Missing Features
--------------------------------

-  Audio should match the text: large portions of spurious text or audio
   might produce a wrong sync map
-  Audio is assumed to be spoken: not suitable for song captioning, YMMV
   for CC applications
-  No protection against memory swapping: be sure your amount of RAM is
   adequate for the maximum duration of a single audio file (e.g., 4 GB
   RAM => max 2h audio; 16 GB RAM => max 10h audio)
-  `Open issues <https://github.com/readbeyond/aeneas/issues>`__

A Note on Word-Level Alignment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A significant number of users runs **aeneas** to align audio and text at
word-level (i.e., each fragment is a word). Although **aeneas** was not
designed with word-level alignment in mind and the results might be
inferior to `ASR-based forced
aligners <https://github.com/pettarin/forced-alignment-tools>`__ for
languages with good ASR models, **aeneas** offers some options to
improve the quality of the alignment at word-level:

-  multilevel text (since v1.5.1),
-  MFCC nonspeech masking (since v1.7.0, disabled by default),
-  use better TTS engines, like Festival or AWS/Nuance TTS API (since
   v1.5.0).

If you use the ``aeneas.tools.execute_task`` command line tool, you can
add ``--presets-word`` switch to enable MFCC nonspeech masking, for
example:

.. code:: bash

    $ python -m aeneas.tools.execute_task --example-words --presets-word
    $ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word

If you use **aeneas** as a library, just set the appropriate
``RuntimeConfiguration`` parameters. Please see the `command line
tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__ for
details.

License
-------

**aeneas** is released under the terms of the GNU Affero General Public
License Version 3. See the `LICENSE
file <https://github.com/readbeyond/aeneas/blob/master/LICENSE>`__ for
details.

Licenses for third party code and files included in **aeneas** can be
found in the
`licenses <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md>`__
directory.

No copy rights were harmed in the making of this project.

Supporting and Contributing
---------------------------

Sponsors
~~~~~~~~

-  **July 2015**: `Michele
   Gianella <https://plus.google.com/+michelegianella/about>`__
   generously supported the development of the boundary adjustment code
   (v1.0.4)

-  **August 2015**: `Michele
   Gianella <https://plus.google.com/+michelegianella/about>`__
   partially sponsored the port of the MFCC/DTW code to C (v1.1.0)

-  **September 2015**: friends in West Africa partially sponsored the
   development of the head/tail detection code (v1.2.0)

-  **October 2015**: an anonymous donation sponsored the development of
   the "YouTube downloader" option (v1.3.0)

-  **April 2016**: the Fruch Foundation kindly sponsored the development
   and documentation of v1.5.0

-  **December 2016**: the `Centro Internazionale Del Libro Parlato
   "Adriano Sernagiotto" <http://www.libroparlato.org/>`__ (Feltre,
   Italy) partially sponsored the development of the v1.7 series

Supporting
~~~~~~~~~~

Would you like supporting the development of **aeneas**?

I accept sponsorships to

-  fix bugs,
-  add new features,
-  improve the quality and the performance of the code,
-  port the code to other languages/platforms, and
-  improve the documentation.

Feel free to `get in touch <mailto:aeneas@readbeyond.it>`__.

Contributing
~~~~~~~~~~~~

If you think you found a bug or you have a feature request, please use
the `GitHub issue
tracker <https://github.com/readbeyond/aeneas/issues>`__ to submit it.

If you want to ask a question about using **aeneas**, your best option
consists in sending an email to the `mailing
list <https://groups.google.com/d/forum/aeneas-forced-alignment>`__.

Finally, code contributions are welcome! Please refer to the `Code
Contribution
Guide <https://github.com/readbeyond/aeneas/blob/master/wiki/CONTRIBUTING.md>`__
for details about the branch policies and the code style to follow.

Acknowledgments
---------------

Many thanks to **Nicola Montecchio**, who suggested using MFCCs and DTW,
and co-developed the first experimental code for aligning audio and
text.

**Paolo Bertasi**, who developed the APIs and Web application for
ReadBeyond Sync, helped shaping the structure of this package for its
asynchronous usage.

**Chris Hubbard** prepared the files for packaging aeneas as a
Debian/Ubuntu ``.deb``.

**Daniel Bair** prepared the ``brew`` formula for installing **aeneas**
and its dependencies on Mac OS X.

**Daniel Bair**, **Chris Hubbard**, and **Richard Margetts** packaged
the installers for Mac OS X and Windows.

**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine
tuning sync maps in the browser.

**Willem van der Walt** contributed the code snippet to output a sync
map in TextGrid format.

**Chris Vaughn** contributed the MacOS TTS wrapper.

All the mighty `GitHub
contributors <https://github.com/readbeyond/aeneas/graphs/contributors>`__,
and the members of the `Google
Group <https://groups.google.com/d/forum/aeneas-forced-alignment>`__.
