How Technology Turns Written Words Into Audio

Transforming words that are written, typed or printed on paper into audible speech involves two distinct processes. First, Optical Character Recognition, or OCR, software converts the text into an editable digital text or document file, and then Text-to-Speech, or TTS, software converts the digital document file into an audio speech file. Both technologies are readily available in the form of standalone software, and OCR software is also typically bundled with scanning hardware.


Before OCR software can transform a digital image of printed or handwritten text into an editable digital text file, there must first be a digital image to convert. This image, typically a PDF file, is most often created using a document scanner. The most common type of document scanner is called a “flatbed scanner,” because the sheet of paper is positioned over a piece of flat glass, and a digital camera-sensor and light mechanism moves beneath the stationary page in order to capture an image of the entire page in a single pass. However, a usable image can also be made with a standard consumer-grade digital camera with a 3-megapixels or greater sensor.

OCR Software

OCR software allows a computer to convert an image or photograph of text into a digital file from which it is able to recognize individual print characters; in essence, the computer “reads” the text. To do this, OCR software converts images of text into standard document file formats such as those used by Microsoft Word, including RTF, DOC and DOCX files, or by Adobe Reader, including PDF files.

TTS Software

TTS software employs sophisticated algorithms and mathematical models to allow a computer to synthesize understandable audible speech from any appropriate text or document file, whether or not it was created by an OCR system. The resulting audio files may be in any of the many formats used by popular audio devices, including MP3 files.

Sources of OCR and TTS Software

There are a number of sources for independent OCR programs, but OCR software is also commonly packaged with the scanners. For example, many Hewlett-Packard scanners are equipped with its Readiris Pro OCR software for seamless integration with the scanner-control application. Similarly, while TTS programs are also available from independent sources, they are also built into all recent versions of Microsoft and Apple computer operating systems, as well as into the operating systems of Windows, Android and iOS mobile devices. In addition, cloud-based services such as Google Docs also include text-to-speech utilities.

Application Example

As an example of a practical application of this technology, suppose that you are planning a lengthy road trip and you want to enjoy an audio version of your favorite novel via your MP3 player, but the novel is not available in off-the-shelf audio-book form. To convert the novel so that you can listen to it, scan its pages with a flatbed scanner, use the scanner’s bundled OCR software to convert the scanned images into an editable text file such as a DOC or DOCX file, and then convert this file into a synthesized audio file, such as an MP3, using free online TTS software such as Zamzar (link in Resources).

Author: vijayanand