OCR to Text to Speech

For the sixth video in the Design Yourself series the group worked with artist Erica Scourti. For the activity the participants used optical character recognition software (OCR) to generate poetry from their own handwriting and writing (leaflets, signage) found throughout the Barbican building.

The next stage in the workshop was going to be to take this extracted text and run it through a text to speech synthesizer, but unfortunately there wasn’t time to get to this stage.

One of the things I liked about the software they used was that it showed you the image of the text that it recognised and extracted, producing a kind of cut-and-paste poetry.

To make the sixth video I wanted to somehow utilize this OCR and text-to-speech process and make a video collage of words and synthesized speech. The challenge was finding a way to do this using only open source software. Finding open source OCR software that works on Linux is not a problem. After a while I discovered that Tesseract is the gold standard for OCR software and that most other software act as frontends or interfaces to it. Here’s a few examples:

However, they all output only the text, and not the image of the extracted text. I’m aware that my use case is quite specific so I don’t blame the developers for this.

Eventually I took to Twitter and Mastodon with my questions. _vade pointed to a bug report on Tesseract which showed that getting the coordinates of recognised words is possible in Tesseract. If I knew the coordinates of words then perhaps I could use that to extract the image of the word. However, doing it this way required using it’s C interface and learning C wasn’t feasible at the time.

After some further digging around Tesseract I found a bug report that makes reference to hOCR files:

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.

This file looked like it contained the coordinate data I needed, and obtaining such a file from Tesseract was as simple as running one command. The next task was finding a tool (or tools) to interpret hOCR files. Here’s a selection, which should really be added the the previous list to form a mega-list:

hocr-tools proved to be the most feature complete and stable. It runs on the command line, which opens it up for easy automation and combining with other programs. After reading the documentation I found a process for extracting the images of words and even making videos from each word/sentence with synthesized speech. Here’s how I did it:

Generate hOCR file

First I needed to generate a hOCR file using Tesseract. For the example I used the first page from the first chapter of No Logo by Naomi Klein.

tesseract book.jpg book hocr

This produced a file called book.hocr. If you look at the source code of the file you can see it contains the bounding box coordinates of each word and line.

Extract images

Using the hOCR file I can extract images of the lines

hocr-extract-images -P 10 book.hocr

This generates both pngs of each line and also a corresponding text file containing the text

Text to speech

Using eSpeak I can generate a wav file of a synthesized voice reading each line.

for file in *.txt ; do espeak -z -f $file -w ${file%.*}.wav ; done

Make video clips

Finally, I needed to combine the png image of the text with the wav file of the synthesized speech into a video

for i in $(seq -f "%03g" 1 83) ; do ffmpeg -loop 1 -i line-$i.png -i line-$i.wav -c:v libx264 -tune stillimage -vf scale="width=ceil(iw/2)*2:height=ceil(ih/2)*2" -pix_fmt yuv420p -shortest -fflags +shortest ${i%.*}.mp4 -y ; done

I used fflags because without it the video was always adding a couple of seconds of silence.

By the end of this I had a folder full of lots of files.

Voila!

The last part of this was to manually arrange the video clips into a video collage. I made this example video to demonstrate to the group what could be done.

Getting to this point took some time but with what I’ve learnt I can replicate this process quickly and simply. In the end the group decided no to use OCR to generate text and instead wrote something themselves. They did still use text-to-speech software and even filmed themselves miming to it. Here’s the finished video:

This was the last video I made for the Design Yourself project. I’ve written about techniques used to make older videos in past blog posts. Go read the Barbican website for more information on the project

Copy Paste photos

Photos of the Copy Paste exhibition currently taking place at Piksel in Bergen, Norway, and online in the Piksel Cyber Salon

Copy Paste

Copy Paste

Copy Paste

Copy Paste

Copy Paste

Copy Paste

Copy Paste

Full list of exhibiting artists inculde Carol Breen, Constant, LoVid, Lorna Mills, Matthew Plummer-Fernandez + Julien Deswaef, Duncan Poulton, Eric Schrijver, Peter Sunde.

Photos taken by Maite Cajaraville. More photos can be seen here.

If you have the opportunity to see the exhibition in person please do! It’s open until 21st June.

Copy Paste opening

Copy Paste opened at Piksel in Bergen on the evening of Friday 22nd May. Sadly myself and all of the exhibiting artists were unable to be there but fortunately they live streamed the whole thing.

After all of the uncertainty about whether Copy Paste would go ahead I’m really happy that situation in Bergen has been good enough for the exhibition to welcome visitors. It was sad to not to be in Bergen myself to see everything IRL but I’m thankful to Maite and Gisle, Directors of Piksel, for handling all of the logistics and installation of the works.

Copy Paste exists in two spaces. In the physical studio space visitors can find works by Carol Breen, Constant, Lorna Mills, Duncan Poulton, Eric Schrijver, and Peter Sunde.

Here’s a pixelated look at some of the artworks as captured by me in the UK from the livestream:

Copy Paste also exists as a virtual online exhibition in the Piksel Cyber Salon.

The space is built using Mozilla Hubs and works in your web browser (or VR headset if you have one). In the Cyber Salon you can find works by LoVid, Matthew Plummer-Fernandez + Julien Deswaef, Carol Breen, and Duncan Poulton.

Many thanks Malitzin Cortes for designing this space. You can visit it at any time and all of the live streamed events will also be streamed to there.

Events

Speaking of events check out these upcoming events happening as part of Copy Paste!

Curator’s Tour

24th, 31st, 7th, 14th, 21st June 13:00 – 14:00
Each Sunday at 13:00 – 14:00 CEST I’ll be giving a tour of the exhibition (remotely, obvs), talking a bit about each artwork and how they contribute to the exhibition and explore ideas around copying.

Live Coding Algorave Performance with Alex McLean and Antonio Roberts

29th May 23:00 – 00:00 CEST
On 29th May 23:00 – 00:00 myself and Alex McLean will be doing a live coding performance. Alex will be doing his usual patterns of sample based music and visually I’ll be mixing things up a bit.

Authors of the Future

6th June 18:00 – 20:00 CEST
An online presentation from Constant of Authors of the Future, with a focus on the Cinemas Sauvage license. This license shows the pitfalls and fun (im)possibility of coming to an agreement with a bunch of anarchist people who do not want to agree on a rule.

Internet Archaeology for Beginners

7th June 16:00 – 18:00 CEST
Join artist Duncan Poulton on 7th June 16:00 – 18:00 CEST for a virtual workshop which offers an introduction to techniques for mining and misusing the web for creative reuse. Attendees will visit the depths of the internet that search engines don’t want you to find, and learn to make their own digital collages from the materials they gather.

To book onto Duncan’s workshop and find out more about the other eents send an e-mail to piksel20(at)piksel(dot)no

Hope y’all enjoy the exhibitoin!

Copy Paste – 22nd May – 21st June 2020

I’m happy to announce that I am curating the exhibition Copy Paste, taking place at Piksel in Norway from 22nd May – 21st June.

If you are an artist, then you have no doubt copied the work of others. This copying can range from using pages from a magazine in collage, adopting the style of an artist, or simply being inspired by the work of an artist. This natural process of copying, taught to us at every stage of our artistic development, is burdened by a very complex and messy set of laws and social conventions which define and limit how we can use copying within our practices. These don’t take into account exceptions or nuances, and come from a historical world where artworks were scarce physical objects, and don’t translate well into a world where culture is abundant and can be accessed and copied at will.

Being an artist who copies is to be an artist working in this murky grey area of right and wrong.

For those artists, I have curated Copy Paste. This exhibition features the work of nine artists and art collectives who all incorporate copying as a core aspect of their work. Taking the form of a physical exhibition at Piksel in Norway, an online exhibition, and an event series, the exhibition aims to show that copying is natural.

Exhibiting artists: Carol Breen, Constant, LoVid, Lorna Mills, Matthew Plummer-Fernandez + Julien Deswaef, Duncan Poulton, Eric Schrijver, Peter Sunde.

There will be a few events happening throughout the exhibition, taking place online and IRL at Piksel’s studio in Norway. More details of that will follow here and on Piksel’s website. Thanks so much to Piksel for the invitation to curate this exhibition 🙂

Gifhouseparty – 16th May 2020

On 16th May 22:00 BST I’ll be doing a gif-enfused audiovisual performance for the Well Now WTF? exhibition.

In the 30 minute performance I’ll be live coding music and sounds for all of the gifs currently stuck at home! If you want to be at the party send me a gif of yourself dancing against a green screen or solid colour by 12th May and tune in to the performance over on Twitch 🙂