In self-publishing novels with Amazon Kindle Direct Publishing, some asked if I’d ever get audiobooks created. I always laughed, as it was way beyond my budget, as they typically cost anywhere from $6,000 to $10,000 to produce. Of course, I really did want to create an audiobook.
I even dabbled with the idea of recording it myself in a padded closet at home, but didn’t want to deal with listening to my voice or investing in the proper microphone. Plus, my house is always loud with kids and cats, and slamming doors and cabinets. I’d have to record at 3AM nightly to get it recorded in silence. (Of course, there is AI denoising software, but…)
Years ago, we routinely rented Harry Potter audiobooks on CD from the Abington Township Library. My son loved to listen to them on long travels. They were fantastic, as different talents represented unique characters for the dialogue parts. It really brought the novels to life.
Fiverr and ElevenLabs To The Rescue?
As I explored Fiverr.com, I realized some freelancers could produce the novel for about $5,000, but it was still too expensive. Also, the work of creating a character list of how to pronounce the many names and settings seemed challenging to do over a Zoom meeting or many emails.
ElevenLabs.io, which I’m really fond of, was a potential solution. But it would require that I get the $330 / month plan due to the word count. It also required writing out uncertain words and have them spelled out. For example, a pizzeria called Rosario’s would need to be re-typed to Rose-air-ee-ohs so that the AI tool could understand the correct way to pronounce the word.
A Free Solution?
For ebook syndication, I use Draft2Digital, which I recently discovered is an approved partner of Apple Books, and allows authors to generate audiobooks using Apple Books AI digital narration for FREE! All that you have to do is pick out the ideal voice to represent your novel and let the AI tool spend a few weeks with your ebook. It’s currently restricted to categories of romance, fiction, mystery and thriller, or science fiction and fantasy.
When I received notification that my novel Be Home By Dinner was published on Apple Books, I smiled because I knew that that there was no way in hell that the AI tool could possibly have nailed the dozens of character names, locations, 1980s pop culture references and more. I questioned how did it make a leap of faith in determining pronunciations. Why was I even participating in this? Was I just helping the Apple Books AI tool get smarter at my own expense?
On initial listen, I enjoyed the character voice that I had chosen. He was suitable for suspense, which is the genre of the novel. But… the “audiobooks without the overhead” definitely had its fair share of issues.
What Fell Apart With The AI Produced Audiobook
Character Names
I figured this would happen. But some character names, like Kova (the antagonist) took on a different pronunciation at different parts of the book. Sometimes it was Koo-Vah. Other times it was Kah-Vah. And sometimes it was the correct Koe-Vah. The name kept morphing, as if the AI narrator couldn’t agree on what to call this character, which I found odd since it’s a simple 4-letter name.
Author Name
Yes, even my name was butchered. Instead of stating Franke with a silent “E”, they included a hard “E”, like Frankie Goes To Hollywood. I felt like I was back in high school during role call with a new teacher.
What I Miss About A Human Narrator
Mouth Noises
Yes, it sounds weird and gross, but I missed sounds of the human element. Fake breath noises are not part of the AI equation yet, let alone lip smacking or air sneaking through teeth. The AI voice is a bit dry and sterile, with a clockwork tempo. At times you want to rattle the robot and have it take a shot of whiskey to loosen up and expand its range.
Ambient Sounds
The AI voice is precise with perfect audio levels. But I miss the sounds of the room, like pages being turned or a glass of water being put down on a wooden table. The impurities of recordings are often the most endearing. The singer, Sting, accidentally sat on the piano during the recording of the song “Roxanne”, for example. The clang of piano keys was recorded, and The Police kept that in song. I remember listening to a lot of the Beat Generation authors perform readings and hearing the cigarette exhalations and ice cubes tinkling in glasses, cars whizzing by, or uproarious laughter of someone nearby. It was more vulnerable and electric.
Lively Dialogue
When I read a novel, I create a dialogue voice for each character. I imagine most people do. It just happens naturally to help break up the reading. With AI narration, the voice adjusts a bit with a conversation between two people, but it sounds like a screenplay read by someone vaguely interested in auditioning for a part in a film adaptation. The emphasis is not as strong, especially for highly emotional scenes of distress, even with multiple exclamation marks or ALL CAPS.
Correctly Pronounced Words
For heteronyms, the AI tool seemed to work based on a coin toss. For example, the word “tearing” was supposed to be pronounced like “eyes tearing up”, but it was pronounced like “tearing up a piece of paper”. The correct context was picked up by the AI tool sometimes, but not always.
Onomatopoeia can be a bit of a train wreck. For example, the “psss psss psss” cat call sounds resulted in the narration spelling out each instance of these phrases. I laughed hard on that one. “Shhhh” was known though.
Review Process?
An audio file that could be annotated would be the simplest solution, with a section that allowed authors to spell out the pronunciations of misspoken words. The file could be updated and the process automated until a green “Approved!” button is pressed. Maybe in the future?
Regardless of issues, I’m excited for the opportunity to have an audiobook at the ready for Be Home By Dinner. Check it out on Apple Books.
Looking to publish your own audiobook? Here’s how you can get started with your own Apple Books audiobook.