Carl Franke

🎹 Music + 📖 Fiction + 📣 Marketing

Using Apple Books AI Based Digital Narration Tool For Audiobooks

In self-publishing novels with Amazon Kindle Direct Publishing, some asked if I’d ever get audiobooks created. I always laughed, as it was way beyond my budget, as they typically cost anywhere from $6,000 to $10,000 to produce. Of course, I really did want to create an audiobook.

I even dabbled with the idea of recording it myself in a padded closet at home, but didn’t want to deal with listening to my voice or investing in the proper microphone. Plus, my house is always loud with kids and cats, and slamming doors and cabinets. I’d have to record at 3AM nightly to get it recorded with in silence. (Of course, there is AI denoising software, but…)

Years ago, we routinely rented Harry Potter audiobooks on CD from the Abington Township Library. My son loved to listen to them on long travels. They were fantastic, as different talents represented unique characters for the dialogue parts. It really brought the novels to life.

Fiverr and ElevenLabs To The Rescue?

Be Home By Dinner Audiobook

As I explored Fiverr.com, I realized some freelancers could produce the novel for about $5,000, but it was still too expensive. Also, the work of creating a character list of how to pronounce the many names and settings seemed challenging to do over a Zoom meeting or many emails.

ElevenLabs.io, which I’m really fond of, was a potential solution. But it would require that I get the $330 / month plan due to the word count. It also required writing out uncertain words and have them spelled out. For example, a pizzeria called Rosario’s would need to be re-typed to Rose-air-ee-ohs so that the AI tool could understand the correct way to pronounce the word.

A Free Solution?

For ebook syndication, I use Draft2Digital, which I recently discovered is an approved partner of Apple Books, and allows authors to generate audiobooks using Apple Books AI digital narration for FREE! All that you have to do is pick out the ideal voice to represent your novel and let the AI tool spend a few weeks with your ebook. It’s currently restricted to categories of romance, fiction, mystery and thriller, or science fiction and fantasy.

When I received notification that my novel Be Home By Dinner was published on Apple Books, I smiled because I knew that that there was no way in hell that the AI tool could possibly have nailed the dozens of character names, locations, 1980s pop culture references and more. I questioned how did it make a leap of faith in determining pronunciations. Why was I even participating in this? Was I just helping the Apple Books AI tool get smarter at my own expense?

On initial listen, I enjoyed the character voice that I had chosen. He was suitable for suspense, which is the genre of the novel. But… the “audiobooks without the overhead” definitely had its fair share of issues.

What Fell Apart With The AI Produced Audiobook

Character Names

I figured this would happen. But some character names, like Kova (the antagonist) took on a different pronunciation at different parts of the book. Sometimes it was Koo-Vah. Other times it was Kah-Vah. And sometimes it was the correct Koe-Vah. The name kept morphing, as if the AI narrator couldn’t agree on what to call this character, which I found odd since it’s a simple 4-letter name.

Author Name

Yes, even my name was butchered. Instead of stating Franke with a silent “E”, they included a hard “E”, like Frankie Goes To Hollywood. I felt like I was back in high school during role call with a new teacher.

What I Miss About A Human Narrator

Mouth Noises

Yes, it sounds weird and gross, but I missed sounds of the human element. Fake breath noises are not part of the AI equation yet, let alone lip smacking or air sneaking through teeth. The AI voice is a bit dry and sterile, with a clockwork tempo. At times you want to rattle the robot and have it take a shot of whiskey to loosen up and expand its range.

Be Home By Dinner Audio Book

Ambient Sounds

The AI voice is precise with perfect audio levels. But I miss the sounds of the room, like pages being turned or a glass of water being put down on a wooden table. The impurities of recordings are often the most endearing. The singer, Sting, accidentally sat on the piano during the recording of the song “Roxanne”, for example. The clang of piano keys was recorded, and The Police kept that in song. I remember listening to a lot of the Beat Generation authors perform readings and hearing the cigarette exhalations and ice cubes tinkling in glasses, cars whizzing by, or uproarious laughter of someone nearby. It was more vulnerable and electric.

Lively Dialogue

When I read a novel, I create a dialogue voice for each character. I imagine most people do. It just happens naturally to help break up the reading. With AI narration, the voice adjusts a bit with a conversation between two people, but it sounds like a screenplay read by someone vaguely interested in auditioning for a part in a film adaptation. The emphasis is not as strong, especially for highly emotional scenes of distress, even with multiple exclamation marks or ALL CAPS.

Correctly Pronounced Words

For heteronyms, the AI tool seemed to work based on a coin toss. For example, the word “tearing” was supposed to be pronounced like “eyes tearing up”, but it was pronounced like “tearing up a piece of paper”. The correct context was picked up by the AI tool sometimes, but not always.

Onomatopoeia can be a bit of a train wreck. For example, the “psss psss psss” cat call sounds resulted in the narration spelling out each instance of these phrases. I laughed hard on that one. “Shhhh” was known though.

Review Process?

An audio file that could be annotated would be the simplest solution, with a section that allowed authors to spell out the pronunciations of misspoken words. The file could be updated and the process automated until a green “Approved!” button is pressed. Maybe in the future?

Regardless of issues, I’m excited for the opportunity to have an audiobook at the ready for Be Home By Dinner. Check it out on Apple Books.

Looking to publish your own audiobook? Here’s how you can get started with your own Apple Books audiobook.

Creating Voice Narration With ElevenLabs Synthesis

What is ElevenLabs?

ElevenLabs specializes in creating natural-sounding speech synthesis and text-to-speech software, using AI and deep learning. 

The simple to use web software allows you to enter in your script and select from over two dozen voice personalities. Each voice has separate settings that can adjust Stability (more variable to more stable), Similarity (low to high) and Style Exaggeration (none to exaggerated). Nudging each of these settings a bit can produce very different outcomes. Sometimes words are over enunciated, extended in length, and sometimes a nervous chuckle get thrown in. But the results are highly realistic and ideal for short passages such as a voicemail greeting or video narration.

ElevenLabs

Over the holidays, there was a spot-on “Santa” character that was as boisterous and jolly as you can get.

Voice Options

The Voice Library provides user adjusted voices that you can add to your default voices. Descriptions really help with finding the most suitable voice. For example: “Middle Aged Man With British Accent”. Tags feature attributes such as: wise, clam, sassy, formal, intense, modulated, pleasant — making it easier to find the ideal voice.

There’s also a multilingual speech model that’s able to generate life-like speech in 29 languages.

How Much Does ElevenLabs Cost?

Five different plans starting at $0 and maxing at $330 are available, offering more features such as better quality, voice cloning, analytics and support.

What Does It Sound Like?

Here’s an example of the “back cover” description for BLIZZARD 96 using the character “Charlotte”:

Utilizing Ideogram AI To Create Custom Stock Photography

Sometimes you need a custom stock photo image. The stock image websites, like Unsplash, Pixabay and Pexels are excellent free sources. But when you have something super specific in mind, Ideogram.ai is one of many tools that could do the job for you.

Here’s some Leasing Consultants giving tours to prospects, created with Ideogram:

What Is Ideogram?

Unlike many AI content creation sites, Ideogram.ai has a minimalist feel with UI with just a small question mark on the bottom right that links out to a few pages. Like another famous social media site that ends in “gram”, Ideogram is a similar social media style account with a very similar feel to that original “gram”. You can choose whether you want your creations to be private or public as well, gain followers, “like” other posts and remix other account creations.

With your Google or Apple account, you can create an Ideogram account and then “Describe what you want to here”. From there, you can fine tune your aspect ratio and add stylistic effects such as:

  • Photo
  • Illustration
  • 3D Render
  • Typography
  • Cinematic
  • Poster
  • Dark Fantasy
  • Graffiti
  • Architecture
  • Conceptual Art
  • Ukiyo-e (I had to look this one up: It’s a Japanese art genre that include woodblock prints and paintings from the 17th to 19th centuries.)
  • and more!

Creating A Fictional Apartment Video With Runway

A fictional apartment community teaser made with Runway. The fire pits are extreme, and dog heads morph into cats, but it’s an aesthetic that keeps getting refined. Just watch out for those expanding donuts.

Yes, that’s supposed to be a dog on that guy’s lap. It’s getting better, but pets seem to be on the eerie side so far.

What Is Runway?

“Runway was founded by artists on a mission to bring the unlimited creative potential of AI to everyone, everywhere with anything to say. Beyond our innovative technology and creative tools, we also strive to create platforms and initiatives that will empower and celebrate the next generation of storytellers.”

from RunwayML.com

Check out a free trial version of Runway and you can create these types of content types based on descriptive queries and more:

  • Video to Video
  • Text / Image to Video
  • Generative Audio
  • Background Removals
  • Text to Image
  • Image to Image
  • Infinite Image

There are many other video effects tools as well.

Baron Ryan – TikTok Creator

Baron Ryan is an entertaining TikTok creator that is breaking boundaries on what you can achieve with one person and an iPhone. His sketches seem inspired by the Wes Anderson cinematic aesthetic, and he packs social commentary with humor that will make you ponder.

This video of him having a contemplative chat with his reflection in a train window is a great example:

I sometimes close my eyes and imagine a hotel and behind every door are all the people I could have been and lives I could have lived, infinite beginnings.

I’m alredy nostalgic for people I’ll never be.

But in the end, you gotta have mercy on yourself, even if you don’t do everything you’ve set out to do in life.

At first you want your dreams to be real. But you realize that some dreams are nicer as dreams. They can be there and you open those doors and live those lives anytime you like on the other side.

Besides, a memory and a fantasy both live in the head.

Baron Ryan
@americanbaron

All the people we could have been. Life could have turned out any which way and every which way comes with its own cocktail of inconveniences. The people we could have been look at us and imagine they could have been us. #loop #life #nostalgiacore

♬ Memory Reboot – VØJ & Narvent

The Creative Act: A Way Of Being

If you’re in a rut with any creative endeavor, or need a new vantage point, this book is an excellent compass. Packed with micro-chapters of every possible step on the journey to creating a song, painting, novel, design, web site–anything, this book will help you realize those eureka moments in the shower, or long gazes at inanimate objects, or stumped lacked of progress are just part of the process. Great read, void of fluff, straight to the point: The Creative Act: A Way Of Being by Rick Rubin

The Creative Act: A Way Of Being, By Rick Rubin

The Black Sheep & The Black Eel

The night before was a merry stew. We left Ocean City and cruised to Sea Isle City to roam around some shops, play miniature golf, and then drink it up early at some place called the Dead Dog Saloon. We were there pounding pints quite early in the evening, eating greasy appetizers. Allyson was pregnant, so she was our driver.

The Dead Dog was a step above a dive bar, low key. But after a few beers, I was told by the manager that I had to either wear a collared shirt or vacate the premises. My Jameson Irish Whiskey graphic t-shirt that I remember vividly getting on my 30th birthday was suddenly equivalent to a swastika at 8:00 pm, and it needed to be covered.

Of course, they sold official Dead Dog Saloon polo shirts there, so I bought a white one and wore it sloppily over my t-shirt. I flipped up the collar, buttoned all the buttons, and mocked the notion that a collared shirt was necessary, as if we were in a private country club. I walked around the bar, chatting with others that were also notified and enjoyed my drunken glory.

I awoke early the next morning with a slight headache, but I needed to get up, as I was going on my first deep sea fishing trip with Harry, my father-in-law, and my bro-in-law Ray.

Now, if you angle it right, everyone can be deemed the black sheep of their family, but for me, I felt I always ran a bit blacker.

With my family growing up, my Mom, Dad, and sister were all nurses. We’d sit in the dining room in my late teens and eat saucy lasagna while they talked about blood and bodily fluids, which always led to a shush from me, or I’d just stammer off with my plate to the den.

My sister was a socialite in high school, always throwing parties and going out. I just stayed in my bedroom and reorganized my baseball cards, waiting for the promise of college freedom, watching Friday sitcoms that nobody watched.

During the holidays, my sister, Mom, and Aunt would dance to pop stars, like Bon Jovi, joyously after a glass of wine, as I sat in the corner wishing I could blare the Pixies. They would call me “Jesus” as my stoner long hair, scruffy beard, and flannel effortless wardrobe clashed with the whole look of the family. I didn’t really care, though, but I just felt like an oddball, although mighty comfortable in being just that.

Now, I was married and had joined a whole new family. The in-law dudes (father and three bros), were all heavily into hunting, fishing, home repair, and sports, particularly NHL and NFL. All of those items resulted in a big fat zero of interest for me, so I was quickly lost in their conversations.

Growing up, none of my friends or family hunted, so it was very foreign to me. I’ve never even held a gun, except for the fake one that I often whip out and shoot my cat with. Once you’re in your mid-30s, you know what you want to pursue in life, and you easily check out and dive into what you dig the most. For me, I could care less if I ate another piece of meat for the rest of my life. And, NASCAR and televised sports — it really didn’t matter to me if they all vanished and were replaced by non-stop Cosby Show reruns.

It’s not that I was a black sheep with my newly expanded family; I was a black sheep with the typical Philadelphian male, I suppose. My interests didn’t lay in building additions to a home or car repair. My focus was on HTML5, CSS3, jQuery and building the best web sites possible for modern browsers, as my livelihood depended upon it. My career as a web professional was taking over my life. It was the only way to thrive in that profession. Pixels and code were my building blocks. Coffee and beer were my engine. Writing and music were my release.

So, now here I was, about to embark on an early AM fishing trip with some seasoned deep sea fishery folk. I’ve always been easily car sick as a kid, from the days of my parents driving me around town. I originally thought that my parents were just bad drivers, but they weren’t.

I always preferred to drive. I insisted for the fishing trip and took us to a Wawa for some grub, although I was the only one that seemed to be craving anything. I bought a coffee and a bag of Fritos Corn Chips.

While waiting to load the charter boat, I crunched down the chips and pounded the coffee and I felt, well, shitty, but at least more awake. Soon we were on the boat. I felt glad that I had finally joined Harry in one of these journeys, as he was always asking me to come along. Maybe it was the long lost missing link of my life that I needed, I thought.

The charter boat filled up with about forty people and we all hung along the railings as the engine chugged us out deeper into the ocean. The deep sea fishing poles seemed simple enough, as you just dropped your baited hook into the water.

Eventually the boat stopped, as if we had reached a precise destination. With the engine off, the boat instantly started rocking heavily with the wind and water, pushing the horizon up and down and jostling instant nausea into my system, as if something was jarred in my brain and I could no longer focus.

I darted to the men’s room, the one tiny men’s room on the boat, and vomited heavily the full yellow corn chip mush into the toilet — well, as much as I could into the toilet. My extended arms held onto the walls for support, as I would have fallen over otherwise. I tried to clean up the mess the best I could and then proceeded back out.

Harry had a baited pole ready for me and immediately knew I had yacked, pointing out that I was pale and unstable looking.

“Yeah, I got it out of me,” I said, grabbing hold of the pole. I was proud that I had made it to the toilet on time, getting it out of my system and ready to catch some bluefish, tuna, weakfish, flounder — anything. Maybe we could grill it up later, I thought.

We all hovered over the railing looking down at the water. But then it hit me again. The nausea was instant and relentless. I threw up into the water, leaving a trace of vomit alongside the boat, holding tight to my pole. Holy fuck is this embarrassing, I thought. Chunks of puke lined my sweatshirt as I couldn’t help but act like a 17-year-old girl that did shots of whiskey for the first time and was ruining the party for all.

Suddenly, I felt tension on my line and knew that I was either catching a fish or a heavy piece of debris. Harry noticed I was fading out and helped me reel in the sucker as I could barely hold onto the rod.

Out of the water wriggled a testy slimy black eel, about four feet long. One of the crewmen came over and told me to just pull it in and I dropped it on the deck. The damn thing writhed around relentlessly. It was like a massive piece of black licorice that had come alive, trying to slap us all in the face. The crewman held it down with gloved hands and then pounded several times on the eel’s head with a mallet. Blood spurted around the deck and it eventually relaxed. The crewman tossed the eel back into the sea and then cleaned up the mess with a mop and bucket.

Now the nausea that had overcome me, leagues above any flu or hangover barf scene that I had ever experienced. With the flu, you may vomit for twenty minutes, but then you fall back asleep for hours. This was a non-stop assault that I couldn’t escape. In fact the vomiting part was actually the better part, allowing me to attain temporary relief. The waiting in between gags was the hell.

I wandered around the boat trying to find a sweet spot of relief, but such a location didn’t exist. I tried to smile at the people happily fishing, acting like was ambling towards a destination. I went into the dining area where I heard they were selling Dramamine. I bought a couple pills and swallowed them down. Some old fella chuckled and told me the pills needed to be taken hours before getting on the boat. “Those will just make you sleepy at this point.”

I sat in a booth for four by myself, gripping the table, and tried to focus on the horizon. It didn’t work at all. Also seated were a couple other seasick guys. I saw one dude vomit and I immediately gagged and tossed up some more onto the floor. Liquid chunky orange goo cascaded back and forth…and back and forth…sliding back and forth on the floor. One kid was about eye level with a trash can and stuck his whole head inside of it to yack.

Just a little over three more hours of this, I thought, holding onto the railing toward the end of the boat. Nobody was around there. Harry came over eating some scrambled eggs from the kitchen as the wind blew towards him. I warned him that I was about to hurl and that the wind might blow it towards his face. He got out of the way, letting me know that he was being easy on me. He described how he originally wanted to shove bait into his mouth and talk to me.

I already felt like a douche because my chest and back were sunburned from trying a new “spray” sunscreen. It was like cooking spray, but didn’t work on my pasty skin at all, leaving a large red spot on my chest and stomach that resembled Pangaea.

Finally we were headed back to the dock. I was happy to hear the engine roaring and seeing us zip evenly across the ocean. At the dock, I stepped onto the deck and slipped and fell. I laughed at myself and got back up. What the fuck did I care, really? Nobody had caught one single fish. It was just the black eel and me, bloodied and butchered.

Back at the beach house, my wife and I decided to go out to eat at some Italian place down the street. We sat at a table outside and dipped bread in olive oil and watched a fender bender occur, right in front of us. Everyone was fine. A cop showed up. Traffic built up. Waitresses brought out entrees. Fresh water with lemon. Peppered cheese. Prodding jokes. Focused eyes, a goofball back on firm land where he belonged.

Perplexity.ai vs. Google SGE

Perplexity.ai, touted as “where knowledge begins” is a research engine that utilizes AI and natural language predictive text to answer questions. Similar to Google SGE, it cites sources (with direct links) and has a “follow up” chat section. Unlike Google though, there is a paid Pro version that allows you to top into a “smarter AI and more Pro Search” and generate images for $20 / month. (I haven’t tried that.)

What’s welcome about Perplexity.ai is the clean interface and lack of ads. With your “ask anything” prompt, you can then choose a “Focus” setting of:

  • All (Search across the entire interent)
  • Academic (Search published academic papers)
  • Writing
  • Wolfram | Alpha (Computational knowledge engine)
  • YouTube
  • Reddit (Search for discussions and opinions)

Another cool tool is the Library, which stores all of your sources and allows you tap back into them for the answers. You can also group your searches in to Collections and then create Secret or Shareable links to them.

If you’re looking for a unique break from Google and give your searches a collective stronger purpose, the Perplexity.ai free version is worth a spin.

Fender Rhodes on Perplexity.ai

Using DALL-E To Create Graphics You’d Never Visualize In Photoshop

The theme of this year’s Pennsylvania Apartment Association APARTogether conference was AI. With the event taking place at Valley Forge Casino Resort, I figured it would be great opportunity to use DALL-E Open Ai imagery and show George Washington leading a state-wide community of apartments, infused by a gambling theme!

Included with ChatGPT 4.0, DALL-E is one of many “text-to-image” AI tools that brings imaginative visions to life. Before these types of tools, if you asked me to create an illustrative graphic with a Founding Father standing amid a bustling array of apartment towers, I’d likely juggle several stock art photos, vectorize them and try to create a cohesive color, levels and saturation level to each layer.

Speaking of gambling, text-to-image tools feels like a slot machine at times, and the results of your query (like a bet) leading to fantastic rewards. Even if you don’t like the results, they often provide inspiration to create something even better.

Pennsylvania Apartment Association at Valley Forge Casino
Pennsylvania Apartment Association at Valley Forge Casino, near King of Prussia
Pennsylvania Apartment Association at Valley Forge Casino with Philadelphia in the background

Making Art In the Age Of Content

If you consider yourself an artist in any capacity, this video is for you: “Making Art In The Age Of Content”

Some highlights for me:

➡️ “The most modern form of art: Create something to serve the algorithms in an attempt to make it go virile.”

➡️ “There is no way to have a daily process that’s repeatable with guaranteed results, but these platforms encourage that and they force you to have higher output and open up the chances that something is going to take off.”

➡️ “The democratization of self expression is what turned everything into a competition.”

➡️ “No matter how much you put your heart and soul into something, it’s just as disposable as everything else on any given platform.”

➡️ “The algorithms exist, even though it’s not their intention, to continue to perpetuate a divisiveness between real artists and content creators.”

« Older posts

© 2024 Carl Franke

Theme by Anders NorenUp ↑