
Hi, I’m Jörg Stroisch, I’ve been working as a journalist for over 20 years and as an agile coach, an agile journalist so to speak. I also deal with innovation topics, especially design thinking.
And I used an event organised by the European Parliament, which I also wanted to use for my podcast German Vote, to really use AI for as many steps of the development and production as possible. And later, I refined my toolbox even further.
And here are my tips and experiences!
Please feel free to connect with me on LinkedIn or Instagram.
By the way, this podcast is also spoken by my voice clone. In this case, quite deliberately, so that you can also get an idea of whether you find the quality convincing or not.
In fact, I have used various tools and paid a bit of a lesson in the process. It was important to me that I didn’t end up with subscriptions. I didn’t manage to do that in the end. Nevertheless, I also use some pay-per-use offers.
Transcription with Alrite and Clean Voice: I use these two tools a lot anyway. They help me transcribe and clean up the audio. At least the transcription on Clean Voice wasn’t particularly good, the one on Alrite was almost perfect again. Both tools are pay-per-use.
ChatGPT: With details of the panellists and the transcript with assigned speakers, in the case of the EU discussion I asked ChatGPT to create a built contribution for me. Overall, this is a feasible way to create a coherent transcript. And I have to say, it was perfect. As it turned out later, it was too perfect! Unfortunately, ChatGPT also cleaned up the literal speech. The literal quotes were no longer correct at all, but were themselves summaries and inventions. I’ll have to work on the prompt again. I used the free version.
Descript: ChatGPT recommended Descript to me as a transcription tool, but also as a tool with which you can easily separate and clean up quotes from the participants and then download them as audio. This also worked really well, although I would refrain from cleaning up the sound. It didn’t work out well with the Ehhs either. This is partly due to the fact that Descript only understands English here. Descript is only available as a subscription version.
Clean Voice: I used Clean Voice to cut out the Ehhs. This also worked well in this case. Unfortunately, there are some unsightly gaps. Overall, however, it works very well. Clean Voice is pay per use.
Fish Audio: I really have tested an infinite number of voice cloning tools. To put it mildly, most of it is real rubbish. Descript does not have good voice cloning either, and it only works in English. Fish Audio is inexpensive and delivers a very good result in terms of hitting the pitch. We can argue about the rest. In any case, it is not entirely effortless. There is now a kind of studio mode that makes everything much more convenient. You’re listening to the Voice clone right now and you’re welcome to rate it. To do this, I trained the tool with an approximately three-minute speaker text. Once in German and once in English. Because that actually makes quite a serious difference to the result. I used speakers that I recorded in the studio in the European Parliament a few weeks ago. In other words, they were of almost perfect technical quality. I have already tested various providers here. ElevenLabs could be a good alternative here. But I haven’t tested that yet.
Voiceovers: Voiceovers, for instance translations of the interview passages into other languages and then speaking through a narrator, are also very important to me. AI can be used very well for this. And Descript does this really, really well. In both German and English. Disadvantage: The time budget here is far too tight for my cases. That’s why I use Speechgen for the English texts. This offer is also pay-per-use. However, this only works well for English-language texts and is not quite as perfect as Descript. ElevenLabs would also be a very good alternative.
Audio editing: For the EU contribution, I packed the various audio fragments into an editing programme and did a lot of editing here. But I now always do the preliminary cut with Descript. Because here I can shorten and change a large part directly in the text. Unfortunately, this does not yet work perfectly. You can tell that Descript has been optimised for the American market. But it’s already working very well. In a timeline mode, I can then make the fine cut afterwards. And I have so far produced one podcast episode entirely in Descript. I had also used the recording function there. That also worked very well. When I record audio outside of Descript, Descript makes a quality correction that cannot be switched off. Why am I writing this? This quality correction does not work at all in some cases. The audio is then often very poor afterwards. Clean Voice is not good here either. Use the latest features in Da Vinci Resolve as a test.
Social Media Marketing: The suggestions for reels at Descript work really well. Incidentally, this is the reason why I decided in favour of Descript. Riverside produced much worse results for me. You can also store templates so that production is really fast. Out of five suggestions, I can always use between two and four. That’s enough for me.
Summarize: Unfortunately, there is no magic wand that you simply wave and everything is perfect. But without the massive use of artificial intelligence, I wouldn’t be able to produce my podcasts with this frequency and quality.
I’m still testing things out here. I recently recorded my voice with a tie microphone from Sennheiser. Unfortunately, the background noise here was very loud. I immediately filtered this out on the computer using a programme from Nivedia. The result rather semi… Descript then made a much worse audio out of it, so I ended up cutting everything in my editing programme.
I want to try out DaVinci Resolve more here soon, but with my little voice booth the recording is already better, which is of course always the better strategy.
I now have the thirty-nine Dollar monthly cancellable subscription to Discript and I always subscribe to Fish Audio on a monthly basis as needed. I definitely use Alrite and Speechgen a lot. I only use Clean Voice when there are a lot of äähs. Otherwise, and also additionally, I use the timeline mode of Descript. Descript is a revelation anyway. The text editing mode is just great and the post-processing in timeline mode saves me an incredible amount of time. What I find sobering here at the moment is that the transcription is not so good. It is cleaned up, which is of course rubbish if you want to edit using text. It is still optimised for the American market.
For this podcast I used Alrite and Fischaudio and then my editing programme. Of course, I didn’t speak this special episode myself either, but my clone did. However, I wrote the entire text myself.