Join Jason Howell as he chats with the founders of Augie and Revoldiv. These two AI tools are harnessing the power of AI to transform the video production and transcription process for creators and professionals.
Please support AI Inside by joining us on our Patreon!
Interview with Jeremy Toeman, founder of AugX Labs, and Augie
- Jeremy Toeman's background at CNET and the shift to video content
- The inspiration behind creating Augie and AugX Labs
- How Augie works and it's core functionality
- Live demo of Augie's video creation capabilities
- Challenges faced in developing AI tools like Augie
- The future of AI-generated content and its implications
- CODE: yellowgold for a free month of the premium tier!
Interview with Surafel Defar, founder of Revoldiv
- Surafel's journey from software engineering to entrepreneurship
- The origins and purpose of Revoldiv
- Revoldiv's transcription capabilities and user interface
- Unexpected use cases of Revoldiv
- Balancing coding and business responsibilities as a founder
- The importance of user feedback and adaptation in the AI field
Hosted on Acast. See acast.com/privacy for more information.
This is AI Inside, Episode 13, recorded Wednesday, April 17, 2024. AI Tools for Creators. This episode of AI Inside is made possible by our wonderful patrons. You're awesome at Patreon.com/aiinsideshow. If you like what you hear, head on over and support us directly. And thank you for making independent podcasting possible. Hello, everybody, and welcome to AI Inside.
I am one of your hosts, Jason Howell, taking a look each week at the AI that's inside everything, it seems, at least this year, it certainly seems like AI is everywhere. Do not have Jeff Jarvis this week. He is on assignment.
I'm sure we'll hear all about that next week when he returns. But I'm really excited because this, you know, we're still early in the podcast in the creation of AI Inside. We're still kind of playing around with ideas. So this week, playing around with the idea of bringing on a couple of founders of AI companies that are doing really interesting things in the creator economy, creator tools, leveraging generative AI to do really neat things.
So that's what we're going to do today. Real quick, though, thank you to everyone who has given this show, this podcast, a review on Apple Podcasts, really, wherever you're subscribing to AI Inside. We appreciate you taking the extra step and leaving a review. It really does help us out a lot. And those of you who support us directly via Patreon, Patreon.com/aiinsideshow, Bob Cobourn is one of our mighty patrons who supported us from day one. Bob, I salute you. Thank you for helping us do this show and keep it running.
We couldn't do it without you all. Patreon.com/aiinsideshow. All right, with all that wonderful stuff out of the way, let's get to our first of two interviews. And we are going to take a look at generative AI applied to video primarily. And our guest has a product that's currently in beta that I played around with a little bit and actually our paths kind of cross in interesting ways from our history and technology. So I want to welcome Jeremy Toeman, founder of Aug X, makers of Augie How you doing, Jeremy?
Good. Good to see you, Jason. Yeah, it's great to have you here. Great to get you on for the first version of AI Inside while we're doing this kind of like focused topic interview thing, which, you know, I've had some experience with your tool. I've also had some experience with the tool that we'll talk about a little bit later in the show. But this is pretty near and dear to my heart because I live and breathe video production and there are certain aspects of video production that I love and certain aspects that could be a little bit of a pain in the butt. And it seems like the tool that you have founded and created, Augie, really addresses the, you know, some of the things that can be kind of challenging around video. So let's let's start with a little bit of background first. Like I alluded to, our paths have had some crossover in our history. CNET, tell me about how your employment at CNET, which I worked at many years ago, led to where you are right now with Augie.
Sure. I mean, there is a running gag in San Francisco that at some point everybody worked in the tech industry will somehow find their way going through CNET. So true. It came true for us, I guess. I was brought in there.
Oh, my gosh, let's agree that pandemic means timing doesn't exist. So somewhere in the 2014, 13ish range, I joined as head of product at CBS Interactive running CNET and Roadshow and also collaborating with some of the other brand partners and I was there for about four years, helped. Basically, we did a full site overhaul, built a lot of new technologies, moved the company to Agile. It was sort of the big shift to video, actually sort of the first big shift to video for online publishers and, you know, which had mixed results for everybody. And I was there for four years, ended up moving from the Bay Area to New York suburbs and basically wanted to work with a local company because I wasn't really enjoying remote management, started a job at Warner Media and then four months later, the pandemic hit and basically spent most of the past four years in the bedroom. So from there, after a couple of years at Warner, all working remote, I had stumbled onto this sort of nugget of an idea and goes to what you were just saying about those kind of pain in the butt parts of video creation.
I wasn't even able to do the pain in the butt part work. And my co-founder was an AI video expert and the little chat we had had was like, what if we could help people make video and do all the really kind of confusing, hard labor part of the work by creating the first, what we call a rough cut, first draft of the video for them and let them work from there. And so, yes, the arc from from CNET to here is pretty straightforward. Yeah.
Although it's a very, very different environment. When you were thinking about this product, which we will kind of get into exactly what it does in a second, but when you were thinking about it initially, like, what I mean, depending on when you were thinking about this as like a problem to solve, like, was it obvious from the very get go that that AI and generative AI, the way we're using it a lot more now than we ever have, was the kind of obvious choice for something like this? Were you considering this prior to that negative AI being a glimmer in your eye?
Our prior to. So when we were starting August Labs, the only AI we were finding these at the time was Computer Vision Tech, which again, my co-founder was an expert with, and some NLP, Natural Language Processing. And that was all we really needed when the AI, we're still not sure what to say with GPT Revolution, LLM Revolution, whatever's going on today. When that all started going down, which we sort of tip the hat. Obviously, there's the Google piece from a few years ago, but stable diffusion last or August of 23 and then chat GPT a few months later.
That's sort of like everybody for us. It didn't really impact our mission. It did make some technologies easier.
But mostly what it did is it let us do different things that we weren't able to do before. Right. So we now have AI script writing in our product. I would never have guessed we could have done AI script writing back in January of 2022 when we were first started the company. We have generative video capabilities.
Again, new technology. We have AI voices powered by 11 Labs. So we have all sorts of new things that have shown up because of this generative AI revolution. We may or may not have done them one day. I mean, I wouldn't ever realize you could clone your voice and do all those cool things that you can do today.
So yes, it's impacted us greatly, but not actually from that core mission perspective. Yeah, yeah, you knew the problem that you were kind of looking to solve. And it just so happens at least right now, it seems that the obvious go to is, you know, training an AI system around some of these challenges and applying it. And coming up with, you know, a pretty, pretty awesome product in the process.
So tell me a little bit about what Augie is specifically, like what it's what its purpose is designed to do and who is it particularly made for right now? Absolutely. So there's there's lots of different ways to think about video.
One framework that we use is this. I'm going to divide all of video into exactly two categories for a moment. Category one will be the video that is the destination.
Entertainment, infotainment, educational, documentary, movies, watching people play on Twitch, where you come to be entertained, educated, whatever that purpose is. Second type is a promotional video. And that video is designed for everything else. That video could be a homepage or landing page video, could be a movie preview, could be a preview for this podcast episode. The whole point of that video is to get the user to do a thing like subscribe, smash that like button, all those kinds of things.
We operate in that world. While we do have people using it for storytelling and poetry and podcasting, all sorts of cool stuff, fundamentally, the kind of for why we're here is to help everybody who needs to market something, do so with video. Our research has found that only one in three people in business can promote their own product, goods, service, whatever it might be, using video as a platform.
So our vision is to really change that dynamic. Yeah, love it. OK, cool. So I've just kind of pulled up the site for anyone watching the video version, which you can get at YellowgoldStudios.com on YouTube, the YouTube channel. Pulled up kind of like the basic kind of place dashboard, if you will.
For firing this off. And I think what I like about this is you don't necessarily have to have a video to feed into it. It works like that.
But as you can see, as anyone who's watching can see, that is one way to do it. You can also feed it a script, which I had a lot of fun with because I went into perplexity and I said, write me a 30 second video script that focuses on what a solar eclipse is from a scientific level, why it's special, all this kind of stuff. And perplexity, which is tapping into Claude, gave me a little script, which I ended up ended up putting into the script section of Augie and ended up with a with a video to match it. So the whole process, by the way, is incredibly useful and very easy to do. Yeah, how how is this really transformed? Because actually taking a step back, when did you first launch your beta? And then how has it transformed in recent months? Like it seems like there's been a lot of development, a lot of iteration.
Absolutely. So we launched our private beta last January, January 23, and we then opened to a public beta in May of 23. And in the next couple of months, we're going to be launching our enterprise version, which will be helping marketers, big companies, mid-sized, etc.
This version, the the Augie you see now, we still consider our beta because we really have some core things we're finishing up, but that'll all be changing over the next couple of months. And one thing to sort of explain to people who can't see it, what Augie does is it listens to your words. This is sort of the fundamental thing we do is whether you are doing a video on the eclipse like Jason is right now, or an ad for a podcast, or an ad for a commercial guide, or a homepage, any video, we listen to the word spoken.
Those could have been an AI voice or a human voice. We don't care. We generate the transcript. If you're watching, you can see the transcript of Jason's story here. We then look at the content in each section of the transcript. So imagine if we we divvied up all of the words spoken into effectively a storyboard, which is literally what we call it.
We then, as our last step, use AI to try to match the words spoken in the storyboard with the visual showed on screen. So Jason, I'm going to ask you to go to the very top of your story here. If you don't mind, just scroll all the way up to that. Yep. And let's go to that. As I recall, so the first sentence here, I'm just going to or why you hit play for just a half second at that very beginning spot. A solar eclipse is a cosmic event that happens when the moon passes directly between the earth and the sun. And that's a generated voice.
The moon blocks out the sun's bright light. So let's pause it here for a second. There we go. Now, what we've seen are four clips that I'll be honest, don't fully match the intent of this video. So let's go back to that very first one. Let's click on that first one if you can.
You can also, by the way, click on the storyboard so you can click on. So we're seeing a, I don't know, it's a woman doing some kind of light dance. I'm not sure.
Yeah, with the neon light. Yeah, it's such a weird video, everybody, that I can't give you a proper audio cue for it. But here's what we're going to do. We're going to fix Jason's video all together in the next like 30 seconds. So Jason, click Select Media for me.
This is the one thing we're in the upper left right there left from where you are. I see. I see.
Sorry. I was looking at the tool set. Now you see it says cosmic event. I wanted to change and type in there where it says cosmic event. Just change it to the word eclipse.
Eclipse. OK, we'll go ahead. So what's happening now is Jason's seeing different results. We have about 110 million clips from Pexels, Unsplash, Getty and others. And so Jason can now pick one that might be more appropriate for the video than the Dancing Lady. So if you go up to your filters there, there's a little filter you can select. Yeah, you can do things like say this is where you can upgrade to the to the premium Getty catalog. You could also decide you're making a fun video. You want to use animated gifts and memes instead of stock content. And then the last thing you can do is also specify you want portrait only or landscape only or things like that.
Yeah. And so the point of doing this and what Jason's also doing is now he's tweaking with his content and switching it from landscape to portrait. The fun thing I have watching you do this is the moment we unlock our users is to how to do certain things. So right now he's changed blocks out light to the word moon and is picking from some fun animated videos.
It becomes so easy to edit your video. And so now I'm going to do one more thing together, Jason. Go click on transcript on the far left there, if you will. Here, we'll go ahead and switch that to a space shot. OK, now sorry, I'm getting pulled in. That's OK. The transfer is what I want to do on the moon. You see where it says click on the words the moon right next to for a brief moment. And now at the edge of the word, there's like a purple line. I want you to drag that all the way over for a brief moment. Yep. There we go. I know I need to continue.
And so what you can't see is a girl listening. Why I definitely got to watch Jason's podcast and obviously I'm a pro video guy. What has happened is the entire segment that was showing some random thing is now gone from the video.
So what we're trying to introduce here is a concept where you're basically just pointing and clicking your way through video creation. You don't like the match? Swap it out. In fact, if you were making a promo video for your podcast, for example, you could go over there to my library and upload your own assets. Yeah.
So if you have stuff, you could upload it here. The last fun thing I want to show you is click one more time over there on select media and see that good night moon thing up at the top there. This is the latest thing we just added to August. We're going to test this in real time. I want you to try dragging it on top of your video this time.
Drag it to the right. OK. So there you go.
Moved around. And now you can do fun stuff with it. You can decide when it comes. If you click on objects, you'll see it's in the objects list now on the far left. And that's where you can move it around, change the timing, give it a background color. We're not going to do the rest of this video. But the point of it is that in just a few minutes, literally under 10 minutes on average, you should be able to make a 30 to 60 second video.
That's again, thought leadership, product demo, advertising, promotional content. You know, the other thing you can do is upload the video. So if it's just you talking and you want to do like a John Oliver picture in picture thing, that's where that placement tab on the far left would also show up.
Couple of last things to show for anyone who is watching. Click on captions. And so you can you've already turned on captions, but you can move them around. So maybe you want them at the top of your video instead of the bottom. There's a lot of different closed caption options we have built in.
We're trying to take advantage of making things have a very modern kind of tick talk like effect. But if you want just basic options, you can do that too. One more thing, click on that music tab with the little red notification there. We have over 6,000 commercially licensed tracks you can choose from. This is one of the other key things about Augie that I probably should have mentioned before now, every piece of content that we surface for the user through the stock or the premium images has commercial rights included with them. So you can use Augie to make ads and all of that footage that we surface for you, you can go ahead and use. Now, again, we don't assume you're going to use only stock for your video. You're probably uploading some of your own content as well.
It's kind of what we hope you do. But it makes it so that assuming you own your own content and all the stuff we provide, you can literally be up and running on Insta, TikTok, LinkedIn, any kind of social ad platform and promoting content in minutes. And that's a lot of what we're here for. It's not just ads.
I talk about it a lot. But it's transforming that concept of a video takes days to weeks to produce into get something going for your brand in minutes. Oh, man, I know that.
I know that pain very well. I'm working in a video. I mean, taking all the pieces, collecting them, organizing them, making place on the timeline. And what we're seeing to a certain degree is a move towards tools that enable this to be a heck of a lot easier and that the web interface here makes it really dead simple. Have you in the development of this? I mean, right now, I keep referring to it as the year of AI, because not because AI is new, but because this is the year where I feel like there's so much energy and attention focused on, OK, well, we've seen, we've had a taste of what it can do that's really wowed people.
How can we take it to the next level? In doing that, have you encountered any challenges working with this tool that you didn't anticipate kind of in this in the last year as these things have really kind of skyrocketed in capability? What has been your perspective from the creation standpoint of a tool like this? First of all, it's a fascinating time to be to be playing with technology, and etc. And I think, you know, if you're listening to this and you haven't even tried mid-journey or chat, you'll be tea or hopefully hockey.
But if you haven't tried any of the basic tools, you've really got to give them a shot. You've got to get into this world and understand what's coming because none of this is a fad. While many of the startups will come and go and etc. This is all very real. And I think what I find so interesting in both a good and sometimes a problematic way is so much as being innovated on and invented and experimented with sort of overlapping real time all together. I think it's causing a lot of consternation in the world out there.
A lot of fud. We're often trying to explain how what we do is different than say generative video. Right. We use generative video, but we ourselves are not generative. Right. We are not creating.
Yeah, you're not recreating the content from a prompt, but you are matching preexisting content to a transcript or a prompt or something like that. That's actually we do have the ability to generate using stable diffusion. You can generate an animation inside of August.
So if you go to the generate tab and you want to type in, so here, let's let's it takes about two minutes to render. But what was the where are you in the video? What's being said at this time? Let's see here. The temperature drops. How about that? So why do you why do you say like cold person watching an eclipse?
Cold person watching an eclipse. And we'll fire that off and see you have to select style. Oh, yeah, you have to hit the style. So you see where it's at. OK. Yeah. So let's see here.
Cinematic, sure. Wine. Oh, no, sorry. That was the wrong button. Dang it. That's OK.
I didn't I didn't scroll up. It's going to build me something. Yeah. While generating, click that see more button. It does take about two minutes.
So this if you're watching with baited breath, you're going to have to give us a second. So we have we have over about 137 unique styles here. All of which designed to let anyone tell their story the way they want. Some of them are trained after classic advertisements. Some of them are playing every time in style.
Sorry, I'll use that word clearly. Some of them are styled after video game concepts. Some of them are styled after cinematic ideas. I love the all are developed to try to the papercraft ones, I think, are gorgeous, by the way. There's some of my favorite. Oh, yeah, that's beautiful.
That is really cool. So what what are you tapping into? Are you using open source AI models to do this stuff? Or is it proprietary?
How does that work? It is. OK, so this in particular uses, excuse me, uses the VD model from stable diffusion from the SDXL overall rendering. We as a company have integrated with Dolly 3. We've integrated with mid journey. We've integrated with opening I slash chat to be T. We are an 11 laps. We are going to be integrating with.
Sorry for the Apple thingies. We are going to be integrating with some of the music generators. We are going to be incorporating more of the sort of text to video, prompt to video, image to video models.
We were playing with some of those recently. You know, we're just waiting for some of the platforms like SOAR to become commercially available so that we use them in the product. So our vision, though, is that the user can sort of pick and choose whatever they might want, right? The best video for your company, brand, etc. Will be a lot of your content and maybe some amount of stock content and maybe some amount of generating content.
And I think that future, which is really on like right on our doorstep, is I think what excites me so much about what we're up to. Yeah, as I was playing around with Augie, one thing that really came to mind for me is that there's this trend, especially on YouTube, this trend of like faceless YouTube channels, channels with content that is that does not require a person in front of a camera necessarily. And in this kind of current moment in AI is almost completely generated by AI top to bottom, right? It's like we use the AI to to write the script. We use the AI to generate the video to match the script. We use the AI to mimic the voice or to create the voice that is speaking and everything. It's truly like human lists.
And when I'm looking at our comments on our live stream, Tay, thank you, Tay says, one day we'll be able to watch completely made up TV channels. I think to a certain degree, we're kind of there. If we're not there, we're certainly knocking on the door. I would say the furthest we are away is like about to knock on the door.
The furthest we are is like, you know, and given what we're saying, some people do with Augie, like we are seeing people create. I don't know if you're following the trend of faceless or headless YouTube videos. That's what I'm talking about. Yeah.
Yeah. So there's a ton of those. And what we've learned is like they make users money. They are traditionally actually harder to create than they should be because they all require like text on screen. That's one click with our product. Like it wasn't really a market we were going after, but we're starting to see a lot of that market come and use it because it does such a nice fit.
So I think what they say is like there's a stat that I just read. 3.7 million videos are uploaded to YouTube every day. That doesn't include TikTok. That doesn't include the rest of the world's social video sharing platforms. So that's kind of the smallest number we could argue. It would be pretty safe to argue we're somewhere in the eight to 10 to 12 million videos uploaded every day. And that's not just by my daughter, by the way. So maybe two or three.
But yeah, you'll never watch or listen to this so it's fine. Now, most of that is is personal branding. And most of that is, you know, a phone recorded to where you were just sort of taking this and I fully agree is that I believe the opportunity for other kinds of content is fascinating.
Like I'm not saying this to cheer or horn. We were I was at the I was at the Allentown Film Festival this past weekend. We got invited to participate. They did something called the Augie AI Challenge. And what was so interesting is what they did is they found it's either 12 or 15 people in Allentown, the mayor, Congresswoman, head of interior, but also people who just worked in the town and some thought leaders and some involved folks and they had them all tell their sort of stories by Allentown. They then took those stories, uploaded them to Augie and made them into sort of split screen effects. And what they did is they combined again, their own assets, so they had each person uploaded a photo. They also used stock when they were talking.
The mayor talks for about walking his dogs on the beach with his wife. And there's some, you know, no one made that video. Right. No one was there to record a video of the mayor, his wife and the dog on the beach. So they used a stock piece of a random guy, random woman, random dog, random beach.
And then at some point there was something else mentioned about growing up as a boy on the outskirts of Allentown. Well, they generated that clip. And so I thought it was really interesting to have this hybrid video that starts with a human, edited by a human, but content kind of kind of kind of kind of content kind of suggested and organized through AI.
And I really, really liked that form of storytelling because it's it's doing a thing we couldn't do before. It's not taking away a job. It's not replacing a thing.
It's additive. It's a new thing. It's so yeah.
That's something. No, no, no, it's a soapbox that that that we and we being myself and Jeff Jarvis talk about a lot that these tools don't necessarily have to replace someone. They can be a supercharged tool set for someone to benefit from. And you're right, these are doing things that we haven't seen before just to kind of close the loop on the generative clip. Now, I should have picked a longer portion because it's a very short clip.
It's 1.3 seconds, but hey, that didn't exist before. And there was a man there was a man. Oh, that's the one.
That was the cold person watching in the clips. I mean, yes. And here's the fun part. Go back to that one. So now, Jason, let's pretend like let's say you were really making this video for real, right?
And you really cared about you. Like, yeah, that's not actually now that I looked at some of these other styles, I want to go different. So I want you to go hit C. Go back and pick a different style entirely. Let's see here. OK, so let's go.
Let's just go back because it's right there. Comic book. Perfect. And now scroll back up and hit generate again. OK. And so this is the point like this is the world we live in now. Jason can keep doing this until he likes it. We find it.
Yeah. And granted, by the way, I'll be transparent in the future. There'll be costs for all the generative abilities because it does take tremendous server costs, although again, I would surmise about a year from today.
Everything's not everything, but a lot of generation is done locally at much, much lower costs, but still like that ability to tweak that story to your perfect level and now start thinking about the full ecosystem. So imagine the stock stuff is there to suggest to you, hey, maybe put a person shivering, right? And so you can go look through the stock and be like, no, I want I want someone that looks like me shivering, right? And you can upload a photo and then animate that. Right.
So all of those kind of things. No, there's there's no version of that in our prior history. That would be I don't know, wouldn't be like three weeks of re-rendering by Pixar to do to do a minute or something like that. Yeah, it's it's crazy how democratized this. This has become in such a short period of time and then the influence it's going to have is going to be pretty remarkable. It's funny to say that I one of the other things we're noticing is how and I'm not exonerating anything I often bring up, you know, separate the art from the artist. There was a Lucy K. Bit back in a different time where he was on. He was talking to Conan and he said, you know, everything's amazing and nobody's happy. And he's like, you remember that? Yeah. So I remember this whole bit on how fast and slow things move.
And he's talking about like people waiting for stuff from the internet. It's like it's going to space and back. And you like slow like give it a second.
And I think we're doing here right. And I told you, I'm like, hey, buddy, that takes two minutes to generate. You know, we're going to have to buy time or like no one's going to want to watch that. In two minutes, you got this clip that never existed before. Yunus, do you have questions about this?
I was just curious. that you were going to focus on in the history of the world, and how is this a good time for you to kind of get started with your content. so I'd like to thank you, I've had different people who actually think about this and their goals and their goals. comes with trying to encourage people it takes a little bit of the fun out of it. So this is definitely a fun product. I do have to let you go, but where should people, like you've got a beta going on right now, can anyone get in on that? What do people need to know as far as checking this out for themselves?
I tried doing the comment thing by messing it up. We have, you go over to meetoggi.com. So M-E-T-O-G-A-U-G-I-E .com.
So meetoggi.com. And can I give your readers a little promo code? Not readers, sorry, listeners. Listen, yeah, absolutely.
All right, well, if you enter yellowgold when you sign up for Augie, that'll get you a free month of the premium status. So that's a Jason and Jeff special. By the way, please tell Jeff I said hi. I've not seen him in probably 20 years. Oh, dang, I'm sorry to miss him. Oh, that's all right, we got plenty of time. Sorry, missed him.
Yeah. Thank you so much. And by the way, anyone listening who does take the time to try the product, we're on Discord, we're on LinkedIn, wherever we're listening, we want your feedback, right? Our product keeps getting better thanks to our users. And I'm so appreciative of everybody who's taken a chance on our little startup to hopefully find some way to get video really working for them. So thank you all. Big time respect for me, anyone who can create something like this, have an idea and then turn it into a full blown product and be where you are with the development of this.
I just find it incredibly impressive. Jeremy Toeman, course founder of OgX, the makers of Augie at meetaugie.com. Jeremy, thank you for taking time with me today. This is a lot of fun. Thanks, great to see you again. And let's make sure our paths cross one more time before the next round back at Sina.
We'll end up back at Sina, you know. That's like good. Oh, no. Is that what's next? It's a giant circle of... That's how it works.
It's been the giant red ball of life. Yes. All right.
Thank you, Jeremy. We'll talk to you soon. Okay. All right, bye-bye. Okay, that was a lot of fun.
Oh, and I should also say just real quick, I'm gonna add it to the stage here real quick. We do have the comic book version of this. There you go.
It's the comic rendered version for the video listeners and watchers out there. All right, we do have more coming up. You definitely don't wanna go anywhere. Hang tight for one second. All right, it's time for round two of our interviews with creators doing cool things in the world of artificial intelligence. And I'm super excited to welcome to the show, Surafel Defar, who is the founder of Revoldiv, which is a service that I... I'm trying to remember how I came across this. Probably if my memory serves me correctly, I came across this on Reddit sometime last year and I was like, whoa, a website that I could use to do free transcriptions of some of the podcast work that I'm doing.
It was kinda a little bit early on in my kind of experimentation with LLMs and trying to get a sense of like, how can I use it in my daily life? And Surafel, who is here with me, created a tool to do just that. It's nice to have you here, Surafel. Yeah, thanks for having me.
Yeah, this is so cool. So you've worked as a software engineer for a decade leading up to founding Revoldiv, maybe even longer, but to my estimation anyways. Tell me a little bit about, you know, going from creating products for other people to creating your product that drive to entrepreneurship and really tapping into this modern moment with AI tools like Revoldiv. Yeah, so I started programming in high school and I've always been fascinated with having your own business. And even when I was a kid, I've had some side projects for fun and some side projects that turned into a business. And computers, especially the ones that internet came out, it was a massive opportunity to impact a lot of people's lives and do some cool stuff. So yeah, that's why I decided to focus on like software engineering for my career. So where along that pathway did you start to kind of play around with some of the tools that I imagine, you know, these AI tools that really feed the power of Revoldiv, how did that kind of come to life? How did you do that?
Yeah, so I was actually working on a different project. I listened to a lot of podcasts and I wanted to build a search engine that enables you to search what was said in any podcast. For example, let's say, you can say search if there are any podcasts that talked about self-driving cars. And this would not only search the title and description of podcasts, but it will give you the exact place where the podcasters talked about that specific topic and that you're actually able to play the exact moment of when that was said. And this could be from any podcast in the world, you know, and think of it as Google, but for video or audio files. So in order to do this, you have to, of course, first convert the audio to text and build all the necessary tools to create a search engine like indexing it and so on. So, and at the same time, this was like a couple of years back, transcription AI models were maturing and becoming good enough.
Fazen, they were above like 90% accurate for an audio that is recorded in a studio setting. But this turned out to be a bigger undertaking at the time and the tech and the tooling to do this in a satisfactory way was not there yet. But in the process, I built a good transcription user interface, I believe, and I decided to focus on that first. The idea was to make it as simple as using Microsoft Word to edit your audio transcription. Text editors are very, very intuitive and everybody is familiar with them. And so it kind of made sense to design the user interface around that modality.
It's really a simple concept. You can click on the individual words on the transcribed text and you're able to seek to the audio where that was uttered. And this along with confidence indicators for each transcribed words allow people to quickly scheme through the text and fix transcription errors. A task that would have taken a professional transcriber a couple of hours would now take you, you know, a couple of minutes to complete. So this truly unlocks many more applications that were not feasible before because you can do it in a couple of minutes. It's a very big force multiplier.
And yeah, so that's how Revoldiv was born. That's really cool. I love this tool. I mean, and I realize in my own use of it, I'm probably only personally scratching the surface because really for me, it's a part of my podcast workflow at this point. You know, at some point I am up with an MP3 of the podcast that I'm looking to publish. I upload that to Revoldiv from this screen, you know, upload audio video. And, you know, in a couple of minutes, I end up with this is actually last night's episode of Android Faithful, which yes, I did transcribe in order to do some of my, you know, behind the scenes work on Android Faithful.
But I end up with the transcript of the episode. Now, one of the challenges of course, and I think it's probably near impossible to totally solve for this, but I'm curious to hear your take, is that on Android Faithful, we do have names that are more challenging than Jason Howell, which, you know, an AI is gonna get right 99% of the time. I find that it gets my name right. But Huyen Tue Dao, Mishaal Rahman, these are things, you know, that you can see right here, Saul Ramon. And I know part of the interface when you're uploading is to be able to put in extra kind of prompt addendums. And so sometimes I'll put those names in there. Sometimes it gets it right.
Sometimes it doesn't. That's gotta be a really big challenge when you're working with transcription and voice recognition and all this stuff the way you are. Yeah, yeah, definitely. It is, so the models will become even more better. We have, of course, we have an enterprise plan where like the accuracy is way better. And then we also give you like some like confidence levels, accuracy indicators so that you can quickly correct those.
So those are some of the things that you can do. And it automatically identifies the speakers for each segment of speech. I don't know why it did not show up there. It wasn't activated. Sometimes when I'm using the tool to export out to like a text file or whatever, I just turn that off. Cause I just want the content.
I don't necessarily need the speakers other times they do. So yeah, yeah, yeah. So I mean, these AI models will always have certain uncertainties, you know, and what we can do to mitigate that is to give you a good user interface. So you can like find the errors. Like if like, for example, let's say the accuracy level is not that good for a certain word, you can like quickly highlight it and then just jump to that part of the audio to correct that. And it can like logically group paragraphs for you and you can, you know, edit the transcripts and create audiograms. And then audiograms are like that short snippets of video with subtitles for sharing on social media. It also has some like good annotation and commenting systems. For example, if you send this out to like your editors, you can like highlight the specific part of the audio or video and then they'll be able to quickly kind of correct the errors that they encounter. Yeah, so like I think a better giving, equipping the user with a good user interface so that they can quickly correct these errors is, I think that is probably how you would mitigate these types of errors.
Yeah, yeah. So you've created this really cool tool that I'm sure, you know, I have to imagine a lot of people are using free as a very good marketing approach for most anything. And I think it's really cool that that it's not just free like up to like 15 minutes or whatever. You give people two hours of transcription audio per session or just a per upload, really?
For I mean, not forever, but yes, two hours. Yeah. So.
Yeah, like. So unlike other services, our AI engine is very efficient and it's optimized end to end for transcription service. And we use a lot of tricks to achieve that. We also try to minimize storage requirements. If you haven't noticed, we don't even force you to log in to use our application. And if you're not logged in, we only keep the data as long as you are actively working on it.
Let's say after a week, if you're not working on it, it gets deleted. And this has advantages both for the user and us. For the user, their data is anonymous and will be deleted after a couple of days. For us, it keeps our storage costs very low.
And if you want to keep your data and, of course, use some other features, you have the option to log in and then, you know, be able to see all your data, what you have uploaded and transcribed before. And we use these and other cost minimization techniques to keep our costs low. But of course, if we want to add other advanced features like generative AI, we need to start charging money for it. And but of course, like we actually charge for our most accurate AI model, the free version is a couple of notches down than like like our best model.
And the smaller model is actually very fast to process things. And I would say that just generally these AI models on average are better than a human transcriber. By that, I mean, if you are an average human transcriber, right, you need to be well versed in different fields like to transcribe the audio accurately. It could be that you have never heard some obscure financial or medical term or that the audio is garbled. Right.
But for any AI, this is not that much of a problem, as it is trained on almost all human activities and it can predict the next likely word, even if the audio is not that clear. And some like so some of our customers have strict data requirements, for example. Yes. And they don't want data to leave a certain country. So we deploy our service in their own country. So this like we charge for these types of activities. We do we also do like custom integrations for small to large customers. We've had had we've had some pilot programs with a Fortune 500 company. But we are planning to introduce a modestly priced plan so we can kind of sustain this and add more features that would currently be very expensive to deploy in just the free version like generative AI, automatic summarization, chapter creation, even automatic rewriting of the transcript. And then when, for example, let's say you change something on the the transcript and goes and changes the actual audio.
These are some of your capabilities that are available, but they're very expensive to deploy as I hadn't thought. I hadn't considered that until this moment. So, you know, the tool that you have, you've got the transcript, you've got the audio that was actually recorded by someone. So when I, you know, when I upload my my voice recording to Revoldiv, it's using my true recording that I recorded last night on the podcast to create this transcript. But you've also got all of this audio information to train a voice replica of yourself so that if you misspoke somewhere, you just type it in and it takes the training of your. That's that's an amazing.
Yeah, that is awesome. I hadn't considered that. Yes, we we actually did. We implemented that and then we kind of tried a little bit, but it was too expensive to just deploy free. And then we didn't imagine so payment. Yeah, payment system for just regular customers. So, yeah, we kind of decided to kind of hold off on that.
But yeah, definitely that is coming. There are a lot of cool things that you can think of. You can change your voice, let's say you can remove filler words. And then like so there are softwares that remove filler words. But if you've seen like in social media, they're like cuts. Like as in they're very abrupt. But there are models now they can where you can smoothly transition into, you know, like, yeah, we're living in a and a very interesting, interesting time.
But yeah. Well, you know, that it's always and it comes up often on this show, that line between reality, however you wanted to find reality and not reality, you know, when we're smoothing the lines and doing that. Like I'm reviewing a phone, the S24 Ultra, and it has this super slow motion mode that when you go into it, even if you were recorded at low frame rates, it fills in the other frames generatively to, you know. And so so I guess the question that becomes because it's easy to hear what you just said and I'm sure there are some people, it would be easy for them to hear that and be like, well, that's, you know, that's really bending the rules of reality too far. That isn't right. I don't like that.
But it's it's always like, where exactly do you draw the line? Because filling in the gaps of the frames that weren't there is doing the same thing. But it's at a different scale. And is that OK?
Is this OK? And why is there a difference? You know, it's just an interesting time we live in around reality and AI. It is.
That is true. I remember Samson, they had a problem with when you take a picture of the moon. Yes, the moon. You know, they are generating like a fake moon.
That actually looks very appealing, but it's not the picture that you took, you know. So yeah, there's a fine line, like, for example, let's say removing filler words or saying, it's like you're trying to convey a message, right? So doing that versus, for example, like adding a person that was not there or like adding an object that was not there. I think that pushes it a little bit. I mean, at least for my comfort, you know, like it's not that. I wouldn't go calling that real. Yeah, yeah, and then pretend as if it were real. Yeah, I think at the end of the day, yeah. Yeah. And you can also inform the the the user, like what YouTube does now that word, like if it's like if you used some type of AI to generate the audio, then you have to actually inform the user that it's generated by an AI.
So those kinds of like, you know, again, user like informing the user or having some sensible maybe loss to do this could mitigate, you know, these kinds of things. Yeah. Yeah. How how is Revoldiv? And am I saying that right?
I've always been that. They revolted. Where did the name come from, by the way? I'm super curious about that. Yeah, it's an acronym of some like inside joke that we have. I don't know if I can say it live on.
OK, fair enough. But I I wondered, I was like, God, is this from something? Or yeah, what does it sound like? Though, when when you when you say it, like, what does it sound like? I don't know, revolving door or something like that. OK, OK, OK. Yeah, something like that. Yeah. Yeah. I don't know. I've just I've asked myself that question many times when I go there.
Yeah, maybe I'm missing something. Anyways, the question that I was going to ask is in creating tools like this, there's the use cases that you anticipate. And then there are ways in which the tools are used that you didn't see coming because people are infinitely creative. And when they look at something like this and go, wow, that's the solution to this particular thing. Are you aware of any ways that like you didn't anticipate in the beginning and then you've heard from users like, oh, I'm doing this with it. And you're like, holy cow, that's actually really cool. Didn't think about that, but that's really neat. Yeah, yeah.
Yeah, for sure. So we have a lot of different users. Some are content creators who use it to transcribe different media files and create audiograms. We also have lawyers, detectives, professors and students who are using it to transcribe lectures.
And hand that out to their students. If you have seen actually Revoldiv's comment section, it allows you to highlight a specific part of the speech from the transcribed text and be able to link it and share it with anyone. So a lot of professors use this alongside with creating chapters for their lecture. So, for example, let's say like you're listening to a lecture and then if people commented on that specific part, it's very easy for you to go and check the comment section.
Yeah, so those are like the regular users. But one that surprised me the most was a choir director who said that they use Revoldiv during competitions where they perform once and get audio feedbacks from judges, which they then need to take and quickly make changes to rearrange their performance. And they're using it to probably like to quickly transcribe it and share the notes among the group.
So this is actually one surprise. There are some other some other use cases that surprised me, too. Like usually like law enforcement uses it like for transcription.
Lawyers. Yeah. But yeah, interesting. Journalists, are you hearing from journalists? I mean, you know, definitely in that category of meeting very stringent data requirements, you know, yes, there was one actually who contacted me. There were a nonprofit too, and they were in a third world country. And they they they didn't have, of course, like they're very expensive. The other two are very expensive, so they were using our system.
And then they were asking me about the data privacy issues and then to assure them like so we, you know, we use like like best security practices in the industry. But yeah, there are people using it from all over the world. So when you price your SaaS, it becomes problematic for like smaller businesses in third world countries, like not not not US, you know. So a lot of people from like even from European countries, right?
Like, let's say if you price it out $50 a month, that might be very expensive for a lot of European customers, you know. So yeah, a lot of people using us from all around the world, which is which is good to hear, you know, even nonprofits and and the likes, you know. Yeah, that's amazing. Well, I really love what you're doing. And I and I again, as as we kind of walk through this, I just realized how how how basic my usage is compared to what you can do and the integration between, you know, editing the audio and everything, you know, I'm I'm really using it for for kind of straight transcript kind of capabilities and sometimes generating closed captions and stuff for some of my videos and stuff. But yeah, hopefully people who are listening, you know, some people watch here on YouTube, of course, but most people are listening, you know, get get a better sense of the fact that this isn't just a transcription tool. It really is kind of like a media creation tool kit, which I guess while I say that out loud is kind of a theme of today's episode of AI inside.
But it's it's really cool. Do you as a developer, do you find yourself spending more time running the business of Revoldiv versus actually doing kind of the coding and stuff like where is your heart lie in this process? Do you do you enjoy? Do you still get to do a lot of the coding with this? And I'm sure you learned a lot about AI along the way. But yeah, yeah, yeah, I do most of the coding. And yeah, you have to like kind of balance out, like you have to divide sometimes like there be weeks where you have to focus on programming.
And then the other week would be focused on business activities. It is very difficult, you know, you have to. It is. I mean, the main challenge in this field is that things move very, very fast. And you have to adapt as fast, you know, like larger companies are not as slow to adapt to newer systems like in the old times.
They've learned their lesson, you know, and if you have seen, for example, Adobe, they rolled out generative AI features within within a couple of months after the initial research papers actually got released, you know. And that's a very past phase of a fast pace for a bigger company. So as a small company being able to execute faster and shipping faster are the main, main challenges. And while doing that, you have to watch out for burnout while doing this, you know, because you have to take care of the business, you have to take care of the programming part and security, privacy, deploy. Like there's so many things that you have to, you know, do. It's that those are the challenges.
Just time is just 24 hours, unfortunately. And, you know, don't I know it? Oh, my goodness. Yeah, so valuable resource, very valuable resource. You have to be able to manage your time. And yeah, those are the biggest challenges.
Yeah, not run yourself into the ground. Well, I think you're doing a great job from my perspective. Anyways, as a user of Evaldiv, I'm a huge fan. I've definitely spoken about it a lot on the different shows that I've done. And I'm thrilled that I got the opportunity to bring you on. And I love that I'm in the position to be able to talk to people, building the things that I love and use.
So yeah, really cool stuff that you're doing. Thanks. Thanks. Appreciate it.
Like, absolutely, absolutely. Surafel Defar is the founder and creator of Revoldiv. I'll spell that out for people in the audio who might not know how to spell that R-E-V-O-L-D-I-V dot com.
Everybody should check it out. Surafel, thank you for hanging out with me today. I really appreciate it. And best of luck with the continued development of Revoldiv. Yeah, thank you, Jason. I appreciate it. Yeah, absolutely. We'll see you soon. All right.
And with that, we have reached the end of this episode, different episode of AI Inside. I really want to give my thanks again to Jeremy Toeman from AugEx Labs and, of course, Surafel Defar from Revoldiv for showing the product, but also kind of talking about what it takes to create these things. I'm always fascinated with the development side, the entrepreneurial side that drive to make something out of nothing.
And I just have an insane amount of respect for people who are able to do that successfully. So go check these things out. Let us know. You can contact@AIinside.show if you want to send us your feedback and let us know what you think about this format. These tools love to hear from you. AI Inside, the show, the podcast, we publish this every Wednesday. So we do record it live on Wednesday mornings at 11 AM Pacific.
If you go to the Yellowgold Studios YouTube channel, you can find the live stream there each and every week. Sometimes that shifts around, but most of the time it's pretty locked into that time. But the majority of you get it in podcast form and it publishes later in the day on Wednesdays, publishes usually immediately to the Yellowgold Studios channel on YouTube and video version.
But then I, you know, I tighten things up and I run the audio through the transcription and Revoldiv and everything and give you a little bit more as pay off for your patience a little bit later in the day. AIinside.show is the site to subscribe to the podcast. Patreon is how you go one step further and you support Jeff and I directly with the work that we are doing here. And we've got a lot of ideas coming up to bring on some really excellent guests to kind of continue playing around with the format. And we can really only do that with your support. So thank you to the patrons who are already there. We'd love to have more of you. And I'm looking to increase the amount of extras to entice you to sign up.
So tell me how you feel about that with your dollar. If you don't mind, we really appreciate it. Patreon.com/AIinsideshow. I want to thank you again for watching, for listening, for supporting us. And we will see you next week on AI Inside. Bye everybody.