Skip to content

Commit 6700fe7

Browse files
committed
Fixed images
1 parent 71b2117 commit 6700fe7

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

docs/developer-case-studies/transcriber-ai-metadata.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ We have been talking, often even a bit inappropriately, about artificial intelli
1414

1515
I'll make it clear: I can't draw anything that isn't a scribble, so platforms that allow you to create artwork by simply typing in text have been something very appealing to me from the beginning.
1616

17-
![](/static/transcriber-ai-metadata-/Firefly.png)
17+
![](/static/transcriber-ai-metadata-Firefly.png)
1818

1919
Then Goliath came along, ChatGPT, and it was a game changer for everybody, going on to create new needs and, literally, revolutionizing more than one industry.
2020

@@ -54,7 +54,7 @@ For the past year in English, and for the past three years in Italian, I have be
5454

5555
For the last two years I've been making use of this internal tool I wrote myself, called **SciattaGPT** (the literal translation would be “*dull*, *sloppy*, *scrappy GPT*”), which I use to create the episode summary and title suggestion, always making use of ChatGPT, first with the 3.5 model, then with GPT-4 and now with GPT-4o-mini.
5656

57-
![](/static/transcriber-ai-metadata-/SciattaGPT.png)
57+
![](/static/transcriber-ai-metadata-SciattaGPT.png)
5858

5959
In the case of this SciattaGPT, all prompts are predefined, rather statically.
6060

@@ -68,13 +68,13 @@ At some point, though, all these ingredients, in my head, came together, and I s
6868

6969
I started out developing a very simple application that would act as a front end to a relatively complex underlying system, which I called **NQR** (which stands for **Natural Query Responses**), the meaning of the acronym of which I found later because I liked the way the three letters sounded.
7070

71-
![](/static/transcriber-ai-metadata-/NQR.png)
71+
![](/static/transcriber-ai-metadata-NQR.png)
7272

7373
NQR is, in its conception, and also a bit in its implementation, relatively simple: a system for managing prompts that generate content from other content, in this case, given a rather long text, which could very well be the transcript of a video, I prepared several prompts that generate a summary of it, an ideal title, a list of bullet points, ... in short things like that.
7474

7575
And to make the application of these prompts usable and fast, I have developed a grouping system that allows you to organize different prompts within sets, there is a set for **YouTube**, a set for **social media**, a set for **meta data**, ... In this way a user can apply and execute different prompts just by selecting the single set.
7676

77-
![](/static/transcriber-ai-metadata-/NQR2.png)
77+
![](/static/transcriber-ai-metadata-NQR2.png)
7878

7979
Perhaps this thing I wrote may sound a bit ... “*pompous*,” or “ *self-praising*,” however, I tried very hard to think from the end user's point of view: the organization of prompts into sets allows you to generate an immense amount of content by simply doing two clicks by first selecting the set and then running the analysis.
8080

@@ -114,7 +114,7 @@ It can be said that the quality of the response is comparable to what would be o
114114

115115
I released Transcriber, perhaps my most successful application, a little over a year ago.
116116

117-
![](/static/transcriber-ai-metadata-/PakSideSite_Transcriber_00000.jpg)
117+
![](/static/transcriber-ai-metadata-PakSideSite_Transcriber_00000.jpg)
118118

119119
I've talked about it **[here](/developer-case-studies/transcriber/)** but it's okay to repeat a little, right?
120120

@@ -148,7 +148,7 @@ Ever since I started developing my applications, their main purpose was to autom
148148

149149
When ChatGPT came along, as I'm sure you all did, I was dazzled by the potential of the tool. And we were still talking about GPT-3 a couple of years ago. I had seen with my very own eyes, finally, *a machine pass the Turing test* brilliantly.
150150

151-
![](/static/transcriber-ai-metadata-/TuringTest.jpg)
151+
![](/static/transcriber-ai-metadata-TuringTest.jpg)
152152

153153
But then, as with all things, I delved deeper, had my own experience, and realized which things LLMs do excellently and which, still, struggle to solve even with sufficiency.
154154

@@ -172,11 +172,11 @@ Always starting from the content I create, particularly the wine podcast, I deve
172172

173173
Initially I started with URLs: I wanted to know what links were being quoted in the broadcast, so I created this prompt:
174174

175-
![](/static/transcriber-ai-metadata-/LinksPrompt.png)
175+
![](/static/transcriber-ai-metadata-LinksPrompt.png)
176176

177177
A relatively simple thing that, however, when I ran it, made me discover that there were many more references in an episode than I remembered:
178178

179-
![](/static/transcriber-ai-metadata-/LinksResult.png)
179+
![](/static/transcriber-ai-metadata-LinksResult.png)
180180

181181
For me it was really a revelation: the clever “*stupidity*” of LLMs had made me discover something I had forgotten.
182182

@@ -188,11 +188,11 @@ Simply put, they make us discover or, better yet, rediscover something about the
188188

189189
I went ahead and developed other prompts, such as this one that identifies brands:
190190

191-
![](/static/transcriber-ai-metadata-/BrandsResult.png)
191+
![](/static/transcriber-ai-metadata-BrandsResult.png)
192192

193193
Or this one that tries to figure out who the participants are if they are mentioned:
194194

195-
![](/static/transcriber-ai-metadata-/PeopleResult.png)
195+
![](/static/transcriber-ai-metadata-PeopleResult.png)
196196

197197
I realize that I have only begun to scratch the surface of what can be done. In the coming weeks, either at the request of app users or out of personal push, I will be developing more such prompts.
198198

@@ -204,19 +204,19 @@ For my experiments, for my podcast and YouTube show, since I am a subscriber, I
204204

205205
So I generated a prompt that generates a prompt... Basically, instead of me writing what I needed, as I always did, I asked the artificial intelligence, again via one of NQR's prompts, to write the prompt for generating the image, to be passed then, by copying and pasting it, into Firefly.
206206

207-
![](/static/transcriber-ai-metadata-/ImagePrompt.png)
207+
![](/static/transcriber-ai-metadata-ImagePrompt.png)
208208

209209
It is interesting this first level of recursiveness: one prompt generating another prompt...
210210

211211
But then, since OpenAI has the API to directly generate images with the DALL-E model, I thought it would be nice to bypass this whole round. At a not insignificant cost-we're speaking of a few cents, not a few thousandths-I decided to go ahead with direct generation.
212212

213213
That said, as of now images can be generated directly from within Transcriber!
214214

215-
![](/static/transcriber-ai-metadata-/ImageGeneration.png)
215+
![](/static/transcriber-ai-metadata-ImageGeneration.png)
216216

217217
You can choose the model, DALL-E 2 or DALL-E 3 (DALL-E 2 is absolutely unqualifiable in quality, I think they only keep it on because there are some applications that use it). For DALL-E 3 you can choose to generate a square or 16:9 image, either horizontally or vertically.
218218

219-
![](/static/transcriber-ai-metadata-/ImageGeneratorSettings.png)
219+
![](/static/transcriber-ai-metadata-ImageGeneratorSettings.png)
220220

221221
You can also choose to generate a standard image or one with a “vivid” pattern, which creates more aesthetically pleasing results that look more like stock photos instead of regular photos.
222222

@@ -226,7 +226,7 @@ Generating a 16:9 image comes in at a cost of $0.12.
226226

227227
### Money, Money, Money!
228228

229-
![](/static/transcriber-ai-metadata-/Costs.jpg)
229+
![](/static/transcriber-ai-metadata-Costs.jpg)
230230

231231
But how much does this stuff cost?
232232

@@ -276,7 +276,7 @@ If you have any questions, please leave a comment on this article!
276276

277277
### About Alex
278278

279-
![](/static/transcriber-ai-metadata-/alexraccuglia.jpg)
279+
![](/static/transcriber-ai-metadata-alexraccuglia.jpg)
280280

281281
Alex Raccuglia, 50, from Milan, Italy, studied computer engineering but, fortunately for him, ended up as a director of TV commercials and promotional videos, accumulating a fair amount of experience in the field of visual effects.
282282

0 commit comments

Comments
 (0)