Skip to content

Commit

Permalink
Fixed images
Browse files Browse the repository at this point in the history
  • Loading branch information
latenitefilms committed Oct 11, 2024
1 parent 71b2117 commit 6700fe7
Showing 1 changed file with 15 additions and 15 deletions.
30 changes: 15 additions & 15 deletions docs/developer-case-studies/transcriber-ai-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ We have been talking, often even a bit inappropriately, about artificial intelli

I'll make it clear: I can't draw anything that isn't a scribble, so platforms that allow you to create artwork by simply typing in text have been something very appealing to me from the beginning.

![](/static/transcriber-ai-metadata-/Firefly.png)
![](/static/transcriber-ai-metadata-Firefly.png)

Then Goliath came along, ChatGPT, and it was a game changer for everybody, going on to create new needs and, literally, revolutionizing more than one industry.

Expand Down Expand Up @@ -54,7 +54,7 @@ For the past year in English, and for the past three years in Italian, I have be

For the last two years I've been making use of this internal tool I wrote myself, called **SciattaGPT** (the literal translation would be “*dull*, *sloppy*, *scrappy GPT*”), which I use to create the episode summary and title suggestion, always making use of ChatGPT, first with the 3.5 model, then with GPT-4 and now with GPT-4o-mini.

![](/static/transcriber-ai-metadata-/SciattaGPT.png)
![](/static/transcriber-ai-metadata-SciattaGPT.png)

In the case of this SciattaGPT, all prompts are predefined, rather statically.

Expand All @@ -68,13 +68,13 @@ At some point, though, all these ingredients, in my head, came together, and I s

I started out developing a very simple application that would act as a front end to a relatively complex underlying system, which I called **NQR** (which stands for **Natural Query Responses**), the meaning of the acronym of which I found later because I liked the way the three letters sounded.

![](/static/transcriber-ai-metadata-/NQR.png)
![](/static/transcriber-ai-metadata-NQR.png)

NQR is, in its conception, and also a bit in its implementation, relatively simple: a system for managing prompts that generate content from other content, in this case, given a rather long text, which could very well be the transcript of a video, I prepared several prompts that generate a summary of it, an ideal title, a list of bullet points, ... in short things like that.

And to make the application of these prompts usable and fast, I have developed a grouping system that allows you to organize different prompts within sets, there is a set for **YouTube**, a set for **social media**, a set for **meta data**, ... In this way a user can apply and execute different prompts just by selecting the single set.

![](/static/transcriber-ai-metadata-/NQR2.png)
![](/static/transcriber-ai-metadata-NQR2.png)

Perhaps this thing I wrote may sound a bit ... “*pompous*,” or “ *self-praising*,” however, I tried very hard to think from the end user's point of view: the organization of prompts into sets allows you to generate an immense amount of content by simply doing two clicks by first selecting the set and then running the analysis.

Expand Down Expand Up @@ -114,7 +114,7 @@ It can be said that the quality of the response is comparable to what would be o

I released Transcriber, perhaps my most successful application, a little over a year ago.

![](/static/transcriber-ai-metadata-/PakSideSite_Transcriber_00000.jpg)
![](/static/transcriber-ai-metadata-PakSideSite_Transcriber_00000.jpg)

I've talked about it **[here](/developer-case-studies/transcriber/)** but it's okay to repeat a little, right?

Expand Down Expand Up @@ -148,7 +148,7 @@ Ever since I started developing my applications, their main purpose was to autom

When ChatGPT came along, as I'm sure you all did, I was dazzled by the potential of the tool. And we were still talking about GPT-3 a couple of years ago. I had seen with my very own eyes, finally, *a machine pass the Turing test* brilliantly.

![](/static/transcriber-ai-metadata-/TuringTest.jpg)
![](/static/transcriber-ai-metadata-TuringTest.jpg)

But then, as with all things, I delved deeper, had my own experience, and realized which things LLMs do excellently and which, still, struggle to solve even with sufficiency.

Expand All @@ -172,11 +172,11 @@ Always starting from the content I create, particularly the wine podcast, I deve

Initially I started with URLs: I wanted to know what links were being quoted in the broadcast, so I created this prompt:

![](/static/transcriber-ai-metadata-/LinksPrompt.png)
![](/static/transcriber-ai-metadata-LinksPrompt.png)

A relatively simple thing that, however, when I ran it, made me discover that there were many more references in an episode than I remembered:

![](/static/transcriber-ai-metadata-/LinksResult.png)
![](/static/transcriber-ai-metadata-LinksResult.png)

For me it was really a revelation: the clever “*stupidity*” of LLMs had made me discover something I had forgotten.

Expand All @@ -188,11 +188,11 @@ Simply put, they make us discover or, better yet, rediscover something about the

I went ahead and developed other prompts, such as this one that identifies brands:

![](/static/transcriber-ai-metadata-/BrandsResult.png)
![](/static/transcriber-ai-metadata-BrandsResult.png)

Or this one that tries to figure out who the participants are if they are mentioned:

![](/static/transcriber-ai-metadata-/PeopleResult.png)
![](/static/transcriber-ai-metadata-PeopleResult.png)

I realize that I have only begun to scratch the surface of what can be done. In the coming weeks, either at the request of app users or out of personal push, I will be developing more such prompts.

Expand All @@ -204,19 +204,19 @@ For my experiments, for my podcast and YouTube show, since I am a subscriber, I

So I generated a prompt that generates a prompt... Basically, instead of me writing what I needed, as I always did, I asked the artificial intelligence, again via one of NQR's prompts, to write the prompt for generating the image, to be passed then, by copying and pasting it, into Firefly.

![](/static/transcriber-ai-metadata-/ImagePrompt.png)
![](/static/transcriber-ai-metadata-ImagePrompt.png)

It is interesting this first level of recursiveness: one prompt generating another prompt...

But then, since OpenAI has the API to directly generate images with the DALL-E model, I thought it would be nice to bypass this whole round. At a not insignificant cost-we're speaking of a few cents, not a few thousandths-I decided to go ahead with direct generation.

That said, as of now images can be generated directly from within Transcriber!

![](/static/transcriber-ai-metadata-/ImageGeneration.png)
![](/static/transcriber-ai-metadata-ImageGeneration.png)

You can choose the model, DALL-E 2 or DALL-E 3 (DALL-E 2 is absolutely unqualifiable in quality, I think they only keep it on because there are some applications that use it). For DALL-E 3 you can choose to generate a square or 16:9 image, either horizontally or vertically.

![](/static/transcriber-ai-metadata-/ImageGeneratorSettings.png)
![](/static/transcriber-ai-metadata-ImageGeneratorSettings.png)

You can also choose to generate a standard image or one with a “vivid” pattern, which creates more aesthetically pleasing results that look more like stock photos instead of regular photos.

Expand All @@ -226,7 +226,7 @@ Generating a 16:9 image comes in at a cost of $0.12.

### Money, Money, Money!

![](/static/transcriber-ai-metadata-/Costs.jpg)
![](/static/transcriber-ai-metadata-Costs.jpg)

But how much does this stuff cost?

Expand Down Expand Up @@ -276,7 +276,7 @@ If you have any questions, please leave a comment on this article!

### About Alex

![](/static/transcriber-ai-metadata-/alexraccuglia.jpg)
![](/static/transcriber-ai-metadata-alexraccuglia.jpg)

Alex Raccuglia, 50, from Milan, Italy, studied computer engineering but, fortunately for him, ended up as a director of TV commercials and promotional videos, accumulating a fair amount of experience in the field of visual effects.

Expand Down

0 comments on commit 6700fe7

Please sign in to comment.