-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add structured outputs. #7
Conversation
I am on it! |
@sayakpaul which version of Gradio do you use? |
|
oh. it is kinda old version. |
That leads to: ERROR: No matching distribution found for gradio==5.14.0 What am I missing? I am on Mac. Meanwhile, I ran a simple test with the following: from configs.responses import SummaryResponses
from google import genai
import os
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
response = client.models.generate_content(
model="gemini-1.5-flash",
contents=[
["Summarize Shakespeare's life work in a few sentences"]
],
config={'response_mime_type': 'application/json',
'response_schema': list[SummaryResponses],
},
)
print(response.text)
print(response.parsed) Gives me: [{"previous_summary": "","updated_summary": "William Shakespeare's body of work includes 39 plays, 154 sonnets, two long narrative poems, and a few other verses, showcasing a mastery of language and exploration of universal human themes such as love, loss, ambition, and revenge. His plays are categorized into comedies, tragedies, and histories, each demonstrating his profound understanding of human nature and dramatic construction. His contributions to English literature and the theater are immeasurable and continue to resonate globally."}]
[SummaryResponses(previous_summary='', updated_summary="William Shakespeare's body of work includes 39 plays, 154 sonnets, two long narrative poems, and a few other verses, showcasing a mastery of language and exploration of universal human themes such as love, loss, ambition, and revenge. His plays are categorized into comedies, tragedies, and histories, each demonstrating his profound understanding of human nature and dramatic construction. His contributions to English literature and the theater are immeasurable and continue to resonate globally.")] |
Okay Python 3.10 resolved the issue. Tried with the ViT paper: ![]() ![]() When I printed the response.parsed=[SummaryResponses(previous_summary="This paper introduces Vision Transformer (ViT), a pure transformer network for image recognition. Unlike previous approaches that combined transformers with convolutional neural networks (CNNs), ViT processes images directly by splitting them into patches and treating those patches as tokens, similar to words in natural language processing (NLP). The key finding is that while ViT's performance on mid-sized datasets like ImageNet is initially modest compared to CNNs, pre-training ViT on very large datasets (14M-300M images) dramatically improves its accuracy. When pre-trained at scale and transferred to various image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), ViT achieves state-of-the-art results, surpassing or matching the performance of ResNet-like CNNs while requiring substantially less computational resources for training. The authors attribute this success to the scalability of the transformer architecture and the advantage of large-scale training over relying on CNNs' inductive biases. The paper also explores variations of the architecture, including hybrid models combining CNNs and transformers, and investigates the impact of pre-training dataset size and self-supervised learning. Overall, the study demonstrates the significant potential of transformers for large-scale image recognition.", updated_summary='This ICLR 2021 paper, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," introduces Vision Transformer (ViT), a novel architecture applying the Transformer directly to image patches without using convolutional neural networks (CNNs). ViT treats image patches as "tokens," linearly projects them into embeddings, adds positional embeddings, and feeds the sequence into a Transformer encoder. The key architectural innovation is this direct Transformer application to image data, removing inductive biases of CNNs. When trained on large datasets (14M-300M images), ViT achieves excellent results on various image recognition benchmarks, outperforming CNN-based models with fewer training resources. The paper explores a hybrid CNN-Transformer approach, but its main contribution is demonstrating a pure Transformer architecture\'s effectiveness for large-scale image recognition.')] So, I guess it's doing what it is supposed to be doing? What's next? |
configs/responses.py
Outdated
previous_summary: str | ||
updated_summary: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is the "response", I think we don't need previous_summary
and updated_summary
. Instead, just summary
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we keep track of the previous summary in a better manner otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM only generates summary. LLM does not generate previous summary (having previous_summary
in response
means that we ask LLM to generate previous summary). it is something to be given to the LLM as input (we don't give it as input currently though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this bit:
it is something to be given to the LLM as input (we don't give it as input currently though).
We're already doing it here no:
Line 69 in 8ec75ed
previous_summary=state['summary'], |
Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
the code (in main.py
) below stores the summary from the response:
state['summary'] = response.text
state['summary_history'].append(response.text)
now, response.text is a JSON formatted string. Hence, we need to parse and extract the value under summary
key, then replace response.text
in the original codes with it. something like below:
state['summary'] = response.parsed.summary
state['summary_history'].append(response.parsed.summary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!!
@deep-diver I updated the schema to only include |
Looks good to me! |
Will fix #3.
Facing:
Happens with
main
branch too. @deep-diver could you check?