Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structured outputs. #7

Merged
merged 3 commits into from
Feb 9, 2025
Merged

Add structured outputs. #7

merged 3 commits into from
Feb 9, 2025

Conversation

sayakpaul
Copy link
Collaborator

Will fix #3.

Facing:

Traceback (most recent call last):
  File "/Users/sayakpaul/Downloads/AdaptSum/main.py", line 191, in <module>
    demo = main(args)
  File "/Users/sayakpaul/Downloads/AdaptSum/main.py", line 178, in main
    with gr.Column("chat-window", elem_id="chat-window"):
TypeError: __init__() takes 1 positional argument but 2 positional arguments (and 1 keyword-only argument) were given

Happens with main branch too. @deep-diver could you check?

@deep-diver
Copy link
Owner

I am on it!

@deep-diver
Copy link
Owner

@sayakpaul which version of Gradio do you use?

@sayakpaul
Copy link
Collaborator Author

3.36.1

@deep-diver
Copy link
Owner

oh. it is kinda old version.
could you please update it to 5.14.0?

@sayakpaul
Copy link
Collaborator Author

That leads to:

ERROR: No matching distribution found for gradio==5.14.0

What am I missing? I am on Mac.

Meanwhile, I ran a simple test with the following:

from configs.responses import SummaryResponses
from google import genai
import os

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
response = client.models.generate_content(
    model="gemini-1.5-flash",
    contents=[
       ["Summarize Shakespeare's life work in a few sentences"]
    ],
    config={'response_mime_type': 'application/json',
        'response_schema': list[SummaryResponses],
    },
)
print(response.text)
print(response.parsed)

Gives me:

[{"previous_summary": "","updated_summary": "William Shakespeare's body of work includes 39 plays, 154 sonnets, two long narrative poems, and a few other verses, showcasing a mastery of language and exploration of universal human themes such as love, loss, ambition, and revenge. His plays are categorized into comedies, tragedies, and histories, each demonstrating his profound understanding of human nature and dramatic construction. His contributions to English literature and the theater are immeasurable and continue to resonate globally."}]
[SummaryResponses(previous_summary='', updated_summary="William Shakespeare's body of work includes 39 plays, 154 sonnets, two long narrative poems, and a few other verses, showcasing a mastery of language and exploration of universal human themes such as love, loss, ambition, and revenge. His plays are categorized into comedies, tragedies, and histories, each demonstrating his profound understanding of human nature and dramatic construction. His contributions to English literature and the theater are immeasurable and continue to resonate globally.")]

@sayakpaul
Copy link
Collaborator Author

Okay Python 3.10 resolved the issue.

Tried with the ViT paper:

image image

When I printed the response.parsed here, I got:

response.parsed=[SummaryResponses(previous_summary="This paper introduces Vision Transformer (ViT), a pure transformer network for image recognition.  Unlike previous approaches that combined transformers with convolutional neural networks (CNNs), ViT processes images directly by splitting them into patches and treating those patches as tokens, similar to words in natural language processing (NLP). The key finding is that while ViT's performance on mid-sized datasets like ImageNet is initially modest compared to CNNs,  pre-training ViT on very large datasets (14M-300M images) dramatically improves its accuracy.  When pre-trained at scale and transferred to various image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), ViT achieves state-of-the-art results, surpassing or matching the performance of ResNet-like CNNs while requiring substantially less computational resources for training.  The authors attribute this success to the scalability of the transformer architecture and the advantage of large-scale training over relying on CNNs' inductive biases.  The paper also explores variations of the architecture, including hybrid models combining CNNs and transformers, and investigates the impact of pre-training dataset size and self-supervised learning.  Overall, the study demonstrates the significant potential of transformers for large-scale image recognition.", updated_summary='This ICLR 2021 paper, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," introduces Vision Transformer (ViT), a novel architecture applying the Transformer directly to image patches without using convolutional neural networks (CNNs).  ViT treats image patches as "tokens," linearly projects them into embeddings, adds positional embeddings, and feeds the sequence into a Transformer encoder.  The key architectural innovation is this direct Transformer application to image data, removing inductive biases of CNNs.  When trained on large datasets (14M-300M images), ViT achieves excellent results on various image recognition benchmarks, outperforming CNN-based models with fewer training resources.  The paper explores a hybrid CNN-Transformer approach, but its main contribution is demonstrating a pure Transformer architecture\'s effectiveness for large-scale image recognition.')]

So, I guess it's doing what it is supposed to be doing? What's next?

@sayakpaul sayakpaul requested a review from deep-diver February 7, 2025 03:22
@sayakpaul sayakpaul marked this pull request as ready for review February 7, 2025 03:22
Comment on lines 4 to 5
previous_summary: str
updated_summary: str
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the "response", I think we don't need previous_summary and updated_summary. Instead, just summary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we keep track of the previous summary in a better manner otherwise?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM only generates summary. LLM does not generate previous summary (having previous_summary in response means that we ask LLM to generate previous summary). it is something to be given to the LLM as input (we don't give it as input currently though).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this bit:

it is something to be given to the LLM as input (we don't give it as input currently though).

We're already doing it here no:

previous_summary=state['summary'],

Or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

@deep-diver deep-diver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

the code (in main.py) below stores the summary from the response:

state['summary'] = response.text
state['summary_history'].append(response.text)

now, response.text is a JSON formatted string. Hence, we need to parse and extract the value under summary key, then replace response.text in the original codes with it. something like below:

state['summary'] = response.parsed.summary
state['summary_history'].append(response.parsed.summary)

@sayakpaul sayakpaul requested a review from deep-diver February 8, 2025 02:42
Copy link
Owner

@deep-diver deep-diver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!!

@sayakpaul
Copy link
Collaborator Author

@deep-diver I updated the schema to only include summary. Could you review once again?

@deep-diver
Copy link
Owner

@deep-diver I updated the schema to only include summary. Could you review once again?

Looks good to me!
We can discuss more later, but I think this PR is ready to be merged :)

@sayakpaul sayakpaul merged commit b8711d7 into main Feb 9, 2025
@sayakpaul sayakpaul deleted the structured-outs branch February 9, 2025 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Structured output support
2 participants