Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use multiple texts and or images? #125

Open
hugolytics opened this issue Jan 18, 2023 · 3 comments
Open

Is it possible to use multiple texts and or images? #125

hugolytics opened this issue Jan 18, 2023 · 3 comments

Comments

@hugolytics
Copy link

Would it be possible to embed several texts at the same time (using the same, or possibly different text models)?

Im working on a medical problem, different doctors evaluate different aspects of the patient's state.

The workaround i am using right now is to concatenate the different text into one, so while my dataset might have (free-text) columns like: medical_history, general_impression, walking_test, etc. i can do medical history: ..... general_impression: ... walking_test: ...
and just hope the transformer will learn this.

the other (nicer) way to do it would be to concatenate the embeddings, so embed the columns separately and fuse them.

I think it should be possible, for me it would be fine to use the same text model in the different texts, so it would really be a matter of having
trainer.fit(X_text= accept a List[Tuple] for example.

would this be difficult for me to customise?
I'd be willing to contribute and document.

@jrzaurin
Copy link
Owner

Hey @hugolytics

thanks for opening the issue!

At the moment is not, but as I am about to start fully integrating with Hugginface, maybe is a feature that we can bring (bear in mind involves multiple preprocessors, tokenizers, etc...which is fine, just not that straightforward)

Now, there are a few things I am not fully sure I get...when you mean using the same text model, let's assume is a simple RNN, you want to use that RNN for each column, sequentially? Because if that is the case, I think that would be "catastrophic" for the learning process. The main point of these multimodal models is the "joint learning" and if one sequentially passes data to a model, there is nothing learned "jointly".

Anyways, we can start by allowing multiple text/images inputs and models. The fusion of the embeddings is easier, I think, as we could just fuse them via existing FC heads (or dot products, etc...)

But anyway, let's do something. Let's keep this issue opened and post here relevant stuff and maybe we can discuss more in detail in the slack channel? : https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-1nao4o0hj-2FtP__8oASmyLsO6aMZQcA

@hugolytics
Copy link
Author

By the same model i meant the same pre-trained transformer model. So the same starting point, and finetune them on the regression task simultanously. (but obviously the different models would have their own distinct fine-tuned weights in the end).
The reason i brought it up is because in the current API, the WideDeep class only takes a single deeptext module. So that would not really be an issue for my usecase (although it would not generalize well if one column was french and the other chinese).

Anyway, ill join the slack channel, im also using huggingface models atm. So i am also interested in the integration!

@jrzaurin
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants