Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Training Strategy of Supervised TCR repertoire classification on HIV dataset #80

Open
Albert-Shuai opened this issue Mar 3, 2023 · 0 comments

Comments

@Albert-Shuai
Copy link

Albert-Shuai commented Mar 3, 2023

Hi, Sorry to disturb:

I am trying to understand the training strategy of HIV dataset and replicate the results you get in your publication.

It seems that the dataset can be categorized as non-cognate groups (CEF, AY9, No Peptide conditions), or cognate groups (where there is an epitope). We have 3 * 3 samples that are non-cognate, while 25 * 3 samples as cognate groups. I saw from the paper that deeptcr can distinguish non-cognate samples from cognate samples, and the training used keep two out of three for training data.

My question is, when doing the training, did you

  1. fit the model using all (3+25) * 2 data at once, where 3 * 2 are non-cognate and 25*2 are cognate group? Then you test the model on the remaining 3+25 samples and see whether the model can correctly predict whether each sample is cognate or non-cognate.
  2. Or you use (3+1) * 2 data, where the 3 * 2 data are non-cognate while the 1 * 2 data is from one specific epitope instead using all 25 * 2 samples as cognate group data? Then you test the model on the remaining 3+1 samples to see whether it can corrected predict which (one) sample is the cognate group.
    Then you repeat 2 for each specific epitope (MSPRTLNAW, NTQGYFPDW, etc...)

Thanks and looking forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant