Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NB for BERT models #114

Merged
merged 13 commits into from
Oct 4, 2022
Merged

Add NB for BERT models #114

merged 13 commits into from
Oct 4, 2022

Conversation

hkirvesl
Copy link
Collaborator

@hkirvesl hkirvesl commented Sep 29, 2022

Reference issues/PRs

Fixes #81 By adding a notebook on BERT models with topological interpretability methods built on top of one. The old tutorials on Q&A and Translation remain for now.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

  1. Add notebook for training HuggingFace Bert model with giotto-deep, and how to use topological pruning methods for such models.
  2. Add support for training HuggingFace models with an unwrapper in Trainer.py
  3. Modify models/utils.py to support the the HuggingFace models (now with an unwrapper)

Screenshots (if appropriate)

Any other comments?

Checklist

  • I have read the guidelines for contributing.
  • My code follows the code style of this project. I used flake8 to check my Python changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed. I used pytest to check this on Python tests.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@matteocao matteocao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the CI pass, I will study your notebook in details, as I am really curious about it!

"kernelspec": {
"display_name": "Python [conda env:torch] *",
"language": "python",
"name": "conda-env-torch-py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CI is complaining about this name:

jupyter_client.kernelspec.NoSuchKernel: No such kernel named conda-env-torch-py

Can you please change this to:

  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Should be fixed now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkirvesl can you please use os.path.join to build file path please? windows and posix OS have a different convention (back and forward slashes)

Screen Shot 2022-09-29 at 19 56 54

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. Done.

Copy link
Contributor

@matteocao matteocao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @hkirvesl ,

I think we are almost there: I only have a few minor comments.
I will then review the notebbook in detail.

One important note: did you pull master to your fork before committing? I ask because there is one item in the CI that took > 5h to run and has currently not yet finished (though it should really take 25 mins at most) and I wonder if it is related to not having integrated the fixes. If you have done so, than ignore this last comment

#- repo: https://github.com/psf/black
# rev: 22.8.0
# hooks:
# - id: black
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood from raphael that mypy is causing issues, however black should not: it should simply frmat the code properly: why are you commenting black out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please keep the hook for black.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please wait until my pull request is accepted and then pull again from master? I fixed all mypy errors, and the pre-commit does work now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, Matteo and Raphael. I did recall some discussion about problems with the prehooks and commenting them out being the recommended temporary work around. I just forgot the details and ended up commenting out too much. All these should be fixed now. Thank you Raphael for the fix, this makes our lives much easier.

@@ -70,6 +70,6 @@ def __call__(
the output tensor of the module
"""
if isinstance(module_out, tuple):
self.outputs.append(module_out[0].detach())
self.outputs.append(module_out[0])#.detach())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functionalities are needed in the forward hooks in order to extract the activation values of the layers for example.
Could you please double check if the functionalities to get activations values still works properly without the detachment? Put differently, do you understand the consequences of removing the .detach() for the forward hooks? If so, could you please write a couple of sentences here below? thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, this is a very good point indeed. I do understand the consequences and I agree what was proposed is not a good solution. To be honest, I forgot about this myself. It was just a quick and dirty fix to make the BERT models work and since it worked and seemingly didn't break any tests I just proceeded without worrying about it.

A good solution would keep the old functionality where applicable, in the same spirit as the change in trainer.py. I believe this has now been fixed.

@@ -266,9 +266,13 @@ def _send_to_device(
new_x.append(xi.to(DEVICE))
x = new_x
prediction = self.model(*x)
if(hasattr(prediction, "logits")): # unwrapper for HuggingFace BERT model
prediction = prediction.logits # unwrapper for HuggingFace BERT model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good here!

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you can simply use

from gdeep.utility import DEVICE

Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It is a more giotto-way to do this. In general, any other comments on how to make the notebook more giotto-like are highly appreciated.

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca you please preceed this cell with a markdown cell describing what you are doing here?

if this cell and the above one are related to handling masks, then maybe it is worth putting a bit of explanation 2 cells above, details the steps that one needs to take


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.I added a brief note (and deleted an unnecessary chunk)

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a few inline comments to describe what you are doing here?


Reply via ReviewNB

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment here , I think, needs to be corrected: raw -> mean aggregated


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. Fixed.

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments here please


Reply via ReviewNB

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these two cells are "almost" duplicated, can you please create a function in one cell and call it with different parameters?

Probably same comments may work a few cells above


Reply via ReviewNB

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: At such --> As such


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe one sentence describe the two cells below.


Reply via ReviewNB

@@ -0,0 +1,1198 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca you clarify why you compute the gradients (and of what) in a sentence in a .md cell above this one?


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some discussion on this. Also: If you know of a more giotto-like way of getting your hands on these gradients I think it would improve this notebook.

@hkirvesl
Copy link
Collaborator Author

hkirvesl commented Oct 1, 2022

Thanks for the good comments! I pulled the main and it did fix a lot, (and also managed to close this PR.) Will reopen once I've cleaned the notebook more a bit more as per the suggestions

@raphaelreinauer
Copy link
Collaborator

Could you please install pre-commit (see README.md)

@hkirvesl hkirvesl reopened this Oct 3, 2022
@hkirvesl
Copy link
Collaborator Author

hkirvesl commented Oct 3, 2022

Reopening the PR

Changelog:

The notebook:

  1. Imports

1.1) Removed unneeded imports:
from gdeep.visualisation import persistence_diagrams_of_activations
from gdeep.visualisation import Visualiser

1.2) Added import

from gdeep.utility import DEVICE

  1. Cleaned some variable names

  2. Added more explanation as per the suggested comments

  3. Removed unnecessary code chunks

  4. renamed variables

  5. Added functions to reduce redundant code

  6. Add legend to a persistence diagram plot.

Testing:

  1. Run all the prehooks, both black and mypy

models/utils.py

  1. Replace the modification to retain the existing functionality

Copy link
Contributor

@matteocao matteocao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loks good and it is very interesting btw! thanks @hkirvesl !!

@matteocao matteocao merged commit f83811f into giotto-ai:master Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a tutorial on how to use Huggingface models with giotto-deep
3 participants