Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GitHub Workflow Parser #7

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Adding GitHub Workflow Parser #7

wants to merge 5 commits into from

Conversation

vsoch
Copy link
Collaborator

@vsoch vsoch commented May 28, 2020

This is a first shot at adding the parser as a GitHub action to, when an issue is submit:

  • obtain the unique identifier, an md5 of the traceback from the new issue
  • look through the record of currently existing issues, represented as a folder structure in the repository
  • post a comment / update the issue if it's already been opened.

This should serve two fold - to both help the user, and keep a little database of issues reported. I suspect we will want to get a base merged, and then tweak details once the datalad PR is merged and we can adjust.

@yarikoptic
Copy link
Member

Md5 of full traceback might be too rigid. I would have made it a tripple hierarchy:

  • exception class name (unfortunately message could be too specific - path names etc)
  • md5 of function names list up until getting into datalad code
  • similar md5 but within datalad portion if stack
  • and then store there detailed info per issue with full traceback etc which could differ (line numbers shift between changes, paths differ in messages etc). Additional matching could be done on that narrowed down set.

Such levels could allow for matching similar even if not identical issues on client side (eg full repo could be cloned and updated in the cache). GitHub action could be used to make records of new issues (which would already have that composite fingerprint in them already).. although there might be benefits from collecting additional traceback and wtf details for already existing issues, I am afraid it might be too much chatter if we are to monitor this collection of issues

@vsoch
Copy link
Collaborator Author

vsoch commented May 29, 2020

Okay so just to make sure I have it right, you would do (these are just randomly derived values so we can see what it looks like)

  • RuntimeError
  • I understand the md5 generation - can you point me to which part of the wtf output is the "functions list?" Do you mean the dependency list? What I can do is update helpme to have a boolean that says "Don't md5-ize this for me, use a custom function instead). Created an issue Add custom function to generate identifier vsoch/helpme#56.
  • similar md5 but within datalad portion if stack (show me again?)

So you are proposing it would look like:

RuntimeError-<md5-functions>-<md5-datalad>

and then store there detailed info per issue with full traceback etc which could differ (line numbers shift between changes, paths differ in messages etc). Additional matching could be done on that narrowed down set.

If the traceback is part of the md5, and it's included in the issue, we would definitely be storing it. For the line numbers, I think that's probably overkill for the points that you mentioned.

My 0.02 for the above - I think the specific dependency lists and functions list might be too detailed for grouping errors. If we have an exception name, and then md5 based on the traceback, I think that could be enough for a human to browse, and to match issues that belong together. On the other hand, you are thinking that you would want to search based on md5 of just a functions list, or just a hash of functions? I have mixed feelings about this, because I don't think I fully understand what a functions list is. My instinct is that we should start with a simpler (less detailed) approach and only dive into more detail if we find it doesn't work well (meaning that two exceptions are labeled as the same but are very different to resolve / address, or we need to search for something and find that we cannot).

@vsoch
Copy link
Collaborator Author

vsoch commented May 29, 2020

okay actually I think I figured it out re: the lists:

In [70]: datald                                                                                                                                                                                   
Out[70]: ['datalad', 'datalad/cmdline/main.py']

In [71]: others                                                                                                                                                                                   
Out[71]: 
['site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/async_helpers.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/async_helpers.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py']

@vsoch
Copy link
Collaborator Author

vsoch commented May 29, 2020

okay just updated the script here to use the updated (more specific) hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants