Adding GitHub Workflow Parser #7

vsoch · 2020-05-28T23:24:56Z

This is a first shot at adding the parser as a GitHub action to, when an issue is submit:

obtain the unique identifier, an md5 of the traceback from the new issue
look through the record of currently existing issues, represented as a folder structure in the repository
post a comment / update the issue if it's already been opened.

This should serve two fold - to both help the user, and keep a little database of issues reported. I suspect we will want to get a base merged, and then tweak details once the datalad PR is merged and we can adjust.

Signed-off-by: vsoch <[email protected]>

yarikoptic · 2020-05-29T04:19:57Z

Md5 of full traceback might be too rigid. I would have made it a tripple hierarchy:

exception class name (unfortunately message could be too specific - path names etc)
md5 of function names list up until getting into datalad code
similar md5 but within datalad portion if stack
and then store there detailed info per issue with full traceback etc which could differ (line numbers shift between changes, paths differ in messages etc). Additional matching could be done on that narrowed down set.

Such levels could allow for matching similar even if not identical issues on client side (eg full repo could be cloned and updated in the cache). GitHub action could be used to make records of new issues (which would already have that composite fingerprint in them already).. although there might be benefits from collecting additional traceback and wtf details for already existing issues, I am afraid it might be too much chatter if we are to monitor this collection of issues

vsoch · 2020-05-29T05:16:03Z

Okay so just to make sure I have it right, you would do (these are just randomly derived values so we can see what it looks like)

RuntimeError
I understand the md5 generation - can you point me to which part of the wtf output is the "functions list?" Do you mean the dependency list? What I can do is update helpme to have a boolean that says "Don't md5-ize this for me, use a custom function instead). Created an issue Add custom function to generate identifier vsoch/helpme#56.
similar md5 but within datalad portion if stack (show me again?)

So you are proposing it would look like:

RuntimeError-<md5-functions>-<md5-datalad>

and then store there detailed info per issue with full traceback etc which could differ (line numbers shift between changes, paths differ in messages etc). Additional matching could be done on that narrowed down set.

If the traceback is part of the md5, and it's included in the issue, we would definitely be storing it. For the line numbers, I think that's probably overkill for the points that you mentioned.

My 0.02 for the above - I think the specific dependency lists and functions list might be too detailed for grouping errors. If we have an exception name, and then md5 based on the traceback, I think that could be enough for a human to browse, and to match issues that belong together. On the other hand, you are thinking that you would want to search based on md5 of just a functions list, or just a hash of functions? I have mixed feelings about this, because I don't think I fully understand what a functions list is. My instinct is that we should start with a simpler (less detailed) approach and only dive into more detail if we find it doesn't work well (meaning that two exceptions are labeled as the same but are very different to resolve / address, or we need to search for something and find that we cannot).

vsoch · 2020-05-29T16:54:36Z

okay actually I think I figured it out re: the lists:

In [70]: datald                                                                                                                                                                                   
Out[70]: ['datalad', 'datalad/cmdline/main.py']

In [71]: others                                                                                                                                                                                   
Out[71]: 
['site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/async_helpers.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/embed.py',
 'site-packages/IPython/terminal/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/async_helpers.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py',
 'site-packages/IPython/core/interactiveshell.py']

Signed-off-by: vsoch <[email protected]>

vsoch · 2020-05-29T17:31:02Z

okay just updated the script here to use the updated (more specific) hash.

vsoch added 3 commits May 28, 2020 17:18

adding github workflow to parse issues

17eac96

Signed-off-by: vsoch <[email protected]>

adding parse issue script

f650c28

Signed-off-by: vsoch <[email protected]>

updating commit

682b87b

Signed-off-by: vsoch <[email protected]>

vsoch mentioned this pull request May 28, 2020

Adding cmdline support submission with helpme datalad/datalad#4586

Closed

vsoch added 2 commits May 29, 2020 11:30

updating script to use more nested identifier

80d04b5

Signed-off-by: vsoch <[email protected]>

updating hash for script download

2862ac9

Signed-off-by: vsoch <[email protected]>

vsoch mentioned this pull request May 29, 2020

Add GitHub action to check for duplicate issues based on identifier #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GitHub Workflow Parser #7

Adding GitHub Workflow Parser #7

vsoch commented May 28, 2020

yarikoptic commented May 29, 2020

vsoch commented May 29, 2020 •

edited

Loading

vsoch commented May 29, 2020

vsoch commented May 29, 2020

Adding GitHub Workflow Parser #7

Are you sure you want to change the base?

Adding GitHub Workflow Parser #7

Conversation

vsoch commented May 28, 2020

yarikoptic commented May 29, 2020

vsoch commented May 29, 2020 • edited Loading

vsoch commented May 29, 2020

vsoch commented May 29, 2020

vsoch commented May 29, 2020 •

edited

Loading