Skip to content
This repository has been archived by the owner on May 5, 2023. It is now read-only.

new revision #15

Merged
merged 31 commits into from
Jul 13, 2021
Merged

new revision #15

merged 31 commits into from
Jul 13, 2021

Conversation

thiswillbeyourgithub
Copy link
Collaborator

@thiswillbeyourgithub thiswillbeyourgithub commented Jul 7, 2021

  • docs: added example files
  • new revision needs new libraries
  • added example script by users
  • new revision readme
  • modified: autocards.py

Sorry for not making more commits and ending up with one giant commit, I changed a lot of things so many times I decided no to commit during brainstorming :/ but please don't hesitate to tell me if I can make it better :) Or if you have any question.

In theory it's all been tested on a few examples but I think it would be nice of you tried it yourself to proofcheck it.

I'm also making a PR for the psionica website right this instant there.

thiswillbeyourgithub added 5 commits July 7, 2021 17:11
feat: added progress bar for paragraphs
feat: consume wikipedia summary from keyword
feat: consume directly from pdb
feat: consume directly from webpage, with several fallback methods and
    ability to change what matters to something else than "p"
feat: consume directly from user inpu
style: renamed a lot of function to make it more clear
fix: instead of using csv library, use pandas to output as csv
feat: output as json
feat: output directly the pandas dataframe
feat: all functions include docstrings
feat: all code is PEP compliant
fix: set an environnement variable for the tokenizer otherwise
    you get a warning written at the top of the output file
fix: prints a message when autocard is loading as it can be long
    on slow devicecs
refactor: self.qg is called differently, it is now wrapped so as to be
    part of a dataframe
feat: txt containing wikipedia style reference like [5] is sanitized
fix: when outputting as csv, all included commas are escaped
feat: consume_web uses several ways to find the title of the webpage
fix: renamed self.clear to self.clear_qa for clarity
feat: added self.string_output() to allow the user to get directly the
    output instead of only being able to print it
feat: self.print and self.pprint allow to print and pretty print the
    output
new: added a commented section that allows to pause the script
    using ctrl+c then resume it later. Very useful when running on very
    large text but could potentially cause issues when uploading it to
    PyPi so I commented it
mentions downloading punkt for first time users
_sanitize_text was missing a docstring
Copy link
Owner

@paulbricman paulbricman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I think this is a great first effort. I listed a few comments for you to look into, but overall the amount of added functionality is impressive. I'll tweak some of the non-Python stuff a bit (comments, README) after we eventually merge a more polished version of the code.

self.qa_pairs = []
global n, cur_n
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working with a class, variables which have to be shared across multiple functions should be fields of that class, through something like self.qa_pair_count = len(self.qa_pairs). Use of global variables is discouraged in Python (and mostly everywhere), that's why you have to explicitly state that you want those to be global.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I decided to do it like that because I was not sure of when the line "n= len(self.qa_pairs)" woud be executed otherwise. For example it has to be reset when clear_qa is run.

Looking back it's actually a way more terrible idea that I thought because I can imagine most users loading autocards already having set a "n" variable somewhere...

Unfortunately I don't feel comfortable correcting this myself as I'm unfamiliar with classes. It's actually my first ever class... Would you be interested in explaining me the best route or doing it yourself? I will surely have to ankify that...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more clear: I don't feel up to solving when and how to set n and cur_n. I can take care of the issue of organizing qa_pairs differently later on myself

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, so I'll tackle this myself after the PR we decided.

autocards.py Outdated
+ "<br>{{c1::"\
+ self.qa_pairs[-i]['answer']\
+ "}}"
self.qa_pairs[-i] = {**self.qa_pairs[-i],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This processing of self.qa_pairs could help some readability. You can use range(start, end) to start from a certain value and avoid the weird negative index. self.qa_pairs could solve, well, the question/answer pairs, while another variable could be used to store this more complex structure with metadata and all maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I took a look and I can do this only after resolving the issue above IMO

autocards.py Outdated Show resolved Hide resolved
autocards.py Outdated Show resolved Hide resolved
autocards.py Outdated Show resolved Hide resolved
autocards.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@thiswillbeyourgithub thiswillbeyourgithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I made a mistake, the last commit name is wrong and I don't really know if I can change it. Sorry :/

Copy link
Collaborator Author

@thiswillbeyourgithub thiswillbeyourgithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, as you can see I'm still learning git by trial and error and apparently screwed the naming of this commit and merged it with another.

I had to relocate the re.sub that were previously used in the pdf section because otherwise it messes up the "per paragraph" function.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants