new revision #15

thiswillbeyourgithub · 2021-07-07T15:35:57Z

docs: added example files
new revision needs new libraries
added example script by users
new revision readme
modified: autocards.py

Sorry for not making more commits and ending up with one giant commit, I changed a lot of things so many times I decided no to commit during brainstorming :/ but please don't hesitate to tell me if I can make it better :) Or if you have any question.

In theory it's all been tested on a few examples but I think it would be nice of you tried it yourself to proofcheck it.

I'm also making a PR for the psionica website right this instant there.

feat: added progress bar for paragraphs feat: consume wikipedia summary from keyword feat: consume directly from pdb feat: consume directly from webpage, with several fallback methods and ability to change what matters to something else than "p" feat: consume directly from user inpu style: renamed a lot of function to make it more clear fix: instead of using csv library, use pandas to output as csv feat: output as json feat: output directly the pandas dataframe feat: all functions include docstrings feat: all code is PEP compliant fix: set an environnement variable for the tokenizer otherwise you get a warning written at the top of the output file fix: prints a message when autocard is loading as it can be long on slow devicecs refactor: self.qg is called differently, it is now wrapped so as to be part of a dataframe feat: txt containing wikipedia style reference like [5] is sanitized fix: when outputting as csv, all included commas are escaped feat: consume_web uses several ways to find the title of the webpage fix: renamed self.clear to self.clear_qa for clarity feat: added self.string_output() to allow the user to get directly the output instead of only being able to print it feat: self.print and self.pprint allow to print and pretty print the output new: added a commented section that allows to pause the script using ctrl+c then resume it later. Very useful when running on very large text but could potentially cause issues when uploading it to PyPi so I commented it

mentions downloading punkt for first time users

_sanitize_text was missing a docstring

paulbricman

Good job! I think this is a great first effort. I listed a few comments for you to look into, but overall the amount of added functionality is impressive. I'll tweak some of the non-Python stuff a bit (comments, README) after we eventually merge a more polished version of the code.

paulbricman · 2021-07-08T12:43:32Z

autocards.py

        self.qa_pairs = []
+        global n, cur_n


When working with a class, variables which have to be shared across multiple functions should be fields of that class, through something like self.qa_pair_count = len(self.qa_pairs). Use of global variables is discouraged in Python (and mostly everywhere), that's why you have to explicitly state that you want those to be global.

Yeah I decided to do it like that because I was not sure of when the line "n= len(self.qa_pairs)" woud be executed otherwise. For example it has to be reset when clear_qa is run.

Looking back it's actually a way more terrible idea that I thought because I can imagine most users loading autocards already having set a "n" variable somewhere...

Unfortunately I don't feel comfortable correcting this myself as I'm unfamiliar with classes. It's actually my first ever class... Would you be interested in explaining me the best route or doing it yourself? I will surely have to ankify that...

to be more clear: I don't feel up to solving when and how to set n and cur_n. I can take care of the issue of organizing qa_pairs differently later on myself

No worries, so I'll tackle this myself after the PR we decided.

paulbricman · 2021-07-08T12:53:58Z

autocards.py

+                + "<br>{{c1::"\
+                + self.qa_pairs[-i]['answer']\
+                + "}}"
+            self.qa_pairs[-i] = {**self.qa_pairs[-i],


This processing of self.qa_pairs could help some readability. You can use range(start, end) to start from a certain value and avoid the weird negative index. self.qa_pairs could solve, well, the question/answer pairs, while another variable could be used to store this more complex structure with metadata and all maybe?

Okay so I took a look and I can do this only after resolving the issue above IMO

autocards.py

thiswillbeyourgithub

It seems I made a mistake, the last commit name is wrong and I don't really know if I can change it. Sorry :/

fix:

thiswillbeyourgithub

My apologies, as you can see I'm still learning git by trial and error and apparently screwed the naming of this commit and merged it with another.

I had to relocate the re.sub that were previously used in the pdf section because otherwise it messes up the "per paragraph" function.

thiswillbeyourgithub added 5 commits July 7, 2021 17:11

docs: added example files

c107544

new revision needs new libraries

2eaa499

added example script by users

96af48d

new revision readme

fd4f5f0

thiswillbeyourgithub mentioned this pull request Jul 7, 2021

updated according to new revision of autocards Psionica/psionica.github.io#11

Open

thiswillbeyourgithub added 2 commits July 7, 2021 15:56

Update README.md

db01b7e

mentions downloading punkt for first time users

Update autocards.py

03ca81c

_sanitize_text was missing a docstring

paulbricman suggested changes Jul 8, 2021

View reviewed changes

thiswillbeyourgithub added 5 commits July 9, 2021 17:03

better _sanitize_text function

1204b35

style: methods called after pandas

4cc55c5

remove function defined but used once

0142744

removed commented section launching pdb

d1a6abf

better docstring for sanitize text

9d5d8c3

thiswillbeyourgithub commented Jul 9, 2021

View reviewed changes

thiswillbeyourgithub added 5 commits July 9, 2021 19:06

removed unecessary import from the commented area

4355add

style: added better default titles

f18762c

PEP8: line was too long

3e57726

modified: autocards.py

e139367

fix:

minor: wrong fstring and extra newline at EOF

823e4ff

thiswillbeyourgithub commented Jul 9, 2021

View reviewed changes

thiswillbeyourgithub added 9 commits July 9, 2021 19:22

fix: better title for text file

c1d2b25

docs: csv_export is now to_csv, same for json

ea70298

docs: clearer text

6fcb6db

missing ebooklib + remove extra newline

213c121

docs: minor phrasing

90c69ae

remove useless notebook

de023d2

style: renamed qa_pairs to qa_dict

dea3d44

phrasing

6e94f6b

more robust epub extraction

bb82eb9

thiswillbeyourgithub added 5 commits July 12, 2021 16:27

adds basic cloze functionnality and notetype

088bfe6

adds basic cloze functionnality and notetype

0f40dd3

added docstring for main class

92eca6a

feat: added a store_content and watermark flag

3576c7f

minor style

978e9ff

paulbricman approved these changes Jul 13, 2021

View reviewed changes

paulbricman merged commit 879b2ff into paulbricman:master Jul 13, 2021

This was referenced Jul 13, 2021

Add the option to consume text paragraph after paragraph #6

Closed

Add PDF handling #7

Closed

Add web-scraping capabilities #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new revision #15

new revision #15

thiswillbeyourgithub commented Jul 7, 2021 •

edited

Loading

paulbricman left a comment

paulbricman Jul 8, 2021

thiswillbeyourgithub Jul 9, 2021

thiswillbeyourgithub Jul 9, 2021

paulbricman Jul 13, 2021

paulbricman Jul 8, 2021

thiswillbeyourgithub Jul 9, 2021

thiswillbeyourgithub left a comment •

edited

Loading

thiswillbeyourgithub left a comment

new revision #15

new revision #15

Conversation

thiswillbeyourgithub commented Jul 7, 2021 • edited Loading

paulbricman left a comment

Choose a reason for hiding this comment

paulbricman Jul 8, 2021

Choose a reason for hiding this comment

thiswillbeyourgithub Jul 9, 2021

Choose a reason for hiding this comment

thiswillbeyourgithub Jul 9, 2021

Choose a reason for hiding this comment

paulbricman Jul 13, 2021

Choose a reason for hiding this comment

paulbricman Jul 8, 2021

Choose a reason for hiding this comment

thiswillbeyourgithub Jul 9, 2021

Choose a reason for hiding this comment

thiswillbeyourgithub left a comment • edited Loading

Choose a reason for hiding this comment

thiswillbeyourgithub left a comment

Choose a reason for hiding this comment

thiswillbeyourgithub commented Jul 7, 2021 •

edited

Loading

thiswillbeyourgithub left a comment •

edited

Loading