Skip to content
This repository has been archived by the owner on May 5, 2023. It is now read-only.

Add the option to consume text paragraph after paragraph #6

Closed
uuuser-name opened this issue Mar 13, 2021 · 3 comments
Closed

Add the option to consume text paragraph after paragraph #6

uuuser-name opened this issue Mar 13, 2021 · 3 comments

Comments

@uuuser-name
Copy link

When a large body of text is consumed, the program sometimes doesn't know where to search the answer of a question, because it may search for the answer at a different location in the text than where it really is.
If the text is consumed one paragraph at a time, this may solve this issue.
i.e. with a for loop, consuming one paragraph at a time, then moving on to the next.

Note that this problem doesn't happen all the time, and therefore this may break things. Hence why it would be optional.
This is more of an open-ended idea than an issue.

@thiswillbeyourgithub
Copy link
Collaborator

FIY I did a simple script to read a page (copy pasted from napoleon's wikipedia page in english) sentence by sentence. This seems to fix a problem I had with buffer overflow as well as allowing to skip a sentence that just doesn't work.

Here's the code below :

#!/usr/bin/env python3


from autocards import Autocards
from pathlib import Path

auto = Autocards()
auto.clear()

prefix = "On Napoléon : "
file = Path("./napoleon.txt")

if not file.exists():
    print("File not found!")
    raise SystemExit()
else:
    full_text = file.read_text()

sentence_list = full_text.split(".")
for i in sentence_list:  # readds the final point
    i = f"{i}.".strip()

output_file = Path(f"{file.parent}/output_file.txt")
output_file.touch()

print("Initialization complete.")

n = len(sentence_list)
for a in enumerate(sentence_list):
    i = a[0] ; s = a[1]
    print(s)
    print(f"Progress : {i}/{n} ({round(i/n*100,1)}%)")
    try :
        auto.consume_text(s)
        string = str('\"' + prefix + auto.qa_pairs[-1]['question'] + '\",\"' + auto.qa_pairs[-1]['answer'] + '\"' + "\n")
    except IndexError:
        print(f"Skipped sentence {s}")
        string = str(f"\"Skipped sentence : \", \"{s}\n\"")
    finally :
        with open(output_file.absolute(), "a") as of:
            of.write(string)

auto.print(prefix)

@thiswillbeyourgithub
Copy link
Collaborator

Update : I actually was mistaken so I ended up creating a script that reads paragraph from paragraph, will do a PR in the coming weeks

@paulbricman
Copy link
Owner

@thiswillbeyourgithub addressed this in #15, but the repo still needs some polishing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants