Skip to content

Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019

License

Notifications You must be signed in to change notification settings

Networks-Learning/memorize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a5fe694 · Jan 10, 2023

History

20 Commits
Jan 11, 2019
Nov 11, 2019
Jan 10, 2023
Jan 11, 2019
Jan 8, 2019
Feb 13, 2019
Jan 8, 2019
Jan 11, 2019
Jan 11, 2019
Jan 10, 2019
Jan 11, 2019
Jan 11, 2019

Repository files navigation

Memorize

This is a repository containing code and data for the paper:

B. Tabibian, U. Upadhyay, A. De, A. Zarezade, Bernhard Schölkopf, and M. Gomez-Rodriguez. Enhancing Human Learning via Spaced Repetition Optimization. Proceedings of the National Academy of Sciences (PNAS), March, 2019.

The paper is available from PNAS website and the supporting website also gives a description of our algorithm in a nutshell.

As a follow-up of this work, we tested a variant of the algorithm presented here (named Select) in the wild by means of a Randomized Trial and found that it performed significantly better than competitive baselines. We present those findings in the following paper:

U. Upadhyay, G. Lancashire, C. Moser and M. Gomez-Rodriguez. Large-scale randomized experiment reveals machine learning helps people learn and remember more effectively., npj Science of Learning, 6, Article number: 26 (2021).

Pre-requisites

This code depends on the following packages:

  1. numpy
  2. pandas
  3. matplotlib
  4. seaborn
  5. scipy
  6. dill
  7. click

Apart from this, the instructions assume that the Duolingo dataset has been downloaded, extracted, and saved at ./data/raw/duolingo.csv.

Code structure

  • memorize.py contains the memorize algorithm.
  • preprocesed_weights.csv contains estimated model parameters for the HLR model, as described in section 8 of supplementary materials.
  • observations_1k.csv contains a set of 1K user-item pairs and associated number of total/correct attempts by every user for given items. This dataset has been curated from a larger dataset released by Duolingo, available here.

Execution

The code can by executed as follows:

python memorize.py

The code will use default parameter value (q) used in the code.


Experiments with Duolingo data

Pre-processing

Convert to Python dict by user_id, lexeme_id and pruning it for reading it:

python dataset2dict.py ./data/raw/duolingo.csv ./data/duo_dict.dill --success_prob 0.99 --max_days 30 
python process_raw_data.py ./data/raw/duolingo.csv ./data/duolingo_reduced.csv

Plots

See the notebook plots.ipynb.