Merge floraxue repo 3 (#19)

* merge submitted answer types into one and clean leftovers * add create method to question serializer to support posting questions * get rid of the confusing two parse_schema.py files * fix a programming error in creating topics * add missing import Question in load_data * adding backend documentation * add missing Answer import in load_data, and remove references to any Question or Answer type in code * removing deprecated post_question endpoint * remove question type in loading data * adding missing question field in highlight group serializer
Goodly · Oct 1, 2016 · 3969830 · 3969830
1 parent 3964845
commit 3969830
Show file tree

Hide file tree

Showing 8 changed files with 250 additions and 427 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,45 +1,86 @@
-How the code in this repo works
-===
+#How the code in this repo works
+
 Hi! Glad you want to add code to this project. First, a brief overview of what's gone into this repo and some suggestions on how to get started using it.
 
-The frontend stack
----
+Formally, this repo contains a Redux app written with ES6 javascript and styled using Sass, served with hotloading and module support from webpack. It relies on a backend with a RESTful API found [here](https://github.com/Goodly/text-thresher-backend) which is a Django server running PostgreSQL in Docker.
+
+#Uh, where does the code even start?
+
+###The frontend code:
+
+`index.js` adds a root React component, which contains a `<Router>` which uses `routes.js` to decide what React component to render as its child. On the first load this is probably `App` from `app.js`, but even if it isn't when `app.js` was loaded it called `configureStore` from `appStore.js` which set up Redux. This in turn initializes the reducers, which perform the proper (synchronous for now) API calls from `api.js` to the back end so there's data to display.
+
+###The backend code:
+
+First of all, raw data is loaded to the database by executing `load_data.py` (as what you can see in `README`). Sample raw data is located in `data/sample/schema/` and `data/sample/article/`. The schema is basically some questions (and child questions) related to an article. `load_data.py` will call data parsers in `data/parse_schema.py` and `data/parse_document.py`. The two parser functions (i.e. `parse_schema()` and `parse_document()`)in `data/` parses specific articles and schemas related to that article.  
+
+The backend features a RESTful API and you can view it via browser. `thresher/urls.py` provides endpoints for you to access. `thresher/views.py` provides functions and models that defines how you view data via browser. `thresher/views.py` will call serializers in `thresher/serializer.py` to output data stored in models in an organized way. The models are stored in `thresher/models.py`, and this is the most important file to read to understand the data models used in backend (and also useful for understanding frontend).
+
+`thresher_backend/` contains management files for this django project. `docker-compose.yml`, `Dockerfile`, `init_docker.sh` and `reset_db.sh` are for running the backend locally with Docker.
+
+#The frontend stack
 * Redux
 * React
 * ES6
 * Sass
 * Webpack
 
-Formally, this repo contains a Redux app written with ES6 javascript and styled using Sass, served with hotloading and module support from webpack. It relies on a backend with a RESTful API found [here](https://github.com/Goodly/text-thresher-backend) which is a Django server running PostgreSQL in Docker.
-
-Uh, where does the code even start?
----
-`index.js` adds a root React component, which contains a `<Router>` which uses `routes.js` to decide what React component to render as its child. On the first load this is probably `App` from `app.js`, but even if it isn't when `app.js` was loaded it called `configureStore` from `appStore.js` which set up Redux. This in turn initializes the reducers, which perform the proper (synchronous for now) API calls from `api.js` to the back end so there's data to display.
+###What's React?
 
-What's React?
----
 The React docs aren't so great - think of it as a extension to Javascript which allows you to write markup inline with logic, and treat view code like funnels which accept data and produce the correct visual change.
 
-What's Redux?
----
+###What's Redux?
+
 Redux is a framework exceptionally good for building understandable and manageable UIs, because of it's unified state, unidirectional data flow, and pure functional mutations of state. The Redux docs ARE good, and you should read them until at least like the section labeled 'Advanced'. This one is the hardest to understand, after React. Read up and ask questions.
 
 I've written a somewhat helpful gist on React and Redux [here](https://gist.github.com/phorust/b4e61af8600f0b2843675f926a9f8ee0).
 
-What's ES6?
----
+###What's ES6?
+
 Pretty dope. It's the next language spec of Javascript, made available now by the lovely developer community making babel, which transpiles ES6 to ES5 (the current js spec). It got renamed to ES2016 but no one uses that name. Basically now, if you have the feeling there's a better way to do what you're doing, there probably is: lambdas, classes, helper functions, better iterators and packing / unpacking.
 
-What's Sass?
---
+###What's Sass?
+
 Syntactically Awesome StyleSheets - one of the leading preprocessors of CSS, it compiles? (people use this word way too losely) to plain CSS but makes writing stylesheets not a pain. Variables, calculations, mixins, and nesting (!!) all help CSS scale way better.
 
-What's Webpack?
----
+###What's Webpack?
+
 Man this one is hard. In the beginning there was nothing, and then people said wait javascript projects are getting big we should make a build system for javascript. Grunt was born, and compared to previous approaches it was revolutionary - instead of a fat IDE, instead of a build configuration file, you wrote actual code, which would execute and allowed you to be prescriptive rather than descriptive (I think there's two better words to use but I forget) about your build process. Then, people got tired of writing big gruntfiles, so they said hey let's use Gulp and started writing big gulpfiles instead. Gulp has ... better piping and IO redirection, so your tasks can be more powerful? Then, people got tired of the benefits of writing programmatic build files instead of configuration files and went back to writing huge configuration files and started using webpack. The main benefit of webpack is it's extremely powerful hot module reloading and the efficiency with which it detects, packages, and sends over changes and modules. Webpack files are especially gross and hard to understand but... these benefits are worth it.
 
 If you can't tell I still personally use grunt or gulp and I'm tired of spending more time writing effecient and cutting edge boilerplate than writing applications.
 
-What's ____?
----
+#The backend stack
+
+* Django REST Framework
+* PostgreSQL
+* Docker
+
+###What's Django REST Framework? 
+
+Django REST framework is a powerful and flexible toolkit for building Web APIs.
+
+Some reasons you might want to use REST framework:
+
+1. The Web browsable API is a huge usability win for your developers.
+2. Authentication policies including optional packages for OAuth1a and OAuth2.
+3. Serialization that supports both ORM and non-ORM data sources.
+4. Customizable all the way down - just use regular function-based views if you don't need the more powerful features.
+5. Extensive documentation, and great community support.
+
+And here's a brief intro to Django:
+
+Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
+
+###What's PostgreSQL?
+
+PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.
+
+###What's Docker?
+
+Docker is an open-source project that automates the deployment of applications inside software containers.
+
+Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.
+
+###What's ____?
+
 I've tried to enumerate all the interesting and useful parts of all the above, so that you can Google the pieces easily. Developer support for this stuff is all great since it's pretty much cutting edge and widely accepted as the way to go. The only thing we're not doing which would be great but not possible (Python has too many benefits for research) is isomorphic Redux, which just means the server also is in javascript and runs redux.
diff --git a/load_data.py b/load_data.py
@@ -15,19 +15,155 @@
 
 from data.parse_document import parse_document
 from data.parse_schema import parse_schema
-from parse_schema import TopicsSchemaParser
-from thresher.models import Article, Topic, HighlightGroup, ArticleHighlight
+from thresher.models import (Article, Topic, HighlightGroup, 
+                             ArticleHighlight, Question, Answer)
 ANALYSIS_TYPES = {}
 HIGH_ID = 20000
 
+class TopicsSchemaParser(object):
+    """
+    Parses a json schema of topics and questions and populates the database
+    """
+    def __init__(self, topic_obj, schema, dependencies):
+        """
+        topic_obj: The Topic object that is the parent of subtopics in schema
+        schema: A json schema as a string or loaded json with subtopics
+        dependencies: The list of answers that point to another question
+        """
+        self.topic_obj = topic_obj
+        # if the schema is a string, tries to load it as json, otherwise,
+        # assumes it's already json
+        if isinstance(schema, str) or isinstance(schema, unicode):
+            self.schema_json = json.loads(schema)
+        else:
+            self.schema_json = schema
+        # ensure that the analysis_type is valid
+        if not isinstance(topic_obj, Topic):
+            raise ValueError("schema must be an instance of Topic model")
+        self.dep = dependencies
+
+    def load_answers(self, answers, question):
+        """
+        Creates the answers instances for a given question.
+        answers: A list of answers 
+        question: The question that answers belongs to
+        """
+        # find the corresponding topic and question ids
+        for answer_args in answers:
+            # create the next question reference, it will be rewritten in
+            # load_next_question
+            answer_args['question'] = question
+            # Create the answer in the database
+            answer = Answer.objects.create(**answer_args)
+
+    def load_questions(self, questions, topic):
+        """
+        Creates the questions instances for the given topic.
+        questions: A list of questions
+        topic: The topic that questions belongs to
+        """
+        for question_args in questions:
+            # Create the topic
+            question_args['topic'] = topic
+            # Store the answers for later
+            answers = question_args.pop('answers')
+            # No type for Questions any more
+            question_args.pop('type')
+            # Create the Question
+            question = Question.objects.create(**question_args)
+            # Load the Question's answers
+            self.load_answers(answers, question)
+
+    def load_topics(self):
+        """
+        Loads all the topics, their questions and their answers.
+        """
+        for topic_args in self.schema_json:
+            # Get the questions to add them later
+            questions = topic_args.pop('questions')
+            # Change id to order
+            topic_args['order'] = topic_args.pop('id')
+            # Set reference to parent
+            topic_args['parent'] = self.topic_obj
+            # Create the topic with the values in topic_args
+            topic = Topic.objects.create(**topic_args)
+            self.load_questions(questions, topic)
+        self.load_next_question()
+        self.load_dependencies()
+
+    def load_next_question(self):
+        """
+        Loads all mandatory next_questions to Answer objects. 
+        If an answer does not point to another question, that 
+        signals the end. Also populates each mandatory question 
+        with a default next question.
+        """
+        topics = Topic.objects.filter(parent=self.topic_obj)
+        for topic in topics:
+            questions = Question.objects.filter(topic=topic, 
+                                                contingency=False) \
+                                        .order_by('question_id')
+            for i in range(len(questions) - 1):
+                self.write_answers(questions[i], questions[i + 1])
+
+    def write_answers(self, curr_question, next_question):
+        """
+        Helper method for load_next_question.
+        Writes the default next answer to the current question and its answers.
+        curr_question: the curr_question to be modified
+        next_question: the next_question curr_question should point to by
+                       default
+        """
+        curr_question.default_next = next_question
+        curr_question.save()
+        answers = Answer.objects.filter(question=curr_question)
+        for answer in answers:
+            answer.next_question = next_question
+            answer.save()
+
+    def load_dependencies(self):
+        """
+        Loads dependencies into targeted answers.
+        """
+        topics = Topic.objects.filter(parent=self.topic_obj)
+        for dep in self.dep:
+            topic = topics.filter(order=dep.topic)
+            question = Question.objects.filter(topic=topic, 
+                                               question_id=dep.question)[0]
+            answers = Answer.objects.filter(
+                question=question)
+            next_question = Question.objects.filter(
+                topic=topic, question_id=dep.next_question)[0]
+            next_question_answers = Answer.objects.filter(
+                question=next_question)
+
+            next_question.default_next = question.default_next
+            next_question.save()
+
+            # First we populate the contingency question's answers with the
+            # default next answer
+            for answer in next_question_answers:
+                answer.next_question = next_question.default_next
+                answer.save()
+
+            # Now we point the current question's answer to the next question
+            if dep.answer == '*':
+                answers = answers
+            else:
+                answers = answers.filter(answer_id=dep.answer)
+            for answer in answers:
+                answer.next_question = next_question
+                answer.save()
+
+
 def load_schema(schema):
     schema_name = schema['title']
     schema_parent = schema['parent']
     if schema_parent:
         parent = Topic.objects.get(name=schema_parent)
     else:
         parent = None
-    schema_obj = Topic(
+    schema_obj = Topic.objects.create(
         parent=parent,
         name=schema_name,
         instructions=schema['instructions'],