Skip to content

Commit

Permalink
Merge floraxue repo 3 (#19)
Browse files Browse the repository at this point in the history
* merge submitted answer types into one and clean leftovers

* add create method to question serializer to support posting questions

* get rid of the confusing two parse_schema.py files

* fix a programming error in creating topics

* add missing import Question in load_data

* adding backend documentation

* add missing Answer import in load_data, and remove references to any Question or Answer type in code

* removing deprecated post_question endpoint

* remove question type in loading data

* adding missing question field in highlight group serializer
  • Loading branch information
floraxue authored and normangilmore committed Oct 1, 2016
1 parent 3964845 commit 3969830
Show file tree
Hide file tree
Showing 8 changed files with 250 additions and 427 deletions.
83 changes: 62 additions & 21 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,86 @@
How the code in this repo works
===
#How the code in this repo works

Hi! Glad you want to add code to this project. First, a brief overview of what's gone into this repo and some suggestions on how to get started using it.

The frontend stack
---
Formally, this repo contains a Redux app written with ES6 javascript and styled using Sass, served with hotloading and module support from webpack. It relies on a backend with a RESTful API found [here](https://github.com/Goodly/text-thresher-backend) which is a Django server running PostgreSQL in Docker.

#Uh, where does the code even start?

###The frontend code:

`index.js` adds a root React component, which contains a `<Router>` which uses `routes.js` to decide what React component to render as its child. On the first load this is probably `App` from `app.js`, but even if it isn't when `app.js` was loaded it called `configureStore` from `appStore.js` which set up Redux. This in turn initializes the reducers, which perform the proper (synchronous for now) API calls from `api.js` to the back end so there's data to display.

###The backend code:

First of all, raw data is loaded to the database by executing `load_data.py` (as what you can see in `README`). Sample raw data is located in `data/sample/schema/` and `data/sample/article/`. The schema is basically some questions (and child questions) related to an article. `load_data.py` will call data parsers in `data/parse_schema.py` and `data/parse_document.py`. The two parser functions (i.e. `parse_schema()` and `parse_document()`)in `data/` parses specific articles and schemas related to that article.

The backend features a RESTful API and you can view it via browser. `thresher/urls.py` provides endpoints for you to access. `thresher/views.py` provides functions and models that defines how you view data via browser. `thresher/views.py` will call serializers in `thresher/serializer.py` to output data stored in models in an organized way. The models are stored in `thresher/models.py`, and this is the most important file to read to understand the data models used in backend (and also useful for understanding frontend).

`thresher_backend/` contains management files for this django project. `docker-compose.yml`, `Dockerfile`, `init_docker.sh` and `reset_db.sh` are for running the backend locally with Docker.

#The frontend stack
* Redux
* React
* ES6
* Sass
* Webpack

Formally, this repo contains a Redux app written with ES6 javascript and styled using Sass, served with hotloading and module support from webpack. It relies on a backend with a RESTful API found [here](https://github.com/Goodly/text-thresher-backend) which is a Django server running PostgreSQL in Docker.

Uh, where does the code even start?
---
`index.js` adds a root React component, which contains a `<Router>` which uses `routes.js` to decide what React component to render as its child. On the first load this is probably `App` from `app.js`, but even if it isn't when `app.js` was loaded it called `configureStore` from `appStore.js` which set up Redux. This in turn initializes the reducers, which perform the proper (synchronous for now) API calls from `api.js` to the back end so there's data to display.
###What's React?

What's React?
---
The React docs aren't so great - think of it as a extension to Javascript which allows you to write markup inline with logic, and treat view code like funnels which accept data and produce the correct visual change.

What's Redux?
---
###What's Redux?

Redux is a framework exceptionally good for building understandable and manageable UIs, because of it's unified state, unidirectional data flow, and pure functional mutations of state. The Redux docs ARE good, and you should read them until at least like the section labeled 'Advanced'. This one is the hardest to understand, after React. Read up and ask questions.

I've written a somewhat helpful gist on React and Redux [here](https://gist.github.com/phorust/b4e61af8600f0b2843675f926a9f8ee0).

What's ES6?
---
###What's ES6?

Pretty dope. It's the next language spec of Javascript, made available now by the lovely developer community making babel, which transpiles ES6 to ES5 (the current js spec). It got renamed to ES2016 but no one uses that name. Basically now, if you have the feeling there's a better way to do what you're doing, there probably is: lambdas, classes, helper functions, better iterators and packing / unpacking.

What's Sass?
--
###What's Sass?

Syntactically Awesome StyleSheets - one of the leading preprocessors of CSS, it compiles? (people use this word way too losely) to plain CSS but makes writing stylesheets not a pain. Variables, calculations, mixins, and nesting (!!) all help CSS scale way better.

What's Webpack?
---
###What's Webpack?

Man this one is hard. In the beginning there was nothing, and then people said wait javascript projects are getting big we should make a build system for javascript. Grunt was born, and compared to previous approaches it was revolutionary - instead of a fat IDE, instead of a build configuration file, you wrote actual code, which would execute and allowed you to be prescriptive rather than descriptive (I think there's two better words to use but I forget) about your build process. Then, people got tired of writing big gruntfiles, so they said hey let's use Gulp and started writing big gulpfiles instead. Gulp has ... better piping and IO redirection, so your tasks can be more powerful? Then, people got tired of the benefits of writing programmatic build files instead of configuration files and went back to writing huge configuration files and started using webpack. The main benefit of webpack is it's extremely powerful hot module reloading and the efficiency with which it detects, packages, and sends over changes and modules. Webpack files are especially gross and hard to understand but... these benefits are worth it.

If you can't tell I still personally use grunt or gulp and I'm tired of spending more time writing effecient and cutting edge boilerplate than writing applications.

What's ____?
---
#The backend stack

* Django REST Framework
* PostgreSQL
* Docker

###What's Django REST Framework?

Django REST framework is a powerful and flexible toolkit for building Web APIs.

Some reasons you might want to use REST framework:

1. The Web browsable API is a huge usability win for your developers.
2. Authentication policies including optional packages for OAuth1a and OAuth2.
3. Serialization that supports both ORM and non-ORM data sources.
4. Customizable all the way down - just use regular function-based views if you don't need the more powerful features.
5. Extensive documentation, and great community support.

And here's a brief intro to Django:

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.

###What's PostgreSQL?

PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

###What's Docker?

Docker is an open-source project that automates the deployment of applications inside software containers.

Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.

###What's ____?

I've tried to enumerate all the interesting and useful parts of all the above, so that you can Google the pieces easily. Developer support for this stuff is all great since it's pretty much cutting edge and widely accepted as the way to go. The only thing we're not doing which would be great but not possible (Python has too many benefits for research) is isomorphic Redux, which just means the server also is in javascript and runs redux.
142 changes: 139 additions & 3 deletions load_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,155 @@

from data.parse_document import parse_document
from data.parse_schema import parse_schema
from parse_schema import TopicsSchemaParser
from thresher.models import Article, Topic, HighlightGroup, ArticleHighlight
from thresher.models import (Article, Topic, HighlightGroup,
ArticleHighlight, Question, Answer)
ANALYSIS_TYPES = {}
HIGH_ID = 20000

class TopicsSchemaParser(object):
"""
Parses a json schema of topics and questions and populates the database
"""
def __init__(self, topic_obj, schema, dependencies):
"""
topic_obj: The Topic object that is the parent of subtopics in schema
schema: A json schema as a string or loaded json with subtopics
dependencies: The list of answers that point to another question
"""
self.topic_obj = topic_obj
# if the schema is a string, tries to load it as json, otherwise,
# assumes it's already json
if isinstance(schema, str) or isinstance(schema, unicode):
self.schema_json = json.loads(schema)
else:
self.schema_json = schema
# ensure that the analysis_type is valid
if not isinstance(topic_obj, Topic):
raise ValueError("schema must be an instance of Topic model")
self.dep = dependencies

def load_answers(self, answers, question):
"""
Creates the answers instances for a given question.
answers: A list of answers
question: The question that answers belongs to
"""
# find the corresponding topic and question ids
for answer_args in answers:
# create the next question reference, it will be rewritten in
# load_next_question
answer_args['question'] = question
# Create the answer in the database
answer = Answer.objects.create(**answer_args)

def load_questions(self, questions, topic):
"""
Creates the questions instances for the given topic.
questions: A list of questions
topic: The topic that questions belongs to
"""
for question_args in questions:
# Create the topic
question_args['topic'] = topic
# Store the answers for later
answers = question_args.pop('answers')
# No type for Questions any more
question_args.pop('type')
# Create the Question
question = Question.objects.create(**question_args)
# Load the Question's answers
self.load_answers(answers, question)

def load_topics(self):
"""
Loads all the topics, their questions and their answers.
"""
for topic_args in self.schema_json:
# Get the questions to add them later
questions = topic_args.pop('questions')
# Change id to order
topic_args['order'] = topic_args.pop('id')
# Set reference to parent
topic_args['parent'] = self.topic_obj
# Create the topic with the values in topic_args
topic = Topic.objects.create(**topic_args)
self.load_questions(questions, topic)
self.load_next_question()
self.load_dependencies()

def load_next_question(self):
"""
Loads all mandatory next_questions to Answer objects.
If an answer does not point to another question, that
signals the end. Also populates each mandatory question
with a default next question.
"""
topics = Topic.objects.filter(parent=self.topic_obj)
for topic in topics:
questions = Question.objects.filter(topic=topic,
contingency=False) \
.order_by('question_id')
for i in range(len(questions) - 1):
self.write_answers(questions[i], questions[i + 1])

def write_answers(self, curr_question, next_question):
"""
Helper method for load_next_question.
Writes the default next answer to the current question and its answers.
curr_question: the curr_question to be modified
next_question: the next_question curr_question should point to by
default
"""
curr_question.default_next = next_question
curr_question.save()
answers = Answer.objects.filter(question=curr_question)
for answer in answers:
answer.next_question = next_question
answer.save()

def load_dependencies(self):
"""
Loads dependencies into targeted answers.
"""
topics = Topic.objects.filter(parent=self.topic_obj)
for dep in self.dep:
topic = topics.filter(order=dep.topic)
question = Question.objects.filter(topic=topic,
question_id=dep.question)[0]
answers = Answer.objects.filter(
question=question)
next_question = Question.objects.filter(
topic=topic, question_id=dep.next_question)[0]
next_question_answers = Answer.objects.filter(
question=next_question)

next_question.default_next = question.default_next
next_question.save()

# First we populate the contingency question's answers with the
# default next answer
for answer in next_question_answers:
answer.next_question = next_question.default_next
answer.save()

# Now we point the current question's answer to the next question
if dep.answer == '*':
answers = answers
else:
answers = answers.filter(answer_id=dep.answer)
for answer in answers:
answer.next_question = next_question
answer.save()


def load_schema(schema):
schema_name = schema['title']
schema_parent = schema['parent']
if schema_parent:
parent = Topic.objects.get(name=schema_parent)
else:
parent = None
schema_obj = Topic(
schema_obj = Topic.objects.create(
parent=parent,
name=schema_name,
instructions=schema['instructions'],
Expand Down
Loading

0 comments on commit 3969830

Please sign in to comment.