July 11, 2016

tf-idf

http://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting

summary of ‘Key-value Memory Networks’

https://gist.github.com/shagunsodhani/a5e0baa075b4a917c0a69edc575772a8

An example of calculating cosine similarity using tf-idf

http://stackoverflow.com/a/18914884/5361448

ElasticSearch

Official Documentation worth your time reading

https://www.elastic.co/guide/en/elasticsearch/guide/current/getting-started.html

Inverted Index

https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html

difference between ‘Term’, ‘Query String’ and ‘Match Phrase’

http://stackoverflow.com/questions/26001002/elastic-search-difference-between-term-match-phrase-and-query-string

Analyzers

https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html
When we index a document, its full-text fields are analyzed into terms that are used to create the inverted index.

Mapping

Match value of each field in document to certain data type

Document

A key is the name of a field or property, and a value can be a string, a number ....
A document also has metadata – information about the document.
Documents are indexed – stored and made searchable – by using the index API.

Sorting and Relevance

By default, results are returned sorted by relevance – with the most relevant docs first.

Doc Values Intro

When searching, we need to be able to map a term to a list of documents.
When sorting, we need to map a document to its terms. We need to ‘uninvert’ the inverted index.
Doc vaules are created at index-time: when a field is indexed, ElasticSearch adds the tokens to the

inverted index for search. But it also extracts the terms and adds them to the doc values.

Trec-CDS

Workflow I know so far

Use ICONEngine to parse input to get concepts
Use ElasticSearch on wiki pages (leading paragraphs) with concepts to select possible wiki pages
Use ElasticSearch on Pubmed (abstract) with wiki titles to select possible papers.

Understand how ICON extract concepts from input

Possible places to improve

Some concepts are meaningless
- topic keywords (come from ICON or regex?)
Use simple elastic query

Comments on the code

Run pipeline from ICON

TopicAnalyser.topicCreator

TopicAnalyser.topicKeyConceptsCreator1

use ElasticSearch to get wiki pages from key concepts
return KeyConcept - a mapping from key concept to a list of diagnoses (aka wiki pages)

TopicANalyser.topicDiagnosesCreator1

concept_score comes from ICON
calculate diagnosis score based on concept_score
filter diagnosis based on demographic info (WikiSearcher.filterByDemographic)
there is a mapping from disorder to gender

generate keywords for treatment
generate keywords for tes

UIMA

An Analysis Engine is a program that analyzes artifacts (e.g. documents) and infers information

from them.

An annotator is a component that contains analysis logic.
Annotators produce their analysis results in the form of typed Feature Structures, which has

simple data structures that have a type and a set of (attribute, value) pairs.

All feature structures, including annotations, are represented in the UIMA Common Analysis Structure (CAS)

Key-Value Memory Networks

Select certain sections from wiki pages

Limit number of sentences from medical notes

Store wiki pages on sentence level

use tf-idf to extract key words

July 12, 2016

Key Value Memory Network

Fight against numpy array error

a list of list issue

import numpy as np
# You want to feed (2,2) to tensorflow
# [1,2,3] [1,2] don't have the number of element
l = [[[1,2,3]], [[1,2]]]
# we don't get error from numpy
a = np.array(l)
a.shape

How to read wiki content from json (pad each sentence to the same length / select certain sections)

Only select certain sections from wiki pages

embed links in knowledge graph to connections between keys and values

use xavier initializer in tensorflow

W = tf.get_variable("W", shape=[d1, d2],
           initializer=tf.contrib.layers.xavier_initializer())

Git tips

git pull from master into the development branch

http://stackoverflow.com/a/20103414/5361448

the difference between ‘git pull’ and ‘git fetch’

http://stackoverflow.com/a/292359/5361448

Run docker behind proxy

After install docker, start it with following command

sudo HTTP_PROXY=http://<PROXY_DETAILS>/ docker -d &

This works on CentOS 6.8.

July 13, 2016

Successfully run tensorflow on CentOS 6

Update glibc

Upldate gcc

Virtualenv in Python

Key Value Memory Networks

remove gradient noise when training

use single gru

July 15, 2016

Emacs plugin for Eclipse

https://marketplace.eclipse.org/content/emacs

nested search in ElasticSearch

https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-query.html

Helpful video to get started with ElasticSearch

https://youtu.be/60UsHHsKyN4

A set of helpful videos to get started with Latex

https://www.youtube.com/playlist?list=PL1D4EAB31D3EBC449

Combining Queries and Bool Query in ElasticSearch

https://www.elastic.co/guide/en/elasticsearch/guide/current/bool-query.html

July 17, 2016

Wide & Deep Learning

The idea behind this paper is to combine wide linear models

with cross-product feature transformations and deep neural networks with dense embeddings

ConfigProto for session in Tensorflow

inter_op_parallelism_threads
intra_op_parallelism_threads

Protocol Buffer

https://developers.google.com/protocol-buffers/docs/pythontutorial

July 19, 2016

A good blog about git workflow

https://sandofsky.com/blog/git-workflow.html

High-quality Latex tutorial

https://www.sharelatex.com/blog/latex-guides/beginners-tutorial.html

Using Emacs Serise (Video Tutorial)

http://cestlaz.github.io/stories/emacs/

Video tutorial on OrgMode

https://www.youtube.com/playlist?list=PLVtKhBrRV_ZkPnBtt_TD1Cs9PJlU0IIdE

Git Data Transport Commands

git stash – useful command to save your unstaged changes

https://git-scm.com/book/en/v1/Git-Tools-Stashing

Sparsemax

simplex
$ρ(\mathdd{0}) = \mathdd{1}/K$
$ρ(\mathdd{z}) = ρ(\mathdd{z} + c\mathdd{1})$
- $ρ is invariant to adding a constant to each coordinate$
logistic loss

bold math symbol in ”Latex

\mathbf{<characters>}

July 24, 2016

use batch training to speed up training process

this reduces the variance in the parameter update and can lead to more stable convergence
this allows the computation to take advantage of highly optimized matrix operations that

should be used in a well vectorized computation of the cost and gradient

One important point regarding SGD is the order in which we present the data to the algorithm
If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence.

Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

chain rule of gradient leads to vanishing gradient issue

－ Slides about gradient and chain rule – https://math.dartmouth.edu/archive/m9f07/public_html/m9lect1119.pdf

easy-to-understand video tutorial on vanishing gradient

– https://youtu.be/SKMpmAOUa2Q

Geek letters in ”LaTex

https://www.latex-tutorial.com/symbols/greek-alphabet/

Whitening transformation

https://en.wikipedia.org/wiki/Whitening_transformation
http://cs231n.github.io/neural-networks-3/
http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

July 25, 2016

High accuracy hyperparameters

python train.py –batch_size=50 –embedding_dim=300 –feature_size=300 –hops=4 –vocab_size=10000

July 27, 2016

Dockerfile

https://docs.docker.com/engine/reference/builder/

learn how to use dockerfile

Docker image

https://docs.docker.com/v1.8/userguide/dockerimages/

Get index of the top n values of a list in python

http://stackoverflow.com/a/13070505/5361448
use lambda function and sorted()

Tensorflow - How to restore a previously saved model

http://stackoverflow.com/a/33762168/5361448

upload my sample code to restore a saved model

Solarized

http://ethanschoonover.com/solarized
Precision colors for machines and people

Terminal - command output redirect to file and terminal

/command/ |& tee /filename/

Python - extract file extension

http://stackoverflow.com/a/541394/5361448

Java - An example of sending post request

https://gist.github.com/siyuanzhao/268116573b81cb8542db456a7c28eb37

July 29, 2016

Java - sort a Map<Key, Value> by values

http://stackoverflow.com/a/2581754/5361448

Java - iterate through a HashMap

http://stackoverflow.com/a/1066603/5361448

Tensorflow - how to uninstall it

http://askubuntu.com/a/764087

set up proxy for apt-get

http://askubuntu.com/a/711906

Pandas - shuffle data frame

http://stackoverflow.com/a/29576803/5361448

August 9, 2016

Short-term To-do

TODO Train a model with deep reinforcement learning
TODO Generative Adversial model

Files

july_daily_notes.org

Latest commit

History