Skip to content

Latest commit

 

History

History
290 lines (213 loc) · 9.34 KB

july_daily_notes.org

File metadata and controls

290 lines (213 loc) · 9.34 KB

July 11, 2016

tf-idf

summary of ‘Key-value Memory Networks’

An example of calculating cosine similarity using tf-idf

ElasticSearch

Official Documentation worth your time reading

Inverted Index

difference between ‘Term’, ‘Query String’ and ‘Match Phrase’

Analyzers

Mapping

  • Match value of each field in document to certain data type

Document

  • A key is the name of a field or property, and a value can be a string, a number ....
  • A document also has metadata – information about the document.
  • Documents are indexed – stored and made searchable – by using the index API.

Sorting and Relevance

  • By default, results are returned sorted by relevance – with the most relevant docs first.

Doc Values Intro

  • When searching, we need to be able to map a term to a list of documents.
  • When sorting, we need to map a document to its terms. We need to ‘uninvert’ the inverted index.
  • Doc vaules are created at index-time: when a field is indexed, ElasticSearch adds the tokens to the

inverted index for search. But it also extracts the terms and adds them to the doc values.

Trec-CDS

Workflow I know so far

  1. Use ICONEngine to parse input to get concepts
  2. Use ElasticSearch on wiki pages (leading paragraphs) with concepts to select possible wiki pages
  3. Use ElasticSearch on Pubmed (abstract) with wiki titles to select possible papers.

  • Understand how ICON extract concepts from input

Possible places to improve

  • Some concepts are meaningless
    • topic keywords (come from ICON or regex?)
  • Use simple elastic query

Comments on the code

  • Run pipeline from ICON
TopicAnalyser.topicCreator
TopicAnalyser.topicKeyConceptsCreator1
  • use ElasticSearch to get wiki pages from key concepts
  • return KeyConcept - a mapping from key concept to a list of diagnoses (aka wiki pages)
TopicANalyser.topicDiagnosesCreator1
  • concept_score comes from ICON
  • calculate diagnosis score based on concept_score
  • filter diagnosis based on demographic info (WikiSearcher.filterByDemographic)
  • there is a mapping from disorder to gender
  • generate keywords for treatment
  • generate keywords for tes

UIMA

  • An Analysis Engine is a program that analyzes artifacts (e.g. documents) and infers information

from them.

  • An annotator is a component that contains analysis logic.
  • Annotators produce their analysis results in the form of typed Feature Structures, which has

simple data structures that have a type and a set of (attribute, value) pairs.

  • All feature structures, including annotations, are represented in the UIMA Common Analysis Structure (CAS)

Key-Value Memory Networks

Select certain sections from wiki pages

Limit number of sentences from medical notes

Store wiki pages on sentence level

use tf-idf to extract key words

July 12, 2016

Key Value Memory Network

Fight against numpy array error

  • a list of list issue
import numpy as np
# You want to feed (2,2) to tensorflow
# [1,2,3] [1,2] don't have the number of element
l = [[[1,2,3]], [[1,2]]]
# we don't get error from numpy
a = np.array(l)
a.shape

How to read wiki content from json (pad each sentence to the same length / select certain sections)

Only select certain sections from wiki pages

embed links in knowledge graph to connections between keys and values

use xavier initializer in tensorflow

W = tf.get_variable("W", shape=[d1, d2],
           initializer=tf.contrib.layers.xavier_initializer())

Git tips

git pull from master into the development branch

the difference between ‘git pull’ and ‘git fetch’

Run docker behind proxy

After install docker, start it with following command

sudo HTTP_PROXY=http://<PROXY_DETAILS>/ docker -d &

This works on CentOS 6.8.

July 13, 2016

Successfully run tensorflow on CentOS 6

Update glibc

Upldate gcc

Virtualenv in Python

Key Value Memory Networks

remove gradient noise when training

use single gru

July 15, 2016

Emacs plugin for Eclipse

nested search in ElasticSearch

Helpful video to get started with ElasticSearch

A set of helpful videos to get started with Latex

Combining Queries and Bool Query in ElasticSearch

July 17, 2016

Wide & Deep Learning

  • The idea behind this paper is to combine wide linear models

with cross-product feature transformations and deep neural networks with dense embeddings

ConfigProto for session in Tensorflow

  • inter_op_parallelism_threads
  • intra_op_parallelism_threads

Protocol Buffer

July 19, 2016

A good blog about git workflow

High-quality Latex tutorial

Using Emacs Serise (Video Tutorial)

Video tutorial on OrgMode

Git Data Transport Commands

git.png

git stash – useful command to save your unstaged changes

Sparsemax

  • simplex
  • $ρ(\mathdd{0}) = \mathdd{1}/K$
  • $ρ(\mathdd{z}) = ρ(\mathdd{z} + c\mathdd{1})$
    • $ρ is invariant to adding a constant to each coordinate$
  • logistic loss

bold math symbol in ”Latex

\mathbf{<characters>}

July 24, 2016

use batch training to speed up training process

  • this reduces the variance in the parameter update and can lead to more stable convergence
  • this allows the computation to take advantage of highly optimized matrix operations that

should be used in a well vectorized computation of the cost and gradient

  • One important point regarding SGD is the order in which we present the data to the algorithm
  • If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence.

Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

chain rule of gradient leads to vanishing gradient issue

- Slides about gradient and chain rule – https://math.dartmouth.edu/archive/m9f07/public_html/m9lect1119.pdf

  • easy-to-understand video tutorial on vanishing gradient

https://youtu.be/SKMpmAOUa2Q

Geek letters in ”LaTex

Whitening transformation

July 25, 2016

High accuracy hyperparameters

  • python train.py –batch_size=50 –embedding_dim=300 –feature_size=300 –hops=4 –vocab_size=10000

July 27, 2016

Dockerfile

learn how to use dockerfile

Docker image

Get index of the top n values of a list in python

Tensorflow - How to restore a previously saved model

upload my sample code to restore a saved model

Solarized

Terminal - command output redirect to file and terminal

/command/ |& tee /filename/

Python - extract file extension

Java - An example of sending post request

July 29, 2016

Java - sort a Map<Key, Value> by values

Java - iterate through a HashMap

Tensorflow - how to uninstall it

set up proxy for apt-get

Pandas - shuffle data frame

August 9, 2016

Short-term To-do

  • TODO Train a model with deep reinforcement learning
  • TODO Generative Adversial model