- https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html
- When we index a document, its full-text fields are analyzed into terms that are used to create the inverted index.
- Match value of each field in document to certain data type
- A key is the name of a field or property, and a value can be a string, a number ....
- A document also has metadata – information about the document.
- Documents are indexed – stored and made searchable – by using the index API.
- By default, results are returned sorted by relevance – with the most relevant docs first.
- When searching, we need to be able to map a term to a list of documents.
- When sorting, we need to map a document to its terms. We need to ‘uninvert’ the inverted index.
- Doc vaules are created at index-time: when a field is indexed, ElasticSearch adds the tokens to the
inverted index for search. But it also extracts the terms and adds them to the doc values.
- Use ICONEngine to parse input to get concepts
- Use ElasticSearch on wiki pages (leading paragraphs) with concepts to select possible wiki pages
- Use ElasticSearch on Pubmed (abstract) with wiki titles to select possible papers.
- Understand how ICON extract concepts from input
- Some concepts are meaningless
- topic keywords (come from ICON or regex?)
- Use simple elastic query
- Run pipeline from ICON
- TopicAnalyser.topicCreator
-
- TopicAnalyser.topicKeyConceptsCreator1
-
- use ElasticSearch to get wiki pages from key concepts
- return KeyConcept - a mapping from key concept to a list of diagnoses (aka wiki pages)
- TopicANalyser.topicDiagnosesCreator1
-
- concept_score comes from ICON
- calculate diagnosis score based on concept_score
- filter diagnosis based on demographic info (WikiSearcher.filterByDemographic)
- there is a mapping from disorder to gender
- generate keywords for treatment
- generate keywords for tes
- An Analysis Engine is a program that analyzes artifacts (e.g. documents) and infers information
from them.
- An annotator is a component that contains analysis logic.
- Annotators produce their analysis results in the form of typed Feature Structures, which has
simple data structures that have a type and a set of (attribute, value) pairs.
- All feature structures, including annotations, are represented in the UIMA Common Analysis Structure (CAS)
- a list of list issue
import numpy as np
# You want to feed (2,2) to tensorflow
# [1,2,3] [1,2] don't have the number of element
l = [[[1,2,3]], [[1,2]]]
# we don't get error from numpy
a = np.array(l)
a.shape
W = tf.get_variable("W", shape=[d1, d2],
initializer=tf.contrib.layers.xavier_initializer())
After install docker, start it with following command
sudo HTTP_PROXY=http://<PROXY_DETAILS>/ docker -d &
This works on CentOS 6.8.
- The idea behind this paper is to combine wide linear models
with cross-product feature transformations and deep neural networks with dense embeddings
- inter_op_parallelism_threads
- intra_op_parallelism_threads
- simplex
$ρ(\mathdd{0}) = \mathdd{1}/K$ -
$ρ(\mathdd{z}) = ρ(\mathdd{z} + c\mathdd{1})$ $ρ is invariant to adding a constant to each coordinate$
- logistic loss
\mathbf{<characters>}
- this reduces the variance in the parameter update and can lead to more stable convergence
- this allows the computation to take advantage of highly optimized matrix operations that
should be used in a well vectorized computation of the cost and gradient
- One important point regarding SGD is the order in which we present the data to the algorithm
- If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence.
Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.
- Slides about gradient and chain rule – https://math.dartmouth.edu/archive/m9f07/public_html/m9lect1119.pdf
- easy-to-understand video tutorial on vanishing gradient
– https://youtu.be/SKMpmAOUa2Q
- https://en.wikipedia.org/wiki/Whitening_transformation
- http://cs231n.github.io/neural-networks-3/
- http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/
- python train.py –batch_size=50 –embedding_dim=300 –feature_size=300 –hops=4 –vocab_size=10000
- http://stackoverflow.com/a/13070505/5361448
- use lambda function and sorted()
- http://ethanschoonover.com/solarized
- Precision colors for machines and people
/command/ |& tee /filename/
- TODO Train a model with deep reinforcement learning
- TODO Generative Adversial model