-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBusiness_Understanding.tex
176 lines (151 loc) · 13.5 KB
/
Business_Understanding.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[document]{ragged2e}
\usepackage{algpseudocode}
\usepackage[]{algorithmicx}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage[]{listings}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{flafter}
\usepackage{subfig}
\usepackage{dsfont}
\graphicspath{ {images/} }
\begin{document}
\begin{titlepage}
\centering
\includegraphics[width=0.15\textwidth]{IIIT-B_logo.jpg}\par\vspace{1cm}
{\scshape\LARGE International Institute of Information Technology, Bangalore \par}
\vspace{1cm}
{\scshape\Large Business Understanding Document\par}
{\Large DS 707 Data Analytics\par}
\vspace{1.5cm}
{\huge\bfseries Blockchain understanding and Cryptocurrency Analysis\par}
\vspace{2cm}
{\Large\itshape Akanksha Dwivedi - MT2016006\par}
{\Large\itshape Hitesha Mukherjee - MS2016007\par}
{\Large\itshape Nayna Jain - MS2017003\par}
{\Large\itshape Tarini Chandrashekhar - MT2016144\par}
\vfill
Instructors : \par
Prof. Ramanathan Chandrashekhar
\par
Prof. Uttam Kumar
\vfill
% Bottom of the page
{\large \today\par}
\end{titlepage}
\newpage
\tableofcontents
\newpage
\justify
\section{Determining Business Objectives}
\subsection{Background}
\textbf{Cryptocurrency} (built over Blockchains), a mysterious new technology emerged seemingly out of nowhere, at its most fundamental level is a breakthrough in computer science – one that builds on 20 years of research into cryptographic currency, and 40 years of research in cryptography, by thousands of researchers around the world. It gives a way for one Internet user to transfer a unique piece of digital property to another Internet user, such that the transfer is guaranteed to be safe and secure, everyone knows that the transfer has taken place, and nobody can challenge the legitimacy of the transfer.
\newline
Blockchains (and the consensus protocols that support them) were invented as a result of developers trying to solve this bold problem of how to create digital, untraceable money. By combining cryptography, game theory, economics, and computer science, they managed to create an entirely new set of tools for building decentralized systems.
\subsection{Business Goals}
The business objectives of this project undertaking are:
\begin{itemize}
\item To understand/describe the sudden surge in interest in cryptocurrencies recently.
\item To explore the volatile/unstable nature of the cryptocurrencies and co-relation between price fluctuations among them.
\item To be able to predict the future prices of the cryptocurrencies.
\item To identify and understand factors contributing to the overall behaviour of the cryptocurrencies, so that the prediction becomes easier.
\item To identify fake or dangerous users, thereby preventing fraud.
\item To grant cryptocurrency more legitimacy and thereby, greater adoption by performing in-depth analysis and pattern recognition across thousands of transactions, ensuring that users are protected.
\end{itemize}
\subsection{Business Success Criteria}
The success of our analytics endeavour depends on the value addition provided in terms of new information which the potential crypotocurrency adopters could benefit from and make use of, in their investment decisions.
\begin{itemize}
\item Correct prediction of future prices of blockchain tokens.
\item Highlight fraudulent use and/or theft of cryptocurrency.
\item Identification of factors leading to fluctuations in the currency evaluation of tokens.
\item Conclusion of the better currency from the predicted trends and volatility.
\end{itemize}
\subsection{Business benefits}
\begin{itemize}
\item \textbf{Exploratory and Descriptive Analytics:} Based on analyzing the historical prices of different cryptocurrencies, we can predict the trends for the same, which will help potential investors make informed decisions.
\item \textbf{Classification:} Identifying fraudulent transactions will help distinguish between legitimate and illegitimate transactions, preventing the case of a dishonest network and avoiding usage of the network for running scams.
\item \textbf{Clustering:} Highlighting anomalies in the network of users and/or transactions can be done by using various clustering methodologies.
\item \textbf{Association rules:} There can be established definite correlation between various factors and prices of the cryptocurrencies. This means the data can be analysed for frequent if-then relationships using the criteria of \textbf{support} and \textbf{confidence} to identify the most important relationships. This eventually leads to predict blockchain behaviour.
\end{itemize}
\subsection{Target Users}
Our analytics on cryptocurrencies will not only benefit the miners, who are actively engaged in the network but many other stakeholders who have actively been interested on the use of cryptocurrencies ever since its advent but have been holding back because of the lack of predictive information on its trends and patterns.
\begin{itemize}
\item The miners who validate transactions could benefit from the future prediction of a token price to know whether it's worth validating or not.
\item Individual investors participating in a token sale would know the criteria before hand, to evaluate token sales.
\item Consumers who pay with cryptocurrency which is evidently more stable.
\item Merchants who accept cryptocurency will come to know about the fraudulent transactions, which will help them avoid and report those users.
\item Entrepreneurs who are building new applications on top of blockchain technology can choose the more versatile blockchain technology based on the tokens used on top of it.
\end{itemize}
\section{Assessing the situation}
\subsection{Inventory Of Resources}
In the dataset we have the historical price information of some of the top cryptocurrencies by market capitalization. The currencies included are Bitcoin,
Ethereum, Ripple, Bitcoin cash, Bitconnect, Dash, Ethereum Classic, Iota, Litecoin, Monero, Nem, Neo, Numeraire, Stratis, Waves.
Each currency has one csv file with the following attributes extracted from \textbf{coinmarketcap}. Price history is available on a daily basis from April 28, 2013. The columns in each of the csv file are:
\begin{itemize}
\item Date : Date of observation
\item Open : Opening price on the given day
\item High : Highest price on the given day
\item Low : Lowest price on the given day
\item Close : Closing price on the given day
\item Volume : Volume of transactions on the given day
\item Market Cap : Market capitalization in USD
\end{itemize}
25 Manually hand crafted features are available for Bitcoin and Ethereum currencies.
\subsection{Requirements, Assumptions and Constraints}
\begin{itemize}
\item The correlations calculated between non-stationary timeseries data are often spurious and are not representative of any actual correlation inherent between the data sets. So, we make the data stationary.
\item We can ignore the cryptocurrencies which have comparatively less number of data points because of being relatively young.
\item Our data has a constraint that it has more tokens than it has blockchains technologies, so at best, we can gauge more information about the tokens than about blockchains.
\item We don't have individual block information in terms of network addresses, so performing anomaly detection and tagging fraudulent users/transactions will be a challenge.
\end{itemize}
\subsection{Risks and Contingencies}
Even though we conduct analytics to identify trends and predict prices, there are certain unforeseen risks and contingencies which we are completely unaware of:
\begin{itemize}
\item Even if we might have tagged a fraudulent transaction or user, it might be totally a false positive due to some other inexplicable error.
\item The prices we predict might be subject to sudden unforeseen fluctuations due to a new world event, something our analysis might not account for.
\end{itemize}
\subsection{Terminology}
\textbf{What is a 'Blockchain'?}
A blockchain is a digitized, decentralized, public ledger of all cryptocurrency transactions. Constantly growing as ‘completed’ blocks (the most recent transactions) are recorded and added to it in chronological order, it allows market participants to keep track of digital currency transactions without central record keeping. Each node (a computer connected to the network) gets a copy of the blockchain, which is downloaded automatically.
Originally, it was developed as the accounting method for the virtual currency Bitcoin, blockchains – which uses what's known as \textbf{distributed ledger technology (DLT)}. Currently, the technology is primarily used to verify transactions, within digital currencies though it is possible to digitize, code and insert practically any document into the blockchain. Doing so creates an indelible record that cannot be changed; furthermore, the record’s authenticity can be verified by the entire community using the blockchain instead of a single centralized authority.
The blockchain technology and respective cryptocurrencies has gained popularity due to the following advantages:
\begin{itemize}
\item Efficiencies resulting from DLT can add up to some serious cost savings. DLT systems make it possible for businesses and banks to streamline internal operations, dramatically reducing the expense, mistakes, and delays caused by traditional methods for reconciliation of records. Electronic ledgers are much cheaper to maintain than traditional accounting systems; the employee headcount in back offices can be greatly reduced. Nearly fully automated DLT systems result in far fewer errors and the elimination of repetitive confirmation steps. Minimizing the processing delay also means less capital being held against the risks of pending transactions.
\item Cryptocurrency uses a “push” mechanism that allows the cryptocurrency holder to send exactly what he or she wants to the merchant or recipient with no further information, thus preventing identity theft.
\item Bitcoin contracts can be designed and enforced to eliminate or add third party approvals, reference external facts, or be completed at a future date or time for a fraction of the expense and time required to complete traditional asset transfers, resulting in immediate settlement.
\end{itemize}
\section{Determining Data mining goals}
\subsection{Data Mining Goals}
\begin{itemize}
\item How did the historical prices / market capitalization of various currencies change over time?
\item Predicting the future price of the currencies.
\item Which currencies are more volatile and which ones are more stable?
\item How does the price fluctuations of currencies correlate with each other?
\item Seasonal trend in the price fluctuations.
\item Market Trends visualization by constructing cross correlation maps between different cryptographic currencies.
\item Blockchain Statistical Analysis of various fundamental factors affecting the network to draw basic inferences from the Bitcoin blockchain.
\end{itemize}
\subsection{Data Mining Success Criteria}
The process of data mining would be judged successfully from:
\begin{itemize}
\item The accuracy of different regression models (say specificity and precision) which predicts the sign of future change in price of cryptocurrencies at varying levels of granularity(eg 10 minute or 10 second interval time points). This evaluation metric measures the success percentage of our exploratory and predictive analytics on the historical data of cryptocurrrencies.
\item The R-square error of the predictive models estimating the change in prices or the future prices of the crypotcurrencies.
\item Visualizations of the descriptive analytics done on various crytocurrencies and comparing individual cryptographic currencies with the overall market.
\item Dual Evaluation metric (based on a combination of outliers in a user based graph and a transaction based graph) can be used to evaluate our identification of suspicious transactions and dubious users in the network.
\end{itemize}
\section{Producing a Project Plan}
\subsection{Project Plan}
Data mining can be defined as the extraction of implicit, previously unknown and potentially useful information from data. Machine learning provides the technical basis for data mining. In this project, we attempt to apply machine learning algorithms to predict Bitcoin price. Our data set consists of many features relating to the Bitcoin and payment network recorded over a period of time. Using this information we can predict the sign of the daily price change. Following steps can be taken during the course of the project:
\begin{itemize}
\item Firstly, cleaning and pre-processing of the data will be done along with interpolation of missing data.
\item Regression and predictive models can be used to predict price change as well as future prices of the currencies.
\item Unsupervised learning methods can be used for pattern recognition, which will lead to identify trends, as well as for anomaly detection,which will identify dubious transactions and users.
\item Classification algorithms can be used to distinguish between legitimate and illegitimate transactions.
\end{itemize}
\subsection{Initial Assessment of Tools and Techniques}
We will use a combination of Python and R libraries to supplement our needs for machine learning in our analytics project. Various unsupervised and supervised algorithms used are available in the form of libraries. We can also use Tableau Tool for effective visualization.
\end{document}