Forecasting-intraday-fluctuations-of-the-DJIA-by-leveraging-news-headlines-and-financial-indicators

The integration of a GPT-2 based embedding with a simple machine learning model for classification has proven to outperform baseline models in comprehensive comparative analyses. This hybrid approach, leveraging the context-rich representations generated by GPT-2 alongside traditional machine learning techniques, demonstrates enhanced predictive capabilities. The success of this combined model suggests that the synergistic use of advanced language embeddings with domain-specific machine learning models can yield superior results in classification tasks involving historical stock prices and news headline features.

Introduction

This study seeks to utilize sentiment analysis on information sourced from news outlets, merging it with stock market data to forecast the volatility movement of the DJIA index.Diverging from prevalent methods primarily centered around stock price predictions, this study delves into the prospect of leveraging contextual analysis of textual data to improve the precision of DJIA index volatility forecasts. The primary research question is formulated as follows: Can sentiment analysis on news headlines contribute to predicting stock volatility movements in the DJIA index? To address this overarching question, two specific sub-questions are formulated:

How does stock market related news headlines enhance the accuracy scores of deep neural network models in predicting DJIA index volatility movement?
In what ways do Natural Language Processing techniques and qualitative analysis of news headlines contribute to forecasting DJIA index volatility movements?

Dataset

The dataset utilized in this investigation is sourced from Kaggle covering the period from June 8th, 2008, to July 1st, 2016. Originally designed for students participating in a Deep Learning and NLP course, the dataset comprises both stock and news data, provided in .csv format and conveniently accessible on the associated website. Link: https://www.kaggle.com/aaron7sun/stocknews

Requirements

Data Preprocessing: Loading and processing datasets, particularly using Pandas (pandas library). Tokenizing and encoding text data for model input.
Model Building: Utilizing the Hugging Face Transformers library (transformers) to work with pre-trained models for NLP tasks. Creating and configuring a text classification model.
Model Training: Training the text classification model on the provided dataset. Fine-tuning the pre-trained transformer model for specific tasks using custom data.
Evaluation: Evaluating the trained model's performance on a validation dataset, likely using metrics such as accuracy, precision, recall, and F1 score.
Prediction: Using the trained model to make predictions on new, unseen text data.
Integration with Scikit-Learn: Leveraging Scikit-Learn (scikit-learn) for machine learning functionalities, such as splitting the dataset into training and validation sets.
TensorFlow and Keras Integration: Combining the Hugging Face Transformers library with TensorFlow (tensorflow) and Keras (keras) for building and training models.
Logging and Reporting: Logging information during the training process, possibly for monitoring training progress and model performance.
Custom Tokenization and Padding: Handling tokenization and padding of text sequences for model input. 10.Data Visualization (Possibly): Plotting and visualizing data or model performance using libraries like Matplotlib or Seaborn.
Requirements File: Organizing project dependencies using a requirements.txt file.

Features

Financial Indicators : ['Stochastic_K', 'Stochastic_D', 'Momentum', 'Rate_of_Change', 'William_R', 'A/D_Oscillator', 'Disparity_5']
Textual Features : 25 news headlines extracted from the Reddit World-News Channel
Data Preprocessing: Loading and processing datasets, particularly using Pandas (pandas library). Tokenizing and encoding text data for model input.
Model Building: Utilizing the Hugging Face Transformers library (transformers) to work with pre-trained models for NLP tasks. Creating and configuring a text classification model.
Model Training: Training the text classification model on the provided dataset. Fine-tuning the pre-trained transformer model for specific tasks using custom data.
Evaluation: Evaluating the trained model's performance on a validation dataset, likely using metrics such as accuracy, precision, recall, and F1 score.
Prediction: Using the trained model to make predictions on new, unseen text data.
Integration with Scikit-Learn: Leveraging Scikit-Learn (scikit-learn) for machine learning functionalities, such as splitting the dataset into training and validation sets.
TensorFlow and Keras Integration: Combining the Hugging Face Transformers library with TensorFlow (tensorflow) and Keras (keras) for building and training models.
Logging and Reporting: Logging information during the training process, possibly for monitoring training progress and model performance.
Custom Tokenization and Padding: Handling tokenization and padding of text sequences for model input. 10.Data Visualization (Possibly): Plotting and visualizing data or model performance using libraries like Matplotlib or Seaborn.
Requirements File: Organizing project dependencies using a requirements.txt file.

Installation

Pandas: Used for data manipulation and analysis.

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

Transformers (Hugging Face): Used for working with transformer models.

from transformers import GPT2Tokenizer, GPT2Model

PyTorch: Used for building and training neural networks.

import torch

NumPy: Used for numerical operations.

import numpy as np

Keras (with TensorFlow backend): Used for building and training neural networks.

from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding, Dropout
from keras.optimizers import Adam

SimpleImputer (scikit-learn): Used for imputing missing values in data.

from sklearn.impute import SimpleImputer

Matplotlib: Used for data visualization (not explicitly shown in the provided code).

import matplotlib.pyplot as plt

Conv1D, MaxPooling1D, GRU (Keras layers): Used for building convolutional and recurrent neural network layers.

from tensorflow.keras.layers import Conv1D, MaxPooling1D, GRU

Deployment

Features

LSTM Model: The app utilizes a pre-trained LSTM model for predicting DJIA movements.
GPT-2 Embedding: News headlines are transformed using the GPT-2 model for better input representation.
User Interaction: Users can input 25 news headlines and select a date for prediction.
Technical Indicators: Additional features such as Stochastic K/D, Momentum, Rate of Change, etc., are fetched from Yahoo Finance.

Installation

Clone the repository

git clone https://github.com/Jeetanand/IntradayDJIAForecast/tree/main/djia_app/app.git

Install dependencies:

pip install -r requirements.txt

Run the app:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dataset		dataset
djia_app		djia_app
saved_model		saved_model
training		training
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

djia_app

djia_app

saved_model

saved_model

training

training

README.md

README.md

Repository files navigation

Forecasting-intraday-fluctuations-of-the-DJIA-by-leveraging-news-headlines-and-financial-indicators

Table of Contents

Introduction

Dataset

Requirements

Features

Installation

Deployment

Glimpse of the streamlit app

Features

Installation

About

Releases

Packages

Languages

Jeetanand/IntradayDJIAForecast

Folders and files

Latest commit

History

Repository files navigation

Forecasting-intraday-fluctuations-of-the-DJIA-by-leveraging-news-headlines-and-financial-indicators

Table of Contents

Introduction

Dataset

Requirements

Features

Installation

Deployment

Glimpse of the streamlit app

Features

Installation

About

Topics

Resources

Stars

Watchers

Forks

Languages