This project provides a Python-based solution for integrating with the Marketstack API to retrieve stock market data. It incorporates a caching mechanism using DuckDB to minimise API calls and improve performance.
The code is structured around two core classes: MarketstackClient
and DatabaseManager
. A Streamlit application provides a user-friendly interface to explore the retrieved data.
The project consists of:
MarketstackClient
: A class responsible for interacting with the Marketstack API, fetching data for various endpoints (End-of-Day, Intraday, Ticker Information, etc.). It also handles caching of API responses.DatabaseManager
: A class that manages a DuckDB database for caching API responses. It provides methods for storing, retrieving, and managing cached data.- Utility Functions: Helper functions for data conversion (e.g., converting API responses to Pandas dataframes) and visualisation (creating stock charts).
- Streamlit Application: A Streamlit app to visualise and interact with the data retrieved by the
MarketstackClient
. (Further refinement is planned).
The primary goals of this project are:
- Efficient Data Retrieval: To streamline the process of fetching stock market data from the Marketstack API.
- Reduced API Usage: To minimise the number of API calls by caching frequently accessed data. The Marketstack API has rate limits, so this is crucial for sustainable use.
- Improved Performance: To provide faster access to data by retrieving it from the cache when available.
- Data Exploration: To offer a user-friendly way to explore and visualise the retrieved data.
- Flexibility and Extensibility: To provide a modular design that can be easily extended to support additional Marketstack API endpoints and data analysis techniques.
-
API Key Management: The
MarketstackClient
securely loads the Marketstack API key from an environment variable (MARKETSTACK_API_KEY
) using thepython-dotenv
library. -
Data Fetching: The
MarketstackClient
provides methods for fetching data from various Marketstack API endpoints (e.g.,fetch_stock_data
,fetch_intraday_data
,fetch_ticker_info
). These methods construct API requests, handle responses, and manage caching. The_make_request
method centralises the API calling logic. -
Caching Mechanism:
- The
DatabaseManager
uses DuckDB, an in-process analytical database, to store API responses. - API calls are cached in the
api_calls
table. - Before making an API call, the
MarketstackClient
checks the cache for a matching response. If found, the cached data is returned. - API responses are stored in the cache along with the request parameters and timestamp.
- The
-
Data Conversion:
- The
convert_to_dataframe
function converts API responses to Pandas DataFrames for easier manipulation and analysis. - The
create_stock_chart
function uses Plotly to create interactive stock charts from the data.
- The
The Streamlit application is intended to provide a user-friendly interface for:
- Selecting Stock Symbols: Allowing users to specify the stock symbol to retrieve data for.
- Choosing API Endpoints: Providing options to select different Marketstack API endpoints (e.g., End-of-Day data, Intraday data, Ticker Information).
- Visualising Data: Displaying the retrieved data in tabular format (using Streamlit's
st.dataframe
) and as interactive charts (using Plotly). - Cache Management: Providing tools to view cache statistics, clear the cache, and remove old cache entries.
- Secure Key Handling: The key is obtained via the
python-dotenv
package and is not exposed to the front end.
marketstack.py
: Contains theMarketstackClient
,DatabaseManager
,convert_to_dataframe
, andcreate_stock_chart
functions.st_market_data.py
: (in progress) Contains the Streamlit application logic (UI elements, data fetching, and display).
The MarketstackClient
currently supports the following Marketstack API endpoints:
- End-of-Day Data (
/eod
): Retrieves end-of-day data for a specified stock symbol. Provides open, high, low, close, and volume information for a given date. Accessed viafetch_stock_data
. - Intraday Data (
/intraday
): Retrieves intraday data for a specified stock symbol. Provides price and volume information at smaller intervals within a trading day. Accessed viafetch_intraday_data
. - Ticker Information (
/tickers/{symbol}
): Retrieves general information about a specific stock ticker symbol (e.g., company name, exchange, sector). Accessed viafetch_ticker_info
. - Exchange Information (
/exchanges/{mic}
): Retrieves information about a specific stock exchange, identified by its Market Identifier Code (MIC). Accessed viafetch_exchange_info
. - Exchange Tickers (
/exchanges/{mic}/tickers
): Retrieves a list of tickers traded on a specific stock exchange. Accessed viafetch_exchange_tickers
. - Latest End-of-Day Data (
/eod/latest
): Retrieves the most recent end-of-day data for a given stock symbol. Accessed viafetch_latest_data
. - Tickers List (
/tickers
): Retrieves a list of all supported tickers. Provides a comprehensive catalog of available symbols. Accessed viafetch_tickers_list
.
-
Install Dependencies:
-
Production
uv add httpx duckdb pandas plotly python-dotenv loguru streamlit watchdog
-
Development
uv add --dev pytest black
-
-
Set up Environment Variables:
-
Create a
.env
file in the project directory. -
Add your Marketstack API key to the
.env
file:MARKETSTACK_API_KEY=YOUR_MARKETSTACK_API_KEY
-
Important: Note that the
.env
file is added to.gitignore
to exclude from the repository.
-
-
Streamlit app: Run the Streamlit application:
streamlit run src/st_market_data.py
See test/README.md.
- More Streamlit Features: Implement all the planned Streamlit functionalities (cache management, endpoint selection, etc.).
- Asynchronous API Calls: Convert the API calls to asynchronous operations using
asyncio
andhttpx
for improved performance. - Cache Expiration: Implement a mechanism to automatically expire old cache entries.
- Error Handling: Add more robust error handling and logging.
- Data Analysis Tools: Integrate more advanced data analysis techniques (e.g., moving averages, technical indicators).
- Unit Tests: Add unit tests to improve code quality and reliability.
- Comprehensive Documentation: Expand the documentation to cover all aspects of the project in detail.
This project is intended for educational and informational purposes only. It is not financial advice. Use at your own risk. Always consult with a qualified financial advisor before making any investment decisions.
While options like Redis or Memcached are commonly used for caching, and simple file-based caches are possible, DuckDB offers a compelling alternative in this context.
-
File-Based Caching (e.g., JSON or Pickle files): A basic approach involves saving API responses directly to files (e.g., as JSON or pickled Python objects). This is simple to implement, but it lacks efficient querying and analysis capabilities, requires manual cache invalidation, and can become slow with large datasets.
-
Shelve: The
shelve
module provides a dictionary-like interface backed by disk files. While offering persistence, it's generally slower than in-memory caches and less suitable for complex queries. -
Redis/Memcached: These in-memory data stores offer very fast caching but introduce an external dependency. They are ideal for high-performance caching in distributed systems but are overkill for a single-application scenario.
-
DuckDB: Offers a compelling alternative due to its zero-dependency, in-process nature, and ability to perform SQL-based analytical queries directly on the cached data. This makes it particularly well-suited for scenarios where you need both efficient caching and data analysis without the overhead of a separate caching server. It provides the persistence of file-based caches with the query capabilities of a database.
DuckDB provides a balance between simplicity, performance, and analytical capabilities, making it a suitable (though not unique) choice for this application.