Implementing data sources is split into three parts:
- Creating the data source class (
MyCustomDataType.cs
) - Creating data downloader/processor (
process.*
) - Creating tests and a demonstration algorithm
- Fork this repository to your own GitHub profile
- Install .NET 5.0 SDK
- Open the
MyCustomDataType.cs
file for editing - Rename the class name
MyCustomDataType
to the data you'll be offering, starting with your vendor name (e.g.MyCompany
FlightData
) - Remove the
SomeCustomProperty
property - Add your dataset's fields/properties.
- Add
[ProtoMember(n)]
to each field/property you add, wheren
starts at10
and increments by1
per field/property added
- Implement
GetSource(...)
to point to where your data lives
- Replace
mycustomdatatype
with your vendor name (all lowercase), followed by the directory name where your data is in - Specify the file where your data is expected to be in
- Use the
date
variable to get the date of data being requested - Use
config.Symbol.Value
to get the current ticker. Make sure that the ticker capitalization is correct. Default is uppercase.
- Implement
Reader(...)
to parse your data
- Set
Symbol = config.Symbol
when creating the instance of the class - Set
EndTime
equal to the time the data first became available for consumption
- Implement
Clone()
to allow Lean to create copies of your data - If your dataset is NOT for equities data, Make
RequiresMapping()
returnfalse
, otherwise returntrue
- See the data sources related to equities section for more details
- Make
IsSparseData()
returntrue
- Make
DefaultResolution()
return the resolution of your data if the user does not specify a resolution - Make
SupportedResolutions()
return the resolutions that your data supports - Set the timezone that your data is saved as in
DataTimeZone()
- (Optional) Implement
ToString()
to return pretty output - Rename the file
MyCustomDataType.cs
to the name of the class contained within - Open the
QuantConnect.DataSource.csproj
file for editing - Add
<AssemblyName>QuantConnect.DataSource.{{dataSourceClassName}}</AssemblyName>
below<RootNamespace>QuantConnect.DataSource</RootNamespace>
- Replace
{{dataSourceClassName}}
with the name of the class you implemented
- Create one of the following files to download/process your data:
- Python:
process.py
- Bash:
process.sh
- Jupyter Notebook:
process.ipynb
- In
process.*
, output your processed/final data to:/temp-output-directory/alternative/{{vendorName}}/{{dataSourceName}}/
- Replace
{{vendorName}}
with your vendor name (e.g.quantconnect
) - Replace
{{dataSourceType}}
with the name of your data (e.g.corporate-flights
) - Path should be completely lowercase, unless absolutely required
- Do not use special characters in your output path (prefer
-
over_
in directories, and_
over-
for file names) - Output should be in CSV format (comma delimited)
- Example output directory:
/temp-output-directory/alternative/quantconnect/fred
- Example output file:
/temp-output-directory/alternative/quantconnect/fred/oecdrecd.csv
- If you are processing data that is associated with stocks/equities, review the data sources related to equities section
- Edit
Demonstration.cs
and create an example of how to load and use your data
- Rename the algorithm class name to the name of the class created in part 1
- The algorithm should be very simple and minimal
- Open the
tests/MyCustomDataTypeTests.cs
file for editing - Scroll to the bottom of the code and make
CreateNewInstance()
return your new data type
- Data can be fake data, it doesn't have to be real
- Set all fields/properties of your class when creating your new data type
- Ensure that tests are passing. Run the following commands in order to check for test status:
dotnet build tests/Tests.csproj
dotnet test tests/bin/Debug/net5.0/Tests.dll
- Rename
tests/MyCustomDataTypeTests.cs
to the name of the class you created in part 1, ending with "Tests.cs"
Your data source is related to equities whenever the following is true:
- The data source describes data about a specific equity Symbol, e.g. AAPL
- The data source is directly linked to the equity, i.e. if my data source describes data for AAPL, then this data only applies to the AAPL equity Symbol
For equity related data sources, update RequiresMapping()
to return true
in the data source class you created in part 1
(Note: ticker WW
is used for example purposes)
If your source/raw data is "point in time", then no further special handling is required. Example:
- Ticker name as of today (2021-06-24) is
WW
- Ticker
WTW
was renamed toWW
on 2019-04-19 - Data before 2019-04-19 has ticker
WTW
, notWW
Otherwise, you'll need to use QuantConnect data to get the ticker's previous name at a given point in time.
To do so, follow the steps below (Python/Jupyter Notebooks only):
- Import required classes:
from QuantConnect.Data.Auxiliary import *
from QuantConnect import *
- Create a MapFileResolver instance:
resolver = MapFileResolver.Create(Globals.DataFolder, Market.USA)
- For each ticker you encounter, resolve the map file, and provide the current time:
map_file = resolver.ResolveMapFile('WW', datetime.now())
- Get the ticker symbol for the date provided. Provide the time of the data you're processing that contains the ticker
data_time = datetime(2018, 1, 1)
ticker = map_file.GetMappedSymbol(data_time)
- (Optional) If you need a Symbol, you can create one:
first_date = map_file.FirstDate
symbol = Symbol(SecurityIdentifier.GenerateEquity(first_date, ticker, Market.USA), ticker)
symbol
should now representWTW