-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardised Financial Data #103
Comments
Hey there, the goal of this tool is certainly not to standardize financial data. This is basically the goal of the XBRL Standard itself. How well the data is standardized solely depends on the financial regulators and the creator of the XBRL document. I guess your question is probably: "How can I use this tool to collect and compare data from different companies". With py-xbrl you can basically extract any information that is tagged in an XBRL or iXBRL document. If you are not familiar with XBRL, maybe have a look at this iXBRL viewer. All values that are "clickable" are tagged with XBRL and can be read in with py-xbrl i.e.: The following code extracts "Earning per share" from apple and Microsoft. import logging
from xbrl.cache import HttpCache
from xbrl.instance import XbrlParser, XbrlInstance
cache: HttpCache = HttpCache('./cache')
xbrlParser = XbrlParser(cache)
subs = {
"AAPL": "https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm",
"MSFT": "https://www.sec.gov/Archives/edgar/data/789019/000156459022035087/msft-10q_20220930.htm"
}
for ticker in subs.keys():
inst: XbrlInstance = xbrlParser.parse_instance(subs[ticker])
for fact in inst.facts:
if fact.concept.name == 'EarningsPerShareBasic':
print(f"On {fact.context.end_date} {ticker} had an EPS of {fact.value}") output:
With py-xbrl you can extract thousands of different facts from thousand of companies directly from the source (the actual financial report from the company) instead of going through an API. |
Pretty damn cool, what would be the difference between what you are doing and what Ties de kok did with https://github.com/TiesdeKok/fast_xbrl_parser It seems that you are parsing the htm file https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm And that he is parsing the xml file: https://www.sec.gov/Archives/edgar/data/1652044/000165204423000016/goog-20221231_def.xml Do you know if these datasets are meant to contain the same information (facts/concepts). I wonder what would be the advantage, disadvantage of using one over the other. |
Hi All,
I think that in this context it would be good to point out the difference
between two commonly used forms of XBRL.
* the XBRL V 2.1 standard (
https://specifications.xbrl.org/xbrl-essentials.html) and
* the inline XBRL standard.
The former is a "pure xml based standard" where both the instance
"document" (the container for the facts) and the taxonomy are represented
in XML. The latter --a newer standard" represents the instance "document"
in the XHTML format while the support taxonomy is still represented in the
XML format.
The reason for this was to support a more "styled" representation of the
facts document to make it human-readable and machine readable at the same
time.
Note that the required submission format across the globe is now pretty
much inline XBRL (hence the APPLE filing in the example shown). The second
example shown is part of the taxonomy (the definition linkbase) which will
always be in the XML format. The filings in Europe have always been in the
inline XBRL format while in the US the first 15 years or so were fully in
the XBRL XML format but have now switched to the inline XBRL standard. Also
many of the new regulatory filing requirements (such as the ESG filings in
the EU and (soon) USA will be done in inline XBRL). Remember that the
inline XBRL standard is based on the XBRL V2.1 standard so any efforts
spent on XBRL code and process development will be used for inline XBRL
processing.
To process a filing/submission the processor needs to be able to
read/process the inline XHTML instance document to extract the facts and
fact-metadata from the file and the associated XML taxonomy files.
Hope this clarifies things a little.
Please ping me if I should clarify more.
Cordially
Raynier van Egmond (https://www.linkedin.com/in/rayniervanegmond/)
***@***.***
…On Tue, Feb 7, 2023 at 5:51 AM Derek Snow ***@***.***> wrote:
Pretty damn cool, what would be the difference between what you are doing
and what Ties de kok did with
https://github.com/TiesdeKok/fast_xbrl_parser
It seems that you are parsing the htm file
https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm
And that he is parsing the xml file:
https://www.sec.gov/Archives/edgar/data/1652044/000165204423000016/goog-20221231_def.xml
Do you know if these datasets are meant to contain the same information
(facts/concepts). I wonder what would be the advantage, disadvantage of
using one over the other.
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKYVRLWQWCYTVUKQQDWSHCTWWJHNNANCNFSM6AAAAAAUAQFU3U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@rayniervanegmond Thank you for the great explanation! I can only agree entirely with what @rayniervanegmond said! It is true that the SEC also provides XBRL files for iXBRL submissions. However these are converted from the original iXBRL filings, this is a service the SEC provides for compatibility reasons. But I would always prefer to parse iXBRL since it has several benefits. Regarding your second question (@firmai ): |
Hi All,
As you may read from my linkedin profiles I have a fair amount of
experience with developing commercial XBRL reporting solutions. My latest
position (as an AI/ML engineer in the FinTech industry) the client had done
a very extensive evaluation of the Arelle processor. As observed it had a
lot of functionalities... The Arelle processor is bloated with
functionality and IMO impossible to adapt to one's own requirements.
The outcome was however that the processor was very slow and consumed vast
amounts of memory; basically making it unusable for the system to be
developed. The solution required a very fast turnaround on any filing
submitted to the SEC. Conclusion: the public domain version of the Arelle
processor will not work for commercial server-side solutions.
The client asked me to help them implement their own XBRL processor which
we did. It runs filings and normalizes them onto a single consolidated
taxonomy very fast. Because the code does one thing "read, process and
validate XBRL" and none of the ancillary XBRL specification stuff. It is
easily maintainable and adaptable to new use-cases.
My advice is to go with a "clean-single-purpose" processor and build from
there.
Again - take care - René
https://www.linkedin.com/in/rayniervanegmond/
…On Tue, Feb 7, 2023 at 2:01 PM Manuel Schmidt ***@***.***> wrote:
@rayniervanegmond <https://github.com/rayniervanegmond> Thank you for the
great explanation! I can only agree entirely with what @rayniervanegmond
<https://github.com/rayniervanegmond> said!
It is true that the SEC also provides XBRL files for iXBRL submissions.
However these are converted from the original iXBRL filings, this is a
service the SEC provides for compatibility reasons.
[image: image]
<https://user-images.githubusercontent.com/29599104/217375044-ab2bf65d-38b2-48d0-b315-1fd49e9c50c9.png>
But I would always prefer to parse iXBRL since it has several benefits.
Regarding your second question ***@***.*** <https://github.com/firmai> ):
TBH, I did not try the "fast_xbrl_parser" from "TiesdeKok". Seems like it
is coded in RUST while 'py-xbrl' is purely python based.
Another great open-source library for parsing XBRL is Arelle
<https://github.com/Arelle/Arelle>. It offers many functionalities, way
more than 'py-xbrl'. However, this vast range of functionalities also
increases complexity. The goal of 'py-xbrl' was always to parse filings and
get all of the data as easily as possible, never XBRL validation which is
also a huge part of a proper XBRL processor.
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKYVRLT3QZHXX4RUO4OO453WWLA35ANCNFSM6AAAAAAUAQFU3U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
How can this be used to develop standardised financial data, the tool looks promising but I am struggling to find good example, thanks so much for your work :)
The text was updated successfully, but these errors were encountered: