-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy path02_data_exploration.qmd
More file actions
203 lines (124 loc) · 13.2 KB
/
02_data_exploration.qmd
File metadata and controls
203 lines (124 loc) · 13.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# Explore data retrieved from SEC
In this section, we delve into the structure of the SEC data obtained from the API. The data follows a structured format in compliance with SEC reporting standards, utilizing XBRL (eXtensible Business Reporting Language). XBRL is a standardized language designed to enhance the accuracy and reliability of financial reporting.
## Understanding XBRL
**XBRL (eXtensible Business Reporting Language):** XBRL is a structured, machine-readable format ensuring consistency and comparability across companies and filings. As specialized language for business and financial data, XBRL enables standardized communication and presentation of financial information. It uses XML (eXtensible Markup Language) to tag financial data with labels that provide context and meaning. This tagging allows for easier analysis, comparability, and extraction of data by both humans and machines.
### Why XBRL?
1. **Consistency:** XBRL ensures consistency in financial reporting by providing a standardized set of tags for financial concepts, making it easier to compare data across different companies and periods.
2. **Efficiency:** The use of XBRL streamlines the process of collecting, analyzing, and disseminating financial information, reducing the need for manual data entry and interpretation.
3. **Transparency:** XBRL enhances the transparency of financial data, as each financial concept is tagged with a specific label, providing clear identification and context.
### Key Elements of XBRL in SEC Data
Within the SEC data retrieved through the API, XBRL is utilized to structure financial information.
A basic understanding XBRL[^02_data_exploration-1] is necessary for navigating and interpreting the structured financial data retrieved from the SEC. It ensures a standardized approach to financial reporting, facilitating meaningful analysis and comparison.
[^02_data_exploration-1]: The definitions provided on this website are crafted for non-practitioners and may lack the rigor and precision required by experts. For a more thorough understanding of XBRL financial reporting, we recommend referring to [XBRL-based structured digital financial reporting](http://xbrl.squarespace.com "XBRL-based structured digital financial reporting") where you can find a comprehensive and rigorous explanation. To delve deeper into the subject, we suggest reading the following authoritative texts: [Essentials of XBRL-based Digital Financial Reporting](http://xbrlsite.azurewebsites.net/2021/essentials/EssentialsOfXBRLBasedDigitalFinancialReporting.pdf) and [Mastering XBRL-based Digital Financial Reporting](http://xbrl.squarespace.com/mastering-xbrl/) by C. Hoffman.
First, we load the required libraries and files.
```{r Functions_source, message=FALSE, warning=FALSE}
source("setup.qmd")
```
Now, let's retrieve the necessary objects for data exploration:
```{r load_object1, eval=TRUE, warning=FALSE, message=FALSE}
# Load company_data
company_data <- readRDS("company_data.RDS")
cik <- readRDS("cik.RDS")
```
The data retrieved from the SEC API is structured financial data compliant with SEC reporting standards.
To navigate through this structure for analysis, consider the following key elements:
- **`cik`**: The Central Index Key (CIK) is a unique identifier assigned to each company filing reports with the U.S. Securities and Exchange Commission (SEC).
- **`entityName`**: This is the name of the company, in our case "JAKKS Pacific, Inc."
- **`facts`**: The main container for financial data, housing two critical sections:
- **`dei`**: Contains Document and Entity Information (DEI), providing basic company details.
- **`us-gaap`**: Holds financial data following the U.S. Generally Accepted Accounting Principles (GAAP).
Within these sections, financial data is organized as a list of items, each with specific attributes:
- **Concept**: Represents a financial measure or item (e.g., "Assets") with an associated label and data type.
- **Facts**: Actual numerical values associated with the concept, with attributes like "value," "unitRef," and "contextRef."
- **Attributes**: Additional attributes providing context for the data, such as reporting period, currency unit, or data precision.
Effectively analyzing this data involves selecting relevant concepts and `Fact`, potentially transforming or pivoting the data for further processing and visualization.
### Understanding the dataset
Let's start by exploring the structure of **`company_data`**:
```{r str_company_data, eval=TRUE, message=FALSE, warning=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_data
str(company_data, max.level = 1)
```
The output reveals that **`company_data`** structure comprises three lists with nested lists, such as **`company_Metadata`** with 22 lists.
::: {.Note .custom-note}
| |
|:-----------------------------------------------------------------------|
| **Note** |
| In R, a **list** is a powerful data structure that can hold elements of **different data types**. This flexibility allows each element to be unique and cater to specific data requirements. For the `company_data` object, the list structure plays a crucial role in organizing various pieces of information, including financial data, company descriptions, and filing details. This heterogeneous nature of the data necessitates a flexible data structure like a list to accommodate these diverse data types. Understanding the **organization and structure** of the `company_data` list is essential for effective navigation and extraction of specific information. By grasping the relationships between the list elements, users can efficiently retrieve the desired data elements for analysis and interpretation. |
:::
To access the lists we use the symbol `$` in after the object e.g. `company_data$company_Metadata`.
Next, we split the `company_data` into separate lists: **`company_Metadata`**, **`company_Facts`**, **`company_Concept`**.
```{r split_company_data, eval=TRUE, warning=FALSE, message=FALSE}
# Split the lists in company_data
company_Metadata <- company_data$company_Metadata
company_Facts <- company_data$company_Facts
company_Concept <- company_data$company_Concept
```
Now, let's examine the structures and content of these lists:
#### Company Metadata
Let's start with `company_Metadata` which includes 22 elements: characters, integer, sub-list (or nested list), etc.
```{r view_company_Metadata, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Metadata
str(company_Metadata, max.level = 1)
```
For our purpose, the most relevant information of `company_Metadata` are:
1. `cik`, as described above.
2. `sic`, standard industrial classification. The SIC codes were used to classify companies into specific industry segments based on their primary business activities. Each four-digit SIC code represented a different industry or sector.
3. `sicDescription` the description of the standard industrial classification
4. `name`, name of the company
5. `tickers`, identifier a publicly traded company's stock on a particular stock market
6. `filing` which includes the filing attributes.
Here the structure of `company_Metadata`:
```{r str_company_Metadata_Filing, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Metadata
str(company_Metadata$filing, max.level = 2)
```
The format of the dataset as printed is not very useful. We see that there are useful information on the forms (e.g. 10K) and dates (e.g. filing dates).
For now we will keep it as is and we will come back later on how to improve the readability
#### Company Facts
`company_Facts` is relatively simple and is a list of 3 elements.
```{r view_company_Facts, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Facts
str(company_Facts, max.level = 1)
```
For our purpose, the last element `company_Facts$facts` is the most relevant one.
```{r view_company_Facts2, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Facts
str(company_Facts$facts, max.level = 1)
```
The **`us-gaap`** list includes relevant Facts, containing 487 nested elements. Let's examine the first five.
```{r str_company_Facts_us_gaap_head, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Facts
Facts_us_gaap <- str(company_Facts$facts$`us-gaap`[1:5], max.level = 1)
```
These elements, to which we will refer as `us_gaap_reference`, include essential fundamentals of the company and are themselves nested lists. Let's explore the structure of the first one.
```{r str_company_Facts_us_gaap, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Facts
str(company_Facts$facts$`us-gaap`[1], max.level = 4)
```
The output presents the structure of the information related to the first element: `AccountsPayable`
- **AccountsPayable**: Represents a financial concept within the **`us-gaap`** section, specifically referring to "Accounts Payable (Deprecated 2009-01-31)."
- **`label`**: This column represents the financial concept or measure, such as "Accounts Payable (Deprecated 2009-01-31)."
- **`description`**: A detailed explanation of the concept, providing insight into the carrying value as of the balance sheet date.
- **`units`**: A data frame with 2 observations and 8 variables. The variables include information such as:
- `end`: Indicates the end date of the reporting period associated with the financial data.
- `val`: Represents the numerical value associated with the financial concept. It includes the unit of measure (e.g., million - "M").
- `accn`: Stands for accession number, a unique identifier assigned by the SEC to each submission.
- `fy`: Represents the fiscal year associated with the financial data
- `fp`: Represents the fiscal period (e.g., Q3 for the third quarter, FY for the full fiscal year).
- `form`: Indicates the type of form filed with the SEC (e.g., 10-Q for quarterly reports, 10-K for annual reports).
- `filed`: Represents the date on which the form was filed with the SEC.
- `frame` Provides information about the frame, and in this case, it appears to be a combination of calendar year, fiscal quarter, and an additional identifier. For instance "CY2008Q3I" suggests a possible combination of calendar year (CY), year (2008), fiscal quarter (Q3), and possibly an additional identifier ("I")
This hierarchical structure provides a detailed view of the financial concept "Accounts Payable (Deprecated 2009-01-31)" within the **`us-gaap`** section, including its label, description, and historical data with unit details.
::: {.Note_List2 .custom-note}
------------------------------------------------------------------------
*Most of the SEC data required for fundamentals analysis is included in a structure of nested lists. We will see in the next section how to properly retrieve this data.*
------------------------------------------------------------------------
:::
#### Company Concept
Finally, let's examine the structure associated with the company's Concept.
```{r str_company_Concept, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Concepts
str(company_Concept, max.level = 3)
```
The output shows the structure associated with the `Asset` of the company under the taxonomy of `us-gaap`.
For the purpose of our analysis, we will use `company_Facts` which includes also the `us_gaap_reference`.