Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,15 @@ charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf

[*.yml]
indent_style = space
indent_size = 2

[*.js]
indent_style = space
indent_size = 2

[*.json]
indent_style = space
indent_size = 2
78 changes: 78 additions & 0 deletions .github/workflows/on-pullrequest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# SPDX-License-Identifier: Apache-2.0
# Licensed to the Ed-Fi Alliance under one or more agreements.
# The Ed-Fi Alliance licenses this file to you under the Apache License, Version 2.0.
# See the LICENSE and NOTICES files in the project root for more information.

name: On Pull Request and Push to Main
on:
push:
branches:
- main
pull_request:
branches:
- main

workflow_dispatch:

permissions: read-all

jobs:
edfi-repo-scan:
name: Scan GitHub Actions and BIDI attacks
uses: ed-fi-alliance-oss/ed-fi-actions/.github/workflows/repository-scanner.yml@main

dependency-review:
name: Scan repo dependencies for security issues
runs-on: ubuntu-latest
# Dependency review needs to compare with another branch, so it should only
# run on PR. Keeping it as a separate workflow because there is no way to
# specify which path, and thus does not make sense inside of a
# package-specific workflow.
if: github.event_name == 'pull_request'
steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

- name: Dependency Review ("Dependabot on PR")
uses: actions/dependency-review-action@3c4e3dcb1aa7874d2c16be7d79418e9b7efd6261 # v4.8.2

test-and-security:
name: Test and Security Checks
runs-on: ubuntu-latest
permissions:
security-events: write
steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

- name: Install Poetry
working-directory: ./python
run: pipx install poetry

- name: Setup Python
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: '3.11'
cache: 'poetry'

- name: Install dependencies
working-directory: ./python
run: poetry install

- name: Run Tests
working-directory: ./python
run: poetry run pytest

- name: Run Linters
working-directory: ./python
run: poetry run flake8 .

- name: Initialize CodeQL
uses: github/codeql-action/init@5fe9434cd24fe243e33e7f3305f8a5b519b70280 # v4.3.11
with:
languages: python

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@5fe9434cd24fe243e33e7f3305f8a5b519b70280 # v4.3.11


68 changes: 68 additions & 0 deletions .github/workflows/scorecard.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Originally sourced from GitHub with implicit lack of license

name: Scorecard supply-chain security
on:
# To guarantee Maintained check is occasionally updated. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
schedule:
- cron: '15 23 * * 0'
push:
branches: [ "main" ]
workflow_dispatch:

# Declare default permissions as read only.
permissions: read-all

jobs:
analysis:
name: Scorecard analysis
runs-on: ubuntu-latest
permissions:
# Needed to upload the results to code-scanning dashboard.
security-events: write
# Needed to publish results and get a badge (see publish_results below).
id-token: write
# Uncomment the permissions below if installing in a private repository.
# contents: read
# actions: read

steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false

- name: Run analysis
uses: ossf/scorecard-action@05b42c624433fc40578a4040d5cf5e36ddca8cde # v2.4.2
with:
results_file: scorecard.sarif
results_format: sarif
# (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
# - you want to enable the Branch-Protection check on a *public* repository, or
# - you are installing Scorecard on a *private* repository
# To create the PAT, follow the steps in https://github.com/ossf/scorecard-action#authentication-with-pat.
#repo_token: ${{ secrets.SCORECARD_TOKEN }}

# Public repositories:
# - Publish results to OpenSSF REST API for easy access by consumers
# - Allows the repository to include the Scorecard badge.
# - See https://github.com/ossf/scorecard-action#publishing-results.
# For private repositories:
# - `publish_results` will always be set to `false`, regardless
# of the value entered here.
publish_results: true

# Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
# format to the repository Actions tab.
- name: Upload artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8 # v4.3.0
with:
name: Scorecard SARIF file
path: scorecard.sarif
retention-days: 5

# Upload the results to GitHub's code scanning dashboard.
- name: Upload to code-scanning
uses: github/codeql-action/upload-sarif@48ab28a6f5dbc2a99bf1e0131198dd8f1df78169 # v3.28.0
with:
sarif_file: scorecard.sarif
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# json-validator

Coming soon!
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/Ed-Fi-Exchange-OSS/json-validator/badge)](https://securityscorecards.dev/viewer/?uri=github.com/Ed-Fi-Exchange-OSS/json-validator)

This repository contains sample code for validating JSON payloads using an OpenAPI specification file as the schema definition. This sample code can help satisfy use cases including:

- Confirming that JSON data are valid for submission to an Ed-Fi Resource API (i.e. the Ed-Fi ODS/API) before issuing an HTTP `POST` or `PUT` request.
- Validating that JSON files stored in a data lake conform to the Ed-Fi Data Standard, as represented by the Ed-Fi API Specification.

## Legal Information

Expand Down
4 changes: 4 additions & 0 deletions python/.flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
max-line-length = 110
extend-ignore = E203, W503, E501
exclude = .git, __pycache__, .venv
72 changes: 72 additions & 0 deletions python/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Ed-Fi JSON Validator

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Testing
.coverage
.pytest_cache/
cover/
htmlcov/
.tox/
.nox/
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/

# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDEs
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Logs
*.log

# Temporary files
*.tmp
*.temp
84 changes: 84 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Ed-Fi JSON Validator

A Python script that validates JSON files in a data lake against Ed-Fi OpenAPI specifications.

## Overview

This tool validates JSON files stored in a data lake structure against their corresponding schema definitions in an Ed-Fi OpenAPI specification. It maps directory structures to schema names and performs JSON Schema validation. The OpenAPI specification file can be defined as JSON or YAML, and it can be provided as a local file or as an HTTP URL.

## Installation

```bash
poetry install
```

## Usage

### Command Line

```bash
poetry run python json_validator --data-lake-root /path/to/data/lake --openapi-spec /path/to/openapi.json
```

### Python API

```python
from json_validator.validator import DataLakeValidator

validator = DataLakeValidator(
data_lake_root="/path/to/data/lake",
openapi_spec_path="/path/to/openapi.json"
)

results = validator.validate_all()
```

See [USAGE.md](./USAGE.md) for more information.

## Data Lake Structure

The validator expects a data lake file system like the following:

```text
root/
├── ed-fi/
│ ├── academicWeeks/
│ │ ├── academicWeek-1.json
│ │ └── academicWeek-2.json
│ └── students/
│ ├── student-1.json
│ └── student-2.json
└── extension/
└── extension-entity/
└── extension-entity-1.json
```

* Files in `ed-fi/academicWeeks/` → `edFi_academicWeek` schema
* Files in `ed-fi/students/` → `edFi_student` schema
* Files in `tpdm/candidates/` → `tpdm_candidate` schema

The file names do not need to match any specific format other than ending in `.json`.

## Features

* Validates all JSON files in data lake against OpenAPI schemas
* Supports both local files and remote OpenAPI specifications
* Detailed validation error reporting
* Configurable logging levels
* Performance metrics and summary reporting

## Testing

This repository contains a [simulated data lake file system](../simulated-lake) with sample JSON files for demonstration purposes.

Example 1: running the validation using a `swagger.json` file from a running instance of the Ed-Fi ODS/API, using Data Standard 5.2 with the TPDM extension.

```shell
poetry run python json_validator --data-lake-root ../simulated-lake --openapi-spec https://api.ed-fi.org/v7.2/api/metadata/data/v3/resources/swagger.json
```

Example 2: running with a `.yaml` file in this repository, using Data Standard 6. Note that the TPDM extension no longer exists, and the Assessment data model has a breaking change compared to Data Standard 5.2. Therefore all files have a failure when compared to Data Standard 6.

```shell
poetry run python json_validator --data-lake-root ../simulated-lake --openapi-spec ../../Schemas/OpenAPI/Ed-Fi-Resource-API-Specification.yaml
```
Loading