Skip to content

Commit

Permalink
[DOC-151] Import documentation from Confluence (#361)
Browse files Browse the repository at this point in the history
  • Loading branch information
stephenfuqua authored Dec 12, 2024
1 parent f9b7a5a commit 1bf503c
Show file tree
Hide file tree
Showing 68 changed files with 3,097 additions and 6 deletions.
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,12 @@
Project Meadowlark is a research and development effort to explore potential for use of new technologies, including managed
cloud services, for starting up a "cloud native" Ed-Fi compatible API.


- [Milestone 0.3.0](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/releases/tag/v0.3.0) has been released with Docker and
* [Milestone 0.3.0](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/releases/tag/v0.3.0) has been released with Docker and
real OAuth2 support.

- [Milestone 0.4.0](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/releases/tag/v0.4.0) includes full PostgreSQL support,
* [Milestone 0.4.0](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/releases/tag/v0.4.0) includes full PostgreSQL support,
load balancer support with NGINX, instructions to use Kafka and performance evaluation.

See [Project Meadowlark - Exploring Next Generation Technologies](https://techdocs.ed-fi.org/x/RwJqBw) in Tech Docs for more
👀 See [Vision](./docs/VISION.md) in Tech Docs for more
information on the background and design decisions for this project.

## Getting Started
Expand Down
2 changes: 1 addition & 1 deletion docker/kafka/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ COPY --chown=gradle:gradle /ed-fi-kafka-connect-transforms /home/gradle/src
WORKDIR /home/gradle/src
RUN gradle installDist --no-daemon

FROM debezium/connect:2.3@sha256:dfa59c008a03f45c7b286d2874f2e6dbe04f3db6f26b6f01806c136abb07381a
FROM debezium/connect:2.7.0-Final@sha256:a69c0bf30a269a0c53a98d9caf61a45f74a7bab18ebac6081a53af64ceba78b4
LABEL maintainer="Ed-Fi Alliance, LLC and Contributors <[email protected]>"

ARG package=opensearch-connector-for-apache-kafka-3.1.0.tar
Expand Down
282 changes: 282 additions & 0 deletions docs/ARCHITECTURE.md

Large diffs are not rendered by default.

48 changes: 48 additions & 0 deletions docs/FINDINGS-AND-QUESTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Other Findings and Questions

The development of the Meadowlark proof-of-concept organically raised questions about alternative ODS API features or patterns that might support the Ed-Fi ecosystem equally well. This document discusses a few of these.

## Authorization

The ODS API's main authorization pattern is based on establishing relationships from resources to education organizations – subclasses of EducationOrganization, or EdOrg for short. API clients are assigned one or more EdOrgs and a strategy that specifies CRUD permissions over API classes for which specific resources can be traced to one of these EdOrgs.

This strategy is powerful and logical but also complex to implement. On the implementation side, each new authorization scheme needs to be driven by relational database views that materialize how each API resource can be traced to an EdOrg. Such views are custom code.

This strategy has also created complexity for API clients. As noted above, the relationships that drive authorizations are opaque and not easily presented to an API client. This strategy also results in strange interaction scenarios, such as the fact that a client cannot read a Student or Parent resource the client just wrote (because it has no relation to an EdOrg yet).

As noted above, this is not to say that the ODS API approach is wrong, but only that for some cases the complexity may not be justified. For example, in the case of a SIS client providing data to an API where the scope is a single LEA, these permissions probably suffice:

* *For this particular API instance, your client has the ability to Create API resources for any of the following API classes:* *(list classes here)*
* *For any resource you write, your client can also Read, Update or Delete that same resource.*

Implementing these rules is considerably simpler and demands no customized SQL or other materialized means to connect each resource to an EdOrg.

Clearly, in the context in which data is being read out of the API the ODS EdOrg authorization pattern becomes potentially much more useful.  But in many cases of data out – particularly early one – the scope of that authorization in field work still tends to be "all district data across these API resources for school year X"

In summary, the ODS API pattern of using EdOrg relationships to drive authorization is powerful and worth preserving, but the Meadowlark project suggests that a set of simpler patterns might eliminate complexity from many early field projects. As a implementation advances in complexity, an API host may choose to enable more powerful and complex designs.

## Validation Flexibility

The ODS API use of a relational database system for storage reduces the ability of the API to adapt to disparate validation needs. This can also be seen as a strength: the ODS API generally won't accept data that has met a fairly high benchmark for quality, and this has pushed data quality back to the source systems and responsibility for data quality back to vendors.

Meadowlark's architecture opens up new possibilities – simple to implement – for  more tunable validation. Using a document store means the product can annotate unvalidated documents for deferred validation, or provide annotations on "how validated" the document is, e.g. support Level 2-style validation as an add-on. 

Of course, at issue here is understanding when (if ever) it is appropriate to lower data validation requirements for dating coming in via API. 

## Native Storage to Support Eventing

The ability to retain a JSON document opens many possibilities for downstream processing and eventing. As a document posted to the API represents a "one logical event" in the operations of an school district (e.g., "student X was marked absent on day Y"), the pre-packaging of that data opens up the possibility for other data consumers to consume it as a documents (e.g., the document could be posted to a log of attendance events to which other systems subscribe). Meadowlark itself uses this mechanism to index the documents in a search engine for query support.

The relational format of the ODS data storage delivers other benefits, such as the ability to perform complex validations based on SQL, so it is is not a case of one storage format is better than the other, but that there are use case benefits to each. Indeed, there are also certainly ways where both technologies could be mixed.

## Analytics Modules

The Meadowlark team experimented with downstream analytics processing using the above eventing mechanism. API documents were made accessible to AWS Athena, which allows for interactive queries with large-scale data sets. The team made simple visualizations from the API data in Athena with AWS QuickSight, the cloud-native BI tool.

In addition to QuickSight, tools like Power BI Desktop also include support for creating reports and dashboards driven by Athena. It would be interesting to create real use-case driven analytics modules that work with a Meadowlark framework designed for community extensibility.

## Reuse of Meadowlark Technology

Meadowlark makes use of MetaEd to generate API document schema validations and to locate natural key and foreign key references in API documents. Some of this is done in a "pre-processing" step that mirrors the behavior of a MetaEd plugin, while the rest is done at API invocation time. This could be moved entirely into MetaEd plugins that generate standard JSON Schema and JSONPath API data from a MetaEd model. This information could be used by the ODS/API platform, for example, to support its own schema validation.

This could also be part of a broader modularization of Meadowlark to enable extensions of Meadowlark created by the Ed-Fi community. With a clean separation of Meadowlark document validation and reference extraction from a web framework, alternatives like Azure Functions or even simple on-premise web application servers become possible. Similarly, separation of Meadowlark's back-end storage, querying and reference validation could allow for community-contributed alternatives like Azure Cosmos DB or local MongoDB instances.
98 changes: 98 additions & 0 deletions docs/PARITY-GAPS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Meadowlark and API Parity

## What is "API Parity"?

Meadowlark is designed to be implemented by the platform host and not cause breaking changes on the API client side:  to substitute the Ed-Fi API provided by Meadowlark with the API provided by the Ed-Fi ODS/API and have API clients continue to function (and not realize) that they were communicating with a different API. We refer to this as "API parity."

API parity for the project is defined in terms of the [Meadowlark Use Cases](./use-cases.md); that is, if a feature was not critical to satisfying one of these use case, it was generally left out. For example, extensibility, eTags ,and change queries are unquestionably useful for some API clients, but the belief is that the core Meadowlark use cases do not generally depend on these features, or that those features – if used – are nice-to-haves.

Such a calculus is imperfect:  there is always the possibility that some API client relies on a particular feature.

Broadly speaking, the proof-of-concept achieves API parity according to the definition above, but with some gaps. This document provides a list of the known gaps to API parity.

## Will these gaps be closed?

Some may, but it is unlikely that all such gaps will be closed.  Ed-Fi is an both an effort to build open source data infrastructure AND an effort to provide blueprints for standardize data flows. In respect of the latter goal of standardization, it is highly useful to compare API differences across API implementations: these are opportunities to understand better what needs to be standard and what does not.

Rather than try to close all these gaps, the goal should be to clearly define what API features are required and which should be allowed to vary. Doing so will allow for the development of alternative API implementations, whether through the open-source effort of the Ed-Fi community or through efforts independent outside of that community work.

## List of API Parity Gaps

### No extension support

Meadowlark does not support API extensibility.

Given that the Meadowlark use cases focus on LEA data sourcing where extensibility should not be needed, this features is unlikely to be prioritized.

Note however that the Alliance has looked to extensibility as a means to evolve the API interface, as in the case of the release of an early access, revised Finance API (see [ED-FI RFC 18 - FINANCE API](https://edfi.atlassian.net/wiki/spaces/EFDSRFC/pages/25363138/ED-FI+RFC+18+-+FINANCE+API)). If this pattern becomes standard practice, there will be more of an argument for the utility of such support.

### Support for "link" objects in JSON

In the ODS/API, the JSON is annotated by "link" elements that show the path to the element using a GET by the resource ID. These elements appear like this:

```json
"gradingPeriodReference": {
"gradingPeriodDescriptor": "uri://ed-fi.org/GradingPeriodDescriptor#First Six Weeks",
"periodSequence": 1,
"schoolId": 255901001,
"schoolYear": 2022,
"link": {
"rel": "GradingPeriod",
"href": "/ed-fi/gradingPeriods/0d4a8d72801240fd805ee118b2641b0f"
}
},
```

These elements do not appear in the GET elements provided by Meadowlark.

It is unlikely that these will be supported, and in general the direction is to continue to omit these from Ed-Fi API specifications.

* The utility of these elements is doubtful: they seem to be an implementation feature/decision made by the ODS/API project and do not seem to be in wide use. The intention seems to be to deliver a HATEOS-type information to clients, but that model of interaction has generally not emerged as best practice in REST APIs.
* Since Meadowlark takes a document-centric approach to collection and data management, annotating the documents would create additional complexity for any APIs of this kind; without compelling value for this feature, it was judged to be better to simply omit the feature.

### Support for "discriminator" fields on abstract class EducationOrganization

The ODS API provides for discriminators that inform the API client what specific subclass of a abstract class is being referenced. This is done via a "link" object that includes a "rel" field that indicates the class of the referent object. See below for an example of this on the /course API resource.

```json
{
"id": "16904b88d3c144b4a43af2924f4c4590",
"educationOrganizationReference": {
"educationOrganizationId": 255901001,
"link": {
"rel": "School",
"href": "/ed-fi/schools/c81a158d7caf49f299ff3c22b503b334"
}
},
"courseCode": "03100500",
"courseDefinedByDescriptor": "uri://ed-fi.org/CourseDefinedByDescriptor#SEA",
"courseDescription": "Algebra I",
...
}
```

This feature was added to the ODS API in the interest of simplifying data usage for outbound/pulling API clients, especially for cases in which there is a high priority on API simplicity, as for the roster/enrollment API.

However, those use cases are not the focus of the initial Meadowlark scope, so it is unclear if this should be addressed. We will likely await further feedback, and if this emerges as a need, possibly look at other implementation options for solving the same problem (e.g., might it be better to ask a client to maintain a cache of EdOrgs, and possibly add support that allows them to do that more easily?). To insert the capability to annotate JSON documents would add complexity that is not clearly justified.

### Full authentication support

Meadowlark's current authentication is hard coded to two key/secret pairs and hard-coded claims.

If the project development continues, this would be a candidate for further development. However, as this authentication pattern is well-known, it is not seen as an element of the proof-of-concept that there is high value in exploring. Therefore, this is likely to be a lower priority.

### Over-posting: posting fields not part of the JSON schema

The Ed-Fi ODS API allows for extraneous fields to be posted without error; such fields are simply ignored. In Meadowlark, these are schema violations and a 4xx error is returned.

Allowing over-posting is generally a bad practice, as it often indicates the API client is not following the schema and can lead to hard to detect errors. However, over-posting can be employed as a simple API client strategy to support multiple versions of an API with less complexity.

This is likely not to be prioritized, given that this permissiveness has both pros and cons and which is more important is unclear.

> [!WARNING]
>
> To test out Meadowlark on your own:
> 1. Make sure that you have an AWS subscription and a user account with permissions to create resources.
> 2. Must have [Node.js 1](https://nodejs.org/)6 installed locally to manage the deployment.
> 3. Clone the [source code repository](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/) using Git.
> 4. Follow the [install instructions](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/tree/main/docs).
36 changes: 36 additions & 0 deletions docs/PROVIDER-PARITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Meadowlark Provider Parity Analysis

As mentioned in the [Meadowlark Architecture](./architecture.md), Meadowlark was developed on Amazon Web Services (AWS),  but the principle was to only use AWS managed services that have an analogous option for other major providers. This would make it (relatively) easy to migrate from one platform to the other. On-premise options were also explored.

The Alliance, following community feedback, has strongly factored open source availability of technology components into its technology roadmap choices (e.g., the work to port the Ed-Fi ODS platform storage to PostgreSQL). This choice has played an important role in expanding availability of the platform and lowering costs. This principle is likely to play an important role if the Meadowlark project is expanded (e.g., move to MongoDB from provider-specific options).

This document reviews the services used and identifies the equivalent tools (or gaps) in Azure, Google Cloud, and on-premise.

| Purpose | AWS Service | Azure | Google | On-Premises | Additional Notes |
| --- | --- | --- | --- | --- | --- |
| Load balancing and reverse proxy | [​API Gateway](https://aws.amazon.com/api-gateway/) | [Azure Application Gateway](https://azure.microsoft.com/en-us/services/application-gateway/#overview) | [Cloud Endpoints](https://cloud.google.com/endpoints) | [NGiNX](https://www.nginx.com/), among others |   |
| Serverless Application | [AWS Lambda](https://aws.amazon.com/lambda/) | [Azure Functions](https://azure.microsoft.com/en-us/services/functions/#overview) | [Google Cloud Functions](https://www.dynatrace.com/monitoring/technologies/google-cloud-monitoring/google-cloud-functions/?utm_source=google&utm_medium=cpc&utm_term=google%20cloud%20functions&utm_campaign=us-cloud-monitoring&utm_content=none&gclid=Cj0KCQiAqbyNBhC2ARIsALDwAsCT7cIo5OA8gTYttkevTd2XvydoEsrmpGTwjb712qKJlVQeW_LKXcEaAiL2EALw_wcB&gclsrc=aw.ds) | [OpenFaas](https://www.openfaas.com/) or [Fn](http://fnproject.io/) | The Meadowloark application is written in Typescript using the [Serverless package](https://www.npmjs.com/package/serverless), making it theoretically easy to reuse these components with any platform's serverless functions. <br><br>Could consider refactoring to OpenFaas or Fn for one system that is cloud-agnostic (runs in Kubernetes and Docker, respectively). |
| Key-value data  store and Change Data Capture | [DynamoDB](https://aws.amazon.com/dynamodb/) with [DynamoDB Change Data Capture](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/streamsmain.html) | [CosmosDB](https://azure.microsoft.com/en-us/services/cosmos-db/#overview) in Cassandra API mode with [CosmosDB Change Feed with Azure Functions](https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-functions) | [Firestore](https://cloud.google.com/firestore) ❌ see note below about change streams | [Apache Cassandra](https://cassandra.apache.org/_/index.html) with [Cassandra Triggers](https://medium.com/rahasak/publish-events-from-cassandra-to-kafka-via-cassandra-triggers-59818dcf7eed) | See detailed info below |
| Search engine | [Amazon OpenSearch](https://aws.amazon.com/opensearch-service/) | [Elastic on Azure](https://azure.microsoft.com/en-us/overview/linux-on-azure/elastic/) | [Elastic on Google Cloud Platform](https://www.elastic.co/about/partners/google-cloud-platform) | Either [ElasticSearch](https://www.elastic.co/elastic-stack/) or [OpenSearch](https://opensearch.org) can run on-premises |   |

## Key-Value Data Detailed Notes

The differences may be great enough that some tweaking of the storage model may be required.

Switching to MongoDB may be a useful alternative, as it is available on all platforms:

* [Amazon DocumentDB](https://aws.amazon.com/documentdb/) with [Change Streams](https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html)
* Azure CosmosDB has a MongoDB mode
* [MongoDB Atlas](https://www.mongodb.com/atlas/database) running on any of the three
* MongoDB can also run on-premises, with [Change Data Capture Handlers](https://docs.mongodb.com/kafka-connector/current/sink-connector/fundamentals/change-data-capture/).

Another option would be to switch to Cassandra for a single database platform available on all providers

* [Amazon Keyspaces](https://aws.amazon.com/keyspaces/)
* CosmosDB
* [Astra DB](https://www.datastax.com/products/datastax-astra/) from DataStax, running on any of the three
* Cassandra can also run on-premises

## Gogle Firestore Warning

Google Firestore might not have a direct equivalent of Change Data Capture... at least, the searching for this does not turn up functionality that is clearly the same as with the other products. However, perhaps one of these techniques is capable of writing out to a stream: [Extend... with Cloud Functions](https://firebase.google.com/docs/firestore/extend-with-functions) or [onSnapshot](https://firebase.google.com/docs/firestore/query-data/listen).
Loading

0 comments on commit 1bf503c

Please sign in to comment.