Skip to content

Conversation

@jalvord1
Copy link
Contributor

Description & motivation

Link to Jira ticket with longer description

Breaking changes introduced by this PR:

This feature by default will change nothing, so it's not inherently breaking. However, this feature has major impacts when turned on and must be carefully configured to ensure proper data access for historical assessment records.

PR Merge Priority:

  • Low

Changes to existing files:

  • fct_student_assessment : When configured, student assessment records will be 'duplicated' across all tenants where a student (based on student_unique_id) has a current enrollment.
  • dim_assessment : When this feature is configured, assessment records will be 'duplicated' across any tenants where there are student records and the metadata for that assessment/tenant/year combo does not exist.
  • eventually objective assessment models as well

New files created:

  • bld_ef3__student_assessment_cross_tenant : This model is where we determine which tenants students are currently enrolled in, and then associate an original k_student_assessment value to all of those tenants, regardless of the tenant it was originally associated to. This is basically a linking model, and then all downstream models can use this information as-needed.

Tests and QC done:

  • Currently in testing for accuracy, performance, etc

edu_wh PR Review Checklist:

Make sure the following have been completed before approving this PR:

  • Description of changes has been added to Unreleased section of CHANGELOG.md. Add under ## New Features for features, etc.
  • Code has been tested/checked for Databricks and Snowflake compatibility - EA engineers see Databricks checklist here
  • Reviewer confirms the grain of all tables are unchanged, OR any changes are expected, communicated, and this PR is flagged as a breaking change (not for patch release)
  • If a new configuration xwalk was added:
    • The code is written such that the xwalk is optional (preferred), and this behavior was tested, OR
    • The code is written such that the xwalk is required, and the required xwalk is added to edu_project_template, and this PR is flagged as breaking change (not for patch release)
    • A description for the new xwalk has been added to EDU documentation site here
  • If a new configuration variable was added:
    • The code is written such that the variable is optional (preferred), and this behavior was tested, OR
    • The code is written such that the variable is required, and a default value was added to edu_project_template, and this PR is flagged as breaking change (not for patch release)
    • A description for the new variable has been added to EDU documentation site here

Future ToDos & Questions:

  • This feature assumes that student_unique_id is globally unique. That might not be true for some multi-tenant implementations, but not currently so we are not making the ID type configurable at this time.

@jalvord1 jalvord1 requested a review from rlittle08 November 10, 2025 20:05
Copy link
Collaborator

@rlittle08 rlittle08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very smart design, and easy to read, which is surprising given how confusing of a problem it is. I have a lot of small ideas but wouldn't change the overall approach at all

join dim_student
on fct_student_school.k_student = dim_student.k_student
-- if this source is configured, remove these students instead of using default logic below
-- TODO: should this join on student unique id instead?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the argument for student_unique_id instead of k_student? does it depend how we expect that the removed_students_source will be created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about how it would be created. I was mainly considering whether we want to remove all instances across years of a particular stu unique id, or only a specific year

count(distinct dim_student.birth_date) over (partition by dim_student.student_unique_id)
+ count(distinct lower(dim_student.first_name)) over (partition by dim_student.student_unique_id)
+ count(distinct lower(dim_student.last_name)) over (partition by dim_student.student_unique_id)
) <= 5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this <=5 logic came from me but now realizing.. if someone had 4 different first names but 1 birth date and 1 last name, that would = 6.. in that case what should we do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, are you thinking this would be 4 different records across tenants?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's hard, because I still think 4 records with different first names gives me pause, but I could see someone plausibly wanting those records to be included. Granted, we give everyone an option to decide on who should and shouldn't be included. Idk, it's hard to make a decision on a default rule, but I think I'd rather be stricter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this approach is overall super clean, especially given how messy the problem is

where normalized_score_name = 'other'
group by 1
),
access as (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very picky but can we rename this cte to something like 'combined_with_cross_tenant'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 🙂

coalesce(student_assessment_cross_tenant.k_student, student_assessments.k_student) as k_student,
coalesce(student_assessment_cross_tenant.k_student_xyear, student_assessments.k_student_xyear) as k_student_xyear,
coalesce(student_assessment_cross_tenant.tenant_code, student_assessments.tenant_code) as tenant_code,
coalesce(student_assessment_cross_tenant.school_year, student_assessments.school_year) as school_year,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think school_year might be the only case we want to prefer student_assessments over cross_tenant.. i have this on my whiteboard bc my head hurts hahah

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're 100% right! really good catch!

coalesce(student_assessment_cross_tenant.tenant_code, student_assessments.tenant_code) as tenant_code,
coalesce(student_assessment_cross_tenant.school_year, student_assessments.school_year) as school_year,
coalesce(student_assessment_cross_tenant.k_student_assessment__original, student_assessments.k_student_assessment) as k_student_assessment__original,
is_original_record,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we coalesce these

coalesce(student_assessment_cross_tenant.is_original_record, False),
coalesce(student_assessment_cross_tenant.original_tenant_code, student_assessments.tenant_code)

from student_assessments_wide
left join object_agg_other_results
on student_assessments_wide.k_student_assessment = object_agg_other_results.k_student_assessment
-- TODO: I don't want this column to be part of the output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this todo is taken care of now, right?

retest_indicator,
when_assessed_grade_level
when_assessed_grade_level,
is_original_record,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kinda think these should go right after access.tenant_code so they're easy to find

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had actually wanted to put them in the end so that they aren't that easy to find and maybe are less confusing for people who are not using this feature. But I could be convinced otherwise

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah that's fair.. right now they are in front of the score cols and i would be game for putting them at the very very end instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh,yeah. I wanted to put them at the very end but that doesn't work with the group by. I suppose that is good reason to move them to the beginning!

),
-- goal of this model is to determine which tenants have active enrollments
-- this will contain duplicates because the grain is of a school enrollment
active_enrollments as (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anticipating future tickets: what should the behavior be during summer? I believe now they'd disappear if folks actually end-date enrollments, and would stick around until we have next year's data if they leave end dates null.

If we thought these records should still be shown, it's very possible the right place to address that is the definition of is_active_enrollment itself rather than here, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point, and is exactly what we ran into in Rally this year. That does make me think that we should handle this in the active flag itself so that the definition can be the same for all use cases for a partner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants