Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Status Audit Management Command #667

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions backend/alert/management/commands/statusaudit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import json
import logging

from django.core.management.base import BaseCommand
from tqdm import tqdm

from alert.models import Course, Section
from courses import registrar
from courses.util import get_course_and_section, get_current_semester, translate_semester_inv


class Command(BaseCommand):
help = """
Generate an audit report that demonstrates the differences between our
database and the course statuses received from using the OpenData
endpoint directly.
Note that this script DOES NOT make any changes to the database, just
generates a textfile report
"""

def handle(self, *args, **options):
root_logger = logging.getLogger("")
root_logger.setLevel(logging.DEBUG)

semester = get_current_semester()
statuses = registrar.get_all_course_status(semester)
stats = {
"missing_data": 0,
"section_not_found": 0,
"duplicate_updates": 0,
"unsynced_updates": 0,
}
unsynced_courses = []
for status in tqdm(statuses):
data = status
section_code = data.get("section_id_normalized")
if section_code is None:
stats["missing_data"] += 1
continue

course_status = data.get("status")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this status per section or per course?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be section_status, will change.

if course_status is None:
stats["missing_data"] += 1
continue

course_term = data.get("term")
if course_term is None:
stats["missing_data"] += 1
continue
if any(course_term.endswith(s) for s in ["10", "20", "30"]):
course_term = translate_semester_inv(course_term)

# Ignore sections not in db
try:
_, section = get_course_and_section(section_code, semester)
except (Section.DoesNotExist, Course.DoesNotExist):
stats["section_not_found"] += 1
continue

# Ignore duplicate updates
last_status_update = section.last_status_update
current_status = section.status
if current_status == course_status:
stats["duplicate_updates"] += 1
continue

stats["unsynced_updates"] += 1
unsynced_courses.append(
(section_code, last_status_update.new_status, current_status, course_status)
)

# Write out statistics and missing courses to an output file.
with open("./status_audit.txt", "w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to run this on a cron, we may want to allow the file name to be passed in/to include the UNIX timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup sounds good. Do you think we should write it to S3 or something like that? Maybe it'd be cool if we could setup a Slack integration and get notifs that way, but might be a lot of extra work for little reward.

f.write("Summary Statistics\n")
f.write(json.dumps(stats) + "\n\n")

f.write(
"""Courses Out of Sync\nCourse Code / Last Update Status /
Our Stored Status / Actual Status\n"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does actual status mean? Might be good to elaborate bc I'm getting it mixed up in my head

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual status is the Path@Penn (or the accurate), status of a course. I'll rename it a bit and also add a comment explaining in-line.

)
f.write("Our Status Matches Last Update\n")
f.writelines(
[
f"{course[0]} / {course[1]} / {course[2]} / {course[3]}\n"
for course in unsynced_courses
if course[1] == course[2]
]
)

f.write("\nOur Status Does Not Match Last Update\n")
f.writelines(
[
f"{course[0]} / {course[1]} / {course[2]} / {course[3]}\n"
for course in unsynced_courses
if course[1] != course[2]
]
)
57 changes: 57 additions & 0 deletions backend/status_audit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Summary Statistics
{"missing_data": 0, "section_not_found": 0, "duplicate_updates": 5711, "unsynced_updates": 49}

Courses Out of Sync
Course Code / Last Update Status /Our Stored Status / Actual Status
Our Status Matches Last Update
ANTH-2550-401 / C / C / O
BE-2000-201 / O / O / C
BE-5650-001 / C / C / O
BEPP-2020-402 / C / C / O
BEPP-2030-001 / C / C / O
CHEM-2412-142 / C / C / O
CHEM-2510-220 / O / O / C
COMM-6030-401 / O / O / C
EDUC-5256-001 / C / C / O
ESE-2040-001 / O / O / X
ESE-2040-201 / O / O / X
ESE-2180-001 / C / C / O
ESE-2180-101 / C / C / O
FNAR-1070-402 / C / C / O
FNAR-5011-402 / C / C / O
FNCE-2020-402 / C / C / O
GRMN-1800-001 / O / O / C
HSOC-2002-403 / C / C / O
LGST-2190-001 / O / O / C
LGST-2920-402 / O / O / C
MGMT-2920-402 / O / O / C
MGMT-3010-001 / C / C / O
MGMT-3010-005 / C / C / O
NURS-1640-110 / C / C / O
OIDD-2920-402 / O / O / C
PSCI-1800-204 / C / C / O
PSYC-1777-001 / C / C / O
SAST-2550-401 / C / C / O
SOCI-2000-403 / C / C / O
SPAN-1800-303 / C / C / O
SSPP-6030-401 / O / O / C
SWRK-6020-001 / C / C / O
SWRK-6030-003 / C / C / O
SWRK-7770-001 / O / O / C

Our Status Does Not Match Last Update
ANTH-0120-404 / C / O / C
BIOL-1101-102 / O / C / O
CHEM-1102-185 / O / C / O
CIS-1210-214 / C / O / C
EALC-0730-406 / C / O / C
HIST-0550-406 / C / O / C
LALS-4240-401 / C / O / C
MEAM-2100-202 / O / C / O
MEAM-2470-101 / O / C / O
MSSP-6340-001 / C / O / C
PHIL-1380-301 / C / O / C
PSCI-0200-202 / O / C / O
SOCI-2910-404 / C / O / C
SOCI-2931-401 / C / O / C
SWRK-6020-005 / O / C / O
Loading