Skip to content

Latest commit

 

History

History
63 lines (44 loc) · 4.06 KB

010-action-classification-update.md

File metadata and controls

63 lines (44 loc) · 4.06 KB

OSEP #10: Action Classification Update

Author(s) @sruthivedantham
Implementer(s) @sruthivedantham
Status
Draft PR(s) #4182
Approval PR(s)
Created 2022-03-11
Updated 2022-25-11

Abstract

Action classification for bills largely happens inside of the scrape() method (or somewhere within bills.py). I propose we isolate everything related to action classification in its own file for each jurisdiction.

Specification

In this new design, all code related to action classification will be moved to a separate file (actions.py). This will help untangle action classification from scraping, as well as make the scrape()/bills.py methods simpler and easier to read.

The contents of the action classification file should be standardized across jurisdictions to utilize the BaseCategorizer and customize rules as appropriate. Regex patterns will match action phrases to defined OS classifications.

Every actions.py will contain:

  • custom rules that apply to the jurisdiction's bill language
  • a categorize_action() function that takes in a phrase and returns the appropriate classification utilizing the BaseCategorizer class
  • anything else that is required for action classification

Examples of Rules that could be found in an actions.py include:

    Rule(r"amendment not adopted", "amendment-failure"),
    Rule(r"(?i)third reading, (?P<pass_fail>(passed|failed))", "reading-3"),
    Rule(r"Read first time", "reading-1"),
    Rule(r"(?i)first reading, referred to (?P<committees>.*)\.", "reading-1"),
    Rule(r"(?i)And refer to (?P<committees>.*)", "referral-committee"),

An example action classification function in a jurisdiction's actions.py would look like this:

    def categorize(self, text):
        """Wrap categorize and add boilerplate committees."""
        attrs = BaseCategorizer.categorize(self, text)
        return attrs

Rationale

Action classification is hard because there are so many things that vary from jurisdiction to jurisdiction. By implementing these changes, we can help standardize the process, even if the content remains variable.

Additionally, moving all action classification code for each jurisdiction to its own actions.py will help simply the scrape() method/bills.py, which can be very long and complex for some jurisdictions. We are currently working to (enable post-processing per jurisdiction), which means that post-processors will run as the scrape runs. Whereas right now the scrape writes objects directly to disk, the scrape step will be updated to instead send the ScrapedBill through a chain of configured post-processors (one of which would be Actions per Jurisdiction). Moving action classification code to its own actions.py will assist in being able to eventually link it in as an individual post-processor.

Drawbacks

  • Every scraper is different and action classification is done differently from one to another. Pulling action classification code out of the scrape() method is easier for some jurisdictions than others.

Implementation Plan

  • This would ideally need to be implemented in every jurisdiction's scraper (although some already achieve this to a greater degree than others).

Copyright

This document has been placed in the public domain per the Creative Commons CC0 1.0 Universal license.