Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unique name for CivicPlus sites #80

Open
zstumgoren opened this issue Dec 25, 2020 · 2 comments
Open

Fix unique name for CivicPlus sites #80

zstumgoren opened this issue Dec 25, 2020 · 2 comments
Assignees
Labels
bug Something isn't working
Projects

Comments

@zstumgoren
Copy link
Member

zstumgoren commented Dec 25, 2020

Related to #82 and #63

We need a more robust strategy for determining a unique name for sites/assets.

The current strategy for determining a unique ID for a site assumes a subdomain containing place-related information.

But not all CivicPlus sites have a place-oriented subdomain. For example, Napa County has a broken CivicPlus subdomain but a working AgendaCenter site on its own county domain (as of late Dec. 2020):

# Broken
https://napa-county.civicplus.com/AgendaCenter

# Working
https://www.countyofnapa.org/AgendaCenter

With the current strategy, we end up producing asset names such as below:

civicplus_www_12072020-1666_agenda.pdf
@zstumgoren zstumgoren created this issue from a note in Tasks (To do) Dec 25, 2020
@zstumgoren zstumgoren added the bug Something isn't working label Dec 25, 2020
@zstumgoren zstumgoren self-assigned this Dec 25, 2020
@zstumgoren
Copy link
Member Author

Whoops. This is a non-issue. Was using an alternate domain for Columbus, which does indeed have a proper CivicPlus subdomain:

http://wi-columbus.civicplus.com/AgendaCenter

Tasks automation moved this from To do to Done Dec 25, 2020
@zstumgoren zstumgoren reopened this Dec 25, 2020
Tasks automation moved this from Done to In progress Dec 25, 2020
@zstumgoren zstumgoren moved this from In progress to To do in Tasks Dec 25, 2020
@zstumgoren
Copy link
Member Author

zstumgoren commented Dec 26, 2020

It would be possible to add some logic that could devise a unique name for an asset other than its subdomain on CivicPlus. For example, given a non-standard URL such as Napa County (https://www.countyofnapa.org/AgendaCenter), we could derive the name from the base domain. However, this would pose downstream issues in code that expects the state_or_province and place data to be present. These data points are derived currently from subdomains.

Two possible strategies that would be more robust:

  1. Use the subdomain generally and for edge cases look up the state and place data in a corrections dictionary.
  2. Rely on a canonical list of known domains and their verified geographic information.

We've already started down the path of manually correcting state/place data in aw-scripts because there are problems ascertaining this info even on subdomains, such as for water districts. Will likely need to use some form of the "corrections" approach here in the core framework to generate accurate, standardized place information for each site.

NOTE: The court-scraper project takes a similar approach insofar as it relies on a canonical file of site metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Tasks
To do
Development

No branches or pull requests

1 participant