Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the OGE Public Financial Disclosure Reports parsing #11

Open
2 tasks
bnsmith3 opened this issue Apr 13, 2017 · 1 comment
Open
2 tasks

Improve the OGE Public Financial Disclosure Reports parsing #11

bnsmith3 opened this issue Apr 13, 2017 · 1 comment

Comments

@bnsmith3
Copy link

There are at least two different types of files on the OGE site that could use parsing:

  • Form 278
  • Ethics Agreement

Right now, form 278 form can be parsed using this code, but it could be improved.

The output of a parsed file should be a json object that could be ingested by pretty much any other service or tool.

@gregoryfoster
Copy link
Collaborator

gregoryfoster commented Apr 13, 2017

Here's an example of a parseable OGE Form 278e Public Financial Disclosure Report (Steve Bannon: March 30, 2017). I believe this PDF format may be output from a process on the OGE's Integrity.gov website where everyone except Presidential candidates must file (candidates file at the FEC) (source). So this parseable PDF format is our primary target.

We have observed one other format for OGE Form 278e, which appears to be a scan of a primary source document (Michael Flynn: March 31, 2017). Depending on the frequency with which these documents are encountered and if they are parseable, it might be a secondary target.

President Barack Obama used the older OGE Form 278, and available documents appear to be scans (Barack Obama: May 12, 2016). It's unclear how many historical records we'll encounter that are coded to 278 rather than 278e.

Finally, here is a different format for a judicial branch employee from the Committee on Financial Disclosure in the Administrative Office of the United States Courts (Neil Gorsuch: August 11, 2016). This also appears to be a scan.

There is also an OGE Form 278-T for reporting periodic transactions (e.g., purchase and sale of stocks). Here is an example (Charles F. Bolden Jr.: Feb 2, 2016). I would argue that this parseable format is the next most important target because this is where potentially interesting and timely information could be uncovered.

re: Ethics Agreements, for example (Rex Tillerson: Jan 3, 2017). These documents appear to be business letters which are not in a standard parseable format. However, they do appear to contain important information so it might be good to figure out a way to point to these files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants