-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where are you getting the mappings data? #39
Comments
If I had one regret with respect to how this library is set up, it would be that I wish I had used a git submodule to have the CSV files from https://github.com/dwillis/fech-sources be the source of mapping data. At the time, the Senate was doing its filings via paper and those senate filings were important to my employer. So I spent a fair amount of time extending the js/json file that @chriszs had built for fec-parse (which also uses fech-sources as its upstream source of data). I found the json easy to hand-edit, but it's been a task long on my to-do list to get the mappings I added back into fech-sources. The most definitive source of data for these mappings is the xls/xsls files the FEC hosts here. Click to expand "Electronically filed reports" and "Paper filed reports" and then click to download the "File formats, header file and metadata" files. I also highly recommend reading through those files before embarking on writing a parser as they will be a huge help in understanding how the filings are structured. To answer your two questions:
What I see as the pros/cons of each are:
One additional reason to use the CSVs as your data source: if/when you find issues in the CSVs, if you make PRs into fech-sources to fix them, they will benefit everyone, as we are all downstream from them. Good luck! Please don't hesitate to reach out if you have any questions. |
I spent some time a year ago trying to reproduce the CSVs from the original source Excel files. Well, actually to merge the two and create JSON Schemas with typing information. This won't come as a surprise to you, but what I found is that both are fairly dirty, the CSVs are sometimes incorrect, there are a ton of records when you multiply the number of fields by the number of versions, JSON Schema has a lot of depth and it's a difficult task. Some of that work was the basis of the draft PR you used as the basis for your fech-sources contribution. A contractor for the FEC was slowly working on a similar project in their fecfile-validate project, but with a slightly different scope (just current filings). FastFEC uses a version of the two .json files, including mappings.json, which I converted from Derek's original Ruby and then Evan and I improved over time. That's as close to a clean source as you'll find, though it originally derives from the CSVs. |
Oh, also Evan is correct about F3s. There's a PDF technical manual somewhere on the FEC site which details some of this, which if I can find again I'll link to. |
Thank you both so much for this. Oh man I just got overwhelmed ;) I've been looking into JSONSchema for a while, and I think I've concluded that I think it is overkill for what we need, but here are a few thoughts I had, and I thought I'd write them down. JSONSchema musingsfecfile-validate looks as canonical as you can get. It looks like they are sourcing their schemas from the .xls files you mentioned above, but it looks like they also don't trust those .xls files and have to hand-edit them. @chriszs by "current filings" do you mean fecfile-validate only supports filing versions 8.3+? That wouldn't be adequate for my (and I bet others) needs. I doubt the FEC will be motivated to support older versions, so we would need to supplement this. @chriszs mentions the combinatorial explosion, but I think we could get around this by re-using sub-schemas. Am I missing something there? Still, I'm not sure if we need the full power of JSONSchema, and therefore I'm not sure if it's worth bringing in that complication. Am I right that all we need extra are the dtypes that should get parsed? Like I don't think we need the full
that JSONSchema provides. Path ForwardOK, it sounds like updating fech-sources is what both of you are most supportive of, and I think that would work just fine for me. Adding types would be great, but just them being functional would be fine. I think the todos would be [] merge dwillis/fech-sources#11 CC @mjtravers from fecfile-validate, if you have any thoughts on how we could team up at all. |
Hi! Firstly: sorry I haven't been able to find time to get to your PR in FastFEC. (Though I have validated there is no perf difference in your version, I did notice some diffs I'm going through to try to figure out.) We will be focusing more on FEC at The Post later this year (I'm hoping to find time sooner). But I do want to chime in here to say:
Looking forward to seeing what you come up with. And thanks for organizing this discussion. |
Yes, my design heavily uses sub-schemas. Yes, there are a lot of edge cases. Correct that fec-validate only seems interested in the current version. I think a plan that focuses on improving the CSVs and converting from there sounds reasonable. |
Hi!
I'm working on a port of this to rust. I'm trying to decide where to source the schema mappings. Possible options I've found are:
I found that FastFEC chose to use this repo as their upstream.
Could you explain what you see as the pros/cons of each of these?
So far, what I see are:
Are there other considerations I'm missing?
I would love to make it so that there was one complete and accurate listing of schemas so that the wide range of parsers would not have to duplicate this effort. Any idea what would be required to make that happen?
CC @esonderegger @dwillis @freedmand
The text was updated successfully, but these errors were encountered: