You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After researching all three, I'm inclined to go with @open-city's library. @cjdd3b's appeals to me, but I think the former is going to be simpler to get running.
A few months later, I'm not so sure. I'm keeping these data in ElasticSearch now, and I'm persuaded that it's the proper vehicle for manipulating this data. But of course it remains essential to de-dupe donor and vendor records.
I'm also thinking that this problem could be offloaded, by geocoding every address (basically farming out the problem to a more intelligent service), and then using the lat/lon pair combined with the name to figure out if it's the same vendor / contributor. At .8¢/query via Yahoo, that could get expensive fast. (Over $900, by my math!) Google sells the same service, but it's apparently so expensive that they're not even naming the price. :-/
There are a few good tools for this:
https://github.com/cjdd3b/fec-standardizer
https://github.com/huffpostdata/campfin-linker
https://github.com/open-city/dedupe
The text was updated successfully, but these errors were encountered: