-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review reweighting targets #5
Comments
Thanks @donboyd5 - I think we can just keep in the doc since we don't need to add excess work here. Discussion on this thread would be good though! |
Suggested principles and general approach I suggest we make this discussion primarily about setting targets. If we find ourselves talking about technical methods for hitting/approximating targets, or evaluating goodness of fit, I can open a separate issue for that. Comments below are about long-run approach, without regard to easier/shorter methods we may need to use in the near term. Clearly we may only implement some (or possibly even none) of this in Phase 1, and may not get to all of it even in Phases 1-3. But it's good to think about where we want to end up. We absolutely should include in this issue discussion of how much targeting, and how, to do in Phase 1. My preference is to waste no work. What I mean by that is, let's not construct anything elaborate in Phase 1 that we'll have to discard or tear down later. Let's make progress on target setting and hitting methods in Phase 1, even if we don't use anything sophisticated in Phase 1 and rather just apply some simple growth rates.
|
Filers and nonfilers approach
All of this is important for getting a file that can represent the last-best historical year well -- essential to representing future years plausibly. (If you extrapolate from the wrong base using the right growth, you'll still get the wrong future.) |
Principles for establishing targets -- which items to target? IRS aggregates create opportunity to target many hundreds of variables, as do either the CPS or ACS. Certainly in the short run we won't be able to try to hit them all, and in the long run we may not want to try to hit them all. We need principles for what to target and, because our technical methods will not allow us to hit all targets well, principles for which targets to place greatest importance on. We'd also like good ways to operationalize those principles. Possible principles -- we should target variables that are:
Are there other important principles? Does filer age come into play under any of the above? (TBD) |
@donboyd5 and @nikhilwoodruff, Thanks for all the work so far. My understanding of our Feb work plan is to create a flat-file version of a Policyengine-US (PEUS) hierarchical input dataset. @nikhilwoodruff has already started that work in PR #4. My responsibility is validation. I have been able to download TSY and JCT tax expenditure estimates for FY2023, which is what I will need to compare estimates generated by our Feb dataset, which looks like it will be for CY2023 (see PR #4). But when I look at the IRS SOI aggregate tables pointed to by @donboyd5 in issue #5, the latest available information is for CY2021. Maybe I'm missing something obvious, but I don't see that we can use any IRS-SOI data in the Feb work given the lag in IRS-SOI publication. |
@martinholmer @nikhilwoodruff The agreed Phase 1 plan only requires that we "construct a flattened version of the PolicyEngine file suitable for input into Tax-Calculator", so we do not need to do this specific kind of targeting in Phase 1. If the basic PE flat file will be an already-targeted CY 2023 file, which is how I interpret the screenshot in PR #4 (if not correct, please say so @nikhilwoodruff), then we don't need to do any specific targeting in Phase 1. This thread is focused primarily on longer-term approaches. That said, for completeness and not for work in Phase 1, there are ways to use CY 2021 target information when producing a CY 2023 file. One approach would be to forecast key targets forward 2 years, allowing us to have targets for aggregates in 2023, as well as distributional targets. Then reweight the file to hit/approximate those targets for filers. But we don't need to do that in Phase 1. |
Thanks, @martinholmer. I missed your earlier request for status update. The thread was about longer term issues that we discussed extensively with @nikhilwoodruff during Phases 1-3, although not all were addressed. Fine to close. We can revisit if / when appropriate in the future. |
Here is a link to @donboyd5's notes about IRS spreadsheets as source of historical target data, from a prior project.
Should I add them below, or keep as separate resource doc, and start conversation here about what to do and how?
The text was updated successfully, but these errors were encountered: