Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pseudolog transformer #707

Open
ParadaCarleton opened this issue Oct 31, 2023 · 8 comments
Open

pseudolog transformer #707

ParadaCarleton opened this issue Oct 31, 2023 · 8 comments

Comments

@ParadaCarleton
Copy link

ParadaCarleton commented Oct 31, 2023

Request for an inverse-hyperbolic-sine, a.k.a. asinh or pseudolog, transformer. x->arcsinh(x/2) behaves like ln(x) for large values of x, but behaves like x->x for small values of x; this behavior is very useful for values that are almost lognormal, but take on both positive and negative values (e.g. net worth).

Ideally, this should provide location and scale parameters that can be either tuned or set to 0/1 (making the transformation x -> arcsinh((x + loc) / 2scale).)

@solegalli
Copy link
Collaborator

Hi Carlos,

Thanks for the suggestion. I am not familiar with this transformation, so I don't undertand what you mean by location and scale parameters and tune to 0/1.

Do you have a resource with more details about this transformation that you could share? Like when is it used? who developed it, or whatever you have at hand? We would need that in any case to create the documentation.

Thank you!

@ParadaCarleton
Copy link
Author

ParadaCarleton commented Nov 1, 2023

Thanks for the suggestion. I am not familiar with this transformation, so I don't undertand what you mean by location and scale parameters and tune to 0/1.

You can find more information here or here.

By location and scale parameters, I just mean that the transformation is of the form:

x -> asinh( (x + loc) / scale / 2)

Which has 2 parameters, loc and scale, which need to be estimated (usually by maximum likelihood).

However, people will sometimes set loc to 0, giving a simplified transform of the form:

x -> asinh(x / scale / 2)

Which only has one estimated parameter (scale).

Some people will even set scale to 1, just giving asinh(x/2).

@glevv
Copy link
Contributor

glevv commented Nov 4, 2023

This one is interesting, but is it numerically stable?

@ParadaCarleton
Copy link
Author

This one is interesting, but is it numerically stable?

Yes, there shouldn't be any problems with it. The only possible numerical problem is if the data aren't scaled and mean-centered, you may have problems with fitting loc and scale. This should probably be mentioned in the docs.

@solegalli
Copy link
Collaborator

Hey guys! Thank you for the links and discussion. It looks good to me. Would you like to give it a go at drafting a class?

@ParadaCarleton
Copy link
Author

Hey guys! Thank you for the links and discussion. It looks good to me. Would you like to give it a go at drafting a class?

I think so, but I'm a bit stuck on how to do fitting, in that there are two approaches:

  1. Choose a fit to maximize the normality of the predictor variable. (Easy, but not as accurate)
  2. Maximum likelihood/minimum loss estimation, where we estimate the scale parameter by minimizing the loss in the predictions. (More principled+more accurate).

I think I've worked out how to do 1, but not how to do 2, or whether it's even possible to do using the sklearn API.

@ParadaCarleton
Copy link
Author

@solegalli do you know how I can add a new transformer to the existing tests? I'm not sure where I can find the tests.

@solegalli
Copy link
Collaborator

You'd probably create a new .py with your transformer within the transformation folder.

Then, you need to create another script within this folder where you'd add the tests.

Plus, you'd need to add your transformer to this file for generic tests, that may fail, but then i can help you troubleshoot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants