Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Density Estimation: Improved Sheather-Jones algorithm #457

Open
orlandothoeny opened this issue Sep 28, 2022 · 4 comments
Open

Kernel Density Estimation: Improved Sheather-Jones algorithm #457

orlandothoeny opened this issue Sep 28, 2022 · 4 comments

Comments

@orlandothoeny
Copy link

The KernelDensityEstimation class currently includes the normal distribution approximation bandwidth estimator (see KernelDensityEstimation::getDefaultBandwith()) when no bandwidth is passed to the constructor.

It would be useful to have the possibility to choose the Improved Sheather-Jones algorithm as the bandwidth function. Especially when working with non-normal-distributed datasets.

Some resources about Sheather-Jones :

@markrogoyski
Copy link
Owner

Hi @orlandothoeny,

Thanks for your interest in MathPHP.

Thanks for the suggestion for a feature improvement for a new kernel function. We'll look into it and see if this is something we can add.

In the meantime, you are able add your own custom kernel function by supplying a PHP callable to the setKernelFunction method of a KernelDensityEstimation object.

Mark

@Beakerboy
Copy link
Contributor

@markrogoyski I believe this request is referring to the bandwidth, not the kernel. Currently the object accepts a float or null. If null, a default bandwidth is calculated and used.

@orlandothoeny if you are able to implement the calculation, we could easily add it add a static method, such that a user could call something like:

$bandwidth = KernelDensityEstimation::ISJBandwidth($data);
$kde->setBandwidth($bandwidth);

this would be the most backward-compatible strategy.

@orlandothoeny
Copy link
Author

@Beakerboy Yes, that's correct. This would be an additional method for calculating the bandwidth.

That would be one option regarding backward compatibility, another option would be to allow callables as an additional type for the $bandwith parameter. But the option you described is probably simpler.

I'd have to brush up on my math a bit to implement it myself, a few years have passed since I last used that stuff :)
Not sure if I have the time to do that though.

I understand that it's an open-source project, so no pressure on you guys. It's your free time. But if someone wants to implement it, I'm grateful.

@markrogoyski
Copy link
Owner

@orlandothoeny,

What could help speed up an implementation is providing test data to write unit tests against.

For example:

  • GIVEN input X
  • WHEN computing the KDE
  • THEN the result is Y

Having data to write unit tests allows us to be confident we are building the write calculation.

Another option is to research and provide instructions on how to produce test data using a trustworthy tool like R or NumPy for instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants