Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KS Tester Class #109

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

KS Tester Class #109

wants to merge 1 commit into from

Conversation

RylieWeaver
Copy link
Contributor

Attempting to implement a KolmogorovSmirnovTester class, which will
(1) Mainly allow us to abstract away different function names for each distribution's 'observed_pdf' and
(2) Give a good framework for other types of tests to be done in the future as '______Tester'

@RylieWeaver RylieWeaver marked this pull request as draft August 29, 2024 03:59
@RylieWeaver
Copy link
Contributor Author

RylieWeaver commented Aug 29, 2024

@rileyjmurray

I've found something interesting when trying to implement the class inheritance I mentioned before. In a nutshell, it seems that C++ does not to have the same level of flexibility with inheritance as python.

In more detail, I want to be able to simply call different versions of observed_pdf and true_pdf without having to alter other functions in the class. This way, all the difference in implementation is in the generation of the pdfs. However, this will require different input arguments to observed_pdf and true_pdf for different instances of the class, and **I haven't found an equivalent notion of kwargs in C++.

Is there a good way to implement this that I could look into? Or, is this type of inheritance is more of a 'Pythonic' thing not meant for C++?

@rileyjmurray
Copy link
Contributor

Hi, @RylieWeaver! Thanks for taking the initiative and starting to work this out.

Can you paste simple example code here showing what you tried to do in C++? It's fine if the code doesn't compile. I'm just trying to understand intent.

@RylieWeaver
Copy link
Contributor Author

RylieWeaver commented Aug 29, 2024

Yes, I would say this is the key part:

    // Functions for pdf, cdf
    static std::vector<float> true_cdf(**kwargs)
    {
        true_pdf = tester.true_pdf(kwargs)
        return weights_to_cdf(true_pdf.size(), true_pdf.data());
    }
    static std::vector<float> observed_cdf(**kwargs)
    {
        observed_pdf = tester.observed_pdf(kwargs)
        return weights_to_cdf(observed_pdf.size(), observed_pdf.data());
    }

All-in-all, I would want it to look like this:

class KolmogorovSmirnovTester(**kwargs)
{
private:
    int n;               // Sample size
    double significance; // Significance level
    list kwargs;

public:
    // Constructor
    KolmogorovSmirnovTester(int n, double significance, **kwargs)
        : n(n), significance(significance), kwargs(**kwargs) {}

    // Function to check the critical value
    std::tuple<bool, double> check_critval(const std::vector<double> &cdf1, const std::vector<double> &cdf2, double critical_value) const
    {
        assert(cdf1.size() == cdf2.size());

        for (size_t i = 0; i < cdf1.size(); ++i)
        {
            double diff = std::abs(cdf1[i] - cdf2[i]);
            if (diff > critical_value)
            {
                return {false, diff};  //Test fails
            }
        }
        return {true, 0.0}; // Test passes
    }

    // Functions for pdf, cdf
    static std::vector<float> true_cdf(**kwargs)
    {
        true_pdf = tester.true_pdf(kwargs)
        return weights_to_cdf(true_pdf.size(), true_pdf.data());
    }
    static std::vector<float> observed_cdf(**kwargs)
    {
        observed_pdf = tester.observed_pdf(kwargs)
        return weights_to_cdf(observed_pdf.size(), observed_pdf.data());
    }

    // Function to run the test given the number of samples, significance level, observed_cdf, and true_cdf
    static void run_test(int n, double significance, **kwargs)
    {
        KolmogorovSmirnovTester tester(n, significance, **kwargs);
        double critical_value = KolmogorovSmirnovConstants::critical_value_rep(n, significance);
        true_cdf = tester.true_cdf(kwargs)
        observed_cdf = tester.observed_cdf(kwargs)
        auto [result, diff] = tester.check_critval(true_cdf, observed_cdf, critical_value);
        if (!result)
        {
            std::cout::endl;
            std::cout << "KS test failed with difference " << diff << " and critical value " << critical_value << std::endl;
            std::cout << "Test parameters: " << "n=" << n << " " << "significance=" << significance << std::endl;
        }
    }

    // Virtual destructor
    virtual ~KolmogorovSmirnovTester() = default;
};

This way, I can pass through the kwargs and the only difference for how an instance would be executed would be how it calculates true_pdf and observed_pdf.

@RylieWeaver
Copy link
Contributor Author

@rileyjmurray Oops, forgot to tag you here originally and been pretty busy the last 2 weeks. In any case, let me know if you think this type of inheritance is well-suited to C++. If not, we can talk about the next thing for me to take on. I have been thinking about FFT-based sketching in particular.

Also, congrats on the release! And thanks for the shoutout.

@rileyjmurray
Copy link
Contributor

Hi @RylieWeaver, thanks for the ping. I'm scrambling with deadlines for the end of Sandia's fiscal year. Rest assured I haven't forgotten about this PR, or your broader interest in the project!

Michael Mahoney has someone in his lab who's started work on SRHTs.

If you're looking for something "functional" to work on (as opposed to testing infrastructure), I'd like to recommend exposing dedicated functions for sparse matrix-vector multiply (SPMV). I allude to this as something that's useful in the context of sketch_vector on FAQ page, under the heading "No support for negative values of “incx” or “incy” in sketch_vector." But it's actually of broader use than that. I mention a starting point for SPMV here: https://github.com/BallisticLA/RandBLAS/blob/main/test/test_matmul_wrappers/DevNotes.md#tests-for-sketch_vector.

@RylieWeaver
Copy link
Contributor Author

@rileyjmurray Hey, thanks! Not a worry, it was the past two weeks for me haha. I'll look into that and get back when I've got questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants