Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GFD mining #465

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add GFD mining #465

wants to merge 5 commits into from

Conversation

AntonChern
Copy link
Contributor

@AntonChern AntonChern commented Sep 25, 2024

This PR implements an algorithm for mining graph functional dependencies based on article "Discovering Graph Functional Dependencies" by Fan Wenfei, Hu Chunming, Liu Xueli, and Lu Ping. Algorithm, given an input graph, returns a set of dependencies satisfied on this graph. The algorithm also has two configurable parameters: k is the maximum number of vertices in the pattern of the mined dependency and sigma is its minimum frequency.
In addition, the PR implements the ability to run the algorithm in Python, and also contains examples of its use.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 29. Check the log or trigger a new build to see more.

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from c3eeecf to a59e3cc Compare October 3, 2024 08:42
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/python_bindings/gfd/bind_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
: query_(query_), iso_(iso_), res_(res_) {}

template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1>
bool operator()(CorrespondenceMap1To2 f, CorrespondenceMap2To1) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second parameter is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boost Graph Library requires such syntax from callback function. Unfortunately, if I change the function signature, the code will not compile

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
@xJoskiy
Copy link
Contributor

xJoskiy commented Oct 8, 2024

Add PR description

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
@xJoskiy
Copy link
Contributor

xJoskiy commented Oct 9, 2024

You are doing great! Please don't forget to notify when you are done with changes requested. Also please mark conversations if they are resolved

@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from 60aeadc to 9ce90a3 Compare October 16, 2024 15:46
@AntonChern AntonChern requested a review from xJoskiy October 16, 2024 15:47
src/python_bindings/gfd/bind_gfd_mining.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Token name_token = Token(i, name.first);
for (auto& value : name.second) {
Token value_token = Token(-1, value);
result.push_back(Literal(name_token, value_token));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.reserve(name.second.size() * attrs_info.at(label).size())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, name.second can have different number of values, depending on name.first. The reserve may not work correctly.

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/python_bindings/bindings.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 2 times, most recently from 11d6a2e to 13b76ae Compare November 5, 2024 16:38
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
examples/basic/mining_gfd/mining_gfd1.py Outdated Show resolved Hide resolved
examples/basic/mining_gfd/mining_gfd2.py Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/tests/all_paths.cpp Outdated Show resolved Hide resolved
@AntonChern AntonChern force-pushed the gfd branch 4 times, most recently from 10ff8ca to b1a77f0 Compare December 11, 2024 17:41
@xJoskiy
Copy link
Contributor

xJoskiy commented Dec 12, 2024

graph_descriptor.h structures and aliases should not be in global namespace, add them to gfd namespace

@xJoskiy
Copy link
Contributor

xJoskiy commented Dec 12, 2024

Change commit name Correct changes to previous algorithms so it's clear, what was changed

// Defines a specific attribute of the pattern.
// The first element is the index of the vertex,
// the second is the name of the attribute.
// An alias for user convenience.
using Token = std::pair<int, std::string>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are indexes supposed to have negative values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I encode the constant token with the value -1

src/core/algorithms/gfd/gfd.h Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.h Outdated Show resolved Hide resolved
src/core/algorithms/gfd/comparator.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/comparator.h Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
std::vector<Literal> premises = {};
Gfd gfd = {pattern, premises, conclusion};
if (Validate(graph, gfd, embeddings, sigma)) {
rules.push_back({{}, l});
Copy link
Contributor

@xJoskiy xJoskiy Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid unnecessary temporary object creation

Suggested change
rules.push_back({{}, l});
rules.emplace_back({}, l);

Consider everywhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the suggested line gives a compiler error:
no matching function for call to ‘std::vector<std::pair<std::vector<std::pair<std::pair<int, std::__cxx11::basic_string >, std::pair<int, std::__cxx11::basic_string > > >, std::pair<std::pair<int, std::__cxx11::basic_string >, std::pair<int, std::__cxx11::basic_string > > > >::emplace_back(, gfd::Literal&)’

Copy link
Collaborator

@ol-imorozko ol-imorozko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review gfd_miner.cpp tomorrow (:

src/core/algorithms/gfd/comparator.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/comparator.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd.h Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
Comment on lines 37 to 48
std::unique_ptr<algos::GfdMiner> algorithm =
CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
algorithm->Execute();
std::vector<Gfd> gfd_list = algorithm->GfdList();
algorithm = CreateGfdMiningInstance(graph_path, {.k = 3, .sigma = 3});
algorithm->Execute();
std::size_t expected_size = gfd_list.size();
gfd_list = algorithm->GfdList();
ASSERT_EQ(expected_size, gfd_list.size());
for (std::size_t index = 0; index < gfd_list.size(); ++index) {
ASSERT_TRUE(gfds.at(index) == gfd_list.at(index));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call CreateGfdMiningInstance and Execute two times in a row?
Should't this be just

Suggested change
std::unique_ptr<algos::GfdMiner> algorithm =
CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
algorithm->Execute();
std::vector<Gfd> gfd_list = algorithm->GfdList();
algorithm = CreateGfdMiningInstance(graph_path, {.k = 3, .sigma = 3});
algorithm->Execute();
std::size_t expected_size = gfd_list.size();
gfd_list = algorithm->GfdList();
ASSERT_EQ(expected_size, gfd_list.size());
for (std::size_t index = 0; index < gfd_list.size(); ++index) {
ASSERT_TRUE(gfds.at(index) == gfd_list.at(index));
}
std::unique_ptr<algos::GfdMiner> algorithm = CreateGfdMiningInstance(graph_path, {.k = 2, .sigma = 3});
ExecuteAndCompare(algorithm, expected_gfds);

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test tests the minimality of the dependencies, that is, independence from the choice of k. In this example, the algorithm should not return patterns with 3 vertices, since they are not minimal. So, the result with k = 2 and k = 3 should be the same

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved
Add class for GFD miner
Add minimal GFD and GFD with multiple conclusion tests
Added two examples with searching for dependencies in small graphs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants