Skip to content

Refactor value types #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Sep 13, 2018
Merged

Conversation

hendrikmuhs
Copy link
Contributor

@hendrikmuhs hendrikmuhs commented Aug 22, 2018

  • Changes the template logic to give a dictionary type enum value instead of inner details
  • The need for a persistence choice is hidden in a lower layer.
  • Value Stores are separated by 3 different modes of writing: compile, merge, append merge.
  • Implement support for append merge of string dictionaries.

fixes #93

old:

todo:

  • separate value store for append merge and normal merge
  • fix string value store append merge (actually seems to be broken independent of this PR)

@coveralls
Copy link

coveralls commented Aug 22, 2018

Coverage Status

Coverage decreased (-0.09%) to 95.212% when pulling 0fcf738 on hendrikmuhs:refactor-value-types into c1343d9 on KeyviDev:master.

@hendrikmuhs
Copy link
Contributor Author

hendrikmuhs commented Sep 6, 2018

almost done, got bigger and bigger

would be good to get some 1st feedback on the changes, IMHO it simplifies usage and the code base despite the need of a bit more boiler plate code

I also see some more improvements, e.g. further separating writing and merging (requires some thought on the generator), but I prefer doing this in separate follow up PR's.

keyvi::dictionary::DictionaryCompiler<keyvi::dictionary::fsa::internal::SparseArrayPersistence<BucketT>,
keyvi::dictionary::fsa::internal::IntInnerWeightsValueStore>
compiler(value_store_params);
keyvi::dictionary::DictionaryCompiler<keyvi::dictionary::dictionary_type_t::INT_WITH_WEIGHTS> compiler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we can use types defined in dictionary_types.h ?

* Comparison for the string value store which does not use length prefixes
*/
template <class PersistenceT, class HashCodeTypeT = int32_t>
struct RawPointerForCompareString final {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will be removed once current StringValueStore is deprecated, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to keep the reader for some time, but the writing part can be removed as part of the reimplementation.

JSON_VALUE_STORE_DEPRECATED = 4, // !< JsonValueStoreDeprecated
JSON_VALUE_STORE = 5, // !< JsonValueStore
INT_INNER_WEIGHTS_VALUE_STORE = 6, // !< IntInnerWeightsValueStore
KEY_ONLY = 1, //!< NullValueStore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this refactoring is a good time to use scoped enums here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent Idea!

class IntInnerWeightsValueStore final : public IntInnerWeightsValueStoreBase {
public:
explicit IntInnerWeightsValueStore(const keyvi::util::parameters_t& parameters = keyvi::util::parameters_t()) {}
IntInnerWeightsValueStore(const IntInnerWeightsValueStore& that) = delete;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we delete these methods in base class, we don't need to delete them in derived ones.

*/
class IntInnerWeightsValueStore final : public IntInnerWeightsValueStoreBase {
public:
explicit IntInnerWeightsValueStore(const keyvi::util::parameters_t& parameters = keyvi::util::parameters_t()) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can implement this constructor in base class, and later just make use of constructor inheritance.


class IntInnerWeightsValueStoreAppendMerge final : public IntInnerWeightsValueStoreBase {
public:
explicit IntInnerWeightsValueStoreAppendMerge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this constructor needed for append merger ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because the generator constructs ValueStore if it isn't passed into it, similar to the AddValue problem. We should look into this in 1 change.

@@ -99,6 +106,13 @@ class IntInnerWeightsValueStoreReader final : public IValueStoreReader {
std::string GetValueAsString(uint64_t fsa_value) const override { return std::to_string(fsa_value); }
};

template <>
struct Dict<value_store_t::INT_WITH_WEIGHTS> {
typedef IntInnerWeightsValueStore value_store_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modern C++ is encouraging use of using statement over typedef.

GeneratorAdapterInterface<PersistenceT, ValueStoreT>::CreateGenerator(size_t size_of_keys,
const keyvi::util::parameters_t& params,
ValueStoreT* value_store) {
typename GeneratorAdapterInterface<ValueT>::AdapterPtr GeneratorAdapterInterface<ValueT>::CreateGenerator(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we are passing 2 value store related template params, maybe we can eliminate one of them ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks strange but note that ValueT is the template of the class, PersistenceT, ValueStoreT are templates of the method.

template <class PersistenceT, class ValueStoreT = fsa::internal::NullValueStore>
template <int DictionaryType = fsa::internal::value_store_t::KEY_ONLY,
typename ValueStoreMergeT = typename fsa::internal::Dict<DictionaryType>::value_store_merge_t,
typename ValueStoreAppendMergeT = typename fsa::internal::Dict<DictionaryType>::value_store_append_merge_t>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have this type deduction as a default behavior which is possible to override, or it should be enforced and be the only one ?

@@ -196,7 +196,7 @@ class Generator final {

// get value and mark final state
bool no_minimization = false;
uint64_t value_idx = value_store_->GetValue(value, &no_minimization);
uint64_t value_idx = value_store_->AddValue(value, &no_minimization);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hendrikmuhs looks like this function is not used anymore, and can be deleted. What do you think ?
(this will also eliminate passing value_store_ to generator).

Note: Probably we can do that in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is basically the reason why the value stores for merging still require the constructor and the Add(key, value) method.

Still, I would like to work on that in a separate PR.

}

} else {
if (KeyDeleted(segment_it.segmentIndex(), top_key) == false) {
Copy link
Member

@narekgharibyan narekgharibyan Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if ( ! KeyDeleted(...)) ?

}
}

void Write(std::ostream& stream) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt if we need this function for merger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's afaik not used at the moment, but that's true for the compiler as well. Not added, but just some strange merge artifact.

@hendrikmuhs
Copy link
Contributor Author

hendrikmuhs commented Sep 12, 2018

I did a couple of the suggested re-factorings.

Thanks for the review @narekgharibyan! Looks much better now. Would be nice if you can have another look. I also changed vector to use the new template typing system.

Separation is still not perfect, to be continued in #102

/**
* An IntValue store which uses the values as inner weights.
*/
class IntInnerWeightsValueStore final : public IntInnerWeightsValueStoreBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't need this class anymore. Maybe it's worth to delete it and just rename IntInnerWeightsValueStoreBase -> IntInnerWeightsValueStore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #102 the AddValue(...) methods will move out of the base class into here.


uint64_t GetValue(const char* p, uint64_t v, bool* no_minimization) { return v; }
class IntValueStore final : public IntValueStoreBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -37,10 +37,11 @@
namespace keyvi {
namespace vector {

template <class ValueStoreT>
template <keyvi::dictionary::fsa::internal::value_store_t DictionaryType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truth be told I'm not sure about this change, as this adds a requirement of Dict< ... > to be defined for value store, which theoretically can be missing for some special value store, implemented for keyvi::vector only. No strong feelings though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds more like a naming issue to me: I haven't thought about vector when calling the type Dict<...> - thinking about it, it would make sense to rename. All this does is simplifying the selection of a value store.

Any suggestions for a better name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think that this class should have a member for value store reader as well, which will ultimately eliminate a need of having ValueStoreFactory as a separate class/method. Also it will ensure that someone implementing new value store, implements all it's components.

So maybe value_store_components ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I doubt we can get rid of the factory as this is part of runtime abstraction but I think we can do some improvements

@narekgharibyan
Copy link
Member

@hendrikmuhs This PR is really great, looking forward to have this merged!
Added a few more comments, apart from that LGTM!

And more idea to be discussed later on: In a long run I see weights to be something independent from value store itself, so for example we can have completion dictionary with json values, but this will require addition of new interfaces as well.

@hendrikmuhs
Copy link
Contributor Author

I improved the naming, there were some overlaps, I hope this improves readability. The readers are also now part of the new helper type.

@narekgharibyan Can you create an issue for your idea about weights - not sure if it can really be independent or if we need a JsonValueStoreWithWeights but lets discuss it in the issue.

@hendrikmuhs hendrikmuhs merged commit 20c7895 into KeyviDev:master Sep 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants