New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove liboauthcpp #8030
Remove liboauthcpp #8030
Conversation
c91992f
to
f3b1e1d
Compare
f3b1e1d
to
c153cf6
Compare
uint32_t digest[5]; | ||
sha1.get_digest(digest); | ||
for (auto & b : digest) | ||
b = boost::core::byteswap(b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why do you think a byte swap is necessary compared to the previous implementation? Why are implementations different?
- How critical is ExtractHash performance? Does it make sense to paste here boost's implementation and fix it to avoid bits rotation? Or maybe paste here the necessary part from liboauth implementation? Which implementation is faster, or are they the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think a byte swap is necessary compared to the previous implementation?
Digest from liboauthcpp
is little-endian (controlled by SHA1_LITTLE_ENDIAN
).
organicmaps/3party/liboauthcpp/src/SHA1.h
Lines 55 to 57 in 967ffc1
#if !defined(SHA1_LITTLE_ENDIAN) && !defined(SHA1_BIG_ENDIAN) | |
#define SHA1_LITTLE_ENDIAN | |
#endif |
Digest from
boost
is Why are implementations different?
I can only speculate but it's probably just a preference.
How critical is ExtractHash performance?
Not critical. It's used to verify downloaded map data in a separate thread.
Does it make sense to paste here boost's implementation and fix it to avoid bits rotation? Or maybe paste here the necessary part from liboauth implementation?
It's up to you to decide ;)
Which implementation is faster, or are they the same?
Since the code is not critical, I don't see a reason to write benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digest from boost is little-endian.
You likely meant big-endian? Does the result depend on the current arch?
Maybe use boost::endian::endian_reverse_inplace ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You likely meant big-endian?
Yes.
Does the result depend on the current arch?
Hmm I have no idea how to verify that.
Signed-off-by: Osyotr <[email protected]>
Signed-off-by: Osyotr <[email protected]>
c153cf6
to
5939751
Compare
{ | ||
SHA1::Hash ExtractHash(boost::uuids::detail::sha1 & sha1) | ||
{ | ||
uint32_t digest[5]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boost::uuids::detail::sha1::digest_type ?
uint32_t digest[5]; | ||
sha1.get_digest(digest); | ||
for (auto & b : digest) | ||
b = boost::core::byteswap(b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digest from boost is little-endian.
You likely meant big-endian? Does the result depend on the current arch?
Maybe use boost::endian::endian_reverse_inplace ?
|
||
SHA1::Hash result; | ||
static_assert(result.size() == sizeof(digest)); | ||
std::copy_n(reinterpret_cast<uint8_t const *>(digest), sizeof(digest), std::begin(result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be avoided in favor of writing into result directly?
I'm inclined to C&P one of implementations into the codebase, but you need to decide:
|
Lincenses are not a big deal, they're already included in copyright.txt Does the currently printed sha match standard Linux/mac utilities output? I've checked it with
... and the output doesn't match. Does Boost's default output without modifications match it? @vng the checksum calculation can be safely migrated to a different algo, right? A new version will contain a new algo with newly generated checksums. Are there better/faster algos with a shorter hash? |
If the algo can be easily replaced, then this PR can be merged, and replacement can be done later. If it's better to leave the same implementation, then less ugly/faster one is preferred. The worst case is when a user downloads 65+Gb of mwm (whole planet) and all hashes are calculated. |
Why SHA is different? unit test shows the same sha? It should be the same for current countries.txt |
I honestly don't know why. PowerShell and various online services give the correct output.
https://github.com/Cyan4973/xxHash?tab=readme-ov-file#benchmarks |
My bad, I forgot |
Thanks! What's the best/simplest can be used? |
Depends on the usage. |
We don't need cryptographic strength here. Would it be hard to integrate https://xxhash.com/ ? |
Here's drop-in replacement:
include(FetchContent)
FetchContent_Declare(
xxHash
GIT_REPOSITORY https://github.com/Cyan4973/xxHash.git
GIT_TAG bbb27a5efb85b92a0486cf361a8635715a53f6ba # v0.8.2
SOURCE_SUBDIR cmake_unofficial
#FIND_PACKAGE_ARGS NAMES xxHash # CMake 3.24+
)
set(BUILD_SHARED_LIBS OFF)
set(XXHASH_BUILD_XXHSUM OFF)
FetchContent_MakeAvailable(xxHash)
target_link_libraries(${PROJECT_NAME} xxHash::xxhash) uint64_t Calculate(std::string const & filePath)
{
uint32_t constexpr kFileBufferSize = 8192;
try
{
base::FileData file(filePath, base::FileData::OP_READ);
uint64_t const fileSize = file.Size();
struct XXH3StateDeleter
{
void operator()(XXH3_state_t * p) const
{
(void)XXH3_freeState(p);
}
};
using XXH3StatePtr = std::unique_ptr<XXH3_state_t, XXH3StateDeleter>;
XXH3StatePtr state = XXH3StatePtr(XXH3_createState());
if (!state || XXH3_64bits_reset(state.get()) != XXH_OK)
{
LOG(LERROR, ("Could not create XXH3 state."));
return {};
}
uint64_t currSize = 0;
unsigned char buffer[kFileBufferSize];
while (currSize < fileSize)
{
auto const toRead = std::min(kFileBufferSize, static_cast<uint32_t>(fileSize - currSize));
file.Read(currSize, buffer, toRead);
if (XXH3_64bits_update(state.get(), buffer, toRead) != XXH_OK)
{
LOG(LERROR, ("Could not update XXH3 state."));
return {};
}
currSize += toRead;
}
return XXH3_64bits_digest(state.get());
}
catch (Reader::Exception const & ex)
{
LOG(LERROR, ("Error reading file:", filePath, ex.what()));
}
return {};
}
uint64_t CalculateForString(std::string_view data)
{
return XXH3_64bits(data.data(), data.size());
} Note that you need to update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the next steps here? LGTM, but first of all we can ensure that tests only commit works on current master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Getting rid of some 3p library is good
- Introducing a change brings potential risks
- Changing to boost instead of already working 3p lib without clear benefits brings a bit of risk
- Changing to a faster, more optimal implementation may bring benefits that overweight risks of introducing changes
Using FetchContent in CMake looks risky to me compared to using submodules or plain copies of the code in the main repo, because if there's no internet connection then you're stuck. Maybe we need to get used to it first.
Regarding the choice of the implementation: we don't need constexpr calculations at compile time, do we?
There is BSD-licensed C++17 header-only implementation, that can be easily copied into the main repo without any dependencies and complexities: https://github.com/RedSpah/xxhash_cpp
Changing client's code will also require modifying the code in the python generator part in tools/python/post_generation/hierarchy_to_countries.py:
def get_mwm_hash(path, name):
filename = os.path.join(path, f"{name}.mwm")
h = hashlib.sha1()
with open(filename, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
h.update(chunk)
return str(base64.b64encode(h.digest()), "utf-8")
We likely may use this drop-in replacement: https://pypi.org/project/xxhash/ , it will require adding one more dependency to the generator.
WDYT about it?
This PR makes our codebase much better and doesn't break an existing behavior. Changing hash for countries is a more complex task and may break existing behavior. Can investigate in a separate PR. |
Depends on #8048