-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadAt #323
base: master
Are you sure you want to change the base?
loadAt #323
Conversation
( fixed travis wrt |
Fixed CI on Windows by not forgetting to close test database before its removal. Python38 failure is due to "Failed to connect to github.com port 443" and so should be transient. interdiff--- a/src/ZODB/tests/testDemoStorage.py
+++ b/src/ZODB/tests/testDemoStorage.py
@@ -357,6 +357,10 @@ def getObjAt(at):
self.assertEqual(getObjAt(atUnlink), 2)
self.assertRaises(ZODB.POSException.POSKeyError, getObjAt, atGC)
+ # end
+ db.close()
+ zdemo.close() # closes zbase and zoverlay as well
+
def test_suite():
suite = unittest.TestSuite(( |
( rebased on top of master since #320 was merged ) |
When zodbdump input says to copy an object, we first load that object. However if object does not exist loadBefore raises POSKeyError, and when object at copied-from revision was deleted loadBefore returns None. -> Handle that explicitly to provide failure details to the user, so that instead of cryptic === RUN TestLoad/δstart=0285cbac75555580 Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 133, in <module> main() File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 129, in main return command_module.main(argv) File "<decorator-gen-6>", line 2, in main File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 103, in _ return f(*argv, **kw) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 94, in main zodbrestore(stor, asbinstream(sys.stdin), _) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 43, in zodbrestore zodbcommit(stor, at, txn) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 122, in zodbcommit _() File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 91, in _ data, _, _ = stor.loadBefore(obj.oid, p64(u64(obj.copy_from)+1)) TypeError: 'NoneType' object is not iterable xtesting.go:483: /tmp/demo009767458/δ0285cbac75555580/δ.fs: zpyrestore: exit status 1 it fails with something more understandable: === RUN TestLoad/δstart=0285cbac75555580 Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 133, in <module> main() File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 129, in main return command_module.main(argv) File "<decorator-gen-6>", line 2, in main File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 103, in _ return f(*argv, **kw) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 94, in main zodbrestore(stor, asbinstream(sys.stdin), _) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 43, in zodbrestore zodbcommit(stor, at, txn) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 129, in zodbcommit _() File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 97, in _ (stor.getName(), ashex(obj.oid), ashex(obj.copy_from))) ValueError: /tmp/demo358030847/δ0285cbac75555580/δ.fs: object 0000000000000003: copy from @0285cbac70a3d733: no data xtesting.go:483: /tmp/demo358030847/δ0285cbac75555580/δ.fs: zpyrestore: exit status 1 For the implementation it would be easier to use loadAt (zopefoundation/ZODB#323), but we don't have that yet. /reviewed-by @jerome /reviewed-on https://lab.nexedi.com/nexedi/zodbtools/merge_requests/20
loadAt is new optional storage interface that is intended to replace loadBefore with more clean and uniform semantic. Compared to loadBefore, loadAt: 1) returns data=None and serial of the removal, when loaded object was found to be deleted. loadBefore is returning only data=None in such case. This loadAt property allows to fix DemoStorage data corruption when whiteouts in overlay part were not previously correctly taken into account. zopefoundation#318 2) for regular data records, does not require storages to return next_serial, in addition to (data, serial). loadBefore requirement to return both serial and next_serial is constraining storages unnecessarily, and, while for FileStorage it is free to implement, for other storages it is not - for example for NEO and RelStorage, finding out next_serial, after looking up oid@at data record, costs one more SQL query: https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508 https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482 https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264 https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199 next_serial is not only about execution overhead - it is semantically redundant to be there and can be removed from load return. The reason I say that next_serial can be removed is that in ZODB/py the only place, that I could find, where next_serial is used on client side is in client cache (e.g. in NEO client cache), and that cache can be remade to work without using that next_serial at all. In simple words whenever after loadAt(oid, at) -> (data, serial) query, the cache can remember data for oid in [serial, at] range. Next, when invalidation message from server is received, cache entries, that had at == client_head, are extended (at -> new_head) for oids that are not present in invalidation message, while for oids that are present in invalidation message no such extension is done. This allows to maintain cache in correct state, invalidate it when there is a need to invalidate, and not to throw away cache entries that should remain live. This of course requires ZODB server to include both modified and just-created objects into invalidation messages ( zopefoundation/ZEO#160 , zopefoundation#319 ). Switching to loadAt should thus allow storages like NEO and, maybe, RelStorage, to do 2x less SQL queries on every object access. zopefoundation#318 (comment) In other words loadAt unifies return signature to always be (data, serial) instead of POSKeyError object does not exist at all None object was removed (data, serial, next_serial) regular data record used by loadBefore. This patch: - introduces new interface. - introduces ZODB.utils.loadAt helper, that uses either storage.loadAt, or, if the storage does not implement loadAt interface, tries to mimic loadAt semantic via storage.loadBefore to possible extent + emits corresponding warning. - converts MVCCAdapter to use loadAt instead of loadBefore. - changes DemoStorage to use loadAt, and this way fixes above-mentioned data corruption issue; adds corresponding test; converts DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt. - adds loadAt implementation to FileStorage and MappingStorage. - adapts other tests/code correspondingly. /cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
I decoupled this patch from #322 - that Once again @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch, please take a look. Thanks beforehand, |
So that it is generally possible to zodbrestore[1,2] zodbdumped[3] database/transactions into DemoStorage. Because without this patch, when dump contains deletion records, e.g. txn 0285cbacc06d3a4c " " user "" description "" extension "" obj 0000000000000007 delete it fails like this: Traceback (most recent call last): ... File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 94, in main zodbrestore(stor, asbinstream(sys.stdin), _) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 43, in zodbrestore zodbcommit(stor, at, txn) File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 131, in zodbcommit _() File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 118, in _ stor.deleteObject(obj.oid, current_serial(obj.oid), txn) AttributeError: 'DemoStorage' object has no attribute 'deleteObject' demo_test.go:105: demo:(file:///tmp/demo470143633/δ0285cbac258bf266/base.fs)/(file:///tmp/demo470143633/δ0285cbac258bf266/δ.fs): zpyrestore: exit status 1 To be able to implement DemoStorage.deleteObject, we have to fix FileStorage.deleteObject first: currently FileStorage.deleteObject raises POSKeyError if an object is not present in the database, but for situation where - demo=base+δ, - object is present in base, and - object is not present in δ it creates a problem because there is no way to add whiteout record for the object into δ. This change should be generally good; let me quote https://lab.nexedi.com/kirr/neo/commit/543041a3 for why: ---- 8< ---- Even though currently it is not possible to create such data record via FileStorage(py).deleteObject (because it raises POSKeyError if there is no previous object revision), being able to use such data records is semantically useful in overlayed DemoStorage settings, where δ part marks an object that exists only in base with delete record whiteout. It is also generally meaningful to be able to create "delete" record even if object was not previously existing: "deleteObject" is actually similar to "store" (and so should be better named as "storeDelete"). If one wants to store deletion, there should not be a reason to reject it, because deleteObject already has seatbelt in the form of oldserial, and if the user calls deleteObject(oid, oldserial=z64), he/she is already telling that "I know that this object does not exist in this storage (oldserial=z64), but still please create a deletion record for it". Once again this is useful in overlayed DemoStorage settings described above. For the reference, such whiteout deletion records pass ZODB/scripts/fstest just fine. ---- 8< ---- DemoStorage.deleteObject implementation is straightforward, but builds on loadAt[4] as precondition. /cc @jimfulton, @jamadden, @perrinjerome [1] https://lab.nexedi.com/nexedi/zodbtools/blob/129afa67/zodbtools/zodbrestore.py [2] https://lab.nexedi.com/nexedi/zodbtools/merge_requests/19 [3] https://lab.nexedi.com/nexedi/zodbtools/blob/129afa67/zodbtools/zodbdump.py [4] zopefoundation#323
|
For the reference - see also https://lab.nexedi.com/kirr/neo/commit/4fb6bd0a?expanded=1 which proides DemoStorage-analogue for ZODB/go with Load organization similar to loadAt-based DemoStorage as implemented in this pull-request. |
/cc @d-maurer |
Anyone? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZODB logic relating to historical data (including MVCC) was largely centered around before
. You have changed this to at
- requiring wide spread modifications. I would much prefer to keep the before
centered approach and implement an optimized version of loadBefore
(for backward compatibility under a new name or with a new keyword parameter) instead of introducing a new loadAt
.
I suggest to keep loadBefore
(without deprecation), define loadBefore_opt
with streamlined return value and implement it by default in BaseStorage
via loadBefore
and change calls to loadBefore
to calls to loadBefore_opt
in the ZODB code base where the third return value of loadBefore
is not necessary.
src/ZODB/BaseStorage.py
Outdated
def loadBefore(self, oid, tid): | ||
"""Return most recent revision of oid before tid committed.""" | ||
return None | ||
# do not provide loadAt/loadBefore here in BaseStorage - if child forgets | ||
# to override it - storage will always return "no data" instead of failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer raise NotImplementedError
over not defining the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> see navytux@d968a9a8
@d-maurer suggests[1]: The ZODB logic relating to historical data (including MVCC) was largely centered around before. You have changed this to at - requiring wide spread modifications. I would much prefer to keep the before centered approach... (zopefoundation#323 (review)) So let's change "at"-based logic to "before"-based logic and rename the new method from loadAt to loadBeforeEx.
…rface not provided" Suggested by @d-maurer: zopefoundation#323 (comment)
@d-maurer suggests to keep loadBefore without deprecation (zopefoundation#323 (review)). -> Don't emit warnings about deprecating loadBefore. -> Keep the deprecation text in loadBefore interface, since loadBeforeEx should practically provide wider functionality without putting unnecessary constraint on storage implementations. In other words loadBefore deprecation is still there, but less aggressively advertised with the idea to make transition for outside-of-ZODB code to loadBeforeEx more smooth and with a bit more steps (we might want to reinstate the deprecation warnings at a later time).
Kirill Smelkov wrote at 2021-5-11 10:29 -0700:
***@***.*** commented on this pull request.
***@***.*** thanks for feedback.
> My favorite solution would be:
> - change the `loadBefore` specification regarding the handling of object deletion records.
This would make external storage implementations non-compliant but we could adapt the primary user of the method to ensure that this does not cause worse behavior than the current one.
> - support a simplified loadBefore (no third result component) variant as an optional optimization.
Would you please clarify this? I, frankly, don't understand how it will fully work. Even if we change ZODB code to expect that loadBefore can return both `None` or (`None`, `serial`), there are other storages - wrapper storages that I cited above - that call loadBefore and check for `None`. Do you suggest we change them all as well?
I had not expected many external (outside `ZODB`) users
of `loadBefore`.
I have not yet seen "deletion records" -- seems to be a newer concept,
potentially not yet widely used. Maybe, we can break the interface
with a version bump and an incompatibility note in the documentation?
...
Maybe I'm missing something, but I don't see how we could change the interface without breaking things somewhere.
All you did was initiated by a `DemoStorage` problem. You used this
to demontrate that the `loadBefore` lacks information for
"deletion record"s. I am in favor to fix this.
That's why I suggest to keep existing interface intact and, for the new behaviour, introduce new interface.
Once again, maybe I'm missing something, so would you please explain your idea in more detail?
You have understood my proposal - no need to describe it again.
It will break the interface with potential consequences for
users. Let's hear what those (or others) have to say.
...
> > In zopefoundation/ZEO#174 I argued that it is not necessary to have atomic invalidation
> > processing for the ZEO client cache because "normal" (MVCC) loads
> > load only earlier object state and will not see partially invalidated
> > cache state.
> > ...
> > Those operations may observe partial invalidation processing.
> > I do not exclude that this can have bad effects.
> While this problem was noticed via looking at the new code added by this PR,
> there are existing places in ZODB5 that use this approach:
> ...
> - `load_current` is also implemented via `loadBefore(maxtid)`:
> ...
> and `load_current` is used throughout ZODB in many places:
>
> https://github.com/search?q=load_current+repo%3Azopefoundation%2FZODB&type=code
>
> I guess the only solution to fix this is to return to process invalidations
> atomically, like I was suggesting during review of zopefoundation/ZEO#167.
Dieter, would you please comment on this?
I am again convinced that the loads mentioned in my original
comment do not require atomic invalidation processing:
they ignore the returned data and only look for the serial.
If the information comes form the cache, the result is the same
as if invalidation processing had been slightly postponed;
If a server load is necessary, it is automatically delayed
until invalidation processing is terminated (as invalidation
processing and server load are performed by the (mostly sequential)
IO thread).
I also do not think that `load_current` requires
atomic invalidation processing: `load_current` affects a single
oid; invalidation processing for other oids should not matter
as far as `load_current` is concerned.
Thus, as long as invalidation processing for a single oid is
atomic (ensured by the cache lock), `load_current` should be safe.
|
@navytux wrote:
This will not give the same cache performance as before: formerly, the cache had precise information about the currency of the object revision and for non current revisions about its validity range. Now an object revision is considered current only if it had been loaded for tid >= cache_head(at load time) (and was not invalidated since that time); however, loads for tid < cache_head can retrieve current revisions as well which is no longer recognized. For revisions loaded with tid < cache_head, the validity range is only approximated which may result in additional loads in the future. I do not know whether the performance loss is significant in practice because most loads are likely to be for tid >= cache_head. Nevertheless, I see the potential loss as an argument against the deprecation of |
@d-maurer thanks for feedback, and first of all I apologize for the delay with replying on my side. While, probably, it was my fault not to present things clearly, let me explain Imagine the situation when the database is under constant change. Transactions In order to be efficient with putting entries into the cache, the client
This organization makes sure that for "current" connections - that were not Yes, for historical connections - that are opened via For the reference: support for historic connections was added in late 2007 in 5f2b2d70a18a.
Deletion records were added in late 2008 in d2d83b88131b (see also http://web.archive.org/web/20120426053905/http://wiki.zope.org/ZODB/ExternalGC).
I really don't understand why we need to break things to fix other things, Making a major version change with compatibility breakage also does not work For the reference, where I work ZODB4 Personally I do not need, nor want, to make an experiment to see what happens if we break things. Once again I really don't understand why we need to break things to fix other things, Kirill |
For the reference: deletion records can also appear without any external garbage-collection - as a result of regular undo of object creation. |
They were known and handled at least from 2002 - see 6c7d0c429b19 and: ZODB/src/ZODB/FileStorage/fsdump.py Lines 35 to 38 in 1fb097b
|
Kirill Smelkov wrote at 2021-5-21 05:31 -0700:
... let me explain
why returning only `(data, serial)` without `next_serial` should not decrease
cache performance for current - non-historical - connections:
Imagine the situation when the database is under constant change. Transactions
are committed on server and propagate via invalidations to client.
`Client.last_tid` is potentially a bit behind on-server `last_tid`. When an
on-client thread opens a `Connection`, that connection gets assigned an `at`
that denotes the Connection's snapshot of the database state. `Connection.at`,
is potentially a bit behind `Client.last_tid`, because during the time when
that client transaction is processed, client could receive some further
invalidations from the server. At any time, there might be several "current"
on-client `Connection`s that potentially all have different `Connection.at`.
In order to be efficient with putting entries into the cache, the client
maintains `ΔTail` that keeps history of all ZODB invalidations from
`Cache.head` till the "at" of its oldest "current" `Connection`. Then, when an
object is loaded for a connection with `Connection.at` < `Cache.head`, the
client looks into `Tail` in `(Connection.at, Cache.head]`
range to find out what is the first next serial that changed the object. If
there is an entry - that entry's serial is used as `next_serial` when putting
data into the cache after the call to load. If there is no such an entry - it
means that the data, that were loaded via `at` < `Cache.head`, is still current
even at `Cache.head` state, and so the data can be put into the cache with
`next_serial=None`.
What are you describing above? How things currently work or how things
could be changed to be efficient without "end_tid" information?
I have looked at the ZEO [5] code and have not found a "tail":
the cache is updated in a completely straight forward way which
relies on `end_tid` to distinguish current from "historical" data.
What you are calling "client" above is actually a layered entity
where the cache is on a low (storage) layer and `Connection.at`
is maintained at a higher (`MVCCAdapter[Instance]`) layer.
Currently, the `MVCCAdapter` layer does not seem to have the API
to implement the scheme above. Likely, this should even be
implemented by the `MVCCAdapter` layer because all storages with
an associated cache would need it.
I believe (with you) that it is possible to get cache
management efficient even without "end_tid" information.
Unlike you, I fear that it requires significant (architectural) changes.
And until those changes have been done, `loadBefore` should not
get deprecated: `FileStorage` is still a major base storage and can
deliver `end_tid` information without any performance loss.
...
I really don't understand <ins>*why we need to break things to fix other things*</ins>,
especially when compatibility-preserving solution is possible.
The solution you currently have deprecates `loadBefore` in favor
of a method which does not provide validity range information.
As far as I see it, ZEO [5] currently relies on such information
for efficient cache management.
I see your argument that for some storages validity range
information is costly (and used only for special cases).
But this does not apply for e.g. `FileStorage`.
The use of its `loadBefore` should not be deprecated
until validity range information is no longer used for
cache management.
... in our case the benefit
of fixed DemoStorage wrt deletion records is imho low for an average ZODB user,
Indeed, this may contribute to my reluctance to approve your PR:
it is quite large and proposes a major storage API change including the
deprecation of one of its central methods -- mainly to fix
the marginal problem above.
Of course, I, too, have proposed an (even more drastic) API change
involving potential breakage: you have found an API weakness (there
are cases when it is important to distinguish between an unknown object
and one explicitly deleted and this is not API supported) and
I thought is a good idea to fix this weakness.
When I made the proposal, I was unaware of the API callers outside the ZODB.
It would be trivial for the authors of the respective code to
adapt to the change (handle `data is None` in a straight forward way);
thus, I still feel it is a good idea - even though it is backward
incompatible.
|
Dieter Maurer notes that loadBefore cannot be deprecated yet because ZEO essentially depends on the `end_tid` information returned by loadBefore to update its cache: zopefoundation#323 (comment) And to remove this dependency it would require to rework ZODB caching layer: zopefoundation#323 (comment) So we cannot deprecate loadBefore until this rework is implemented first. -> Remove general loadBefore deprecation, and emit loadBefore vs loadBeforeEx warning only when actually hitting a "deletion" record, because only that case is known to lead to data corruption.
@d-maurer, thanks for feedback. I indeed explained how things could be changed to work without You are, again, right, that the caching layer, that I explained, is indeed
Thanks.
You are probably right here. Until we don't have that efficient caching scheme I cannot allow myself to break things, but I have amended my change to Hopefully this could help us to find some common ground. Kirill P.S. And |
This PR is already approved, is there still something to do before merging it? |
@icemac, thanks for asking. @jamadden, 6 months ago, at #318 (comment), you said you "hope to find time to examine these PRs in depth "soon." 5 months ago, at #323 (comment) after @d-maurer's approval, I've asked whether you want to provide any input on this patch. Are you still interested in reviewing it? We can wait if you still plan to have a look. So could you please explicitly state what's in your mind regarding this pull request. Thanks beforehand for feedback, |
to resolve trivial conflict on CHANGES.rst * origin/master: (22 commits) Fix TypeError for fsoids (zopefoundation#351) Fix deprecation warnings occurring on Python 3.10. fix more PY3 incompatibilities in `fsstats` fix Python 3 incompatibility for `fsstats` add `fsdump/fsstats` test fsdump/fsstats improvements - add coverage combine step - first cut moving tests from Travis CI to GitHub Actions - ignore virtualenv artifacts [ci skip] tests: Run race-related tests with high frequency of switches between threads tests: Add test for load vs external invalidation race tests: Add test for open vs invalidation race fixup! doc/requirements: Require pygments < 2.6 on py2 doc/requirements: Require pygments < 2.6 on py2 fixup! buildout: Fix Sphinx install on Python2 buildout: Fix Sphinx install on Python2 Update README.rst Security fix documentation dependencies (zopefoundation#342) changes: Correct link to UnboundLocalError fsoids.py fix fsrefs: Optimize IO (take 2) (zopefoundation#340) ...
( PR refreshed by syncing to master to resolve trivial conflict on CHANGES.rst ) |
I appreciate the reminder, and I apologize for the no-show; there are various personal and professional reasons I haven't found the time to review this PR in depth. I will do my best to look at it within the next week; feel free to move ahead without me if that doesn't happen. I have some vague memories about some concerns I might have had:
|
I created some merge conflicts by merging #357, sorry. |
OK, I think I'm allowed to relay this now: The company that I co-founded and which sponsored all my open source work for the last decade or so…will, umm...not be doing that…or any development work. It's been all-hands-on-deck to focus on things that could prevent that, but at this point it seems unlikely. I still care, deeply, about these projects. They're important to me on a personal level, and I think they are important to the wider community. But my time, at this point, is no longer going to be subsidized to work on them. I sincerely hope that changes! But in the meantime, I will have to focus on finishing the most important work I had outstanding to the community (currently I think that means gevent and greenlet), and then I have to find some way to pay the bills. I am deeply sorry for any turmoil this has caused. I didn't want that. |
On Wed, Nov 10, 2021 at 3:53 PM Jason Madden ***@***.***> wrote:
OK, I think I'm allowed to relay this now: The company that I co-founded
and which sponsored all my open source work for the last decade or so…will,
umm...not be doing that…or any development work. It's been
all-hands-on-deck to focus on things that could prevent that, but at this
point it seems unlikely.
I'm very sorry to hear that. You've done a ton of great work! Thank you!!!
Jim
…--
Jim Fulton
http://jimfulton.info
|
zc.zlibstorage wraps base storage and includes generic __getattr__ to forward all unknown attribute access to base: https://github.com/zopefoundation/zc.zlibstorage/blob/6d5a3c75/src/zc/zlibstorage/__init__.py#L52-L53 But if base implements loadBeforeEx (zopefoundation/ZODB#323) the following scenario is then possible: ZODB sees that zlibstorage provides loadBeforeEx and loads data via loadBeforeEx instead of loadBefore, but the data are loaded not decompressed, and ZODB complains about "broken pickle". -> let's fix this by explicitly wrapping loadBeforeEx if base storage provides it.
Resolve many conflicts after zopefoundation#357. * master: Let the year float. Configuring for pure-python Specify a PyPy2 version. Lint the code. Configuring for pure-python
Sorry, but some of flake8 complaints are really stupid...
loadBeforeEx is like loadBefore, but simpler, provides better information for object delete records and can be more efficiently implemented by many storages: zopefoundation/ZODB#323 On RelStorage loadBefore is currently implemented via 3 SQL queries: 1) check whether object record exists at all 2) retrieve object state 3) retrieve serial of next object revision Compared to that loadBeforeEx is implemented via only one SQL query "2" from the above - "retrieve object state". It is exactly the same query that loadBefore uses and after the patch loadBefore actually invokes loadBeforeEx for step 2. This change was outlined in zopefoundation/ZODB#318 (comment) and zopefoundation/ZODB#318 (comment) and as explained in the first link this patch is also semantically coupled with zodb#484 This patch passes tests with both ZODB5 and with ZODB5+zopefoundation/ZODB#323: - when ran with ZODB5 it verifies that loadBefore implementation does not become broken. - when ran with ZODB5+zopefoundation/ZODB#323 it verifies that loadBeforeEx implementation is correct. For tests to pass with ZODB5+zopefoundation/ZODB#323 we also need zopefoundation/zc.zlibstorage#11 because without that fix zc.zlibstorage does not decompress data on loadBeforeEx.
@jamadden thanks for feedback. And I'm very sorry to hear that things are unfortunate on your side. In the meantime please feel to skip what I will post below for the record.
@icemac, no problem. I resolved those conflicts. Kirill |
zc.zlibstorage wraps base storage and includes generic __getattr__ to forward all unknown attribute access to base: https://github.com/zopefoundation/zc.zlibstorage/blob/6d5a3c75/src/zc/zlibstorage/__init__.py#L52-L53 But if base implements loadBeforeEx (zopefoundation/ZODB#323) the following scenario is then possible: ZODB sees that zlibstorage provides loadBeforeEx and loads data via loadBeforeEx instead of loadBefore, but the data are loaded not decompressed, and ZODB complains about "broken pickle". -> let's fix this by explicitly wrapping loadBeforeEx if base storage provides it.
A bit of argument to maybe reconsider |
* master: - vb [ci skip] - force recognizing ellipsis - try to fix changed traceback representation - select more specific pypy versions - prepare release 5.7.0 - revert 5b8e2dc
( synced to master to resolve conflict on CHANGES.rst ) |
loadAt is new optional storage interface that is intended to replace loadBefore
with more clean and uniform semantic. Compared to loadBefore, loadAt:
returns data=None and serial of the removal, when loaded object was found to
be deleted. loadBefore is returning only data=None in such case. This loadAt
property allows to fix DemoStorage data corruption when whiteouts in overlay
part were not previously correctly taken into account.
DemoStorage does not take whiteouts into account -> leading to data corruption #318
for regular data records, does not require storages to return next_serial,
in addition to (data, serial). loadBefore requirement to return both
serial and next_serial is constraining storages unnecessarily, and,
while for FileStorage it is free to implement, for other storages it is
not - for example for NEO and RelStorage, finding out next_serial, after
looking up oid@at data record, costs one more SQL query:
https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508
https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482
https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264
https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199
next_serial is not only about execution overhead - it is semantically
redundant to be there and can be removed from load return. The reason
I say that next_serial can be removed is that in ZODB/py the only place,
that I could find, where next_serial is used on client side is in client
cache (e.g. in NEO client cache), and that cache can be remade to
work without using that next_serial at all. In simple words whenever
after
query, the cache can remember data for oid in [serial, at] range.
Next, when invalidation message from server is received, cache entries,
that had at == client_head, are extended (at -> new_head) for oids that
are not present in invalidation message, while for oids that are present
in invalidation message no such extension is done. This allows to
maintain cache in correct state, invalidate it when there is a need to
invalidate, and not to throw away cache entries that should remain live.
This of course requires ZODB server to include both modified and
just-created objects into invalidation messages
( Include both modified and just created objects into invalidations ZEO#160 ,
interface: Require invalidations to be called with full set of objects and not to skip transactions #319 ).
Switching to loadAt should thus allow storages like NEO and, maybe,
RelStorage, to do 2x less SQL queries on every object access.
DemoStorage does not take whiteouts into account -> leading to data corruption #318 (comment)
In other words loadAt unifies return signature to always be
instead of
used by loadBefore.
This patch:
or, if the storage does not implement loadAt interface, tries to mimic
loadAt semantic via storage.loadBefore to possible extent + emits
corresponding warning.
data corruption issue; adds corresponding test; converts
DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt.
/cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch