Long make calls lock table metadata
Bug Report
Description
A client locks table metadata for the entire duration of a make function call. When other clients attempt to drop or declare child tables, the call is blocked until the first client finishes make. This approach scales poorly with number of clients and number of child tables.
Reproducibility
Include:
- OS: Any
- Python Version: Any
- MySQL Version: Any
- MySQL Deployment Strategy: Any
- DataJoint Version: 0.14.1
- Minimum steps required to reproduce:
- See the test case presented https://github.com/LorenFrankLab/spyglass/issues/1030
Proposed Solution
As an alternative to writing a Computed.make function, allow user to write three functions:
-
make_fetchfor reading inputs -
make_compute, which is not run in a transaction, and is passed the return value ofmake_fetch -
make_insert, which inserts computed values using the same transaction semantics asmake.
In pseudocode, these three functions will be used in the following routine as such:
if hasattr(table, "make"):
return make()
else:
assert hasattr(table, "make_fetch")
assert hasattr(table, "make_compute")
assert hasattr(table, "make_insert")
input = make_fetch()
conn.disconnect() # I assume this disconnect step is to ensure that make_compute cannot insert?
result = make_compute(input)
tx = conn.start_transaction()
input2 = make_fetch()
if hash(serialize(input2)) == hash(serialize(input)):
result = make_insert(result)
tx.commit()
return result
else:
print("ERROR: inputs have changed")
tx.abort()
return None
Additional Research and Context
Related Issues
- https://github.com/LorenFrankLab/spyglass/issues/1030
- https://github.com/LorenFrankLab/spyglass/pull/1067
cc: @dimitri-yatsenko @ttngu207 @CBroz1 @samuelbray32 @peabody124
This will be inside populate and will follow all the conventions of populate.
Yes, it looks correct. If we want to be fancy, we can prohibit insert calls in make_fetch, insert and fetch calls from make_compute, and fetch operators from make_insert.
@ethho, our blob serialization serializes most types of data into binary strings. You can use a hash on the serialized data for comparing input to input2
I am following this. I see the #1171. Can this issue here be updated regularly when this is implemented / in a testable state? Thanks for taking care of this!
This is a high priority for multiple labs.
This has not been merged / solved yet, right?
This has not been merged / solved yet, right?
Looks like the last commit was 5m ago
We gave users a 'check threads' tool to check for hold-ups and see whose process might be slowing things down