datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

Long make calls lock table metadata

Open ethho opened this issue 1 year ago • 4 comments

Bug Report

Description

A client locks table metadata for the entire duration of a make function call. When other clients attempt to drop or declare child tables, the call is blocked until the first client finishes make. This approach scales poorly with number of clients and number of child tables.

Reproducibility

Include:

  • OS: Any
  • Python Version: Any
  • MySQL Version: Any
  • MySQL Deployment Strategy: Any
  • DataJoint Version: 0.14.1
  • Minimum steps required to reproduce:
    • See the test case presented https://github.com/LorenFrankLab/spyglass/issues/1030

Proposed Solution

As an alternative to writing a Computed.make function, allow user to write three functions:

  1. make_fetch for reading inputs
  2. make_compute, which is not run in a transaction, and is passed the return value of make_fetch
  3. make_insert, which inserts computed values using the same transaction semantics as make.

In pseudocode, these three functions will be used in the following routine as such:

if hasattr(table, "make"):
    return make()
else:
    assert hasattr(table, "make_fetch")
    assert hasattr(table, "make_compute")
    assert hasattr(table, "make_insert")
    input = make_fetch()
    conn.disconnect() # I assume this disconnect step is to ensure that make_compute cannot insert?
    result = make_compute(input)
    tx = conn.start_transaction()
    input2 = make_fetch()
    if hash(serialize(input2)) == hash(serialize(input)):
        result = make_insert(result)
        tx.commit()
        return result
    else:
        print("ERROR: inputs have changed")
        tx.abort()
        return None

Additional Research and Context

Related Issues

  • https://github.com/LorenFrankLab/spyglass/issues/1030
  • https://github.com/LorenFrankLab/spyglass/pull/1067

cc: @dimitri-yatsenko @ttngu207 @CBroz1 @samuelbray32 @peabody124

ethho avatar Aug 21 '24 19:08 ethho

This will be inside populate and will follow all the conventions of populate.

Yes, it looks correct. If we want to be fancy, we can prohibit insert calls in make_fetch, insert and fetch calls from make_compute, and fetch operators from make_insert.

dimitri-yatsenko avatar Aug 21 '24 20:08 dimitri-yatsenko

@ethho, our blob serialization serializes most types of data into binary strings. You can use a hash on the serialized data for comparing input to input2

dimitri-yatsenko avatar Aug 21 '24 20:08 dimitri-yatsenko

I am following this. I see the #1171. Can this issue here be updated regularly when this is implemented / in a testable state? Thanks for taking care of this!

horsto avatar Aug 30 '24 21:08 horsto

This is a high priority for multiple labs.

dimitri-yatsenko avatar Sep 07 '24 00:09 dimitri-yatsenko

This has not been merged / solved yet, right?

horsto avatar Jan 16 '25 23:01 horsto

This has not been merged / solved yet, right?

Looks like the last commit was 5m ago

We gave users a 'check threads' tool to check for hold-ups and see whose process might be slowing things down

CBroz1 avatar Jan 17 '25 14:01 CBroz1