datachain icon indicating copy to clipboard operation
datachain copied to clipboard

New persist() method

Open dreadatour opened this issue 1 year ago • 2 comments

Follow-up for the https://github.com/iterative/datachain/issues/327

Sometimes it is useful to save intermediate chain state, because operations are lazy, chains are not executed immediately and intermediate results are not stored.

For example, if we want to create dc_filtered_1 and dc_embeddings from dc, without saving intermediate dc chain will be executed twice, for each children.

It is possible to do it with save() method without name param, also we have exec() method, but it looks like persist() is better and more verbose name for this method.

After persist() method will be implemented, we may want to make name param in save() method mandatory.

dreadatour avatar Aug 27 '24 07:08 dreadatour

How about materialise instead of persist? Just a suggestion.

mattseddon avatar Aug 27 '24 07:08 mattseddon

.persist() is the name of the method in the dataframe API standard. I think that's what we should use - assuming it works exactly as described in the standard.

rlamy avatar Aug 27 '24 12:08 rlamy

Was added in https://github.com/iterative/datachain/pull/1029

dreadatour avatar May 28 '25 09:05 dreadatour