sqlmesh icon indicating copy to clipboard operation
sqlmesh copied to clipboard

loader.reload_needed() doesn't take into account new files

Open crericha opened this issue 2 years ago • 2 comments

crericha avatar Feb 06 '23 21:02 crericha

I can take a stab at this soon. I guess we just need to scan the directories of interest (i.e. models, macros, hooks) and do a reload if we find any path that doesn't appear in path_mtimes.

georgesittas avatar Feb 11 '23 02:02 georgesittas

I'm actually a bit skeptical about this... My idea was to do something like the following:

    def reload_needed(self) -> bool:
        """
        Checks for any modifications to the files the macros and models depend on
        since the last load.

        Returns:
            True if a modification is found; False otherwise

        Note:
            File additions and removals are considered as modifications. For example, if a new
            file is added to one of the directories of interest, then a reload needs to happen.
        """
        if any(
            path.stat().st_mtime > initial_mtime
            for path, initial_mtime in self._path_mtimes.items()
        ):
            return True

        # Scan hook, model, macro, audit directories with glob and check if all files appear in `_path_mtimes`.
        # I assumed that we care about file removals too, so that'd mean we need to do the inverse check as well.

However, this will run on every refresh command, which will cause a slowdown in the notebooks because plan, evaluate, render and dag all call refresh. This delay may be significant for big projects where we have lots of models etc.

Another approach could be to monitor the directories on (a) separate thread(s) and have a flag that indicates when we should update (i.e. a change has been made to a directory of interest).

  • https://thepythoncorner.com/posts/2019-01-13-how-to-create-a-watchdog-in-python-to-look-for-filesystem-changes/

georgesittas avatar Feb 17 '23 16:02 georgesittas