http-kernel icon indicating copy to clipboard operation
http-kernel copied to clipboard

Disable cache

Open mmoreram opened this issue 5 years ago • 26 comments

Cache is not useful anymore. In fact, it produces a lot of invalid and non expected behaviors (like having to work with dev or cleaning cache by hand each time we reload the server.

This task should be

  • Delete cache folder recreation, or, if that step is too complicated (in terms of code duplication), find a place where deleting the generated cache when it's completely filled.
  • Keep var/ folder for log files.

mmoreram avatar Sep 04 '20 09:09 mmoreram

can you please be more specific when saying "cache is not useful anymore" ?

  1. Which caching parts are not useful?
  2. What are the side effects of having the current cache logic?
  3. What is the expected behavior from the async kernel when booted?

I'm asking it because some of the cache logic in the kernel is to "warmup" (prepare) some files that the environment will be able to deal with later on. (as explained here). For example: Symfony\Bundle\FrameworkBundle\CacheWarmer\RouterCacheWarmer

nivpenso avatar Feb 02 '21 21:02 nivpenso

@nivpenso sure!

  1. In fact, all parts are now absolutely useless. Cache in Symfony is built to share pre-processed logic across requests. This is because memory is not shared among them all, so all this data that should be prebuilt like annotations, twig templates, doctrine metadata, container services architecture... is built exactly once, and written in disk in a way that you only need to require a file or call a method, and everything is done. In fact, all this pre-built data is precisely done for one and only one purpose: know how to build services with their in-memory data.
  2. What happens in Drift is that this cache is built once and again. As expected in Symfony. And each time something changes in terms of service definition, or a route... well, then Drift needs to rely on this cache logic on top of environments and console commands.
  3. As Drift shares memory across Requests, this pre-processing actions is only done once, just before the server starts listening the HTTP socket. Because it doesn't matter if we talk about 100ms or 1 second building this cache, we can delete this cache each time before starting the server (to avoid cache zombie files), and because we can delete the cache each time we want to start the server, then there's no need to have this cache. We don't even need to build it. At all.

So, what if we want to preload everything before we start listening to HTTP socket? Like something loaded from our database, or a file content, or a configuration file to avoid file reading each time we handle a request... then we can rely on this new kernel event named preload. This is a blocking event (can be blocking because the event loop has not even started), so you can do whatever you need, even using blocking actions.

You can see a bit more about this kernel.preload event in the documentation

So yes. Cache is not useful, and only adds some troubles when working on development (if you don't remember to delete the cache after some configuration changes).

The main goal of this PR is to find a way of, literally, do not generate this cache.

mmoreram avatar Feb 02 '21 21:02 mmoreram

Here is where I got so far in my investigation after digging into symfony/http-kernel

Kernel::initializeContainer() is the function that initiates the cache for the container. it first checks whether the cache dir exists or not - if not:

  1. it creates the cache directory
  2. create a lock file in the directory
  3. create annotation file (in here Kernel::buildContainer())
  4. compiling the container - this part doesn't include writing the files into the file system, and it doesn't set the kernel's container yet. It just preparing the files.
  5. dump the compiled container to the file system (cache folder)
  6. removing the lock file
  7. loading the compiled container from the file system

It is a little bit tricky to remove the cache from the file system because it used by the kernel to load the compiled container (from the file system).

I'm now investigating 2 approaches:

  1. writing the compiled container to PHP streams instead of file system
  2. immediately load all the compiled container to the kernel using eval() function

A different approach can be: just leave the cache directory as it is now, and make sure to delete it (if it exists) on kernel preload. @mmoreram WDYT?

nivpenso avatar Feb 09 '21 00:02 nivpenso

@nivpenso I like the container compilation into PHP streams. Basically because as soon as the kernel shuts down, this data is lost (expected behavior).

About the second option, I don't understand this approach. Can you explain a bit more? And about the last one, delete the whole cache on preload, can be a good option for now, yes. I would go to avoid generating it, basically because is an amount of time we can win here (not much, but this is something).

Finally, take in account that this cache is not only used by the container, but as well by Twig, Doctrine... so the last option would solve them all.

mmoreram avatar Feb 09 '21 12:02 mmoreram

I agree with you that working with streams sounds like a good solution but unfortunately, the kernel code for building the container and especially writing it into the filesystem is scattered across 2-4 different places and requires many code changes.

As of the 2nd approach, the compiled container is basically a group of PHP files that include things like configuration and other optimizations embedded within these files. So instead of dumping these files into the filesystem and load the files using require, we can use the eval() function on each one of the compiled container classes to load it into the memory. It is not that easy task of course, since the loading order is critical.

As more I think about it the more I agree with you. The 3rd approach of removing the cache directory on the server's preload state is the easiest idea to implement and will require the least changes and code duplication.

nivpenso avatar Feb 09 '21 16:02 nivpenso

Just to update it seems that the preload kernel event can't help us here because the event is being fired after the container has created.

nivpenso avatar Feb 10 '21 13:02 nivpenso

Merged! Well done!

mmoreram avatar Feb 15 '21 10:02 mmoreram

Disabling the cache in general has a big impact on tests based on KernelTestCase. The kernel is rebooted for each individual test and the container is now rebuilt each time, which takes a long time in total. In my case, 198 kernel tests ran in 10 seconds before the change, now they take over 30 minutes.

As we can see, the cache is not useless and should not be disabled, or disabling it should be optional.

raebbar avatar Feb 22 '21 09:02 raebbar

@raebbar you're. totally. right.

My appologies for that. I was completely focused on production, but in testing, working with the kernel cache, is something very useful.

We have 2 options here

  • Disabling the cache only on prod (which makes sense)
  • Adding a --no-cache flag in the server, and use it in the kernel. This change would explicitly modify the kernel adapter interface in the server package.

WDYT?

mmoreram avatar Feb 22 '21 09:02 mmoreram

@raebbar good catch!

@raebbar, Out of curiosity what is the reason to reboot the kernel for every test?

nivpenso avatar Feb 22 '21 09:02 nivpenso

@nivpenso each test should be isolated, so working with different kernels would have sense. The same happens when you have functional tests with a running Drift server, Several isolated servers would work with the same kernel definition, and rebuilding the same kernel once and again would result a dramatic decrease of performance.

mmoreram avatar Feb 22 '21 09:02 mmoreram

@nivpenso Just as mmoreram has already described, tests should be isolated in order to have a clear state at the beginning and not influence each other.

The question I am asking is whether we even need this logic for deleting the cache in the kernel. php bin/console cache:clear --env=prod --no-debug clears the cache already. If you use containers, you simply build this into the Docker entrypoint or you can always execute it directly before the php vendor/bin/server call. What do you think?

raebbar avatar Feb 22 '21 09:02 raebbar

@raebbar yes, you're right. Cache is something that is useful. But is only useful when you need velocity in terms of building and shutting up kernel instances. And this only happens in tests.

So, the question is. Why don't we just keep the kernel and that's it? Why don't we just use rm -Rf var/cache/*? Well, you're right. We could do that. But I can tell you that I've seen people an entire hour working against time, looking for the reason why a change is not seen in the server. And that was because the cache. And I've seen this scenarios dozens and dozens of times, both in Symfony and Drift.

In Symfony is something that you cannot bypass, basically because ALL requests re-build the same Kernel, and need this cache to do it efficiently. But in Drift, the cache is never reused across requests.

So. My thoughts are that, in the main Drift experience, the cache should not be present at all. That will improve the developer experience so much. In tests, cache makes sense, and I think that we should find a way (as invisible as possible) to keep using the cache only in tests.

mmoreram avatar Feb 22 '21 10:02 mmoreram

@mmoreram I understand your point ... ok let's make it convenient :-)

"Disabling the cache only on prod (which makes sense)" - doesn't solve the problem completely because I don't want to have a cache when developing locally (for example)

"Adding a --no-cache flag in the server" - With this option, I first have to look at the server package myself to decide whether it is the right way or not. You know that better.

raebbar avatar Feb 22 '21 10:02 raebbar

@raebbar what about working with cache ONLY in test environment?

mmoreram avatar Feb 22 '21 10:02 mmoreram

I think we can start with that.

Another possibility would be an environment variable to keep it flexible for all cases. By default, we would deactivate the cache unless the DRIFT_CACHE_ENABLED variable is not empty. DRIFT_CACHE_ENABLED is then set in the .env.test.

raebbar avatar Feb 22 '21 11:02 raebbar

I like this idea as well. That makes it completely configurable, no matter the environment. Said that, in that case, that option should be properly documented.

mmoreram avatar Feb 22 '21 11:02 mmoreram

Just wanted to go deeper into that isolation thing - I totally understand and agree that each test should not be affected by another. With that said, it doesn't mean that they can not share resources. An isolated environment for testing can also be done without restarting the server or rebooting the kernel each time.

Before trying to provide an in-framework solution. I think we need to understand better the problem we are facing here. The KernelTestCase class is a "Symfony" class, maybe it doesn't fit the Drift type of testing?

It is still not very clear to me why it is so important to reboot the kernel and load it again from cache instead of just working with the existing one? can you give an example of two tests that use the same kernel which one affects the other?

nivpenso avatar Feb 22 '21 12:02 nivpenso

Functional tests check the integration of the different layers of an application (from the routing to the response). Most of the application's services are used as they are. Only a few special ones may be mocked. Any service with a state can change when a test is run. If tests are added or existing ones are changed, the initial state changes for subsequent tests. You would have to reset the status manually for each test run. This is time-consuming, error-prone, sometimes impossible and requires constant maintenance when services change and creates many dependencies. It is better to start a new "clean" system for each test.

raebbar avatar Feb 22 '21 13:02 raebbar

Just a thing here.

Drift can ONLY work with stateless services. The thing to work with several kernels would be for parallelism features, but that's it.

mmoreram avatar Feb 22 '21 15:02 mmoreram

@mmoreram Just for clarification. I meant services in the sense of Symfony Container. And of course they can have a state.

raebbar avatar Feb 23 '21 06:02 raebbar

@raebbar nopes. In Drift (neither in any ReactPHP application) you can not use stateful services. Basically because you cannot guess the order of requests accesses.

The only state that a service can have in an emulation of a stack, like a list of logs, a repository or something similar.

mmoreram avatar Feb 23 '21 08:02 mmoreram

You have just given simple examples yourself. Would you say the Redis server implementation in ReactPHP by clue (https://github.com/clue/php-redis-server) has a state? Write your Varnish clone in reactphp, imaginable - state. Configs that are loaded at startup and can be changed at runtime - state. Statistics, metrics - state. DNS-Resolver mit Cache - state. Connections to external resources - state. And so on ...

I guess I know what your point was now, but saying "Drift can ONLY work with stateless services" is wrong in my opinion.

But our discussion is now a bit offtopic for this issue :-)

raebbar avatar Feb 23 '21 13:02 raebbar

@raebbar whan I talk about services, I mean this, state per request. So yes, can have state, but this state change will be shared per all request at any order.

Said that, I would go for an environment variable for this cache usage. As soon as I have a PR, I will send it here :)

mmoreram avatar Feb 28 '21 11:02 mmoreram

@raebbar @nivpenso

https://github.com/driftphp/http-kernel/pull/45

WDYT? It's a simple workaround that can work properly in dev environments.

mmoreram avatar Mar 01 '21 22:03 mmoreram

Looks good. Thank you for the implementation.

raebbar avatar Mar 02 '21 06:03 raebbar