SUBSCRIBE VIA RSS


Subscribe to our feed

Symfony Experts

Symfony Experts
If you have an urgent question for a symfony-related issue, this is the place to ask.

Topics

Stack Overflow


The old fashioned way

RECENT TUNES

April 10, 2009 – 12:56pm symfony cache system: cache growing too large

Symfony has a really powerful cache mechanism, but if you turn it on and don’t configure it, it will cache one file for *every possible url*, and in the case of a dynamic site with thousands of pages, this grows to many gigabytes very fast. This is especially a problem if you have a dynamic search feature, and you have search-friendly URLS where you convert query parameters to /a/friendly/path/like/this. The number of unique paths that generate content on your site is literally infinite, since the URL can include anything the user types in the search box.

Changing the cache duration (or “expiration date”) doesn’t affect this issue, since there is no cron job that travels through the files and deletes old ones—rather, when symfony encounters a file that is older than the expiration date, it replaces it with a new one. But, you can (and will) end up with thousands of files in your cache, some of which may have been accessed only once and will sit there taking up precious space.

Instead, you have to be more specific about what you want symfony to cache and what you want it to ignore. I generally *turn off* the general cache (in frontend/config/cache.yml) and then go through and specify what partials I want to cache. This is useful because if you set “contextual: false” in your partial cache configuration, symfony will use the same partial across every page (say, a header or footer or a “what’s new” block) and will only store that file *once* regardless of the URL used to get to that page. Or, more specifically, it will cache that file once for every combination of parameters you pass into the partial, but we’ll get to the details in a minute.

As an example, in our main layout.php for our site, we use a partial for our navigation menu — we have a list of cities that is populated in the database, and there is no reason to fetch this every request. Let’s say this partial is in our app’s layout folder (ie not within a module context, but rather the “global” context) and is called _headerNav.php.

    <?php include_partial('global/headerNav') ?>

We can tell symfony to use/cache this same partial regardless of the context (or “url” used to access this page). Since this is a global partial, the settings go inside app/frontend/config/cache.yml:

_headerNav:
  enabled:     on
  contextual:  false

So now symfony will store this headerNav partial inside:

cache/[env]/template/[hostname]/all/sf_cache_partial/global/_headerNav.cache

rather than

cache/[env]/template/[hostname]/all/path/to/your/unique/url/that/is/different/every/time/_headerNav.cache

Next up, say that rather than having the same exact partial for *every page*, you have a few different contexts… a few, but not one for every URL that may access that partial. A great example is a blog. Say you want to cache the partial that shows you a blog post summary. You use this partial on your home page, on the blog category page, and the search results page. Once you generate the partial for blog ID #2, you can use that partial on all of those pages. But, you don’t want to use the same exact cached partial for different blog posts, because that obviously wouldn’t work. So, you set the cache key:

<?php include_partial('blog/post_short', array('post' => $post, 'sf_cache_key'=>$post->getId())) ?>

This tells symfony to use the same cached partial everytime that sf_cache_key is the same. So, the cache directory…

cache/[app]/[env]/template/[hostname]/all/sf_cache_partial/blog/_post_short/

ends up containing only 1 file per blog post:

  103.cache  63.cache  68.cache  74.cache  80.cache  85.cache  90.cache  95.cache
  104.cache  64.cache  69.cache  76.cache  81.cache  86.cache  91.cache  96.cache

…. where the cache file name is the sf_cache_key you set when you included the partial. Another nice thing about setting the sf_cache_key is that you can easily idenfity what cache file refers to what object. Otherwise, symfony generates a unique hash key for each cache file… and this can be a problem if you want to manually clear out specific cache files (you’ll see this at the end).

If you do this enough then your massive cache which is storing cached files for every single URL that is used to access your site ends up being consolidated into the shared sf_cache_partial directory, which won’t get nearly as large.

The trick is that you must set “contextual: false” for your partials… otherwise the partials end up being distributed across a directory structure that resembles the URL used to access the content.

The end goal is to keep you cache directory from being a million files large. This will *only* happen if you have sections of your pages that are **contextual** meaning that symfony will store them inside a directory structure that mirrors the URL that generated the cache in the first place. If you find that you still have lots of cache/…/path/to/your/unique/url/and/it/keeps/going then you should look inside those cache files to find out what is being saved there and either stop caching it, or try to figure out how you can move that into a shared cache file somewhere by setting a shared sf_cache_key.

A major problem with all this caching is that it gets really tough to keep track of what actions on the backend require you to clear out what cache on the frontend. In the case of the blog post, I want to clear out any cache for that particular post. This is what my Post::save() method looks like:

public function save($con = null) {
  $return = parent::save($con);
  $this->clearCache();
 
  return $return;
}
 
protected function clearCache()
{
  if (sfConfig::get('sf_cache'))
  {
    /* does not work cross app
    $cacheManager = sfContext::getInstance()->getViewCacheManager();
    $cacheManager->remove('@sf_cache_partial?module=blog&action=_recentPosts&sf_cache_key=1');
    $cacheManager->remove('@sf_cache_partial?module= blog&action=_tagList&sf_cache_key=1');
    $cacheManager->remove('@sf_cache_partial?module= blog&action=_post_short&sf_cache_key='.$this->getId());
    $cacheManager->remove('@sf_cache_partial?module=threeOneThird&action=_headerUpdates&sf_cache_key=1');
    $cacheManager->remove('blog/index');
    */
    $sf_root_cache_dir = sfConfig::get('sf_root_cache_dir');
    $cache_dir = $sf_root_cache_dir.'/frontend/*/template/*/all';
 
    sfToolkit::clearGlob($cache_dir.'/sf_cache_partial/blog/_post_short/'.$this->getId().'.cache');
    sfToolkit::clearGlob($cache_dir.'/sf_cache_partial/blog/_tag_list/*');
    sfToolkit::clearGlob($cache_dir.'/sf_cache_partial/blog/_recentPosts/*');
    sfToolkit::clearGlob($cache_dir.'/sf_cache_partial/threeOneThird/_headerUpdates/*');
    sfToolkit::clearGlob($cache_dir.'/third-word/index*');
 
    if ($this->getStrippedTitle()) { 
      //for some reason this doesn't work, perhaps due to cross app?
      //$cacheManager->remove('@blog_show?stripped_title='.$this->getStrippedTitle());
      // note this path is incorrect if no_script_name is off
      sfToolkit::clearGlob($cache_dir.'/third-word/read/'.$this->getStrippedTitle().'.cache');
    }
  }
}

In theory you are supposed to be able to rely on the cacheManager remove cache based on a pretty path like:

    $cacheManager->remove('[module]/[action]');

or

    $cacheManager->remove('@sf_cache_partial?module=[myModule]&action=_[partialName]&sf_cache_key=1');

But, if you are clearing your cache across apps (backend actions are trying to clear frontend partials) this totally breaks down because the cacheManager does not understand the routing rules for the other application. So, you have to specify the files you want delete manually. This isn’t so bad if you use sfToolkit to remove files using a file pattern.

This post has grown far too long! But hopefully it gives you some insight how you can best utilize the caching system to speed up your site without ending up with gigabytes of useless cache files.

Posted by in  Web Development   |  

16 Responses to symfony cache system: cache growing too large

  1. nicolas says:

    Nice article about caching.
    Concerning cache deletion between applications, the doc gives us a solution for Symfony 1.1 and 1.2 :
    http://www.symfony-project.org/book/1_2/12-Caching#chapter_12_sub_clearing-cache-across-applications-new-in-symfony-1-1

    Hope this will help.

  2. David says:

    Very interesting concept… I think though that your data model in now way should have to know what partials it is used in. From an OO-point-of-view this is so far from nice that someone might consider it ugly design.

    Just to make sure you understand why this is bad: consider an application with 100+ model classes, each using something like your clearCache method. What happens if only 1 single part of the symfony cache system changes? Yes, you have to rewrite the clearCache method of 100+ classes. Good luck!

    I understand your explicit problem but clearing the cache inside the data model and having to know the names of the cache files MUST NOT be the solution. Unfortunately I haven’t played with the cache system myself and still didn’t have to use this yet. I hope I will find another approach (and if I do so, I’ll definitely publish it too).

  3. Pingback: Pages tagged "image"

  4. Yuriy Voziy says:

    Clearing cross-app cache with cache manager was impossible in Symofny1.0. But it is possible now http://www.symfony-project.org/book/1_2/12-Caching#chapter_12_sub_clearing-cache-across-applications-new-in-symfony-1-1

  5. Scott Meves says:

    Another useful link for people landing on this post: If your routing cache is getting too large in symfony 1.2, read this: http://www.symfony-project.org/blog/2009/04/03/lazy-routing-deserialization

  6. Scott Meves says:

    @David, I agree. In theory this type of cache clearing should occur in the controller, but the problem with that is there are potentially many different actions that affect the same objects in your model. The logical place to put an event trigger anytime an object is saved is the $myModelObj->save() method, don’t you think? Maybe the solution is to abstract this clearCache method found in each model class and create an event listener that applies to all save() methods. Then, to avoid putting any mention of templates in the model, you could create a config file that contains a list of templates that are related to each data type. An interesting problem I ran into with this is that you don’t have the option to include any custom logic for the cache-clearing for a particular model. For example, if you have a “photo” object and it may or may not be related to a “user”, if a photo was saved and it was related to a user, you may have to clear out the user profile cache too, but if it’s related to a “business”, you would want to clear out a different partial from the cache. With that said, the cost of just blindly removing all the potentially cache files whenever an object is saved is probably not worth worrying about.

  7. David says:

    @Scott: using the symfony event notification system for this is exactly what I’d also propose. Yet another configuration file for the caches is probably a solution, but I would prefer something a little less redundant.

    What about placing some kind of comment (like PHPDoc) at the beginning of each template, stating what objects are used inside this template? Of course this has to be kept up-to-date as well, but it’s much easier IMHO than having this in a different file. And there would be one more advantage: if you remove the template, the system won’t try to remove its cache any more. So you are more able to re-use stuff as long as you use the same cache cleanup mechanism everywhere.

  8. Colin says:

    I couldn’t help but notice nobody has mentioned the auto cleaning factor

    http://trac.symfony-project.org/browser/branches/1.0/lib/cache/sfFileCache.class.php#L73

    Could be that the only documentation of this setting is the source code itself :P

    – CH

  9. I am curious, wouldn’t it be easier to write a cron that lives outside of Symfony (or maybe do this as a task) and which periodically sweeps through the cache and deletes any file that is older than a certain period? That would cut down on those files that arise simply because of some random search that someone did 4 months ago. Like David, above, I’m uncomfortable having the model classes know about specific partials. That seems like a major violation of MVC principles.

    By the way, you write:

    say this partial is in our app’s layout folder and is called _headerNav.php.
    <?php include_partial(‘global/searchTop’) ?>

    Is that a typo? Shouldn’t it be “headerNav”?

  10. Scott Meves says:

    You’re right about the typo, thank you!

    A cron job would work well for the cases where you end up with lots of old cache files sitting around; symfony won’t delete them until they need them again and see they are too old. But, it doesn’t fix the case where an object from your model is modified and you don’t those old cached files lingering around between the time it was modified and the time the cron script runs. For example, if you had a blog post and you cached the blog/show?id=xx view, and then you made an edit to the blog post content, you’d want to see your edits right away and not wait until your cache file reached the age limit.

    So, the core of the issue is this: You need some way to link modifications of your model objects to a list of cached files that are affected by the change. As long as you listen in to when the object is modified, you can move the actual list of which files are modified out of the model and into some other class or even listener so keep your MVC components clean.

  11. Hi, I’m trying to show the cache datetime on my template. can you help me?

    on my action I do it:


    $cacheManager = $this->getContext()->getViewCacheManager();
    $component_uri = "@sf_cache_partial?module=dashboard&action=_translations&sf_cache_key=40cd750bba9870f18aada2478b24840a";

    $this->lastModified = $cacheManager->getLastModified( $component_uri );
    $this->lifetime = $cacheManager->getLifetime( $component_uri );

    and on my template I did it:

    this cache was generated at , next update will be in seconds

    I worked fine, but the variable sf_cache_key=40cd750bba9870f18aada2478b24840a” is hardcoded, how can I get this key dinamicaly??

    thank you,

    Nei

  12. Robin Corps says:

    In Symfony 1.2 (at least) you can do the following to clear cache cross application from within an action (eg. clear frontend cache from backend app):

    $configuration = ProjectConfiguration::getApplicationConfiguration(‘frontend’, sfContext::getInstance()->getConfiguration()->getEnvironment(), false);
    $context = sfContext::createInstance($configuration);
    $cache_manager = $context->getViewCacheManager();
    $cache_manager->remove(‘@sf_cache_partial?module=[your_module]&action=[your_action]&sf_cache_key=[your_id]‘);

    This example would clear a given partial in the frontend app from the backend app.

    Hope that helps someone!
    Robin

  13. Back on Jun 6, 2009 I pointed out a typo that you’ve since corrected. But what about the yaml example that follows? You have:

    _searchTop:
    enabled: on
    contextual: false

    Where does this _searchTop come from? This is the first mention of _searchTop on the page. Is this a typo? Should it be _headerNav?

    Anyway, this is a great post. I was just showing it to a co-worker.

  14. Scott Meves says:

    Thanks Lawrence, I think you are right! I have updated the post.

  15. EP Factory says:

    Hi,
    I’m creating a social network with 10 cultures and I have to configure the symfony cache. So I use the contextual cache with parameters (sf_culture, slug…) to cache each user page (there are more than 2000 users). But in less than 4 hours, cache folder is more than 1.5 Go !
    Do you know an alternative to symfony cache when contextual cache is not sufficient ?
    Thank you

  16. Scott Meves says:

    I would suggest turning off caching of any user-context cache. The savings you get from just one individual user revising their own profile during the time when the cache is fresh probably isn’t worth it. Instead, you might want to look into APC or caching your query results rather than the actual html templates.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>