These forums are read-only!
HTTP Accelerators and Dynamic Content
  • Hi all

    I'm very new to the concept of reverse proxies and content accelerators and I wondered if someone who knows more could explain a fundamental question I have about how they work with dynamic content (in my case PHP generated pages). I suspect I'm being boneheaded or am missing the point so any enlightenment is much appreciated! :)

    To keep things simple, suppose I have apache, php, mysql and varnish all installed on a single slice (in the real world I'd probably have more than one slice hosting these services). Everything is setup so that an incoming HTTP request for a php url hits varnish first and then (assuming a cache miss) gets sent back to apache and php for processing. As I understand it, the resulting generated php file is then sent to the client as assembled HTML and is cached (along with any css or images referenced by the HTML, if so desired) for x time period by varnish. If another request for the same php url comes in, varnish serves the already assembled HTML file together with any images and css from the cache rather than passing the request back to apache/php for processing.

    My question is this:

    Assuming that my understanding of http acceleration/content caching is correct, I can see how the above would work for php pages that are designed for anonymous viewing i.e. where limited (or no) personalisation takes place and any dynamic content is rendered exclusively from the use of POST and GET variables but how does this work for dynamic pages that include custom content based on session data? Is the answer that this sort of content is not cacheable or am I missing something?

    The only answer I can come up with is that for SESSION based php pages the best you can do is cache the opcode (using APC for example). Is this right or am I making a fool of myself?

    Final question: If am right, does this mean that you have to painstakingly tell varnish (or whatever caching app you use) which pages create content based on session data and therefore should be excluded from caching? This seems a bit error prone for larger sites where you may forget to explicitly mark one or two pages that mustn't be cached.
  • I don't think you are missing anything. In order to cache pages that are personalized from session or database data (i.e. to display "Logged in as John Doe" at top of page), you need to have some application-aware caching logic. Memcached can be used for this type of caching. Assuming you can break your page up into logical blocks, several blocks likely do not need to be personalized on that level. You can cache the blocks independently based on unique keys, then reassemble the blocks from cache on demand. This will not be as fast as a Varnish cache hit, but will be much faster than regenerating the entire PHP page from scratch.

    In my example, the block containing the login message would be stored in memcache with a key that is unique to the user (pulled from the session, most likely). While the rest of the blocks might be stored with a key that is unique to the page.
  • Thanks artagesw. I was worried someone would say this :) So, basically my cunning lazy idea of just sticking varnish in front of apache for instant caching benefits is not going to be possible with dynamic content? Is it even possible to put varnish in front and tell it not to touch certain pages (which I would then optimise individually by hand with memcached)?

    Thanks for the heads up on memcached - I need to go and read up on this.
  • Yes, that's right. I don't have experience with Varnish in particular so I can't help you there. Most web caches do have tuning capabilities to designate things like non-cacheable pages. I'd be surprised if Varnish did not support that.
  • Ok, thanks for the heads up. Time for me to get intimate with the varnish docs I think.