Caching

Pesto contains support for setting and parsing cache control headers, including entity tags (ETags).

Cache control headers

The pesto.caching.no_cache decorator modifies response headers to instruct browsers and caching proxies not to cache a page. See the module API docs (below) for sample usage.

ETags

The minimum you need to know about ETags

ETags are HTTP headers sent by a HTTP server indicating a revision of a document. Typically a web server will supply an ETag along with the content on the first request. Sample HTTP response headers including an etag might look like this:

200 OK
Content-Type: text/html; charset=UTF-8
ETag: "xyzzy"

When the browser makes subsequent requests for the same URL, it will send an If-None-Match header:

GET /index.html HTTP/1.1
If-None-Match: "xyzzy"

The server can then compare the If-None-Match ETag value with the current value. If they match then rather than serving the content again, the server can reply with a 304 Not Modified status, and the browser will display a cached version.

Any string can be used as an ETag. When serving static content you could use a concatenate the file’s inode, filesize and modification time to generate an ETag – this is what Apache does. Alternatives include a using an incrementing sequence number or a string that represents an object’s state in memory.

ETags come in two flavours: strong and weak. The above examples show strong ETags. A weak ETag looks much the same, but has a W/ prefix:

ETag: W/"notsostrong"

A weak ETag signifies that the content is semantically equivalent, even if the byte-for-byte representation may have changed. The HTTP RFC gives an example of a hit counter image, which does not absolutely need to be refreshed by the client on every request.

How to add ETag support

Let’s build a hit counter application. Here’s a version without ETag support. You’ll need the Python Image library installed to try this example. Also note that this is a very simplified example that would not be suitable for use for a real application:

import Image
import ImageDraw
from StringIO import StringIO
from pesto import to_wsgi
from pesto.response import Response

class HitCounter(object):

        current_count = 0

        def counter(self, request):

                self.current_count += 1

                img = Image.new('RGB', (50, 30))
                draw = ImageDraw.Draw(img)
                draw.text((10, 10), str(self.current_count))
                buf = StringIO()
                img.save(buf, 'PNG')

                return Response(
                        [buf.getvalue()],
                        content_type='image/png'
                )

if __name__ == '__main__':
        from wsgiref.simple_server import make_server
        counter = HitCounter()
        app = to_wsgi(counter.counter)
        httpd = make_server('', 8000, app)
        print "Now load http://localhost:8000/ in a web browser"
        httpd.serve_forever()

Save and run this script, then browse to http://localhost:8000/ and you should see a hit counter.

To make this cacheable, we need to add an ETag header to the response. Let’s suppose we only want the image to be cached for up to seven hits. We would start off by defining a method that generates an ETag to reflect this:

def hitcounter_etag(self, request):
        return self.current_count / 7

Then we can use the pesto.caching.with_etag decorator to apply this to the counter function, and the pesto.caching.etag_middleware to make the application return a 304 Not Modified response when the ETag matches. I have also put the image generation into a separate function so that it is lazily generated – the image will not be regenerated when the ETag is matched:

import Image
import ImageDraw
from StringIO import StringIO
from pesto import to_wsgi
from pesto.response import Response
from pesto.caching import with_etag, etag_middleware

class HitCounter(object):

        current_count = 0

        def hitcounter_etag(self, request):
            return self.current_count / 7

        @with_etag(hitcounter_etag, weak=True)
        def counter(self, request):

                self.current_count += 1

                def image():
                        yield ''
                        img = Image.new('RGB', (50, 30))
                        draw = ImageDraw.Draw(img)
                        draw.text((10, 10), str(self.current_count))
                        buf = StringIO()
                        img.save(buf, 'PNG')
                        yield buf.getvalue()

                return Response(image(), content_type='image/png')

if __name__ == '__main__':
        from wsgiref.simple_server import make_server
        counter = HitCounter()
        app = to_wsgi(counter.counter)
        app = etag_middleware(app)
        httpd = make_server('', 8000, app)
        print "Now load http://localhost:8000/ in a web browser"
        httpd.serve_forever()

Load this in your browser, and examine the headers using the LiveHTTPHeaders FireFox extension or something similar. You will see an ETag header has been added:

200 OK
Content-Type: image/png
ETag: W/"0"
...

Refresh a few times and you should see the server sending a 304 Not Modified response to repeated requests.

More on ETag generation

The function pesto.caching.with_etag expects to be passed a function which must return an object to be used as an ETag. It then uses the following rules:

  • If passed an numeric value or short string, it is used as-is.
  • If passed a long string, an MD5 signature is computed and used as the ETag.
  • If passed any other object, the object is pickled and the MD5 signature of the pickle used as the ETag.

The pesto.caching.etag_middleware will call the WSGI handler in order to allow it to set the ETag header function, and then either abort the response and return a 304 Not Modified status or proceed with the response.

If the underlying application is a pesto handler, this means the handler will be invoked and the first iteration of the content iterator called (ie as far as the first yield statement in the image function above), before the content iterator is closed.

For best performance a good pattern for handlers is:

  • return an generator function to generate the response lazily
  • start that function with yield ''.

For example this is bad: very_expensive_calculation will be called every time, even on ETag matches:

from pesto.caching import with_etag, etag_middleware
from pesto.response import Response
from pesto import to_wsgi

def very_expensive_calculation():
        print "Calculating!"
        return "The answer is 42"

@to_wsgi
@with_etag(lambda request: 'foo')
def my_handler(request):
        return Response([very_expensive_calculation()])

app = etag_middleware(my_handler)

We can see that very_expensive_calculation is called every time by calling this in a test rig:

>>> from pesto.testing import TestApp
>>> print TestApp(app).get('/').text()
Calculating!
200 OK
Content-Type: text/html; charset=UTF-8
ETag: "foo"

The answer is 42

>>> print TestApp(app).get('/', HTTP_IF_NONE_MATCH='"foo"').text()
Calculating!
304 Not Modified
ETag: "foo"

Now we rewrite this so that very_expensive_calculation will be only be called when the ETag does not match:

@to_wsgi
@with_etag(lambda request: 'foo')
def my_handler(request):
        def generate_content():
                yield ''
                yield very_expensive_calculation()
        return Response(generate_content())

app = etag_middleware(my_handler)

And we can see that very_expensive_calculation is not called when we supply the If-None-Match header:

>>> print TestApp(app).get('/', HTTP_IF_NONE_MATCH='"foo"').text()
304 Not Modified
ETag: "foo"

pesto.caching API documentation

pesto.caching

Utilities to add caching and ETag support.

pesto.caching.etag_middleware(app)

Interpret If-None-Match headers and only sends the response on to the client if the upstream app doesn’t produce a matching etag.

Note that the upstream application will be called on every request.

The response’s content iterator will not be called on cached responses.

pesto.caching.etags_match(tag, tags, allow_weak=False)

Return True if any tags matches an entry in tomatch

Parameters:
  • tag – a tuple of (weak, entity-tag)
  • tags – a list of tuples of the same format

Synopsis:

# Strong comparison function
>>> etags_match((False, 'a'), [(False, 'a'), (False, 'b')], allow_weak=False)
True
>>> etags_match((False, 'a'), [(True, 'a'), (False, 'b')], allow_weak=False)
False

# Weak comparison function
>>> etags_match((False, 'a'), [(True, 'a'), (False, 'b')], allow_weak=True)
True

# Weak comparison function
>>> etags_match((True, 'b'), [(True, 'a'), (False, 'b')], allow_weak=True)
True

# The special case '*' tag
>>> etags_match((False, 'a'), [(False, '*')])
True
pesto.caching.make_etag(s, weak=False)

Return string s formatted correctly for an ETag header.

Example usage:

>>> make_etag('r1089')
'"r1089"'
>>> make_etag('r1089', True)
'W/"r1089"'
pesto.caching.no_cache(func)

Add standard no cache headers to a response:

>>> from pesto.testing import TestApp
>>> from pesto.core import to_wsgi
>>> from pesto.response import Response
>>> from pesto.caching import no_cache
>>> @no_cache
... def view(request):
...     return Response(['cache me if you can!'])
...
>>> print TestApp(to_wsgi(view)).get().text()
200 OK
Cache-Control: no-cache, no-store, must-revalidate
Content-Type: text/html; charset=UTF-8
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Pragma: no-store
<BLANKLINE>
cache me if you can!
pesto.caching.parse_entity_tags(s)

Parse entity tags as found in an If-None-Match header, which may consist of multiple comma separated quoted strings, as per RFC 2616 section 3.11

Example usage:

>>> from pesto.caching import parse_entity_tags
>>> parse_entity_tags(r'"tag a", W/"tag b"')
[(False, 'tag a'), (True, 'tag b')]

>>> parse_entity_tags(r'"\"a\"", "b"')
[(False, '"a"'), (False, 'b')]

>>> parse_entity_tags(r'"\"a\",\\b", "b"')
[(False, '"a",\\b'), (False, 'b')]

>>> parse_entity_tags(r'"\"a\",\\b\\", "b"')
[(False, '"a",\\b\\'), (False, 'b')]

>>> parse_entity_tags(r'"some longer \"text\"", "b"')
[(False, 'some longer "text"'), (False, 'b')]
pesto.caching.quoted_string(s)

Return a quoted string, as per RFC 2616 section 2.2

Synopsis:

>>> from pesto.caching import quoted_string
>>> quoted_string(r'"this" is quoted')
'"\\"this\\" is quoted"'
>>> quoted_string(r'this is \"quoted\"') == r'"this is \\\"quoted\\\""'
True
pesto.caching.with_etag(etag_func, weak=False)

Decorate the function to add an ETag header to the response object.

etag_funcs is a list of functions which will be called with the request object as an argument, and return an identifier. This could be a timestamp, a revision number, a string, or any other object that identifies the revision of the entity.

Synopsis:

>>> from pesto.core import to_wsgi
>>> from pesto.testing import TestApp
>>> from pesto.response import Response
>>> from pesto.caching import with_etag
>>> def generate_etag(request):
...     return "whoa nelly!"
...
>>> @with_etag(generate_etag, False)
... def view(request):
...     return Response(["This response should have an etag"])
...
>>> print TestApp(to_wsgi(view)).get()
200 OK
Content-Type: text/html; charset=UTF-8
ETag: "whoa nelly!"

This response should have an etag

>>>