Skip to content

Latest commit

 

History

History
248 lines (176 loc) · 10.9 KB

NewProgramming.wiki

File metadata and controls

248 lines (176 loc) · 10.9 KB

#summary Punchcards and Mainframes

<wiki:toc max_depth="2"></wiki:toc>

This basic tutorial shows via pseudocode how you can get started with integrating memcached into your application. If you're an application developer, it isn't something you just "turn on" and then your site goes faster. You have to pay attention.

If you're confused on how memcached works and integrates into an application, you may want to read the [TutorialCachingStory] if you haven't yet.

Table of Contents

Basic Data Caching

The "hello world" of memcached is to fetch "something" from somewhere, maybe process it a little, then shove it into the cache, to expire in N seconds.

Initializing a Memcached Client

Read the documentation carefully for your client.

Some rare clients will allow you add the same servers over and over again, without harm. Most will require that you carefully construct your memcached client object *once* at the start of your request, and perhaps persist it between requests. Initializing multiple times may cause memory leaks in your application or stack up connections against memcached until you cause a failure.

Wrapping an SQL Query

Memcached is famous for reducing load on SQL databases. Unlike a query cache which can be centralized, implemented in slow middleware, or mass invalidated, you can easily get yourself moving by caching query results.

Wow, zippy! When you cache this user's row(s), they will now see that same data for up to five minutes. Unless you actively invalidate the cache when a user makes a change, it can take up to five minutes for them to see a difference.

Often this is enough to help. If you have some complex queries, such as a count of users or number of posts in a thread. It might be acceptable to limit how often those queries can be issued by having a flat cache.

Wrapping Several Queries

The more processing that you can turn into a single memcached request, the better. Often you can replace several SQL queries with a single, fast, memcached lookup.

When you load a user, you fetch the user itself *and* their site preferences (whether they want to be seen by other users, what theme to show, etc). What was once two queries and possibly many rows of data, is now a single cache item, cached for five minutes.

Wrapping Objects

Some languages allow you to configure objects to be serialized. Exactly how to do this in your language is beyond the scope of this document, however some tips remain.

 * Consider if you actually need to serialize a whole object. Odds are your constructor could pull from cache.
 * Serialize it as efficiently and simply as possible. Spending a lot of time in object setup/teardown can drag CPU.

Further consider, if you're deserializing a huge object for a request, and then using one small part of it, you might want to cache those parts separately.

Fragment Caching

Once upon a time ESI (Edge Side Includes) were all the rage. Sadly they require special proxies/caching/etc. You can do this within your app for dynamic, authenticated pages just fine.

Memcached isn't just all about preventing database queries. You can cache computed items or objects as well.

In this grossly oversimplified example, we're loading user data (which could be using a cache!), loading the raw template for the "bio" part of a webpage (which could be using a cache!). Then it loads the main template, which includes the header and footer.

Finally, it processes all that together into the main page and returns it. Applying templates can be costly. You can cache the assembled bio fragment, in case you're rendering a custom header for the viewing user. Or if it doesn't matter, cache the whole 'page' output.

See? Why do more work than you have to. The more you can roll up the faster pages will render, the happier your users.

Extended Functions

Beyond 'set', there are add, incr, decr, etc. They are simple commands but require a little finesse.

Proper Use of `add`

`add` allows you to set a value if it doesn't already exist. You use this when initializing counters, setting locks, or otherwise setting data you don't want overwritten as easily. There can be some odd little gotchas and race conditions in handling of `add` however.

Proper Use of `incr` or `decr

`incr` and `decr` commands can be used to maintain counters. Such as how many hits a page has received, when you rate limit a user, etc. These commands will allow you to add values from 1 or higher, or even negative values.

They do not, however, initialize a missing value.

If you're not careful, you could miss counting that hit :) You can doll this up and retry a few times, or no times, depending on how important you think it is. Just don't run a `set` when you mean to do an `add` in this case.

Cache Invalidation

Levelling up in memcached requires that you learn about actively invalidating (or revalidating) your cache.

When a user comes along and edits their user data, you should be attempting to keep the cache in sync some way, so the user has no idea they're being fed cached data.

Expiration

A good place to start is to tune your expiration times. Even if you're actively deleting or overwriting cached data, you'll still want to have the cache expire occasionally. In case your app has a bug, a crash, a network blip, or some other issue where the cache could become out of sync.

There isn't a "rule of thumb" when picking an expiration time. Sit back and think about your users, and what your data is. How long can you go without making your users angry? Be honest with yourself, as "THEY _ALWAYS_ NEED FRESH DATA" isn't necessarily true.

Expiration times can be set from `0`, meaning "never expire", to 30 days. Any time higher than 30 days is interpreted as a unix timestamp date. If you want to expire an object on january 1st of next year, this is how you do that.

`delete`

The simplest method of invalidation is to simply delete it, and have your website re-cache the data next time it's fetched.

So user Bob updates his bio. You want Bob to see his latest info when he so vainly reloads the page. So you:

... and next time he loads the page, it will fetch from the database and repopulate the cache.

`set`

The most efficient idea is to actively update your cache as your data changes. When Bob updates his bio, take bob's bio object and shove it into the cache via 'set'. You can pass the new data into the same routine that normally checks for data, or however you want to structure it.

Play your cards right, and your database only ever handles writes, and data it hasn't seen in a long time.

Invalidating by Tag

TODO: link to namespacing document + say how this isn't possible.

Key Usage

Thinking about your keys can save you a lot of time and memory. Memcached is a hash, but it also remembers the full key internally. The longer your keys are, the more bytes memcached has to hash to look up your value, and the more memory it wastes storing a full copy of your key.

On the other hand, it should be easy to figure out exactly where in your code a key came from. Otherwise many laborous hours of debugging wait for you.

Avoid User Input

It's very easy to compromise memcached if you use arbitrary user input for keys. The ASCII protocol uses spaces and newlines. Ensure that neither show up your keys, live long and prosper. Binary protocol does not have this issue.

Short Keys

64-bit UID's are clever ways to identify a user, but suck when printed out. 18446744073709551616. 20 characters! Using base64 encoding, or even just hexadecimal, you can cut that down by quite a bit.

With the binary protocol, it's possible to store anything, so you can directly pack 4 bytes into the key. This makes it impossible to read back via the ASCII protocol, and you should have tools available to simply determine what a key is.

Informative Keys

... might be clever, but if you're looking at this key via tcpdump, strace, etc. You won't have any clue where it's coming from.

In this particular example, you may put your SQL queries into an outside file with the md5sum next to them. Or, more simply, appending a unique query ID into the key.