Designing for Brobdingnag

| categories: sysadmin, design

On the topic of talks, while at Google I was involved in an interesting SRE Classroom event in late 2013. These events were run by many of the SRE offices as a way of sharing design approaches with software and systems engineers who might lack exposure to large-scale systems.

I later gave my talk from the Dublin event to devops.ie in February 2014. PDF with speaker notes from that event:

Designing for Brobdingnag: considerations in scaling large web systems.

Today, we're focusing on technical designs in large-scale software environments, and patterns which we've found work well.

These can be relatively difficult to get exposure to.

This won't be an exhaustive treatment, but we hope to give you:

  • an appreciation of the kinds of constraints and opportunities we have at scale;
  • some tools for approaching design;
  • and a better idea of the kinds of systems SREs engage with.

Note the "Resources" slide towards the end: lots of good links on design and distributed systems there.

Based on presentations at the London and Mountain View SRE Classroom events by Matt Brown, John Neil and Robert Spier. I also include ideas from Jeff Dean's talk Building Software Systems At Google and Lessons Learned. Many thanks to Niall Richard Murphy, Alex Perry, Laura Nolan and Pete Nuttall for assistance, ideas and review.


monendi te salutant

| categories: sysadmin, fun

Talking with Niall on IRC today, I had a brainwave: with just one letter changed, the famous Latin quote morituri te salutant - "those who are about to die salute you" - could become monituri te salutant - "those who are about to be paged salute you".

A noble and fitting handover note for oncall engineers?

Sadly, as is often the case when I take enthusiastic flights into classical translation, I'm off here: moriturus is a somewhat irregular thing from the verb morior. moniturus is active, not passive: it means "about to warn/advise/notify".

To get this right, we need a future passive participle, which it turns out is supplied by the gerundive. So that gives us monendus.

monendi te salutant

Not as sweet a solution as the one-letter change complete with "monitoring" embedded, but not bad. Vale!


Alerting in production systems

| categories: sysadmin

Anyone who has been (or is going) through the wringer of a noisy pager rotation may enjoy this talk, delivered at intercom.io in Dublin on July 1st:

Pager Bound: high-signal alerting in production systems.

  • Good alerting is something that needs to be designed: organic growth tends to not go so well;
  • page, ticket or log: eliminate email alerts;
  • if we must be paged out of bed, it should be for something that really needs human attention;
  • we can only handle ~2 events well per shift;
  • service-level objectives are a really useful way to orient our alerting to customer experience & business priorities;
  • page on the symptom as it relates to our SLOs, not the cause.

Following Rob Ewaschuk's philosophy on alerting.


Beyond Corp

| categories: sysadmin

For the last couple of years at work I've been part of "Beyond Corp", a programme to

Re-architect corporate services to remove any privileges associated with having a corporate network address.

Doing this at a large, 15-year-old company with an extensive legacy IT infrastructure is hard. It's been interesting.

Earlier this month, my colleagues Jan Monsch and Harald Wagener presented a talk about the programme at LISA '13. It's a detailed overview of our background, vision and architecture, along with a discussion of challenges we've met along the way. I had planned to present, but family life intervened and Harald heroically stepped in. :o)

It's a great talk; strongly recommended to anyone with an interest in modern security and management of large, mobile client networks.


Keeping a lab book

| categories: sysadmin

Many years ago, a friend mentioned that he kept a lab book for systems work, so I started to do the same.

I've found it works well for less-defined, experimental or tentative work - for example performance optimizations or exploring new technology; it keeps state organized and external to my poor stinging brain, and leaves me with documentation (as well as tips and tricks) to look back on. Being explicit about expectations, hypotheses, and what variables you're changing as you work is a useful discipline.

Also, it's a good idea to keep a record of how you produced those fascinating results: perhaps you (or someone else) would like to repeat your experiment on a different binary or configuration; perhaps you'll need the raw data in the flamewar when you post your results. ;o)

Here's the template I use. If you like keeping notes as you hypothesize, measure, rinse and repeat, then you might find it useful. Maintaining it in a wiki works well. Also, I like to use collapsible sections so I can record but later hide excessive detail.

Lab book: Title of this experiment/lab session.

Dates: The period you ran the experiment over.

Team members: Who was involved.

Purpose

The background of your experiment; what you intend to find out; any hypotheses you already have; references to supporting documentation or experiments.

Materials

What you're using to produce your results: for example, the version of a binary or the revision number of a configuration you're experimenting with, plus links to any supporting scripts or other tools you're using.

Procedure and data

What you did, how you did it, and what you found out. Consider pasting in commandlines and results. Write for your reader (who may be your future self) - be reasonably detailed and thorough. If you have raw data dumps, perhaps link them, and just include a few representative lines here. Link in any interesting graphs.

Analysis

What you think it all means, and what actions you're going to take as a result of your experiment. Do you need to open some bugs? Do you need to do some more experiments?


Next Page ยป