#185 July 14, 2022

Writing, Learning and Tech, with Ian Miell

Hosts: Craig Box

Ian Miell is a partner at consultancy Container Solutions, and an author of books on Bash, Git, Terraform and Docker. He explains to Craig how writing - whether runbooks, blog posts, training courses, or “real” books, can help you learn and make your team more effective.

Do you have something cool to share? Some questions? Let us know:

Chatter of the week

News of the week

CRAIG BOX: Hi, and welcome to the Kubernetes podcast from Google. I'm your host, Craig Box.


CRAIG BOX: Warm here this week. As in, they expect this weekend might break the record for hottest day ever in the UK. I'll certainly follow up on that piece of news in next week's show.

My advice for keeping cool, especially for those without a large garden — get a small pool. If you have a spare square meter, you can get one of those little round pools, which you could fill with ice and comfortably stand a couple of people in. If you've got space to spare, get one of those two-ring rectangular inflatable pools, which are large enough to while away an afternoon in. Modern phones are waterproof, are they not? I still feel like I'm living dangerously while trying to take a pool selfie.

We're keeping the doors open overnight, though I'm still cautious about that because the first time we did that here, a fox ate one of my shoes. At least, I hope it was a fox.

Anyway, too hot to chat. Hope it's nice where you are. Personally, I enjoy most of my podcasts when I'm out walking, and there's not much of that at the moment. Let's get to the news.


CRAIG BOX: The Gateway API, formerly known as the Services API, and commonly referred to as Ingress v2, has hit beta. Gateway is a modern set of APIs for deploying L4 and L7 routing in Kubernetes, designed to be generic, expressive, extensible, and role-oriented. Version 0.5 of the project is the first with a v1beta1 API for the common components, and introduces release channels so you can kick the tires on some of the uncommon components. The project also introduced a new initiative for using the Gateway API for service mesh, with representatives from the Istio and the SMI community coming together to standardize mesh use cases.

Two projects have stepped out of the sandbox and into incubation at the CNCF this week. Kyverno, with a K, is a policy engine that joined the CNCF in November 2020, and was originally created at Nirmata. Keptn, also with a K, is an event-driven orchestration engine that joined in June 2020, and more importantly, was discussed on this show in September of that year. It was originally created at Dynatrace. Both projects have shown substantial progress since joining, with many more features on their respective roadmaps.

Google Cloud has introduced Arm instances on Compute Engine, launching the Tau T2A family based on the Ampere Altra server CPU. GKE support launched on the same day, with Arm available on both GKE standard and autopilot. All you have to do is mark your workloads as wanting to run on Arm, and autopilot will handle the rest, letting you run clusters with both x86 and Arm nodes. T2A VMs are now in preview in three regions spread around the world.

In mergers and acquisition news, KubeShop has acquired a majority stake in Botkube, a ChatOps bot for monitoring and debugging Kubernetes clusters. KubeShop is a self-described first-of-its-kind Kubernetes-focused accelerator and incubator, and currently operates open source projects focused on developers and testers.

All code has bugs, but one particularly unlucky piece of code had three bugs in the same line. The IAM authenticator for AWS lets you authenticate to an EKS cluster using AWS identities. Gafnit Amiga from Lightspin discovered three vulnerabilities, whereby someone with an allow-listed IAM identity might be able to modify their username and escalate privileges. The line of code in question would lowercase request parameters, allowing you to set capital-A-Action and lower-case-A-Action in a token request and have them not marked as duplicate. It has been fixed with a 17 line patch and a new version released. Amiga also recently published new ways to exploit the Kubernetes Ingress-NGINX controller, which were fixed in the most recent version.

Another CNCF project has had its bugs caught by the fuzz. KubeEdge underwent a foundation-sponsored security audit performed by ADA Logics. Ten fuzzers were developed, which have been applied to its CI system. Two CVs were found through fuzzing, and another eight found by manual testing. All have since been fixed by the KubeEdge team.

Finally, congratulations to OpenShift head Matt Hicks, who has just been announced as Red Hat's new CEO, and to episode 85 guest Clayton Coleman, who, in unrelated news, has joined Google as a distinguished engineer.

And that's the news.


CRAIG BOX: Ian Miell is a partner at Container Solutions and the author of books on Docker, Git, Bash, and Terraform. Welcome to the show, Ian.

IAN MIELL: Hello, Craig. Lovely to be here.

CRAIG BOX: I think that finding out you were a history graduate and had worked as a journalist immediately put you in a perspective that I didn't really have just knowing that you were a container consultant.

IAN MIELL: My path to IT, like many people's, was slightly circuitous. I did a degree in history. I studied maths, English, and history at school, and then went to university and studied history for three years, and fully intended, because I didn't know what else to do with my life — I didn't want to be a lawyer, which is what most of my friends wanted to do — I wanted to do a master's in history. And then that didn't really work out, so I decided to run away for a year and teach English abroad in my mother's home country, Austria.

And during that time, I was quite handy with computers because I'd been into computers as a kid. Some of my fellow teachers in this Austrian school said, hey, you're really good at this stuff, you ought to teach it. And I was like, yeah, great. And then I kind of got into that. And then I thought, I don't know what to do with my life, so why don't I study computing? Because that seems to be where all the cool kids are. It was sort of dotcom era. For those of you [under] the age of 30, that's about the turn of the century, when there was lots and lots of venture capital money being thrown at technology, which sounds familiar, right?

And at that time, yeah, it was the thing to go into. And I thought, well, that'll pay the bills. It might be some sort of career, and it's quite an open field and kind of fun. So I got into that and did a conversion master's course in computer science at University of London, working three jobs and living in a broom closet. And then yeah, to cut a long story short, ended up working for an online gambling backend systems company who specialized in optimizing to take as many bets as possible as quickly as possible.

CRAIG BOX: It's very interesting, the different opinions about online gambling around the world.

IAN MIELL: Yes. It's a hot topic because I think the US market finally opened up after many years of us talking about it. And I know that the UK has another gambling bill that it's talking about or reviewing, or something's going on politically with that.

CRAIG BOX: If only there was a government to approve things.

IAN MIELL: Well, yes. In fact, that was how I was reminded of it. This morning on the radio, someone was saying, oh, I had to resign as minister, but I was really keen to get into the gambling bill and influence that for the good. So it's a bit like social media, right? There's an area that technology has really changed the industry the last 20 years. And as a result, people want to get a grip on it with legislation. But that takes a long time. But that's probably a whole other subject.

CRAIG BOX: Well, history and journalism does feel like the kind of thing that might lead one down the political path. Have you considered throwing your hat in for the open position at the moment?

IAN MIELL: Well, it's funny you should say that. I was thinking about this because, when I was deciding to do my university degree, a teacher at my school — because I was the only person applying to the university I was applying to — and the teacher said, oh, there's this subject called politics, philosophy, and economics. And I said, oh, that's fantastic, I'd love to do that. And then he said, that's the subject all the prime ministers do. So I immediately ruled myself out for that subject, and thought history would be a safer bet. But no, I don't have any political ambitions. If I were doing politics, I'd probably be some sort of background glad hander or something.

CRAIG BOX: It does, however, take a very unique combination of skills and background to post something titled, "Business value, soccer canteens, engineer retention, and the bricklayer fallacy."

IAN MIELL: Yeah, that was a fun post to write. So I think you're referring to a post on my blog, zwischenzugs.com, which is impossible to spell.

CRAIG BOX: Thank you for saying it out loud first so that I had a reference. I'm pretty sure I would have got that right. Didn't practice it, though.

IAN MIELL: [LAUGHING] Yeah, actually, it's a German word, because my mother's Austrian, so German is back there. But it's also a chess word. And I was at a German conference recently, and I said, oh, my blog's called zwischenzugs.com. And of course, you're a German-speaking crowd, so you know what I'm talking about. And they all looked at me blankly. It was like, we don't know what that word means. And it turns out it's not a word in general use in German at all. It's really only used in the chess world.

CRAIG BOX: It's a bit like a zugzwang?

IAN MIELL: Yes. Zugzwang is another chess word, but that's better known. Zugzwang means compulsion to move, or specifically, you have to move when you don't want to. And zwischenzugs is an in-between move. So it's a move you make in between taking your opponent's queen. You might give a check first. I called my blog that because my blogging is an in-between move between what I do at work.

But that particular post sprang from reading Alex Ferguson's book on leadership, which I recommend to anyone who wants to read an earthy book about the nature of leadership. For those that don't know, Alex Ferguson was a soccer, or, as we say in the UK, football coach or head coach or manager for over 20 years, leading a big UK team to international success. And he was famous for being quite a demanding individual, shall we say.

And he wrote this book, and it's clearly very authentically in his voice. So it's like you can hear him — it's ghost-written, but you can hear him coming through the pages. He wrote about how he spent some of his time, at least, talking to junior players in the canteen and actually getting involved in the architecture of the new buildings for the younger players.

Yeah, that led me to reflect on some of the deeper questions about business value. And I had what I thought was quite a humorous scene of a scrum master going up to Alex Ferguson and saying things like, well, are we in the business of winning soccer matches, Mr. Ferguson, or are we in the business of canteen design, thinking about how those conversations would go, and from there reflecting on what business value is and so on.

So yeah, I like to read. I like to think about the broader issues. And the history degree I did was a great kind of training for blogging, really, because every week, I was handed a list of books and a question. And it was like, see you next week. And next week, you're supposed to have an essay in your hand. And you'd read out the essay, and they would tell you what they thought.

But how you spent that week was entirely up to you. You could go to lectures if you wanted. Historians tended not to. Or you can go to the library and figure it all out for yourself. So if you want to write a blog post or study something quickly or do journalism, it's a great training because you just have to quickly get something together, which inculcates a great bias for action, by the way, because you have to write before you're necessarily comfortable about knowing what you're writing about, which, again, is a useful skill for life, I think.

CRAIG BOX: Do you think that writing makes you a better technologist?

IAN MIELL: Yes is the short answer, because writing involves delivery. It also involves often thinking about things from others' perspectives. And if you're doing technology, that can be quite important. But it imposes on you a certain self-discipline and structure and so on. But also, there's really basic stuff.

Like if you're asked to summarize something for a project leader or program manager or someone senior, a client or in your company, the ability to ask yourself, what are the main points they want to hear, the basic history essay structure of say what you're going to say, say it, and say what you've said, these are things that are actually quite useful. I'm often surprised when I talk to technologists who don't have much experience in writing and see some of the emails they send out, or probably these days, big Slack updates, how they're sort of written. All the information's there, it's just not presented in a way that's easily consumable for the audience. So definitely that's helpful.

CRAIG BOX: That violates the dictum of "don't repeat yourself".

IAN MIELL: Yeah. Yeah, exactly. So I've written books on Bash and Git. And the reason I wrote books on those things was because a lot of the material out there is clearly written for people who kind of already know the subject quite well. And it's really difficult to place yourself in the position of someone who doesn't know the jargon, or the jargon is really new to them, or it's all a bit chaotic, they don't really know what's going on, whereas if you've been writing Bash for 20 years, certain things just seem so obvious just because you've been around it for so long.

And when you're writing, it's sort of the same thing. Like, you know the subject, and now you've got to try and get that over to someone else. And doing that in a way that's kind of entertaining and interesting and consumable for them without them getting frustrated, it's a really useful discipline, I think, forcing you to try and empathize with the reader.

CRAIG BOX: How much Bash and Git did you know when you set out to write these books?

IAN MIELL: It's a great question because it gets to the heart of something which I think — a lot of people think that people who write books know the subject first completely and they're super experts in it, and then they write the book. And actually, with the Git and Bash, that very much wasn't the case. In fact, I started writing the books out of frustration at my own incomplete knowledge. So with Bash, I'd been using it every day for 20 years, and I just sort of sat down and thought, hang on, I read the Bash man page, and there's so much of it I just don't really fully understand. And that's ludicrous. I should be able to read it and just go, I know what's going on here.

So by writing "Learn Bash the Hard Way," I forced myself to actually go and try and explain it to others. And by doing that, of course, I learned a lot along the way. And more importantly, a lot of the stuff that was sort of semi-formed in my mind, that was like, oh, I vaguely know what this does, and I can kind of use that in a script, I was forced to actually ask myself difficult questions about how exactly does that work, and why is that different to that. Why is two square brackets different from one square bracket. That kind of question, which I'd always fudged over, had to be confronted.

CRAIG BOX: Do you consider yourself a programmer?

IAN MIELL: No, not anymore. I did program for a couple of years, doing proper programming. Did a bit of C, did a bit of — the gambling company I worked for did Tcl a lot. So they had a C application server, which was quite a beautiful thing. Very, very, very fast, very powerful. And as a business logic language, they used a language called Tcl, or "Tickle", which I think some of your older, more experienced listeners will have come across. At one time, it was actually a rival for Java at Sun, I believe, and lost out to Java.

One way to look at it is it's kind of LISP for humans – or Perl for sane people, is how I've heard it described. It's a very small string-based language. Anyway, I spent a long time writing Tcl and have a fondness for it. But it's not very commercially valuable. So I spent a few years big into database optimization. We had a monolithic database at this gambling company. And if a query went a couple of milliseconds slower, then that could be a big problem. So I really got into the internals of database optimization. And I loved all that stuff. But again, it's not a widely applicable skill.

As I worked as an engineer or programmer, I ended up accidentally leading teams. And I always had a bent towards communication in that role. I was a terrible manager when I was 26. But I learned a little bit about that. But I was often good with clients in the sense that I could explain to them why they could not have feature x, or why it was going to take a long time to do something, or why something had taken a long time and do it in such a way that didn't annoy them.

And so I guess I've never been able to fully focus on the engineering side. And of course, I was around people who were just way better at it than me. So you let them take the lead. But I have had my moments. I wrote a huge automation tool a few years ago for my company to try and speed up a bunch of stuff we wanted to do. And I got into the internals of terminals and got into all, sorts of, cool stuff. And that was a programming moment for me. Yeah, it's a side of my life I think I'm going to have to, a door that's now pretty shut to me.

CRAIG BOX: I first met you in 2015, and I went back and checked my notes – I had written down that you were three weeks into your new job at the bank at the time we first chatted.

IAN MIELL: Wow. You kept notes! I've only recently started doing that in the last few years. So yeah, was it really three weeks?


IAN MIELL: It was three weeks in when we met? My god.

CRAIG BOX: Yet you still came across as competent and valuable.

IAN MIELL: Well, that's a skill. I didn't feel it.


But it's good to know that that's how it comes across. I recall, Craig, you sitting in a very mahogany-walled room presenting [virtually] to 200 plus Barclays engineers about Google Cloud Platform. That's my memory of that. And emphasizing with you, how difficult that must be to talk to a void of people. Of course, we're all used to it now. We're post-pandemic.

CRAIG BOX: It is a skill. We don't see a lot of people in person these days. But it's always nice when we do get to have a chance to catch up, as you and I did when we were in Spain. You were looking at the role that you'd taken on then moving from your gambling company, which obviously had grown over time to a much larger company in Barclays, as you mentioned. You wrote in 2018 about why enterprises are so slow. What was the difference in experience to you, between those two different classes of company?

IAN MIELL: Yeah, absolutely enormous. And that's what drove me to write that. Actually, what drove me to write that piece was frustration with people within Barclays and outside who just thought, why does it take you so long to deliver something? How hard can it be? You just install the software, and write some docs, and you're done.

I worked for the infrastructure team at Barclays. And so we had all sorts of roadblocks in our way, which made it extremely difficult to get anything done. And even people in charge at Barclays who were seniors had no idea of how difficult it was to get something to production. So I had started in my career in a roomful of people, 30 people in a room, 2001. And there I was in a company that, in 2015, I think, with 125,000 employees.

And so all my skills that had been honed of bias fraction, and JFDI, and all these things which served us well in a startup trying to deliver features fast for our customers completely ran into the ground, just couldn't even get started. And so my choice was either, I don't know, I suppose I could quit, and just go back to a startup, and do that. Or I could just figure out what's going on.

So it did that. And I ended up, yeah, writing that piece. So I should say a little about how I ended up at a bank, I suppose. So while I was at the gambling company, I was working in third line support, which was essentially being shouted at by bookies because they were losing money by the second when systems were down.

And one of the things that frustrated us was that we couldn't get environments up quickly. We would fix stuff ourselves. So if we had a fix for a broken piece of code, we'd want somewhere to try that fix out to see if it worked before going to production. And often, we couldn't. And we would just stick it on production and see if it worked. Restart the service. Does it work? Great. Put it in source. Does it not work? Roll it back.

I read about this thing over one weekend called Docker, which allowed you to do lightweight VMs. We tried big VMs. But they were 10 gig. You couldn't cart them around anywhere. You couldn't share them. They got stale pretty quickly. It didn't really work for us. So my fellow engineers would have these huge environments they spent two weeks setting up when they joined a team. And they would be their pet environments. That was the norm.

And I said, look, there's just a bunch of scripts people run to get this stuff working. And they overcome all these different things. But this Docker thing allows us to programmatically build these containers. And then, you can run it on your desktop. And I was thinking about this the other day, actually, that Docker has this great feature, the copy on write layers.

We used to use that in development all the time. So the developer would have a huge monolithic Docker image of the whole system. And then, they would make a change to the system and see what files changed by looking at the diff, the docker diff. And you could pull someone else's changes on top of your changes and try them out.

So if someone was working on something, you want to collaborate on it, you could "docker pull" from them. And it would be — all this stuff has been lost as we focused on just building the image in a CI and deploying it to Kubernetes. And there's part of me that thinks that's a real shame because that mangling of Git and Docker could have been a really powerful idea, I think, but maybe only for some use cases.

CRAIG BOX: Yeah, and again, as someone who's written a book on it, you can talk perhaps about the difference between Git and GitHub.

IAN MIELL: Yes, exactly. So actually, yeah, there's a great analog there for Git because it's this distributed software tool where people could in theory push and pull independent of a centralized server. But we like the idea of a centralized server. It makes things mentally simple for us to grasp. Actually, the stimulus for me writing a book on Git was when I was working at Barclays, my fellow infrastructure team — hundreds of them — were building this thing. And there were 12 teams working on a single platform.

They came to me one day and said, oh, well, you know Git, Ian. So can you tell us where this change came from? And I was like, well, that'll be easy. Just send me the repo and I will tell you where that change came from. And I did a Git log minus minus graph minus minus one line minus minus all, thinking I could trace the thing through and see the basic structure. And it was all pipes, my whole screen. And I maximized the screen. It was still all pipes.

And I said, how do your teams merge stuff? And they said, oh, yeah, we just merge. I said, do you rebase? No, no, we just merge. And we just pile it all in. So you couldn't tell where anything was. And they didn't know what I was talking about when I said rebase. So my goal was to do a training program, where they get to the point where they understand what a rebase is, because if you could understand that, you've actually had to understand how distributed Git works and all this other stuff. But most people just want to use pull, push, merge because it's easier to grasp.

CRAIG BOX: Let's all go back to Subversion. Life was easier then.


IAN MIELL: Yeah, I never liked sub. I was a CVS guy, just because our company used CVS. And that was, when I joined in 2001, I think CVS was considered the nouveau option. And RCS was the thing. And some people objected to version control at all. They had tar files and email. And what's the problem?

CRAIG BOX: So that evolution that led you to Docker, did that also lead the gambling company to Kubernetes?

IAN MIELL: No, so it's quite a sad story, really. I did a Skunk Works project because previous attempts to go through the company machine didn't work. I had an ill fated attempt to use Erlang because it was particularly appropriate for certain gambling problems. I loved Erlang. It was a completely new paradigm, fascinating language. You can get incredible performance out of it and scalability.

CRAIG BOX: Slightly more modern than Tcl?

IAN MIELL: Slightly more modern. I think it's quite old actually, though. It's from, is it from the '90s?

CRAIG BOX: Ericsson invented it. I thought it might go back even further than that?

IAN MIELL: It could be. Joe Armstrong, but I think, yeah, anyway, but it was fashionable at the time or becoming fashionable.

CRAIG BOX: Real-time follow-up, 1986.

IAN MIELL: 1986, yeah. So yeah, so I tried going the official route and introducing Erlang. And that didn't really work out because I lost control of it. So I was like, OK, this time, I'm going to skunk-works this. So with Docker, I got two really bright, keen, young engineers. And they were working on different customers. So we spread the work between us. And we ended up with these customers saving about 20% of developer time just by using Docker, not in production, not even in staging, just in development.

And that 20%, I think, was actually higher. But I based it on a survey. I did a survey of the developers using it and asked them, basically, how much time do you think you're saving and why? So I did this. And I was like, right, OK, I'm going to go to the board. I'm going to try and get a meeting with the company's senior execs and try and get some funding for this because I've got enough now to do it.

And they said, yeah, we don't really want to speak to you for another six months. And at the same time, this person at Barclays really wanted to hire me. So it became harder and harder to say no to them. So I thought, OK, I may not like working for a bank. I may have sworn that I would never, ever, ever work for a bank. But what's the worst that can happen?

So I took a big leap, after 14 years, took a big leap into the unknown and joined Barclays as a senior OpenShift architect. I'd never used OpenShift. I'd never used Kubernetes, really, at that point. I'd never been an architect formally until that point. So it was a big leap. But fortunately, I worked with a really great team. And it worked out well.

CRAIG BOX: When you go to a big company like that, with the comfort of being in a group of 30, were you able to find a group of 30, a community of practice, or a team, or something in Barclays to be able to recreate that environment?

IAN MIELL: Yes, kind of. So before I joined Barclays, I spoke to someone I knew at Red Hat. And I said, Red Hat do stuff at Barclays? And they said, yeah. And I said, do you know people who work there? And they said, yeah. I said, well, what are they like? And he said, well, Barclays is a big place, my lad.

So who are you talking about? And I gave the name of the person who was hiring me. And they said, oh, they're the best team in finance. You definitely want to go and work for them. So that was a big tick in the box. And when I joined that team, they were core of about five or six people who were—

CRAIG BOX: All named Jonathan, if I remember rightly. All four of them.


IAN MIELL: They had very English names, people at Barclays. I came across more Ians working there than anywhere else. In fact, I rarely meet an Ian. But at Barclays, there were lots of them. So yeah, there's certainly a certain demographic there. But yeah, these were really driven people who wanted to do good stuff with software and also knew how to fight the machine.

So they had already got OpenShift v2 into production, which I think I heard Daniel Walsh talking about recently. That stirred a few memories when he was talking about that because it was basically Docker before Docker. You had cgroups, and SELinux, and various other techniques all mushed together to separate workloads. And yeah, so they'd done 2. And they were going to move to v3.

CRAIG BOX: Is it possible in a group like that to block out the fact that there are another 120,000 employees around the world and just focus on building something for the team that you're working with?

IAN MIELL: Well, if we go back to Alex Ferguson, it takes more than just 12 strikers to make a team. So we had people who could fight the machine on their behalf. So in a way, yes, you could block it out. I guess one of the really challenging things about working for a large organization like that is that you can do a lot of really good work and it goes to nothing because some accountant put a line through an Excel spreadsheet, or some manager was feeling bad that week about something, or you just didn't fit with some other strategy somewhere else.

There's all sorts of reasons why your work can come to nothing. But if within that team, you can feel that the work you're doing is progressing even if it's like, oh, we can roll that out because that doesn't work. We could move really fast. And you would get a feeling for that. The other thing that I think helps is there's a certain game you have to play in those teams or those organizations, where you almost have to translate the agile day-to-day into a traditional project management organization way of looking at things.

And there was one person whose job almost all day, every day was to sit on the phone and explain or convert the agile plans into six months, 12-month specific goals and commitments, which is insane. But it's pretty hard to explain to a bank why we might do some work and it might not work out, because we don't really know whether what we're doing actually works. That's the level of uncertainty they need to be protected from.

CRAIG BOX: Is it possible to build software that suits both the startup and the enterprise? Or do you think that they're just so chalk and cheese that you can't even tier those requirements differently?

IAN MIELL: Oh, I don't know, actually. I wonder if the world has changed a little bit. Or maybe my experience of the world has changed because I don't hear of the same stories of the real open fields we had in 2001 to just build any stuff the way we liked it. It's just not there anymore. So for example, when I worked at the gambling company, if we needed to do, I don't know, some security tokenization, we would just write it ourselves.

If we needed to do authentication, we would write stuff ourselves. And so it was such an open field then, you could have a go at doing it yourself. And then eventually, you might realize, actually, you know what? It's crazy to roll around crypto, cryptography. Let's use a product for that. But generally, the emphasis was on using our own skills and knowledge to get the solution right. And that paid off in all, sorts of, ways.

So the C application server was written over a weekend in a fit of pique because the founders of the company hated the IBM application server that they were using. And to this day, they still use it. And it's a great differentiator for their business. But you'd never do that now. That would be insane.

CRAIG BOX: Well, it's easy to look at it and say, the way that the companies might operate internally is different. But if you take something like Kubernetes, for example, is it going to be suitable to scale down to the small use cases and to still support the large enterprise use cases as well?

IAN MIELL: Well, this relates very firmly to another piece I wrote about, saying that AWS is the new Windows and Kubernetes is the new Linux. So you could say the same about Linux. Linux is really complicated. Or it was. Well, it is complicated. But 20 years ago, it was really complicated and really hard to use. And you had to compile your own kernel modules and recompile this, that, and the other to try and make it work or things wouldn't work the way you want it to.

And then gradually, over time, you got two main lineages of distro, the Debian and the Red Hat. And you can buy solutions for whatever you want. And it'll probably be good enough. And it'll be relatively easy to use. I think the same thing has happened with Kubernetes. So at the moment, and certainly five years ago, Kubernetes was a lot of home experimentation and trying to figure out how to make things run and make things work.

And now, we have the concept of Kubernetes distros, which wasn't really a thing until about four or five years. There was OpenShift and open source. And that was it. But now, you've got Rancher. And I'm sure there are lots of others out there now. I think over time, it will become — and I think for a lot of startups, actually, they just consume the cloud ones. And they don't really think about it. And that's fine. They'll use EKS, or AKS, or Google Cloud, GKE. They won't need to think about it. It's almost invisible to them. It's like spinning off a Linux machine.

CRAIG BOX: But in saying that the tooling, the YAML, the higher level stuff, is that an effort that a small company shouldn't bother going to?

IAN MIELL: It depends how small. So I work for a consultancy that helps companies go to cloud native software. And surprisingly often, we tell companies, you don't need Kubernetes. Or if you need it, you don't need it yet. Don't just bring it in for the sake of bringing it in. So a typical profile of a company would be, theoretically, a single product company. But they actually have 12 different teams, B2B, with 12 different customers. And it's all a bit of a mess. And each one is doing their own auth solution, and doing their own security solutions, and so on.

And you look at that and you say, OK, there's an obvious candidate for abstraction here. And Kubernetes can solve your problems in the sense that we put it all in one box and say, that's the box that everyone's using. And we're not in the business of writing our own auth systems or our own security solutions. We can just use the ones off the shelf.

So for example, I'm looking at helping a company lock down their Kubernetes distribution. And I could write all sorts of crazy shell scripts that check whether you're using secrets as environment variables or using them as files. Or you could just install the bog-standard Gatekeeper or Kyverno that everyone else is using, and you can be fairly [sure] that that's going to be maintained by someone else for years, and years, and years. That's the sweet spot for it. But the very small company, it's probably not — you need some really motivated and talented engineer who wants to maintain it for the rest of the business to do that.

CRAIG BOX: Something else we've talked about over the years has been the concept of runbooks. You're very big on the idea of having processes that people can follow. When do you think something should be codified as a runbook versus it being automated as a script?

IAN MIELL: The story was that I was in charge of this team doing third line support. And we had many, many priority issues, I think 2,000 accounts in two years of major priority incidents. And a lot of them were repetitions. And we always moaned, we need more documentation. We need more documentation.

I got really tired of this refrain. So I said, OK, it's not happening through bottom up. So I'm going to do it. So I took seven months, and I collated all these incidents. I classified them. I went through them, and I gathered the wisdom from previous incidents based on the notes and then turned them into runbooks.

And the effects were huge. Things got a lot quieter. Things got a lot calmer after about four months of doing this writing. And over time, we saved energy. We were able to onboard juniors much more easily. We had much more control over the process. It was just a virtuous circle once that seven months was done.

But the really hard thing was the time spent getting to that point of critical mass, where it was used and useful. And also, it's very hard to maintain the discipline of, have you checked for runbooks? Have you checked for runbooks? Because people just carry on asking around like they used to.

CRAIG BOX: Where is the balance, though, between documenting what to do when something goes wrong and then going back into the code and fixing it such that it just never goes wrong?



So we had a process. We had two teams, actually, in the end. We called them the incident and the problem team. This was back in 2006 / 7, I think, because we were still in ITIL land. And many of our listeners won't know what ITIL was. But basically, it was a set of concepts that were taken from industry best practice and gave everyone a language that they could all — when I say incident, you know what I mean. If I say problem, you know what I mean. That's the idea. That was the idea of ITIL. A really powerful idea, actually.

CRAIG BOX: The L is for library.

IAN MIELL: Library, exactly. Yeah, and it was done by the British government. I didn't know this until recently.

CRAIG BOX: That says a lot.


IAN MIELL: So we had an incident team and a problem team. And the task of the incident team was to respond to incidents, and fix them, hopefully, and get them resolved. And then, we had a problem team whose task it was to go look for recurring problems, and ideally find the low-hanging fruit that had the most benefit, and go and fix them properly.

And so they had a budget to do that. Basically, it was a competitive thing. The incident team would complain, oh, this thing happens over and over again. Why aren't you problem guys solving it? And the problem team would go, you were working on it, working on it. And there was always a judgment like, how much does this actually cost us? How much toil in modern parlance? Or previously, how many hours are we spending on this incident?

If it was not that much, then, yeah, and it would take us two weeks to fix, say, then we might not do it. But if it was something where it takes two weeks to fix, but we're spending two weeks on it over every three-month period, then yep, that's the candidate to attack. And that works really well, because if you kept the metrics properly, you could actually see like, oh, look, we cut this problem out. And the call rate went down by 10%. So that really improves things.

So to take a bit of a step back, the team I was on was composed of programmers or engineers who were rotated out of development into the centralized support team. And customers paid a lot of money for this support, so it had a lot of weight within the company. And so we could go back to dev teams and say, this piece of work you've done is just not fit for purpose.

It's not well-architected. It's not well-written. You really need to go and look at improving it. And we would sometimes help them with that. Or sometimes, we would send it back to them. But because we were ex-engineers, it wasn't just, oh, there's some support guy throwing something back at me. We would really try and fix it ourselves first.

CRAIG BOX: It really does feel like what we now think of as a SRE function as opposed to a support function.

IAN MIELL: It was. It was in many ways. When Google came out with the SRE book, I was like, oh, damn, we've been doing this for years. And we never talked to Google about it because we never thought there was any — it would be much common sense.

CRAIG BOX: If only you'd thought of an acronym.

IAN MIELL: [CHUCKLES] Exactly, yeah. The reason we were in that spot, actually, is because, wind back 20 years, we had these customers when we were 30 people in a room. We had these customers. And they said, we want to buy a support contract off you. And we're like, what are you talking about? We've given you the software.

Oh, no. We want to buy a support contract. So we'll pay you x million a year just to maintain the systems. We're like, OK, sure. So we went out to market and found a bunch of support engineers by googling them. And we built this team. And the customers were like, no, we don't want those guys. They don't know the code. I just phone up the tech lead and they fix it straight away. We want that.

So we had to get rid of that team. And we said, right, engineers, now you get rotated into support. So this is like DevOps. This was back in 2002. So it's like DevOps. But we couldn't do DevOps purely because the systems were too big. You had to have a centralized team. So crudely, I think of SRE as DevOps, but too big for one team. You have to centralize aspects of it.

Another way to look at it is platform. You're running a platform on which your software runs. But anyway, this team was composed of engineers. And engineers would rotate in and out of the support function. And so because we did that, we could rewrite stuff or think of things in that way. So yeah, we had this concept of incident/problem. And we had this way of deciding what to attack. And usually, it was fairly obvious. Usually, there was some problem that was happening every week. And it was like, OK, we really need to fix that. Let's do it.

A couple of weeks ago, we spoke to Steve Wade, who is an independent Kubernetes consultant. You've written about why it's great to be a consultant. And indeed, you are now working for a consulting company, leading teams there. How do you summarize why it's the right choice for you at this point?

So one of the things I got frustrated about working for big corporate machines was the fact that, and for very good reason, you are in a box. You are hired as x. And if you step outside that area, then that's problematic. And yet you have experience. You know how things work. And you think you can actually diagnose problems, deeper problems than just why my Helm chart isn't working.

So I looked at various consultancies. And I joined Container Solutions because they have this approach, quite a holistic approach, of — and a lot of companies have this, but we really emphasize it — which is that usually when a company is struggling to implement cloud native, it's because of deeper strategic challenges. And it's because of deeper misalignment between engineering teams and the rest of the business.

And if you don't sort that out, then you're going to struggle to make it work. One of the great things about being a consultant is you're outside the political machinations of the company. You can actually go and tell them, the reason we think your project is not going to work is not because of the tech, but because of this. Or it may be because of the tech. But mostly, it's not. Mostly it's because of things they're not considering.

And if you say these things, the worst they can do is fire you as a consultant. But because we have many customers, we don't actually depend on one client. Or one conversation is not going to ruin our business. Whereas, if you're an employee and you irritate a senior leader because you're getting into stuff you shouldn't be getting into, that can be the end of your career there. And you have to go look for another job.

So that freedom that you get to talk openly and honestly to customers, I find very liberating because you don't have that cognitive dissonance doublethink you have to have in an organization where, I know this project is going to fail, but I will keep my mouth shut. That's the thing I most enjoy.

CRAIG BOX: What are you learning about at the moment? And when does the next book come out?

IAN MIELL: The thing that's come out to me in the last few years is a thing I called finance topologies. There's a great book called "Team Topologies," which describes the different kinds of teams and best practices around structuring teams for IT or for software projects. And I've started writing a book called "Finance Topologies," which is about the deeper causes of technology transformation failure and specifically how they are funded.

So working with various companies, trying to move from traditional software development project ways of working into product ways of working, or from noncentralized platforms to centralized platforms. When we dig into why projects have worked, we often find that, fundamentally, the resources of the business haven't been assigned appropriately to that task.

And you debug. You ask the five whys. With one customer, we ask the five whys. And we ended up realizing that the way the accountant saw the project was different to the way the engineer saw the project. And so the accounting method for the project was different, which meant that the reporting method was different. So success was valued differently.

And so on it went. And so there was just this fundamental miscommunication between the parts of the business about what was being done and why it was being done. And so my thesis is that if you get the finance topologies right, then the team topologies will fall out of them. And the team behaviors and the culture will all follow.

One of the frustrations I have reading about IT transformation in general is that there's a lot of talk about culture as though it were this thing that has no causes. You say your culture is wrong. It's like, great. Now what do I do? How do I change my culture? Do I buy everyone a beer on a Friday and it's sorted or?

CRAIG BOX: You fix it.

IAN MIELL: [LAUGHING] Exactly, yeah, you fix it. And this is where the history thing comes back because, as a historian, everyone is a Marxist. They believe that material things, money, crudely, makes the world go round. And as an engineer, you're trying to find, what's the oil that makes the system flow? And of course, it's money at the end of the day.

And if the money isn't flowing in the right places in the right way, then a lot of other behaviors are a lot harder to make right. It's not that you can't fight against the tide. But you are fighting against the tide. And that's hard. So yeah, I'm exploring that idea in some depth at the moment and trying to get together a book on it.

I've written a few blog posts on it. Yeah, it's been really fun to see the reaction to that. But I think it's underexplored. I think Conway's law — I actually went back and read Conway's law paper again, recently. And I noticed that at some point, because Conway's law, for those that don't know, says that the structure of your organization will determine its technology solution. I'm paraphrasing, but something along those lines.

And at some point, I think he talks about the administrative function at the top that decides how the teams are formed or something, really glosses over it. And so I want to flesh out that bit and say like, OK, the way you think about the money in your business, that's going to determine a lot of aspects of how your organization is structured, which then determines culture and, ultimately, your software components.

CRAIG BOX: Are you going to call it "Learn Sociology The Hard Way"?

IAN MIELL: [LAUGHTER] Yeah, I suppose it is a mix of all these different things. Yeah, I think "Finance Topologies" is probably going to be the title because it's nice and catchy, and sounds techie as well.

CRAIG BOX: I expect it will fly off the airport bookstore shelves.

IAN MIELL: [LAUGHING] Let's hope so.

CRAIG BOX: All right. Well, thank you very much for joining us, Ian.

IAN MIELL: Well, thanks, Craig. It's been a lot of fun.

CRAIG BOX: You can find Ian on Twitter at @IanMiell, on the web at zwischenzugs.com with two Z's, and you can find Container Solutions at container-solutions.com.


CRAIG BOX: Thanks again for joining us. If you've enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod or reach us by email at kubernetespodcast@google.com. You can also check out the website at kubernetespodcast.com, where you will find transcripts and show notes, as well as links to subscribe. Thanks for listening, and we'll see you next week.