Can I Wireshark this?

As mentioned in my previous posts, I spent many years working primarily in the networking industry. Even my first full time role was at a broadband company (admittedly in the call centre, but it still counts!).

Now, I work as a DevOps Engineer. How did that happen?

The network world

The network industry itself has been around almost as long as computing expanded beyond gears and levers. From the early days of ARPANet, to the advent of TCP/IP, IP addressing, to later on IPv6, there have always people working in the terminal, plugging away at a CLI interface making it work.

For over a decade, I was on one of them. I took great joy in configuring a Cisco 6500/ASR9000/887/Nexus 7k, or a Juniper MX/SRX, or a HP Procurve, or HP/H3C 5800 or 5900, or Mikrotiks, Brocade, and any of the Linux-based tools to talk routing protocols to them (Quagga, BIRD, OpenBGPD etc).

I spent my days working with BGP, MPLS, MTU issues (it’s ALWAYS MTU), Spanning Tree (running it across a 200-mile distance worked…somehow!), PPP, IPSec, VxLAN, EVPN, and loving every second of it.

Firewalling goes hand in hand with networking (in terms of knowledge and blame), so I also tried my hand with Cisco ASAs, Checkpoint, learning IPTables, pining for Palo Alto and a lot more.

Forming and forging core networks, routing packets all over the world, it felt almost magical at times. I was enamoured, to the point of being addicted to studying for certifications in my free time (to learn, and to prove what I knew).

I loved networking. It was my thing. I felt like I was in in the networking industry for the long haul. So what changed?

Automation?

Automation has been a bad word in the networking industry to some. Many believe it replace them. Others want to avoid being “programmers”, happy to plug away at the CLI. Compared to the Systems Administration world, and especially the cloud offerings, the networking industry (and the vendors) have lagged behind.

Most networking equipment and vendors still put the CLI first. Automation often involves complex Expect-style scripts, effectively simulating someone typing in commands and reacting to the output.

Libraries and modules have come along to make this job easier (e.g. Netmiko for Python, NAPALM, and Ansible support). Often though, it isn’t much of an abstraction. You are still expected to know the commands and expected output. Also, there’s little support for agent-based configuration management (e.g. Puppet, Salt) in anything but the latest equipment.

Some vendors are starting to include APIs, YANG and/or gRPC endpoints. The problem is many providers still maintain legacy environments. I have worked at places still using Cisco 6500s (first released over 20 years ago!) as the backbone of their network more recently than you would hope (i.e. last year!).

Winds of change

In early 2015, I took a role at one of the largest telecoms companies in the UK (and in the world), working on their burgeoning cloud/data centre offering. They were heavily invested in HP/H3C networking. My previous company’s core network was nearly all HP/H3C, so I was a good fit.

I worked on improving their core network (including merging legacy networks together, decommissioning old suppliers, re-merging their autonomous system back together). During this, I noticed that changes could be made to the core network (usually by me…) and nobody had any idea. No alerting, no version control, nothing.

I investigated. It turned out Solarwinds NPM was installed as a plugin for our Monitoring platform (Solarwinds, oddly enough). It would periodically retrieve the configuration from the network kit, and it was trying to send out configuration changes every day to the NOC team.

This was great, except that when it sent emails with the configuration diffs, it sent the full configuration of every device (before and after changes) AND what differed (if anything). Fine with a couple of devices, but we had around 40 or 50 in total. Every email was 80mb. Our email quota was 2mb per email. Spot the problem.

A later version of Solarwinds fixed this issue (and only sent JUST the differences, rather than the full configuration), but we had a lot of custom alerts that would need re-writing when upgrading.

So I decided to build a tool to do it myself. I could have implemented something like Rancid or similar, but this would require opening up network access to it. People there were happy with the current tooling, and didn’t see the problem with the status quo.

Fun with Python

I had dabbled with Python before, but I hadn’t done anything of note in it. Basic user input, search and replace, nothing complex. I used this as a project to learn it.

The tool I built would log on to each device, get its current configuration, and save a copy locally. Once all the devices in the list had their configurations retrieved, they were committed to Git. As those who use Git know, it is very easy to show differences between the current commit, and the last commit (i.e. the one which ran the day before).

Using Netmiko (which supported most of the vendors we used), and Paramiko (for vendors that Netmiko didn’t support), I felt happy with what I had put together. It involved classes, native Python Git modules, and functions.

Seasoned Python developers will think this is quite basic. Now when I look back on it, I do too. I have since rewritten/refactored similar scripts for other purposes (to include parallelism, reporting etc).

It’s worth knowing that about 2 years prior (in my previous company), I nearly gave up on a project that required writing basic Bash scripts. To quote myself at the time, “I am not a programmer and I never will be”. That mindset was changing.

Not fun with networking

I had scratched an itch, I had built a tool that filled a gap in our processes. People came to rely on it. This was good, and gave me confidence to do more automation and scripting. I also gained familiarity with Netmiko, a Python module which automated basic shell interaction with a number of networking kit. It would automatically recognise prompts, you could tell it that certain commands were for configuration and some for just verification. It became my go-to library for Network interaction from then on.

Unfortunately I had also seen first hand how difficult it was to automate equipment (i.e. most network kit) that was never intended to be automated.

A lot of the kit in our legacy network was not natively supported by Netmiko. We also had some interesting HP firewalling that nobody has heard of before or since that role. For these, I would need to use Paramiko instead (which did basic SSH interaction). I had to tell it the exact prompt to expect, the time to wait on commands, the exact responses to expect from commands, and sometimes the terminal type (not everything worked well with VT100 for example) so it wouldn’t screw up the output.

Compared with my colleagues, who were dabbling with Ansible to manage multiple machines, Jenkins to deploy changes to server and code configuration, something didn’t feel right.

I had also started to reach a point in networking where I hadn’t learnt anything groundbreaking in a long time. I could learn the finer points of what was already there. The new developments in the industry seemed like a rehash of what had gone before (how many different ways have we tried stretched Layer 2 now?).

It is still an industry that is moving slowly, and even the “big” changes are not the entire paradigm shifts you find in the Cloud and Systems world. Facebook may be implementing something like Open/R (a distributed system for routing information), but some poor helpless souls are still using RIP unironically.

Kuberwhatnow?

The Systems guys on my team were very much Windows-first guys, along with a lot of VMWare knowledge. The company (or at least the tech lead) pushed more towards open source, meaning that anyone who had Linux skills in the team became immediately useful. I had been using Linux for longer that I had been in networking (although not as much professional experience), so I started getting more involved on the systems side.

The developers were rolling out Docker containers, but they only had a single machine to do it with. Every now and then (whether due to power cuts/maintenance or whatever) the Docker box would fail. This needed to change.

I’d heard about this Kubernetes thing (through the wonders of Twitter) and decided to give it a go. It was a new world of automation, orchestration, being able to interact with APIs, and watch the cluster just do its thing. Throw in some Ansible, and I could spin up a virtual machine, and watch it join the cluster 5 minutes later. A world away from the artisanal core network configs required to make everything run just about right.

I had a slight eureka moment, seeing where the industry was heading towards. At the time, Docker Swarm was still an option, and Mesosphere was in use in many companies. Little more than 3 years later and Kubernetes has dominated the container world.

Strangely, one of the first things that was deployed on the cluster was a Graphite-based tool to gather stats from CheckPoint firewalls and push them to a server which ran Graphite and a Grafana dashboard. Why did we need to create this tool? Because Solarwinds couldn’t have a device be polled differently, but using the same management IP, and Solarwinds were unwilling to implement the latest SNMP MIB that would get around this issue.

I can probably thank Solarwinds (begrudgingly) for a lot of my current career.

Moving on

I moved on to a different role, a Solutions Architect kind of job. It seemed like a good fit for my family and I, in terms of the location and the pay increase. Also a role that was working at a higher level (i.e. overall design and architecture rather than configuration) would help me see the bigger picture.

Unfortunately the role was very hands off. I spent 99% of my time in Excel and Word. That did not work for me. I did have more of an appreciation of budgeting, purchase orders, high level design, but I’m at my happiest in front of a black-background terminal with white/green text.

So I applied for a hands-on role again, working for a small ISP. This seemed perfect. Back in my earlier days in the industry, I always dreamed of working for ISPs. Working with large core networks, making changes at a large scale, it all seemed exciting.

Not living the dream

I started at the role. And it hit home. This wasn’t for me any longer. I grew more and more dissatisfied with networking.

While doing customer rollouts, I would build dashboards (using Python, Flask, Bootstrap and Postgres) that would automatically updated when engineers uploaded their job spreadsheets. I built configuration management tools for our network infrastructure. Whenever I actually had to log on to a device directly, it made me feel a bit sad.

The love for networking was gone. It didn’t help that the past few jobs had issues with management, workload, and the work/life balance being a mystery to them. In those times before though, I used to fall back on how much I loved networking. Instead I found increasingly the more I had to work on the network, the more disenchanted with the networking industry I became.

Time for a change

I decided the right path for me should be in the burgeoning DevOps world. My operational experience of managing Networks, Linux and Virtualisation would help me. My newfound love for development would help. I started looking into the tooling and infrastructure.

I brushed up on the AWS basics, made myself familiar with Azure, I started playing around with Terraform. I had worked with Ansible a little before, so I brushed up on that. And then I applied for DevOps jobs. Every day. For months.

I struggled to get much feedback. There were no shortage of jobs that wanted someone to come in and put some basic Python scripts together to do network rollouts, but this really wasn’t what I wanted. I’d still be managing the network, and still end up logging in manually to see why automation hadn’t worked.

“Use this one trick!”

I decided to try something. My resume had my current role as Network Engineer. While this was technically my job title, I was also managing all the Virtualisation, the MySQL database, and also managing all the additional tooling I had created. So I changed my job title to be Network And Systems Engineer.

Next thing I knew, I had three or four phone interviews within the week. A couple didn’t work out, primarily as they were very Microsoft focussed. I can do some of the basics within Windows. I had even migrated over from two separate Active Directory domains to a single one in the role I was in. Really though, Windows is not where my skillset is.

One of the phone interviews I had was with a company I’d already applied to twice before, for the same role (unbeknownst to me when I applied the third time!). It required someone with Linux skills, knowledge of the cloud and virtualisation, some development experience, and any knowledge of carrier networking would be beneficial too. This seemed like a perfect fit.

3 months later I started there as a DevOps Engineer.

Whoami

I still feel odd about having left the networking industry. The excitement I used to feel for it, the comfort I had in my knowledge. I still have a hand in the network infrastructure at my current role, but to a much lesser degree than I used to.

Maybe when the industry has moved to being truly automated, and not just at the megascale companies (Amazon, Facebook, Google), it may be more interesting to me again.

For now though I’m excited about my job again, learning every day, and finding a lot of room to grow.