OpenMarket – August 17, 2021
by Peter McCarthy
In November 2020, OpenMarket was acquired by our once-competitor Infobip. After many months, my team and I have started being exposed to Infobip’s engineering infrastructure, platforms, products and general approach to software engineering. My previous conceptions of software development were blown away in some regards, as it became clear at how the sheer scale of the company facilitated these sophisticated and meticulously planned conventions. But while it is almost certainly best practice to have a log store and some interface to filter them, there’s something special about manually SSHing into a machine and simply
tail -f’ing the file, and it’s these contrasts that have me torn on the behemoth software powerhouse of IB and the humble yet agile technical layout of OM.
Automation over People
One of the things that struck me immediately was how much automation IB (InfoBip) had in almost all facets of their CI/CD and monitoring. As mentioned previously, all logs are automatically streamed to a logstore which can then be queried through an ElasticSearch interface. This one web application provides log filtering/streaming, setting up alerts and generally getting a thorough overview of what a specific system (or instance of) is doing and what’s going wrong. At OM (OpenMarket) in contrast, each service and instance logged to a file on that VMs disk which was then collected and archived on another machine. In short, to see what a program was doing, one would SSH in and simply run Bash commands on the log file(s). Naturally automation is part of what makes a good software engineer, and IBs logging and metrics solution is exemplary. For a company of this scale, this is the ideal solution. However, manually inspecting enormously sized files has its merits also; not only does it require one to become proficient in Bash and knowing how to get the data you need out of a large file, but it also provides valuable experience in fundamental tools such as SSH and SSH keys and ownership of your own servers.
Similarly, IB’s CI/CD platform is one of the most polished I’ve seen. What astounded me most is that it was all built in-house and worked seamlessly with all of their projects, something I believed cloud providers like AWS were partly designed to offer. It really went to show how much can be achieved when the resources are available to build and maintain an integration system at this scale. Jenkins automatically builds and releases new versions, and IB’s own deployment manager service sets up Docker containers, ports and inter-service communication with literally the click of a button. OM, on the other hand, generally had bespoke release and deployment processes for each project. Naturally the older ones were more of a challenge to standardise, but having to recollect in the README how to release and deploy an application becomes cumbersome and unnecessarily complex. That’s not to say there were no benefits from this: being exposed to so many different frameworks and plugins we used, not to mention the amount one can take in about Git and its tagging system for releases, were definitely skills that can take one farther in becoming a more well-rounded engineer. It’s nice when something does what you want to do by itself, especially if you know what you want to do, but building andusing it yourself teaches you exponentially more about software integration and gives you a much more substantial appreciation of how CI/CD can and should be approached.
Setting the standard
Using a microservices approach to software design means communication is the bedrock of the infrastructure. Services need to be able to discover others and forward data in a common language/encoding, taking other requirements like load balancing and queueing into account. While this can all be done using a common library to facilitate this communication, in a system of this scale, a separate system is much more pragmatic to orchestrate data transfer between services, which is how IB has solved this problem. All networking communication is done through the application, which acts as a type of registry for services to ’check-in. This type of communication abstraction is perfect for a large number of teams, many of which are creating microservices applications, who don’t need to worry about how to discover or ping a service. What makes this so seamless is the use of RPC (Remote Procedure Call) at the application-code level. This effectively means functionalities are viewed as simply library imports rather than API calls. E.g. One service has a method
“double(int x)” which simply doubles the input of x. Instead of calling this explicitly over HTTP, the user can simply import this service’s library and call the method from the application code.
This is where IBs standard service architecture comes in. New services are recommended to utilise IBs standard project layout, which includes the functionality for RPC. That is, if a new project is needed, the functionalities of RPC and this communications registry is included in the standard template. Thus all services elegantly communicate with each other without having to worry about the networking layer as much. This solution is a masterclass in abstraction, removing arguably unnecessary requirements from projects and streamlines the development of actual business features.
While this works well for an organisation of this scale, again, for OpenMarket it would likely be a hindrance than a boon to productivity. IBs approach is very inflexible, having a significantly larger number of moving parts than OM’s, and with those crucial infrastructure layers abstracted out, teams are less able to diagnose and troubleshoot issues, leaving it up to the networking team to solve. OM used a message passing approach to its core SMS platform. This was simply a library that a new service would import and integrate if it wanted to communicate with other services utilising it. The library handled service discovery, encoding, load balancing and service redundancy, and with it all being compacted into a single library, this left factors like networking more open to developers and therefore became a much more agile approach. Like before, it meant team members needed to have some understanding of the networking layer that perhaps the IB approach didn’t; packet captures were a somewhat uncommon occurrence at OM, but even the need for them shows how open and flexible this solution was.
Summary & Final Thoughts
What this transition has shown me so far is how much organisation scale needs to equalise with engineering; an approach taken at a hundred engineers won’t necessarily be the most pragmatic at a thousand. It is a trade-off between flexibility and homogenisation. A few smaller developers need to be fast and agile (whatever works best for the team), while a relatively enormous one with a constantly shifting and evolving catalog of services needs to be robust and uniform (whatever works best for all the teams).
While I believe there was room for alignment at OM, I believe our message passing approach was lightweight enough to be flexible, while still offering a common platform for a large number of our services. Obviously when resources are tight (which was the case at OM compared to a behemoth like IB), sleeker solutions are called for- and when they are in supply, entire teams focused around this infrastructure can streamline engineering to its maximum.
My time at OM was invaluable experience for learning about things like SSH, setting up a project with logging and config etc. manually. The sorts of things that every project requires, but is not necessarily what you yourself will need to implement. It impressed upon me the feeling of ownership, in the sense that the VM my application was running on was, for the most part, truly my responsibility, and this is something I think is fantastic experience especially for newcomers to software engineering like myself when I joined OM.
At the same time, I believe IB is the next logical step in this learning process- once you understand and appreciate all the moving parts to your software, it should become the responsibility of another team to manage that for you. Your job as a developer becomes much more streamlined, focusing on your service’s code rather than the supporting factors of it. So while I think that the direction of IBs engineering infrastructure is on the right path, it always helps to know how some of the stuff works under the hood!