When you start working in the world of networking and you get to know the big brands, you think that what these people do is incredible. And it is true, what they design, develop and build seems impossible to understand, even to achieve....
With the evolution of communications, the internet turned, among other things, into a repository of all kinds of data and information, and the world of software began to expand. The emergence of Research and Development communities generated an evolutionary process of opening at a software level. On the hardware side there was also an opening, but if we consider the resources required, software development clearly has the edge.
It used to be impossible to think of having a router with advanced functionalities in the palm of your hand. Today, however, it is very easy to have an environment with the same characteristics, for example, to develop educational kits in robotics and programming or to test different integrations combining several components.
Our motivations can differ, but nowadays we can encourage ourselves to create and design scalable and cost-effective solutions with general-purpose hardware and software components. Even though there is a lot of information available, every design project presents several challenges. In the following paragraphs I will expose two of them.
What do we want to do? Why do we want to do it? What is the need that we must cover? The purpose indicates our north. How we get to fulfill it depends on the resources we have and our creativity. After this definition, the degree of complexity will indicate how many parts to divide the problem into, what methodology to use, and other variables, but in all cases the key is to have a clear objective.
How do we achieve reliability in a productive environment? Mainly, we can consider two aspects: one is the use of dependable elements and the other to always subject the entire project to comprehensive tests.
Mainly, we can consider taking the following steps:
- Using dependable elements
- Carrying out “black-box” and “white-box” tests
To explain further what we can do, we will review a practical use case below.
Let's imagine that we have a solution in our datacenter that serves thousands of customers. This solution is not new, but rather one of the old ones, so vital to our service that no one dares to refactor it, since it is a very expensive process.
In this case, we need to guarantee the high availability of the service at all levels, but above all, we must track not only the network reachability, but also the service itself. This means the application must be up and running
There are many different types of solutions to this problem, but in our case, we seek to focus on one that, in a practical way, allows us to have a cost-effective and reliable solution. It is worth clarifying that everything can always be improved, and the idea is to go through one of the possible paths between the problem and the solution.
It could be a very good solution to monitor the service from a user's point of view, and if for some reason it stops responding, this will allow us to program actions automatically.
The following is the list of elements which meet the reliability requirements:
BGP is a proven and mature protocol, responsible for the control plane of the INTERNET. Its evolution towards other services such as L3VPN, L2VPN, MVPN, EVPN, etc., shows the power of the application.
Python is a scripting language based on C and C++ programming languages. It is so versatile that it is practically used in all technological areas. Its learning curve is not complex at all, and there are thousands of specific purpose libraries developed.
ExaBGP is an application that allows you to establish BGP sessions and manage the sending and receiving of prefixes in a programmatical way. It is not a router like Quagga, but it is closer to a control plane. ExaBGP exposes an API through which routes can be added or removed, among other things. Additionally, this application runs as a service, which can also be monitored and restored if, for some reason, it fails.
Use Case Diagram
The proposal is to design a solution that behaves like a client, and that in the event of a service outage can generate changes in the network and restore itself quickly. The following diagram describes the relationship between the service and the network through ExaBGP.
The ExaBGP process has a BGP session established towards a certain Router. It can be a Route-Reflector or any device on the network that has BGP reachability and can announce routes. Depending on the interconnection context, the best iBGP or eBGP option can be defined, since the ExaBGP process can work with both types of session.
In order to control our service, we will use a python script. As long as the service responds, the script will send an advertisement of a certain route or a set of routes through the API of the ExaBGP process. If for some reason that service stops responding, then two types of action could be taken:
- Update the route to the 'BACKUP' service, modifying the next-hop
- Activate a new route to the 'BACKUP' service by modifying the Destination Prefix
Personally, I consider the first option to be simpler, and it will be complemented by the 'ANYCAST' functionality to avoid a change at the DNS level.
Today we have tools and functionalities that allow us to design and implement solutions that meet our needs in a reliable and sustainable manner. These can be as simple or complex as our creativity allows. The case study mentioned as an example is fully feasible, and there are many more.
Relying on components, protocols and functionalities already known and mature in the market allows us to guarantee the stability and growth of the solution.
In case you want to explore more about the possibilities and case studies: