A complicated equation of DevOps, Analytics and security
15.02.2022A guest post by Ram Chakravarti *
With the DevOps transition, companies need to focus on strengthening service management. This has led to the development of site reliability engineering to support IT operations.
Related companies
Better teamwork and AI-supported analyses help to find the right mix of security and agility.
(© kentoh – stock.adobe.com)
Site Reliability Engineering (SRE) is a set of principles and practices that cover aspects of software engineering and are applied to problems of IT infrastructure and IT operations (I&O). The main goal is the creation of highly scalable and highly reliable software systems.
Instead of focusing on one area, the SRE team works on operational activities (e.g. incident management) and development tasks (e.g. development of functions, automatic scaling and automation of manual tasks).
However, to support a true DevOps environment, the focus should be on the implementation and practice of more comprehensive Service and Site Reliability Engineering (SSRE). SSRE strikes a balance between agility and quality and relies on both IT service and operations management (ITSM/ITOM).
This equilibrium requires the orchestration of multiple datasets across the ITSM/ITOM and the application of machine learning (ML) algorithms to deliver differentiated reliability analyses. These insights then allow the SSRE to release new features with a high level of confidence, while striking a balance between agility and quality.
How do DevOps and ITSM play well together?
While the teams in the more agile DevOps area tend to be more autonomous and focus on their own products, companies should work with them to implement more comprehensive capacity optimization and monitoring and integrate them into the service management system.
Implementing swarming is a great way to align service teams with DevOps. Swarming transforms the typical multi-level support model, in which a problem is passed on until it is solved, into a model in which a person who attracts the right employees to work together finds the solution.
For a successful transition to a swarming model, you should do as many mundane tasks as possible automatically (for example, resetting passwords) so that your support team can focus on the big challenges. You should also have a robust real-time collaboration system (e.g. teams or Slack) to connect resources.
With swarming and value-adding teamwork models, the service organization can work better with the DevOps team to exchange information, create knowledge and prevent developers from being overwhelmed with support tickets. However, it is crucial that these interventions and collaborations are targeted and create added value so that the flow of DevOps processes is not interrupted.
One thing you should pay attention to when implementing value-added teamwork is performance metrics that may not be compatible with each other. If developers are incentivized to deliver code as often as possible, but operations or IT security personnel are only measured by uptime or production incidents, they are unlikely to agree.
Give all employees a common responsibility for the overall goals, which are measured in a standardized way, regardless of the job title or functional role. By fostering value-adding teamwork between the DevOps and service organizations, they also enable more comprehensive risk analysis to find issues that may not have been clear within the toolchain, with a process that can be deployed across the enterprise.
What about security?
Every person in an IT organization should also be responsible for security, regardless of their role. This philosophy has guided the natural evolution from DevOps to DevSecOps. One way to implement DevSecOps is to use technologies such as artificial intelligence (AI).
With the help of AI, you can enrich data sets with context, provide insights, drive action and optimize ITOps for the most complex companies. By using AI, you can also detect anomalies in the analysis of user and entity behavior and thus gain even better insights.
It is interesting that DevSecOps can look different for every industry. For example, it is already being expanded to include areas of security, criticality and regulations for industries such as public utilities. When it comes to electricity, water and critical infrastructures, security must of course be included in all facets of the organization – and this also includes the software.
Integrating better security checks into the development process can lead to more secure code when it is released for production. However, the implementation can be a double challenge, as these companies are often already hesitant to develop further – for example, when switching from on-premises to cloud.
Given the strict regulations and processes, any change can be difficult. Much of this is culturally driven, so the introduction of shared responsibility for the overall goals can help balance the cultural issues between development, operations and security teams.
Conclusion
For DevOps to work properly, organizations need to achieve a balance between organizational, operational, and technical aspects, including security, across their entire infrastructure. Companies need to develop a perspective on how much is enough when it comes to security (i.e. the level of required security controls based on risk appetite and the trade-off between speed and quality).
Ram Chakravarti
In parallel, DevOps teams should adopt strategies such as automating security testing as part of continuous integration and automating security updates as part of broader vulnerability and compliance management. The introduction of these practices in bites allows companies to successfully introduce new software solutions at the desired pace and in the desired quality, thereby achieving the DevOps nirvana.
* Ram Chakravarti is Chief Technology Officer at BMC Software.