research-article

Open Access

Self-Adaptation in Industry: A Survey

Authors:
Danny Weyns

Katholieke Universiteit Leuven and Linnaeus University

Katholieke Universiteit Leuven and Linnaeus University

0000-0002-1162-0817
View Profile

,
Ilias Gerostathopoulos

Vrije Universiteit Amsterdam

Vrije Universiteit Amsterdam

0000-0001-9333-7101
View Profile

,
Nadeem Abbas

Linnaeus University

Linnaeus University

0000-0002-7555-7300
View Profile

,
Jesper Andersson

Linnaeus University

Linnaeus University

0000-0001-5471-551X
View Profile

,
Stefan Biffl

TU Wien

TU Wien

0000-0002-3413-7780
View Profile

,
Premek Brada

University of West Bohemia

University of West Bohemia

0000-0001-5617-6396
View Profile

,
Tomas Bures

Charles University Prague

Charles University Prague

0000-0003-3622-9918
View Profile

,
Amleto Di Salle

European University of Rome

European University of Rome

0000-0002-0163-9784
View Profile

,
Matthias Galster

University of Canterbury

University of Canterbury

0000-0003-3491-1833
View Profile

,
Patricia Lago

Vrije Universiteit Amsterdam

Vrije Universiteit Amsterdam

0000-0002-2234-0845
View Profile

,
Grace Lewis

Carnegie Mellon Software Engineering Institute

Carnegie Mellon Software Engineering Institute

0000-0001-9128-9863
View Profile

,
Marin Litoiu

York University

York University

0000-0003-0383-920X
View Profile

,
Angelika Musil

Katholieke Universiteit Leuven and TU Wien

Katholieke Universiteit Leuven and TU Wien

0000-0002-1025-1626
View Profile

,
Juergen Musil

TU Wien

TU Wien

0000-0002-2163-3603
View Profile

,
Panos Patros

Raygun Application Performance

Raygun Application Performance

0000-0002-1366-9411
View Profile

,
Patrizio Pelliccione

Gran Sasso Science Institute

Gran Sasso Science Institute

0000-0002-5438-2281
View Profile

ACM Transactions on Autonomous and Adaptive Systems Volume 18 Issue 2Article No.: 5pp 1–44https://doi.org/10.1145/3589227

Published:28 May 2023Publication History

ACM Transactions on Autonomous and Adaptive Systems

Abstract

Computing systems form the backbone of many areas in our society, from manufacturing to traffic control, healthcare, and financial systems. When software plays a vital role in the design, construction, and operation, these systems are referred to as software-intensive systems. Self-adaptation equips a software-intensive system with a feedback loop that either automates tasks that otherwise need to be performed by human operators or deals with uncertain conditions. Such feedback loops have found their way to a variety of practical applications; typical examples are an elastic cloud to adapt computing resources and automated server management to respond quickly to business needs. To gain insight into the motivations for applying self-adaptation in practice, the problems solved using self-adaptation and how these problems are solved, and the difficulties and risks that industry faces in adopting self-adaptation, we performed a large-scale survey. We received 184 valid responses from practitioners spread over 21 countries. Based on the analysis of the survey data, we provide an empirically grounded overview the of state of the practice in the application of self-adaptation. From that, we derive insights for researchers to check their current research with industrial needs, and for practitioners to compare their current practice in applying self-adaptation. These insights also provide opportunities for applying self-adaptation in practice and pave the way for future industry-research collaborations.

1 INTRODUCTION

Computing systems form the backbone of our factories, traffic control systems, healthcare, telecommunication, financial systems, and so forth. When software plays a vital role in their design, construction, and operation, these systems are often referred to as software-intensive systems [21]. The trustworthiness and sustainability of these systems is vital for our society [5, 32]. Yet, building and maintaining trustworthy and sustainable systems is challenging due to complexity that arises from the growing demands on these systems, their continued integration, the uncertain operating conditions they face, the fast speed of technological progress, and so forth. These challenges have been a continuous driver for new and innovative approaches to design, develop, and operate software-intensive systems. One common approach today is the so-called DevOps in which development and operation are blended, allowing system components to be easily evolved and redeployed without impacting their operation [7].

A classic approach to address the increasing complexity of software-intensive systems is transferring control from humans [27] to software components by equipping systems with feedback loops that automate tasks that otherwise need to be performed by human operators. These feedback loops monitor the system and its environment, reason about the system behaviour and its goals, and adapt the system to ensure its goals under changing conditions, or gracefully degrade if necessary. Such goals can be quite diverse, ranging from ensuring a required level of performance under uncertain workload conditions, dealing with errors caused by external services that are difficult to predict, or defending the system against malicious attacks and the problems they may cause. A typical example is a feedback loop deployed in a cloud environment that expands or decreases computing resources to meet changing demands while minimising the cost of operation. Another example is a container framework that performs auto-scaling in a microservice deployment.

The principles of applying feedback control to software-intensive systems have been the subject of active study in academia. Back in 1998, Oreizy et al. [33] presented a seminal paper at the International Conference on Software Engineering (ICSE), where the authors introduced the notion of self-adaptation that comprises two simultaneous processes: system adaptation that is concerned with detecting and handling changing circumstances, and system evolution that is concerned with the consistent application of change over time. A few years later, Garlan et al. [15] stated the crucial role of architectural models as first-class citizens that enable a system to reason about system-wide change and adapt itself accordingly to achieve or maintain its goals. Blair et al. [4] consolidated and elaborated on these principles in what is now generally known as “models at runtime.” In 2007, Kramer and Magee [25] stated the crucial role of software architecture in the realisation of self-adaptive systems, distinguishing adaptation management from goal management. Over the past decade, the research community has developed a vast body of knowledge and know-how on principles, e.g., see [2, 4, 13, 37], models and languages [23, 31, 43, 52, 54], processes and methods [1, 6, 8, 48], patterns [26, 35, 53], and frameworks [10, 15, 36] to engineer self-adaptive systems. Researchers have documented a substantial number of literature reviews and surveys on various topics in self-adaptive systems, such as the benefits of self-adaptation [51], requirements for self-adaptive systems [56], approaches to realise self-adaptation [26, 28, 30, 39], the use of formal methods in self-adaptive systems [49], self-protection [57], the notion of uncertainty [20, 29], and the use of machine learning in the realisation of self-adaptation [17], among others. There are several basic research works in the field of self-adaptation, e.g., see [7, 9, 22, 38, 44].

In parallel, the principles of feedback control have been studied and applied in industry. For example, about two decades ago, IBM launched its legendary initiative on autonomic computing [24]. Inspired by the autonomic nervous system of the human body, the central idea of autonomic computing was to enable computing systems to manage themselves based on high-level goals. Four classic goals are self-optimisation, self-healing, self-protection, and self-configuration. Autonomic computing delegates the complexity of system operation to the machine aiming to reduce the time required by operators to resolve system difficulties and other maintenance tasks such as software updates. Over the years, industrial solutions based on feedback loops have found their way to practical applications—for instance, in the domain of elastic cloud to adapt computing resources and automated management of server parks to deal with changing business needs (e.g., [3, 40]).

Although the output of academic research is documented in research articles, journal volumes, and books, the current practice of self-adaptation in industry has never been systematically described.

1.1 Objective and Research Questions

Our general objective is to better understand the state of practice of self-adaptation in industry. To that end, we perform a large-scale survey with active practitioners. Concretely, this survey aims at shining a light on what motivates practitioners to apply self-adaptation, what kind of problems they solve using self-adaptation, how practitioners design and develop self-adaptive systems, whether they follow any established practices, what difficulties and risks they face in adopting self-adaptation, and what future opportunities industry sees for the application of self-adaptation.

To the best of our knowledge, no systematic study has been done that investigates these issues. Investigating industrial practice on self-adaptation and answering the questions targeted by this study will help narrow the gap between industry and academia. It aims at helping researchers in academia to get a better picture of how self-adaptation is applied in practice, the industrial needs in realising self-adaptation, and what problems practitioners face. We conjecture that having a better picture about industry practice will help the research community to position their efforts with respect to industrial needs and make well-informed decisions to set future research objectives, both fundamental and applied. However, drawing a picture of the state of the practice can also benefit industry by sharing the motivations and potential benefits of self-adaptation, directing them towards relevant sources of information such as best practices, and identifying opportunities for collaboration with researchers to address the problems they face.

We aim to answer the following concrete research questions:

RQ1:	What drives practitioners to apply self-adaptation in software-intensive systems?
RQ2:	How do practitioners characterise self-adaptation?
RQ3:	How do practitioners apply self-adaptation in industrial software-intensive systems?
RQ4:	What are the experiences of practitioners with applying self-adaptation, and do they see opportunities for how and where to apply self-adaptation?

With RQ1, we want to investigate the motivations of practitioners for applying self-adaptation, the kinds of industrial systems for which self-adaptation is applied, and the types of problems they solve using self-adaptation. In academic research, self-adaptation has been proposed for two main complementary problems [44]: (1) to automate the management of complex software-intensive systems based on high-level goals provided by operators, and (2) to deal with operating conditions that are hard to predict before deployment and need to be resolved during operation (i.e., mitigating uncertainties). Key management tasks for self-adaptation are self-healing, self-optimisation, self-protection, and self-configuration. We want to understand whether industry uses the principles of self-adaptation to deal with the same or different problems, and whether and how they relate to the classic system and software management tasks. Answering RQ1 will shine a light on application areas, motivations, and concrete problems for which self-adaptation is applied by practitioners or could be applied by practitioners who currently do not use self-adaptation. This may provide academics with insights in relevant areas to drive and validate research results on self-adaptation. The results may also indicate applications and problems that are not yet explored in industry and may benefit both academia and industry.

With RQ2, we aim to investigate the perception of practitioners on the concept of self-adaptation. We are particularly interested in how practitioners characterise self-adaptation as a property that enables a system to adapt itself at runtime. To that end, we will elicit concrete examples of what they understand by self-adaptation. This will give us better understanding of whether and how practitioners understand the concept of self-adaptation, what terminology they use, whether there are any differences in the viewpoints on what constitutes self-adaptation, and whether they consider self-adaptation altogether useful. This may also shine a light on whether there are any (emerging) industrial standard practices (e.g., a technology stack or tools). Answering RQ2 will help researchers get a better picture of how practitioners understand the concept of self-adaptation. However, the insights may reveal potential opportunities for practitioners to benefit from expertise of other practitioners as well as knowledge developed by researchers.

With RQ3, we aim at examining how self-adaptation has been realised and used in industry. We are particularly interested in mechanisms, tools, benchmarks, and processes employed in the industry to engineer self-adaptive solutions. We will pay attention to the degree of automation and the role of humans in runtime adaptation, as this is commonly considered important for trust in software-intensive systems (e.g., see [50]). Furthermore, we are interested in comparing industrial practices with solutions developed by academics, such as modelling techniques, frameworks, and verification techniques. We also want to understand how practitioners obtain trust in the self-adaptive solutions they employ. Answering RQ3 will provide insights into best practices on how practitioners realise self-adaptation. It will highlight the criteria that practitioners use to apply and realise self-adaptation solutions and may shine a light on to what extent solutions from the research community have been adopted in industry. These insights will open opportunities for both academia and industry to steer future research and improve practical applications.

Finally, with RQ4, we want to understand the difficulties and risks, if any, that practitioners experience in the design, implementation, and other engineering activities of self-adaptive systems. We also will probe whether practitioners face problems for which they would appreciate support from researchers. Finally, we elicit opportunities that practitioners see for applying self-adaptation that are not exploited yet. Answering RQ4 may help fill the gap between academia and industry. Furthermore, identifying problems and risks may trigger new collaborative studies to investigate and address these challenges. Such studies are likely to bridge the gap and result in more targeted research and improved industrial applications of self-adaptive systems.

1.2 Contributions

By drawing a landscape of the use of self-adaptation in industry, the survey results benefit both researchers and practitioners. Concretely, the contributions of this study are as follows:

•	An empirically grounded overview of the state of the practice in the application of self-adaptation
•	Insights for researchers to assess their current research in relation to industrial needs
•	Insights for practitioners to assess the level of their current practice in applying self-adaptation
•	Additional prospects for applying self-adaptation in practice and opportunities for industry-research collaborations.

Preliminary results of this study were reported in previous work [46]. That work only considered a small subset of questions (focusing on the motivations of practitioners to apply self-adaptation, concrete use cases in practice, and difficulties practitioners face when applying self-adaptation) and reported initial results based on one batch of data (113 participants). This work extends that study with the view of practitioners on self-adaptation, the drivers for using self-adaptation, methods used, experiences with applying self-adaptation in industry, and opportunities for the future. In this article, we consider the full dataset of 184 participants from more parts of the world.

1.3 Outline

In Section 2, we present the study design with the survey questions and analysis methods used. Section 3 presents the results for each research question and provides key insights for each research question. In Section 4, we derive insights from the study results for researchers and practitioners. Section 5 discusses threats to validity. Finally, we wrap up and conclude in Section 6.

2 RESEARCH METHOD

In this study, we use a survey as the research method [18]. Subsequently, we discuss the population and sample, the questionnaire, and the data analysis methods we used.

2.1 Population and Sampling

Our target population included practitioners actively involved in the engineering of industrial software-intensive systems in any domain—architects, designers, developers, testers, maintainers, operators, and other people who have technical expertise and are actively involved in the development and maintenance of these software systems.

Concretely, we contacted 355 practitioners from a wide variety of companies¹ via the networks of the researchers involved in this study (i.e., the authors of this article) to complete the survey. We used two criteria to invite people: (1) participants should be active in different domains that are representative of software-intensive systems, and (2) participants have the required expertise to answer the questions. The invited practitioners were spread over 21 countries.² The invitations were sent by personalised emails in two batches during the period from November 30, 2020 until July 31, 2022. We sent reminders according to a predefined schedule of 1, 2, and 6 weeks after the invitation.

2.2 Survey Instrument

The survey used a questionnaire to collect data based on a set of predefined questions [18]. Because practitioners are not necessarily familiar with the term self-adaptation, the survey started with a gentle introduction of the core idea of what constitutes a self-adaptive system using basic terminology commonly used in industry, and illustrated this with a few characteristic examples to make it concrete. We used both closed and open questions. Closed questions have a predefined set of answers, such as yes/no or multiple choice. We also allowed participants to add extra options for answering several closed questions using a text field. Open questions provide a space that participants can use to provide an answer. Closed questions allow acquiring a clear view on a particular topic using basic statistics, whereas open questions allow acquiring in-depth insights using qualitative analysis. We provide a replication package with all study materials, including the study protocol, the questionnaire, the raw data, and the analysis results.³

For this study, we used a self-administered anonymous online questionnaire (Survey & Report hosted by Linnaeus University, Sweden). The main motivation to use an online questionnaire is to involve a large set of participants with relatively low cost (both time-wise and financially). We created an initial list of survey questions that were directly derived from the research questions of this study. The initial list of questions was composed by two members of the research team and then crosschecked by the other team members.

We validated the questionnaire in a pilot with eight randomly selected participants from the target population. For this pilot, we added additional meta-questions to the questionnaire about clarity of terminology and questions, relevance of the questions, scope of the questions, and the time required to complete the survey. For both clarity of terminology and clarity of the questions, we obtained an average score of 4.38 on a scale from 1 (Not clear at all) to 5 (Very clear). None of the participants indicated that questions should be removed or modified. Six participants indicated that no important aspects were missing. One participant hinted that we may also probe whether the use of self-adaptation requires a specialised team in the company or alternatively infrastructure to share knowledge. Another participant suggested adding a question about scalability of solutions for self-adaptation. One participant stated that the example system we used to introduce self-adaptation may create some bias, and further that answers to questions may differ depending on roles on the engineering teams. The average reported time to complete the survey was 24 minutes. Based on the feedback, we adjusted the introductory part of the questionnaire. We did not revise the questions, as they were perceived as clear and well scoped. The finalised questionnaire was then distributed to the participants as explained earlier.

The first part of the questionnaire (Table 1) solicited whether the participant applies self-adaptation and collected general demographic information. This allowed us to check whether the participant had experience with self-adaptation (Q0.1), confirm a good coverage of kinds of software-intensive systems across participants (Q0.2), the size of the companies of participants (Q0.3), as well as a confirmation of the participant’s role (Q0.4) and years of experience (Q0.5).

Table 1.

ID	Question	Response Options
Q0.1	Have you worked with concrete self-adaptive systems?	Yes; No
Q0.2	What kind of software systems does your organisation build?	Free text
Q0.3	Approximately, how many people are working on engineering software in your organisation?	1–10; 11–20; 21–50; 51–100; more than 100
Q0.4	What is your role in your organisation?	Project Manager; Designer; Programmer; Tester; Operator; Maintainer; Other (free text)
Q0.5	How many years of software engineering experience do you have in total?	1–3 Years; 4–8 Years; 9–20 Years; If other, please specify (free text)

ID	Question	Response Options
Q1.1	For which problems do you or your organisation apply self-adaptation capabilities, i.e., a managing system that monitors and adapts a managed system to achieve some objectives?	To automate tasks; To deal with changes in the environment; To deal with changes in business goals; To optimise system performance; To detect and resolve errors; To detect and protect a system against threats; To configure/reconfigure a system; Other (free text)
Q1.2	What are the main business motivations for you or your organisation to apply self-adaptation?	To improve user satisfaction; To reduce costs; To mitigate risks; To open up new application opportunities; Other (free text)
Q1.3	What could be the benefit of self-adaptation in one of the systems you worked with? Please explain briefly.	Free text

ID	Question	Response Options
Q3.1	What mechanisms or tools does the self-adaptive system you worked with use to monitor a managed system during operation? By monitor, we mean tracking properties of the system or its environment.	Free text
Q3.2	What mechanisms or tools does the self-adaptive system you worked with use to analyse conditions of a managed system during operation? By analyse, we mean examining conditions of the system or its environment and determining whether any adaptation is required or not.	Free text
Q3.3	What mechanisms or tools does the self-adaptive system you worked with use to change a managed system or parts of it during operation? By change, we mean adjusting parameters of the system, or adding, removing, or changing any parts of it.	Free text
Q3.4	What is the degree of automation of the majority of the self-adaptive solutions you work with in your organisation?	Semi-automated; Fully automated; Mixed (Semi- and Fully Automated); Other (free text)
Q3.5	Do you reuse solutions to realise self-adaptation in systems you work with?	Never; Very Rarely; Rarely; Sometimes; Frequently; Very Frequently; Always
Q3.6	Please provide a concrete example of reuse you used to realise self-adaptation.	Free text
Q3.7	Why do you not often reuse solutions when realising self-adaptive systems? What hinders the reuse? Please provide a short answer.	Free text
Q3.8	How do you ensure that you can trust the self-adaptive solutions you build? Examples could be extensive testing or human supervision, but you may use other means. Please describe briefly.	Free text

ID	Question	Response Options
Q4.1	Did you encounter particular difficulties or challenges when engineering or maintaining self-adaptive systems you worked with?	Never; Very Rarely; Rarely; Sometimes; Frequently; Very Frequently; Always
Q4.2	Please give one or two examples of the difficulties or challenges that you encountered when engineering or maintaining self-adaptive systems.	Free text
Q4.3	Did you face any risks when engineering self-adaptive systems you worked with?	Never; Very Rarely; Rarely; Sometimes; Frequently; Very Frequently; Always
Q4.4	Please briefly describe one or two risks that you faced when engineering self-adaptive systems.	Free text
Q4.5	How did you mitigate the risks that you faced? Please explain briefly.	Free text
Q4.6	Have you faced or seen any problems of self-adaptation for which you would appreciate support from researchers?	Never; Very Rarely; Rarely; Sometimes; Frequently; Very Frequently; Always
Q4.7	For which problems of self-adaptation would you appreciate support from researchers? Please briefly explain one or two such problems.	Free text
Q4.8	In your organisation or in industry in general, do you see application opportunities for self-adaptation that are currently not exploited?	Yes; No
Q4.9	Please describe or give examples of the application opportunities for self-adaptation that are currently not exploited.	Free text

Categories/Codes	#	Example Quotes
Improved utility	61
Robustness	21	“[F]ault tolerance, one node dies, a new one is spawned without manual intervention.” “[B]etter error handling and prompt disaster recovery.”
Performance	16	“Improve performance and quality-of-service.” “[I]ncrease in the speed of adaptation.”
Availability	8	“The main benefit for us is the 99.9999% availability, which is crucial for some customers of these cloud-specific solutions.”
Other	16	“[F]or IoT: optimized operations, improved energy usage.” “[A]n important part to guarantee the safety . . . of the overall system.”
Savings	38
Costs	25	“The primary benefit is cost reduction.” “[T]he cheaper bills for running this in an efficient manner in e.g., a cloud service.”
Resources	13	“[S]cales down resources during hours when traffic is low, and scales up during peak hours, without any manual interference.”
Improved human interaction	37
User experience	19	“Keep Telco network in optimal condition so that QoS and user experience is maximized, and churn minimized.” “[B]etter user satisfaction because of prompt website responses.”
Engineers support	18	“[R]emoves most of the optimization burden from programmers, so they can be more productive.” “Reduce workload on human operators; make (the results of) certain actions . . . repeatable and predictable.”
Handle dynamics	22
Load dynamics	12	“Change AGV behavior depending of the workload with the goal to save energy (battery life).”
Context dynamics	10	“Each machine is unique and its optimal operational parameters change over time due to ware, location, task and seasonal factor.”
Other improvements	16
Various	16	“In case of spikes in incoming events the system is able to adapt . . . avoiding bottlenecks.” “Easier and faster market integration.” “It’s fundamental in huge infrastructure systems otherwise we can’t make it happen.”

(1)	Self-adaptation is widely applied in industry across a wide variety of domains.
(2)	Practitioners primarily apply self-adaptation to optimise performance, automate tasks, and deal with changes in the deployment environment.
(3)	The dominating business motives to apply self-adaptation in industry are primarily improving user satisfaction and reducing costs, and secondarily mitigating risks.
(4)	The main benefits of applying self-adaptation are improved utility (in robustness and performance), savings (costs and resources), improved human interaction (user experience and engineers support), and handling dynamics (in the context and system load).

Categories/Codes	#	Example Quotes
Subject of adaptation	99
System	28	“Our company develops safety critical systems for railway. Systems architecture is often with redundancy - e.g., 2 out of 3 system, where is automatic reconfiguration implemented. Purpose is high safety and availability.” “A flexible manufacturing system . . . the system and the individual station within the system can ‘sense’ what kind of work piece it has in front of itself and what it or another machine should do with it in the next step.”
Module	22	“Environment compensation system for capacitive touch interface. Such system is influenced by envirenmental change (for example temperature).” “We manage the memory usage of the process. Once memory usage over a limit (i.e., 90%), we start throttling the workload.”
Platform layer	13	“Monitoring the memory/CPU/disk consumption of our servers and suggesting measures to fix it through human intervention.”
Application layer	11	“HotSpot JVM . . . reads a program’s Java bytecode, and adaptively tunes the performance of the program at runtime, adapting to runtime profiles.”
Cluster	10	“Spark executor auto-scaling system. We built this system to automatically add or remove nodes to our Spark cluster when we have a high demand of resources from our Spark jobs.”
Network	6	“Our radios apply ‘channel assessment’ . . . that optimizes the radio channels used during BLE communication. Our radios also apply very aggressive power management. peripherals and cores are switched off whenever possible to minimize the system’s power usage.”
Mixed	6	“Enterprise-cloud environment consisting of dozens of different (micro) services providing functionality to 3rd parties as well as internal employees - data management, authentication and authorization, business process automation, as well as internal development process support (build servers, logging, etc.).”
CI/CD pipeline	3	“Sacling up and down our infrastructure (CI/CD) chain to build and integrate the truck software.”

Categories/Codes	#	Example Quotes
Type of adaptation	99
Auto-scaling	33	“Automated horizontal scaling of AWS EC2 instances for medical data processing systems.”“[A]utoscale a cluster based on the resource usage of the nodes of the cluster.”
Auto-tuning	28	“A mink feeding robot, that can adjust the food amount according to a set of feeding rules and the food left over from last feeding.”
Monitor/Analysis	22	“We configured AWS alarms to monitor performance of our systems in case we get more than few number of HTTP 400/500 errors.” “Monitoring the memory/CPU/disk consumption of our servers and suggesting measures to fix it through human intervention.”
Automated reconfiguration	11	“Continuous integration system - Other & starts building & testing a new version as soon as it detects code changes Build alignment - Creates a new release whenever a subsystem builds successfully.”
Other	5	“Our mobile robots scan their environments using laser scanners and other sensors and plan their behavior accordingly.” “[S]elf healing automotive systems.”
Trigger for adaptation	78
System properties	27	“Auto-scaling functionality of an Azure Service Fabric cluster running a transformation load for processing AGV statistical and playback data.” “Realtime focused data streaming protocol . . . must take care to avoid exhausting the network resources and thus incurring packet loss and latency spikes, which are very noticeable in games.”
Environment properties	18	“An IoT system running in Kubernetes and used to monitor water leaking for household insurance.” “A flexible manufacturing system . . . can ‘sense’ what kind of work piece it has in front of itself and what it or another machine should do with it in the next step.”
System load	14	“Kubernetes, for handling load intensive periods for scaling up, and self recover from crashes.” “Autoscaling of SaaS applications in function of load on AWS and Azure clouds.”
Events	12	“We use kubernetes which provides notification callbacks on any event such as host/pod not available, based on these events we auto mark the node was inactive and do not use those nodes for further write or read operations.” “Auto Scaling an EMR cluster in AWS based on incoming event data.”
User actions	7	“[Adapt] cache warm up strategy based on user interactions.” “[S]cammers . . . To decide the users that are most likely to be a scammer, the system tracks the past performance of models responsible for flagging potential scammers.”

(1)	Self-adaptation is applied at different levels of industrial software-intensive systems: from a complete system to parts of a system and support systems.
(2)	The dominating types of adaptations applied in industry are auto-scaling, auto-tuning, and monitoring/analysis.
(3)	Adaptions in industrial software-intensive systems are triggered by changes in properties of systems and their environments, dynamics in system load, relevant events, and through user actions.
(4)	Technologies such as elastic cloud and auto-scalers are key enablers for the realisation of self-adaptation in practice.

Categories/Codes	#	Example Quotes
Monitoring metric	75
Resource usage	23	“Active sessions counting, resource utilisation (e.g., RAM) monitoring given by VM.” “Typically CPU and Memory usage.” “Helsim: uses CPU counters to measure time or power consumption to process particles.”
Load	18	“Number of incoming HTTP requests.” “The system polls the queue of the Spark job scheduler in our cluster every 5 seconds via REST API, using a NiFi flow.” “Number of queries.” “[N]umber of requests.”
Reliability metrics	13	“AWS lambda error metric is monitored to see if the sum of 400/500 errors for every part 5 mins is less than some specified amount.”
Performance metrics	12	“We track the response times for the users’ requests.” “[M]onitored systems implement specific features to provide data about their performance.”
Application state	9	“Tracking properties are - correct integrity - functionality of memorries (RAM, ROM), correct values and integrity of data among redundant parts.”
Monitoring mechanism	20
Environment sensors	9	“Based on external information (external sensors like Lidar, Camera, GPS, . . .) making sure no accident were to happen.” “Exteroceptive are aggregated to create a snapshot of the world’s state. These are LIDAR and Image sensors. We use Proprioceptive sensors to determine the robot’s state. These are encoders only.”
Logging mechanisms	6	“Logging software triggered whenever an incoming request is made.” “The system logs all interactions, both errors and successful operations.”
System sensors	4	“Based on internal information (internal sensors like Wheel speed, steering angle, yaw and roll sensors, . . .) optimize the performance to support the driver to drive optimal.”
Humans	1	“Human review decisions are used to monitor the precision of models.”
Monitoring tool	34
Kubernetes monitoring	9	“Kubernetes clusters are made out of master and worker machine nodes. On the worker nodes runs a process called kubelet that monitors the state of the worker nodes in the Kubernetes cluster.” “Probes implemented in the application, metrics provided by K8s metrics server (goes down to cgroups via kubelet).”
Prometheus	9	“[E]very service exposes a defined set of metrics. We collect metrics regarding every layer of the distributed system. We mainly use Prometheus and Splunk to collect these metrics.” “Prometheus and grafana for monitoring health of services.”
AWS monitoring	8	“We use AWS CloudWatch service to monitor and act on any event with ServerLess AWS lambda functions.” “AWS Lambda based monitor which monitor aprox number of message in SQS queue.”
Other: Azure monitoring, Datadog, Splunk, cAdvisor, Elasticsearch	8	“Default tooling from Azure/AWS in combination with splunk.” “We are using Datadog to collect relevant metrics.” “AKS monitors the system load and response time to start-up more instances. It also checks for malfunctioning applications and restarts them when stalled, providing high availability.”

Categories/Codes	#	Example Quotes
Analysis mechanism	73
Data analysis methods	18	“I think it uses some rolling average or some similar algorithm to estimate whether to scale up or down.” “[S]imple statistical inferences based on metrics and simple rules encoded by developers.” “[S]tatistical analysis of data.”
Comparison to threshold	16	“Comparing the error rate with constant/dynamic thresholds.” “Hard coded critical boundaries like min max values which lead to switching over to emergency modes. . . .” “[W]hen it falls below Service Level Agreements this indicates a need for auto-scaling.”
Metric(s) calculation	12	“Failure rate is used to measure quality of adaptation parameters.” “Capturing performance of each node.” “Measurement of traffic load, CPU utilization, and general availability metrics (reachability, status, . . .).”
Learning	12	“Each station has a kind of edge computing component that performs some analysis based on machine learning results.” “It tracks both the internal working conditions (load) of itself as a serving component, and learns about overall serving conditions.” “The system uses biosensory feedback to determine the riders’ happiness. . . .”
Custom rules	9	“Mostly a simple ruleset gleaned by experimentation and observing how the resulting adaption steps perform at runtime.” “[W]e have alertmanager to set up some rules that are known to be issues that have clear solutions.”
Auto-scaling policy	5	“[T]he response of the scheduler is parsed and the queue length is evaluated. If greater than zero, the flow performs a SCALE UP operation. If equal to zero, the flow performs a SCALE DOWN operation.”
Semantic reasoning	1	“Reasoning on knowledge graphs.”
Analysis tool	23
AWS analysis tools	9	“Analytics functions native to the cloud environment the system runs in (AWS).” “AWS based auto-scaling conditions as provided in the Cloud formation setup of the cluster.”
Kubernetes stack	7	“The master nodes have all sorts of different components such as the kube-scheduler, controllers and state db (etcd), that are managed via the kube-apiserver.” “Built-in Kubernetes/Openshift mechanisms. . . .”
Dynatrace	2	“[A]nalyze was done by Dynatrace or by Keptn itself by checking against thresholds.”
Other	5	“We mainly use rule-based systems like Splunk to automatically analyse production metrics against patterns.” “Default tooling from Azure.” “Kibana.”

Problem/Benefit	Improve User Satisfaction	Reduce Costs	Mitigate Risks	New Opportunities
Automate tasks	42	45	34	15
Environment changes	43	44	28	17
Optimise performance	12	10	6	5
Changes business goals	55	55	35	17
Handle errors	34	32	27	12
Protect system	22	23	24	11
(Re)configure system	35	36	26	16

Company Size	Relying on Tools/Infrastructure	Custom Mechanisms
1–10	5 (56%)	4 (44%)
11–20	3 (27%)	8 (73%)
21–50	5 (36%)	9 (64%)
51–100	4 (40%)	6 (60%)
\({\gt }100\)	7 (13%)	47 (87%)

Subject Adaptation	Support	Type Adaptation	Support	Trigger Adaptation	Support
System	12 (26.7%)	Auto-tuning	16 (36.4%)	System properties	11 (31.4%)
Module	9 (20.0%)	Auto-scaling	13 (29.5%)	Environment properties	8 (22.9%)
Application layer	9 (17.8%)	Monitor/analysis	9 (20.5%)	System load	6 (17.1%)

Categories/Codes	#	Example Quotes
Change mechanism	83
Scaling mechanisms	36	“The server-side system has a load balancer. Hence we increase the number of workers behind the load balancer to decrease the average response time for the users.” “It adjusts the number of worker nodes.” “Adding a completely similar server/serverless Lambda instance.”
Reconfiguration	25	“The adaptation directly adjusts the period between the packet send events, as well as the number of packets allowed during each send event. . . .” “Depending on context, controlled variables are managed through different automation systems.” “[R]econfiguration of the management entity . . . to support a larger (or smaller) scale distributed system.” “[L]oad balancer/director that may support controlling the exposure facade towards the system environment.”
Non-automated	12	“To effect change on the managed system, the results from the tool need to be approved by an engineer, and are then acted on by the mining and plant teams. These processes are for the most part not automated. . . .” “Generating alerts and expecting humans to resolve the error manually based on suggestions.” “Did not do this . . . Based on safety protocols this could not be secured.”
Restarting/deploying	7	“Mostly just restarting the managed subsystems. In the case of Kubernetes HPA, its the horizontal scaling (up/down) of the Pods.” “Generally restarts the unhealthy workload, but in the case of autoscaling can also be used to add or remove replicas.” “[O]ur pipelines use simple bash scripts to deploy previous versions when new versions fail.”
Migration	3	“Once the control process informs the control plane, it starts a workflow what we call as instance warming workflow which will dump items that supposed to go to that node from another replica and fills them.” “[V]irtual machine (VM) migration or creation.”
Change enacting tool	19
Kubernetes	9	“Mostly just restarting the managed subsystems. In the case of Kubernetes HPA, its the horizontal scaling (up/down) of the Pods.” “[T]o change topology we simply use K8S api to add/remove worker pods.”
AWS	7	“AWS based in-built auto scaling capabilities.” “Use the AWS ElasticLoadBalancer and also trigger actions via AWS Lamda functions when required.”
Other	3	“IBM ITM, Log Analyzer, TCAM.” “UC4 Automation Engine workflows that orchestrate kubernetes clusters.” “Build-in Openshift mechanisms.”

Categories/Codes	#	Example Quotes
Code	33
Modules	18	“Self adaptation mechanisms used for speech recognition . . . are also used for computer assisted coding solutions.” “Different parts of the Behavior tree can be reused in different robots.”
Scripts and algorithms	8	“The same scripts and solutions are constantly reused - because it’s the easiest way to create new with a constant lack of time.” “Threshold algorithms are reused frequently, with the threshold value adapted for the specific use case.”
Libraries	7	“[I]nternal libraries that simplify monitoring, interaction with external tools, etc.”
Design artifacts	22
Patterns	7	“We try to reuse design patterns (e.g., autoscaling) for all cloud native applications we build.” “Re-use of design patterns like MAPE-K.”
Architecture	7	“AWS stack . . . can be used as a generic template cross different applications which are based on a job processing.”
Know-how	5	“We use similar principles in different product.” “We reused knowledge of driver parameter adaptation from FDM (3 axis) printer while designing a SLA (single axis) printer.”
Models	3	“[M]achine learning cost models can be reused by different systems.”
Specifications	18
Policies and rules	5	“[A]uto-scaling policies . . . have a standard definition which can be reused in different systems or use-cases.”
Configuration files	5	“K8s config files for different cloud native application can be similar.”
Templates	4	“We reuse very similar set of configuration templates of container deployment.”
Metrics	4	“Kibana alerts.”
IT infrastructure	11
Frameworks and platforms	7	“[A] framework for monitoring metrics that allows labels to be given to properties, the time-series data to be tracked in a database, and then hooks to visualization database and alert systems.”
Tools	4	“Use the same tools AWS provides for all our different product deployments.”
Procedures	7
Processes	3	“Writing ‘watchdog’ processes for systems that aren’t deployed to kubernetes.”
Pipelines	2	“[P]ipeline (Application - Datadog - custom logic - AWS API) is replicated with different settings for different use-cases.”
Schedules	2	“Most of the approaches we use for digital twins share some history . . . An example of that is in the scheduling space, where schedules need to adapt to changes in resources or the inclusion and removal of tasks.”

Categories/Codes	#	Example Quotes
Reuse hurdles	19
Different problems	11	“In my case every self-tuning problem is different and prevents easy reuse.” “Our applications and application domains are very different and since we do research we actively look for new and different challenges.”
Lack of experience/maturity	4	“I think lack of competence is a huge thing to overcome, though most of the organisations around us try to catch up.”
System structure	2	“The solutions were too coupled, too integrated and not enough modularized.”
Organisational concerns	2	“We have to go through a legal department in order to reuse code from outside . . . That poses a large problem.”

Categories/Codes	#	Example Quotes
Testing and verification	71
Extensive testing	58	“We use extensive testing (unit, module, system).” “We have extensive testing on test k8s clusters, provisioned for these purposes.” “We have countless amount of testing and verification code built as part of the OpenJDK to ensure the quality of the product is appropriate.”
Benchmarking	10	“As a lot of the self adaptation logic involves optimization opportunities, we also regularly run many benchmarks and immediately report regressions.” “We do testing of the machine learing models, but we also have pilot factories where we test our methods and design to see if all station perform as itended.”
Verification	3	“[E]xpert testing, supervision, verification when applicable.” “Testing, but also some human verification as part of the Cloud Operations team.”
Stakeholder-centred techniques	45
Human supervision	22	“Human supervision until confident.” “Extensive system testing and gradual release of human supervision levels upon system going live.”
Rigorous design and development	10	“[V]irtual training to ensure operators understand and are comfortable with the conditions in which the safety system will engage.”
Trust in third-party software	8	“[F]or features like auto-scaling compute . . . we use trusted vendors and deploy these features mainly for analytics use cases which are not business-critical.”
Operational constraints	5	“[T]he concrete actions that are taken by the system are defined by the user. [S]o there is never a surprise. [T]he system only decides if and when to apply these actions.” “Our autotuning algorithms never fail for particular (exactly specified) set of systems. If the system fulfils these assumptions, it works always.”
Online techniques	36
Runtime monitoring and alerting	27	“In cases where an existing system is not being replaced but rather new capability is being added, results will be tracked over time to ensure accuracy.” “[W]e have deployed some alert to track the high-level properties of the system.”
Continuous testing during operation	6	“[T]here is gradual canary testing in the real production system.” “Automated test scripts, automated ‘synthetic transactions’ in production, model performance validation.”
Mitigation strategies	3	“This automation can provide alter with all the steps and rollback automatically if there is any issue.”

(1)	Resource usage and system load are the main types of monitoring metrics used in practice. These metrics are primarily tracked by sensors in the environment and the system.
(2)	Practitioners use various mechanisms for analysis in realising self-adaptation, with data analysis methods and comparison to thresholds as the main mechanisms.
(3)	A wide range of mechanisms are used to enact self-adaptation in industrial systems, with auto-scaling and reconfiguration as the top mechanisms.
(4)	Practitioners extensively rely on tools such as Kubernetes and AWS to support the realisation of different functions of self-adaptation.
(5)	Industrial systems apply a mix of semi- and fully automated adaptation.
(6)	A majority of practitioners reuse solutions when applying self-adaptation, mainly in the form of code, design artifacts, and specifications.
(7)	Ensuring trust in industrial self-adaptive systems is mainly achieved through extensive testing, runtime monitoring and alerting, and human supervision.

Categories/Codes	#	Example Quotes
Design issues	43
Reliable/optimal design	26	“With high availability requiremets, the chance something fails somewhere sometime is close to a 100%. The systems needs to be designed to still provide service despite erroneouse behavior or failing parts in the system.” “[T]he main challenge is to design adaptation function with respect to computation context.”
Design complexity	17	“Complexity in defining the adaptation rules. Conditions are not always obvious.” “Self-adaptiveness or resilience have to be taken into consideration at each stage of the . . . workflow. This is really a challenge as more often than not these are concepts that are completely obscure to the average programmer/devop mind.”
Lifecycle issues	42
Tuning/debugging	19	“Debugging the root cause of a scaling failure might be time-consuming: also, in some cases the problem might be outside of your control (e.g., temporary lack of EC2 Spot capacity in AWS).”
Limitations tools/methods	13	“The metrics available are not always fully transparent and built with auto-scaling in mind.” “IAM permissions are hard to deal with when configuring these self-adaptive systems. Usually, the permission to scale or to notify is not properly configured.”
System/environment evolution	10	“If the functionality is not designed in from the beginning then it is a huge amount of work to implement later.” “System architecture over lifetime (nee features to be added. . .).”
Runtime issues	30
Runtime uncertainty	17	“Many self-adaptive systems are based on unproven heuristics. Therefore, they usually do not work in many cases.” “It is hard to guess how much can the environment affect the system . . . It is hard to extend the parameters to cover whole production.”
Data collection/evaluation	7	“Gathering quantitative data samples to evaluate the performance is very complicated.” “[S]ensors gives wrong reading values.”
Resources required	3	“Sometimes it doesn’t react fast enough. It also takes computation resources for this self-adaptive software, and the compute resources use increases with the number of incoming requests.”
Delayed/missing runtime changes	3	“Autoscaling is often too slow or triggered too late.” “Notifications are delayed or missed.”
People and process issues	24
Skills/experience	14	“Every self-adapt system must be tuned up which is sometimes tricky and needs high skilled engineers.” “The Kubernetes/Openshift cloud and centralized log storage . . . require experienced administration staff and vast knowledge of many networking concepts (. . . DNS, NAT).”
Process and management	9	“We are not yet very experienced . . . the main challenges were to convince the central IT department this was the way to go, then to design the system, and obviously to master the technology itself.”
Automation	1	“[O]ften automation is not trusted enough by humans. [H]umans want to stay in the loop.”

Categories/Codes	#	Example Quotes
Faults	20
Incorrect functionality	7	“Automation can lead to unexpected values.” “The process might be OOM killed if the self-adaptive system doesn’t function correctly (i.e., bugs).”
Wrong results	4	“[I]ncorrect results.” “Wrong decisions based on faulty models.”
Misconfiguration	4	“Tuning autoscaling settings can be problematic resulting in unexpected results.” “Wrong threshold levels may lead to unwanted responses.”
Network failure	2	“Giving control to software that can change production environments can cause network failure.”
Other	3	“[D]ata loss.” “[H]euristics that work well on some applications, do not always perform the best for all applications.”
Difficulties with development/operation	16
Difficult to manage environment uncertainty	6	“We face a risk of underestimating environment variability.” “Legacy monitoring solutions don’t cope well with environments that scale back.” “Risk may be encountered if the incoming event stream is completely unpredictable and have huge spike differences in data for a considerable period of timr.”
Difficult to test	4	“[I]f the executed actions that will be done by the self-adopting system are not tested before, it might introduce some risks.” “It is also difficult to do reliable performance testing in non-production environments.”
Difficult to build	4	“[I]mplementing and designing self-adaptive systems may initially seem to take longer time – hence the risk of not being allowed to implement it as good as it can be done.” “Costs of building own (self-hosted) environment. . . .”
Other	2	“[L]ife updates (no downtime).” “There is always a lingering concern of quis custodiet ipsos custodes - or ‘who watches the watchmen.”

Categories/Codes	#	Example Quotes
Impact on qualities	16
Performance degradation	5	“[R]isk of degrading the performance instead of improving it, and degrading the user experience as a result.” “Performance impact on the running system when applying auto-scaling (e.g., scaling down).” “[S]ometimes a sequence of perfectly acceptable self-adaptive automatic actions can lead to outages worse than the root cause.”
Reduced availability	4	“If the system did not behave properly this could result in an outage. . . .” “Availability of the system during the auto-scaling rules being applied.”
Safety and security threats	4	“If a system is self-adaptive, how can we secure that it is safe during production (some parts can be powered for self test during assembly and we need to know it is safe)? If we use machine learning on a self-adaptive system, how do we secure safety?” “There is a risk of misconfiguration that can lead to lost nodes and applications, security exposures etc. There are also security risks involved with the base building components, such as docker images from untrusted sources. . . .”
Extra resource consumption	2	“Risk of all resources being eaten up by a self-adaptive process.” “[I]t may use up too many unnecessary hardware and software resources.”
Reliability issues	1	“Reliability issues in case of non-converging oscillations or plain wrong output due to prolonged failures in the metrics collection pipelines or simply wrong algorithms.”
Impact on business	14
Increased cost	5	“Regarding autoscaling, the main issue was to fail and so increasing the infra cost of the users due to bugs in the system.” “Lost control over system size. This also impacted the approx. total cost agreed with the customer.”
Losing trust and control	4	“Trust. Because flexible manufacturing systems have some kind of autonomous behavior with tasks that have been done manually, our clients are initially very sceptial and to not trust the systems initally.” “[R]isk of losing (manual) control of the system for the sake of automation.”
Harder to understand/fix	3	“[T]he whole system becomes more complex, hence fewer people understand all details of its behaviour.” “More difficult troubleshooting for a self-adapting, distributed system.”
Not useful	2	“The self-adaptive system might not perform better than the baseline when dealing with dynamic shapes, as the cost model might not be generic enough to predict the performance.”

Categories/Codes	#	Example Quotes
Stakeholder-centred techniques	25
Rigorous design and development	8	“[C]areful engineering so that there are open doors for manual intervention, when necessary, without lost of system availability nor hindering the automation mechanisms.” “We try to have design sessions . . . and possibly enhance the design in the early phases of development.” “Engineering analysis, testing, controlled deployment, . . .”
Code review	4	“As always, planning, design reviews, code reviews, testing on several levels, monitoring the production.” “Each incident is taken into consideration and rules are always reviewed.”
Human supervision	4	“The responsibility was left to a human operator.” “Mainly by performing tests and human supervision (monitoring resource utilization).”
Outsource	3	“Outsource the cloud operation to a specialized provider (RedHat, AWS) where possible. In other cases, customers had to hire experienced administrators/go through extensive period of testing to gain the necessary experience.”
Other (post-mortem analysis, hiring experts, work in pairs, documentation)	6	“When we hit a problem years after the fact, we perform a detailed post-mortem and try to think about other possible failures we may have missed.” “We hired (multiple) external consultancy firms to tap into their experience in deploying such a system.” “Work in pairs, Document architectural decisions.”
Offline techniques	18
Extensive testing	15	“[T]est each action in isolation before it is provided to the system.” “Automated and human testing. In addition for complex algorithms, we run parallel, correlated analysis.” “With automated and manual testing while injecting non-determinism to the test suite.” “Extensive testing at the customers factory and fine tuning of the models.”
Set operational boundaries	2	“Defined max-amount of resources a system functionality/component is allowed to consume.” “Thresholds and some manual monitoring.”
Encryption	1	“State of the art encryption, encryption, and encryption.”
Online techniques	9
Runtime monitoring and analysis	6	“Alerts tracking high-level properties that can give us some assurance that the system is working fine.” “Monitor/review the automated actions.”
Roll-out/roll-back strategies	2	“Slow roll - only send the new system traffic in small increments (10%, 20%, . . .) until production baselines are established for load, actual latency, etc. This helped us determine what the MIN and MAX pod settings should be as well as VM heap sizes.” “Manual roll back to previous stable state of user profiles.”
Run in hours not critical to the business	1	“We run our processes during the night, when there is less chance of interference with business critical (customer facing) systems.”

Categories/Codes	#	Example Quotes
Engineering	48
Architecture and reuse	16	“Best Practices for implementation and architectural design guidelines.” “I’d love to see a taxonomy of self-adaptive techniques. Perhaps a set of techniques could be added to Kazman’s Architecture Tactics checklist?”
Adoption	10	“We lack interaction with development teams that are facing similar problems. We have a huge problem explaining this area to the management structure . . . They have basically no ability to lead due to lack of competence.” “[N]ew organisational structures and workflows that lead to the design of more self-adaptive and resilient platforms.”
Platforms and frameworks	4	“[T]o my knowledge there is no framework on what is ‘safe’ or not safe to be automatically executed by a self-adaptation system.” “To provide a platform for capturing the domain knowledge i.e., extensible . . . to manage the managed systems what kind . . . KPIs can be captured, and how they are related.”
Tools	4	“Outlier detection . . . is well understood but existing commercial tools are usually pretty weak and custom code is required to optimize.” “One of the main problems is to get tools that can profile the running systems under certain loads.”
Testing and debugging	4	“Assurance of the behavior of highly dynamic systems is still the big hurdle. Test budgets and schedules do not grow with system complexity.” “[A] pre-production cloud test environment to try them first.”
Advanced features	10	“Coordinate multiple, potentially conflicting, objectives - in changing environment . . . reacting too quickly [is] often sub-optimal.” “[R]esearch on network protocols, these should include some level of self-awareness and should automatically provide common network self-adaptation features.” “How a feedback loop can be designed in a way that you later can adapt to changes.”
Guarantees	25
Trustworthiness	20	“Formal verification of the algoritmic behaviour of the overall system (correctness).” “[V]alidate my algorithms.” “Safety protocols for Machine learnign in self-adapting systems.” “What are the mechanisms should be integrated into self-adapting system to identify malicious input?”
Unknowns	5	“We normally capture this using some form of process based models, but these struggle with thin[g]s like unknowns.” “[N]ot just anomaly detection, but actually responding appropriately to the anomalies (what is appropriate?).”

Self-Adaptation in Industry: A Survey

ACM Transactions on Autonomous and Adaptive Systems

Abstract

1 INTRODUCTION

1.1 Objective and Research Questions

1.2 Contributions

1.3 Outline

2 RESEARCH METHOD

2.1 Population and Sampling

2.2 Survey Instrument

2.3 Data Analysis

3 RESULTS

3.1 Demographic Information

3.1.1 Experience with Self-Adaptation (Q0.1)..

3.1.2 Software Systems Built by Organisations (Q0.2)..

3.1.3 Software Engineers Working at Companies (Q0.3)..

3.1.4 Roles of Participants in Their Organisation (Q0.4)..

3.1.5 Experience of Participants (Q0.5)..

3.2 Drivers for Applying Self-Adaptation (RQ1)

3.2.1 For Which Problems Do You Apply Self-Adaptation? (Q1.1).

3.2.2 What Are the Main Business Motivations to Apply Self-Adaptation? (Q1.2).

3.2.3 What Could Be Benefits of Applying Self-Adaptation? (Q1.3).

3.3 RQ2: Characterisation of Self-Adaptation

3.3.1 Explain a Concrete Self-Adaptive System You Worked With. (Q2.1).

3.4 RQ3: Application of Self-Adaptation

3.4.1 What Mechanisms or Tools Does the Self-Adaptive System You Worked with Use to Monitor a Managed System During Operation? (Q3.1).

3.4.2 What Mechanisms or Tools Does the Self-Adaptive System You Worked with Use to Analyse Conditions of a Managed System During Operation? (Q3.2).

3.4.3 What Mechanisms or Tools Does the Self-Adaptive System You Worked with Use to Change a Managed System or Parts of It During Operation? (Q3.3).

3.4.4 What Is the Degree of Automation of the Majority of the Self-Adaptive Solutions You Work with in Your Organization? (Q3.4).

3.4.5 Do You Reuse Solutions to Realise Self-Adaptation in Systems You Work With? (Q3.5).

3.4.6 Please Provide a Concrete Example of Reuse You Used to Realise Self-Adaptation. (Q3.6).

3.4.7 Why Do You Not Often Reuse Solutions When Realising Self-Adaptive Systems? What Hinders Their Reuse? Please Provide a Short Answer. (Q3.7).

3.4.8 How Do You Ensure That You Can Trust the Self-Adaptive Solutions You Build? (Q3.8).

3.5 RQ4: Difficulties, Problem Support, and Opportunities

3.5.1 Did You Encounter Particular Difficulties When Engineering or Maintaining Self-Adaptive Systems You Worked With? (Q4.1).

3.5.2 Please Give One or Two Examples of the Difficulties That You Encountered When Engineering or Maintaining Self-Adaptive Systems. (Q4.2).

3.5.3 Did You Face Any Risks When Engineering Self-Adaptive Systems? (Q4.3).

3.5.4 Briefly Describe One or Two Risks That You Faced When Engineering Self-Adaptive Systems. (Q4.4).

3.5.5 How Did You Mitigate the Risks That You Faced? (Q4.5).

3.5.6 Have You Faced or Seen Any Problems of Self-Adaptation for Which You Would Appreciate Support from Researchers? (Q4.6).

3.5.7 For Which Problems of Self-Adaptation Would You Appreciate Support from Researchers? Please Briefly Explain One or Two Such Problems. (Q4.7).

3.5.8 In Your Organisation or in Industry in General, Do You See Application Opportunities for Self-Adaptation That Are Currently Not Exploited? (Q4.8).

3.5.9 Please Describe or Give Examples of the Application Opportunities for Self-Adaptation That Are Currently Not Exploited. (Q4.9).

3.6 Confidence

4 DISCUSSION

4.1 Observations

4.2 Benefits of Applying Self-Adaptation in Practice

4.3 Difficulties and Risks of Applying Self-Adaptation in Practice

4.4 Research Support to Address Problems in Practice

5 THREATS TO VALIDITY

5.1 Construct Validity

5.2 External Validity

5.3 Reliability

6 CONCLUSION

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Dealing with Drift of Adaptation Spaces in Learning-based Self-Adaptive Systems Using Lifelong Self-Adaptation

A survey on engineering approaches for self-adaptive systems

Improving architecture-based self-adaptation using preemption

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

Categories/Codes	#	Example Quotes
Data	21
Data governance	8	“Data alignment and . . . its integration.” “[G]etting data from application behaviour helps a lot in analyzing how application performance can be further improved.” “Adaptive AI systems to manage huge document contents.”
Data access	8	“Support for data science as to extract correct cause relationships vs apparently correlations.’ ’ “For example, how much data is shared across threads, how many objects are thread-local, how much performance is lost due to locality issues.”
Machine learning	5	“[I]f the data/metrics can be structured and labelled in some way (i.e., scored), then perhaps it should be possible to apply ML to help identify opportunities and figure out automatically how to respond.” “How to use machine learning to solve the self-adaptation problems and demonstrate its performance bound.”
User interaction	19
Automation	9	“[V]olume of data gets to large for people to process. People get to be the bottleneck for throughput.” “Automatic synthesis of predictive and or reconfiguration models.” “Approaches whereby systems of reasonable scale can monitor and fix themselves as necessary without human intervention.”
User experience	7	“[M]ost of the problems that we faced are related to help the customer to understand the benefits of self-adaptative systems.” “Autoscaling should become commodity products . . . As users, the complexity should be abstracted away.”
User involvement	3	“User response can also be used for adaption (e.g., if a user constantly overrides the managed systems settings there managing system should ‘learn’ from the user and adapt the control algorithm for that specific user).”

Categories/Codes	#	Example Quotes
System activities	72
Autonomous operation	37	“E.g., manufacturing production line with visual inspection operators who remove defects, . . . the production line can further be adapted based on the defect rate/type.” “Self adaption could have a lot of benefits in building automation systems, like smart heating and lighting systems that takes peoples habits into consideration.” “[M]aking the system adaptive to adjust and act instantly based on the data without waiting would be beneficial and efficient.”
Data management and machine learning	26	“Methods to automatically handle changes in the machine learning models and to efficiently deploy them to the edge. There is still lots of manual fine tuning that delays a timely new release.” “The query optimizer of database (i.e., MySQL) could utilize self-adaptation technic.”
Auto-scaling	9	“The ‘managed service,’ which is a stateful service/data store, is provisioned for the peak capacity, which means resources are idle most of the time. If we can build reliable and efficient system that can automatically scale stateful services based on the demand, we can reduce the cost.” “Our microservices do not dynamically scale.”
System properties	47
Quality improvement	26	“Based on the alarm certain counter actions could be initiated in order to deal with the faulty behaviour and reach a stable system state.” “Congestion prognosis.” “[F]ault tolerance.” “Power consumption.” “[R]esource optimization.” “There are many opportunities to split up [current monolithic systems] and then make them scalable such that outages are more contained. [E].g., screens on trains.”
Security improvement	10	“Security of e.g., mobile devices that adapts based on locally identified threats as well as knowledge of risks in the environment.” “Automating changes in Security levels based on threat levels.” “Detecting in-vehicle threats, detecting a system being compromised.” “[R]eact to attack patterns.”
Cost-effectiveness	8	“IT cost reduction (e.g., software asset mgmt).” “The question really is: How do you do these things on the cheap (with non Silicon Valley billion dollar funding) and in contexts where mistakes might be extremely critical?”

Categories/Codes	#	Example Quotes
Engineering activities	21
Maintenance and reuse	15	“[S]elf-adapting CI/CD infrastructure based on demand.” “Preventive maintenance.” “Carriers are eager to get rid of human factors to improve operation and maintenance capabilities and network quality. Therefore the ICT field pays much attention to self-adaption systems.” “Software provisioning and automatic updates.”
Patterns and libraries	6	“Developing a comprehensive library of algorithms on top of the industrial monitoring systems which can be applied to analysis portion of the chain in order to drive correct self-adaptation actions would benefit the self-adaptation adoption.” “Cross-cloud self-adaptation.” “[P]atterns to provide solutions to common problems.”
Human involvement	7
Personalisation	4	“[I]t would be interesting to adapt the player experience itself based on the player, mostly to better challenge them.” “Healtcare decision making systems witch are changing outcomes and advices basd on patient status.”
Human-machine interaction	3	“I consider that the biggest opportunities are found within the Human Machine Interaction or Building Machine Interaction. There will be a future in which talking to a device that can modify the environment (e.g., a robot but not a phone) will be as natural as talking to a person, or seeing a machine interacting with another machine (e.g., robot taking the elevator).”

(1)	A majority of participants face difficulties when engineering or maintaining self-adaptive systems, mainly with reliable/optimal design, design complexity, and tuning/debugging.
(2)	About half of the participants encounter risks when using self-adaptation. The main risks relate to incorrect functionality and difficulty to manage environment uncertainty, as well as degraded performance and increased cost.
(3)	Approximately 40% of the practitioners report that they would appreciate support from researchers to deal with problems they face, particularly problems related to the engineering of self-adaptive systems, guarantees, and management of data.
(4)	About half of the participants see future opportunities for applying self-adaptation, particularly in relation to autonomous operation, data management, and machine learning.

Adaptation Problem (Top Occurrences/Total)	Top Kind of System
To optimise system performance (12/78)	Embedded/cyber-physical/IoT
To automate tasks (10/61)	Cloud
To deal with changes in the environment (9/60)	Embedded/cyber-physical/IoT
To detect and resolve errors (8/46)	Web/mobile
To configure/reconfigure a system (8/51)	Web/mobile and Cloud
To detect and protect a system against threats (6/46)	Web/mobile
To deal with changes in business goals (5/15)	ICT communication and networks