Software Architecture Robustness Analysis : A Case Study Approach
Abstract—Software Solution providers sell products saying their system is robust. To this day, no hard and fast rules exist that demarcate the standards for measuring this quality attribute. Implementation of software architecture is often conceived through patterns. Each pattern has its own way of employing tactics. Considering robustness to be a deciding factor, the competing architectural patterns can be evaluated and analyzed so that best application can be chosen. This paper provides an outline of Software system robustness and a study of patterns and impact of tactics on them. It discusses generic means through which a system can be designed to be stronger and less prone to faults. Also, the study encompasses different elements in architectural style that contribute to the aspect. Moreover, in support of the applicability of suggested analysis methods, it adds a case study to garner evidence towards the claim. An attempt is made to justify the selection of a pattern based on documented and undocumented design experiences as well as comparative exploration for achieving quality of robustness.
Keywords—Robustness; Software Architecture; Architectural style; Patterns; Tactics
Introduction
As we are drawn towards the age of the machines, software systems are means to control everything. From microwave to aircraft, large servers with thousands of nodes and petabytes of memory to hand-held devices; we are dependent on reliable software that can result in an undeterred usage experience. Not only is it just a fancy luxury, it is fast becoming our necessity and part and parcel of life that we can’t live without. Hence it is of utmost importance that design and development of the software is done with careful study.
From usage perspective, software solutions broadly can be thought of belonging to two categories – 1) utility applications that are used by individuals 2) Commercial software that are used by corporates for their business needs. Most commercial software incurs higher cost and so, developing and deploying them involves more intricate planning and follows calculated steps. Large software solutions in public services are also extremely complex and need to be flawless.
Fault in some of them, can be considered insignificant and ignored but in others, may cause substantial economic loss. There are still others that directly impact our lives and any failure in them can lead to fatalities. Software for these systems need to be extremely stable and thus their architecture needs to be designed in such a way that it become robust enough to sustain virtually any adverse incident. Now, such incidents can result from external activities or natural calamities.
Imagine we need to send a critical message to someone. There can be multiple possibilities to avail. Let’s see what those are and what could be possible limitations and impediments in them.
As we mentioned above, each system can be thought of as an alternative and we can say that there is redundancy in messaging service. If there is no network, the message will remain saved in the phone and when network is up, it will be sent, similar thing applies to email. Although slow, the physical mail service stands on a tried and tested system and a way to send original document. There are some P2P messaging apps available today, like whatsapp, that indicate acknowledgement of whether message is delivered or read.
From the above example, we can infer that providing solution to a requirement needs consideration of all means and we have think of design that has provision to pursue an alternate when one fails. Furthermore, it is worthy to note that depending on requirement (type of message), which would produce consistent result.
An unexpected scenario can be handled only if it is envisaged even before the system is built. For smaller systems, the developers can list out improper way of usage or write test cases with invalid or illegal inputs in mind (Extreme Programming approach). However, for complex systems, it would be immensely difficult to put such an effort. In addition to requiring huge manpower and time, it is almost impossible to figure out if some of the cases are missed. This in turn can leave subtle but dangerous bugs in the system. If such an issue propagates, it may cause the entire system to crash.
Therefore, we need to think one step ahead and try to design the system by breaking it down to components; which can communicate through connectors and rules can be defined by configuration. We can also put constraints onto them. This higher-level design is called software architecture. Thus, we employ divide and conquer mechanism to make components and their connections enduring thereby achieving overall system strength.
Although design in terms of implementation techniques and methodologies is a well-established field with on-going research for quite some time, Software architectural design has drawn relatively lesser attention. Over the last couple of decades however, the trend seems to be changing. As the systems become increasingly complex, it is perceived that analysis and design of systems at a higher level of abstraction is the key to tackle most of the issues that arise in later parts. This also benefits investors immensely as maintenance costs are reduced by a huge margin. Although it is not obvious where and how the design can be optimized, architects use empirical data to measure effectiveness. Furthermore, software architecture analyses are often executed on a need basis for projects and there’s very few available structured comparisons for an attribute. To find a solution to design goals, an architect must perform painful study and manually assess software architectures for product evolution and trade-off analysis.
Firstly, we will need to layout a considerable definition of robustness on the backdrop of software quality and non-functional requirements. We need to understand that different stakeholders have distinct perspective of robust system design. Lung and Kalaichelvan [1] presented the concept of software architecture robustness. In the paper, they handsomely lay down framework by recognizing more concrete influences of the impact of robustness in software architecture. They also an outline of metrics for analysis through case studies.
Because of ever-growing struggle to come-up with revolutionary ideas, the evolution of the software engineering is probably one of the fastest among all industries. Hence, the parameters and process used for assessment could be outdated. Identification of measurable metrics is needed first to be able to compare different patterns. If we think from low-level design point of view, there are various parameters available at the code level. Some researchers have also shed light on the design metrics. On the other hand, no widely accepted and codified method for analyzing high-level design parameters exists. Partly because this field is still evolving and is open to new findings; and partly because they differ greatly based on the size and complexity of the system under consideration.
An architectural pattern can be assessed from qualitative as well as quantitative perspective. This paper proposes some ways of quantitative evaluation for finding out how they compare with each other to achieve a robust system. This can help an architect to consider and make informed decisions as he goes ahead with design.
Organization of the paper is as follows: Part II provides some term details that we will encounter later in the paper. Part III has a discussion on exploring what software architecture robustness is. Part IV contrasts the quality of robustness with other similar ones like reliability and availability. Part V discusses how requirement specification and design can be approached keeping robustness in mind.
Part VI shows some sub-characteristics, depicts a scenario and puts forward thoughts on tactics that can be employed. Finally, Part VIII shares what this research achieves through a case study. To conclude, part IX identifies future directions in this area.
background and terminologies
Software Intensive System: It is a generic term for multitude of systems wherein software development and integration plays a dominant role and occupies lions share of the whole system. They are usually complex to configure and comprise of many sub-components working in tendem to achieve a common end goal. So, software intensive systems are extremely difficult to maintain and even harder to design. Large scale deployment and multi-point communication are the salient features of such a system.
System Under Test: It refers to a system that has to be tested to see if it performs properly with respect to various kinds of input. The application under test should respond uniformly and consistently as expected. In our case study section, which is about a integrated automation testing framework, we will briefly describe the system under test and point out how considering it has a major impact on design of the framework itself.
Pattern: From the software architecture point of view, patterns are generic solution to recurring problems that can be fine-tuned to produce desired outcome and attain qualities. Christopher Alexander[2] was the one who first formulated the idea from the physical architecture’s point of view, but we can corelate it to a software architect’s as well. Each pattern serves a particular set of purposes and depending on requirement, can be mixed and matched with dedicated interfacing.
Quality Attribute and Tactics: The factors that potentially impact the application’s behavior. It affects the system’s design and user experience. Usually they can be thought of as non-functional requirements. Tactics are design decisions that have impression on attainment of quality attributes. It’s about how the system responds to a stimuli.
Software Architecture Robustness
In this section, we will try to articulate the quality attribute robustness. Literature on software architecture has expressed it in many ways, there seems to be no common universally accepted definition. Although architects and clients alike consider it to be a critical aspect. A system is ‘robust’ when it does not crash too often and can withstand myriad range of inputs. In the event of extreme situation like malformed command or power failure, the system should at least – a) Log the reason and b) Terminate by closing all open resources. Based on study of various literature and books, we can express it as:
“Robustness is the degree to which a system can handle unexpected or abnormal conditions and terminate gracefully.”
Let us examine with examples. First, consider a simple program that provides functionality to divide one number with another. Now, this can be implemented through c++ programming language in a very basic way like:
—————————————————————————-
#include <iostream>
int main() {
int x, y;
std::cout << “Enter 2 numbers: “ << std::endl;
std::cin >> x >> y;
std::cout << “Result: ” << x/y << std::endl;
return 0;
}
This program follows neither procedural nor object oriented principles. However, even if we don’t consider these, it is a terrible program. Think about the points below:
In a software project, the requirement specification document (SRS) provides guidelines as to what should be the approach to design. For non-trivial applications like the one above, we should, at minimum consider the above-mentioned points. Getting a clear picture of stuffs to consider designing a robust application can be challenging. It also depends on number and type of intended user-base. For a fellow programmer, comments and the signature of a method can tell the story; but for a naïve end-user, this would be impossible to comprehend. Hence, notifying the user of program behavior is very important as well. This adds to a robust application experience and can be further analyzed to make a quantitative guess of productivity of the system. Continuous delivery is the need of the hour and very much in demand.
With these in mind, the program can be rendered as below:
—————————————————————————-
#include <iostream>
int divideIntegers(int numerator, int denominator) {
if(denominator == 0) {
std::cout << “Sorry, denominator is zero” ;
return 0;
}
else {
return numerator / denominator;
}
}
int main() {
int x=0, y=0;
std::cout << “Enter 2 numbers: “ << std::endl;
try{
std::cin >> x >> y;
std::cout << “Result: ”
std::cout << divideIntegers(x, y) << std::endl;
}catch(…) {
std::cout << “Error in program”;
return -1;
}
return 0;
}
Social media applications are very popular these days. An application cannot rise to popularity with tendency to fail every now and then. In fact, it must take care of users from a wide range of social, cultural and technical background. One thing that stands out for this category of application is the constant change in requirement. Based on user experience analysis and feedback a feature may need to be entirely re-designed. For instance, one of the most fundamental feature is logging in. Adding ability to login using mobile number instead of a username would be a daunting task. Also, if the application has global outreach, each government of country may have specific rules to be applied or extra validation to be done. We can characterize a few extreme situations:
These sorts of situations put the system under stress. Now, this stress can result in A) internal fault and B) external fault. Any individual component that the system is comprised of can develop a snag. This in turn makes the components downstream to process wrong and incorrect data. Eventually the entire system fails. Corrupt data, communication channel pointing to different endpoint, memory leak, loading of wrong configuration values are some of the symptoms of erroneous design. The end-user may not notice an internal fault unless he has a monitoring alert configured for it. But consequently, the fault generates further faults in components and ultimately can result in system failure. So, we can see that even though we handle exceptional scenarios at a granular level, it is the chain of responsibility that defines what happens with it in the long run. We can employ a combination of monitoring and notification tactics to overcome such a situation.
The environment plays a key role in terms of the external faults getting generated. From bad input data to going out of memory, from running out of disk space to network cable getting unplugged, all fall under this category. Each module should have strong validation rules so that before starting to process first check should be done to make sure it can be processed. The architects use various tactics like ping/echo, heart-bit etc. to periodically take a stock of system health. Serialization provides a way to the application to dump the current state into a binary file in case of sudden failures like power outage. This can later be loaded up to resume processing from it left off. System as well as application logs are scanned to analyze most common means of failure and this knowledge can later be used to enhance the system. This way of dynamic self-healing mechanism is also catching up in the research of robustness quality.
Robust architecting involves handling all possible combinations of scenarios that potentially can make system unstable and make sure it doesn’t end abruptly. The steps to be performed in case of non-natural termination has to be well-defined. Often these steps are combined and a module is formed which can be invoked if any of the termination condition is met. Memory deallocation, temporary file system cleanup saving changed configuration data – all fall under this set. Another effective way is using tactic of separation of concern. This may identify smallest possible unit of the application that can act independently and run them in detachment from the main application. This helps the main application keep track of failed sub-processes and take necessary steps. The Google Chrome browser is an apt example of this. Each tab is responsible for handling specific URLs and thus there is a web application to tab mapping. Each tab runs as a separate process in its own address space and when an application crashes, it only takes down the tab and the application window can notify the user, who can try to restore (i.e. relaunch the application).
Assessment of Software architecture robustness can be done in many ways. We can have quantitative as well as qualitative evaluations. In this paper, we will figure out sub-factors of robustness attribute and discuss about what are the ways to measure them by distributing them among the two lines. One thing to note here is that specified requirement has the highest priority during the design phase.
Robustness & other quality attibutes
Although quality is an abstract term, we humans have always found ways to justify marking one item to be of better quality then another. For physical products, we can distinguish based on size, color etc. When it comes to software products, we should come up with parameters of measurements. And we can find out which parameters affect qualities in which way. For instance, increasing virtual memory of a system can result in better performance of the application; or maybe using a different algorithm. Quality also can be specified with respect to how closely it conforms to the requirement. The first one is referred as ‘Quality of Design’ and the second one as ‘Quality of conformance’. In modern world, software producers strive to provide best user experience to their clients. Naturally, quality attributes and their inter-dependency has caught eyes to researchers who have worked to model them.
Many efforts have been made to map quality models. Earliest of them all is probably McCall’s. He listed high-level factors to present a view of quality. Boehm extended it and added a level of abstraction for utility. He also lists two-stage constructs of quality attributes. The ISO 9126 is also based on McCall’s model that hierarchically categorizes six main qualities into sub-characteristics. These however, don’t explicitly mention robustness as an independent quality. Karl Wiegers, in his taxonomy has listed it and mapped it to the criteria like error handling and hazard analysis.
As we go through the attributes, we can imagine some of them having some common sub-characteristics. Measuring those closely related quality attributes would thus involve common steps. Let us contrast robustness quality attribute with two of its most closely related ones.
Reliability and Robustness:
Reliability and dependability are interchangeable concepts. It refers to the quality of consistently producing same output for given set of inputs. If, for some reason it fails on a certain portion, the same behavior can be expected as long as the design remains same. Some experts claim robustness as a part of reliability. This would imply that a system will always be reliable if it is robust. We know from experience that it may not always hold. For instance, a system can continue to operate for a prolonged period without any failures but differ in output for same input from different sources from different time. That is, there can be inherent bugs that render the output but never generate faults.
Reliability has been widely covered in the literature, most notable in the handbook of reliability engineering. There is no doubt though that the two, share mutual interest and the design goals for both cross roads in characteristics like – Fault tolerance, exception management, recoverability etc. A fault that makes output of the system contradict the expected result creates a reliability issue. Robustness specification can tell us how that kind of faults can be handled. We can further work to make amendments to the system so that we gather the correct response in the handler section. This way a robust system can be very reliable.
Availability and Robustness:
Another quality attribute that grabs a lot of spotlight and has shared characteristics with robustness is availability. Bass et al say that “The availability of a system can be calculated as the probability that it will provide the specified services within required bounds over a specified time interval.” Now, a system can be shut down intentionally or it can unexpectedly terminate. It will be said that the system adheres to availability requirements if total uptime remains within the limits. We cannot say the same thing for robustness. If an abnormal input brings the system down, it fails to maintain the quality requirement. However, fault management occupies indispensable part of both the attribute.
If we have a web server with only a limited number of requests per day, it is likely to remain available unless an internal bug makes it crash. If the number of requests has a steep hike or the server is relocated to a new place, then it has to handle different kinds of input and chances for failure increases manifolds. So, we can argue that a system that is not robust is more likely to be unavailable. A system can be unavailable but robust in quality as well.
Fig: The three quality attributes share characteristics.
The triangle marks common to all factors like fault tolerance.
characteristics and measuring of Robustness
Levy et al have paved the way for demonstrating how sub-characteristics can be used to assess software architecture for quality attributes. Based on their work, we now disclose the characteristics for robustness and discuss about how they can be used to find out qualitative and quantitative measure of robustness.
The motivation comes from the ISO 9126-1 quality model, in which sub-characteristics have been given for quality characteristics like Functionality, Efficiency etc. It is noteworthy that some of the sub-characteristics are re-used across multiple quality characteristics. As we mentioned in our comparison section, we would want to re-use the fault tolerance characteristics. However, we will rather introduce an umbrella characteristic called ‘Fault Handling’ which not only deals with tolerating but also incorporates fault reporting/notification. Compliance to requirement continues to be a sub-characteristic consistent with others.
Robustness
Fault handling | Productivity | Failsafe behavior | Compliance |
Let us go through each of them and explain what are they and why they are considered a sub-characteristic.
Robustness scenarios and tactics
Source
Response measure
Artifact: Process
Stimulus Response
Exception Issue logged
Invalid Handler User notified Resume
Input Invoked next
operation
Fig: Sample concrete robustness scenario
The above figure, we can see an instance of robustness scenario in which when a user inputs some invalid data, an exception is raised in the process. The control then goes to the handler section which pushes a log message and prints appropriate message to the user and then goes on to the next operation.
The table below describes general scenario for robustness.
Portion of scenario | Possible Values |
Source | People, hardware, Software, physical infrastructure, physical environment |
Stimulus | Fault: Exception, Crash, incorrect timing |
Artifact | Processors, Communication Channels, Persistent storage, processes, RAM |
Environment | Normal operation, Startup, Shut Down, Failsafe mode |
Response | Exception gracefully handled, resources deallocated, handles relinquished |
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more