Managing Software Development and QC/QA:

Software Quality

Oct 21 2008   12:14PM GMT

Role of Software Reliability Estimation in Software Design



Posted by: Zohair Chentouf
Software Quality

Recently, I had to decide about a design issue. A new functionality had to be implemented. Two modules were candidates for that. So, I had to chose either to implement it in one or another of these two modules. Obviously, such a design decision has to observe certain criteria. I have listened to my team leaders debating about this issue during a design meeting. Some of them have chosen one module while the others have chosen the other module. The most of their arguments were objective. And all of them had in mind real-time constraints and future maintainability of the whole system. Finally, I decided that the functionality had to go in one of those modules. In this article, I give account of how to deal with such a software design process.

My role as leader of both development and quality assurance at the same time allows me to put a bridge between the development and the test activities. In design meetings, I often come with more information about the quality of the modules we want to change and this helps to take the right decisions. As I said above, my team leaders have focused on software capacity (real-time) constraints and long term software maintainability. This is a good practise. But what I was able to figure out as another important decision criterion was the software reliability. And I had this information because I am always involved in the testing process.

Let’s now abstract the problem: how to put together the estimated software capacity, maintainability and reliability in order to solve such a software design problem? I would say that it depends on the software development context. The software production I’m personally involved in has objectives and constraints that are different from those which rule the software development in other organizations. That’s why another software architect in another workplace would not order those three criteria as I did. In regard of the nature of those two modules I had to choose among and the degree of capacity and maintainability they have already reached, reliability was more important than maintainability and capacity. Moreover, from what I know about those two modules’ maintainability, one among them was more maintainable but less reliable due to the specific environment (third party libraries ) it has to interact with.

My software design problem needed those three criteria to be addressed but other design situations may involve other software quality elements like robustness, for example. Whatever the criteria are, the designer has to keep all of them in mind. I personally prefer to write down criteria instead of relying on my unconscious reflexes to carry them.
Let’s now focus on software reliability estimation. As I said above, I got that information from my software quality process. Yet in the latter, reliability estimation is not often straightforward. Depending on the workplace’s goals and constraints, the software quality manager has to decide about the meaning of the software reliability conceptual elements which are the following:
- Failure’s severity (major vs. minor).
- Failure’s impact (system wide vs. local).
- Failure’s probability.
- Likelihood of recovery, time to recovery, extent of recovery (complete vs. partial).

Having well defined all those conceptual elements, the manager has to organize them into the software quality process and integrate them in his/her software quality metrics. The first two variables (failure’s severity and impact) could be available in the bug tracking system. The two other variables could need to be extracted from the system’s logs, usage statistics, and customers’ or beta testers’ reports.

Aug 8 2008   11:25PM GMT

Software Quality Assurance at Design Time



Posted by: Zohair Chentouf
Software Quality

Software engineering practise is called service engineering when it focuses on building telecommunications services as software components. Examples of popular services are Call Forward, Call Hold, Call Waiting, Voicemail, etc. Interestingly, service engineering is highly software quality oriented. In this post, I conjecture that the “general” SDLC (other than building telecommunications services) has a lot to learn from service engineering. Focus will be on software quality assurance at design time.

One of the differences between service engineering and “general” software engineering is the telecommunications Feature Interaction (FI) problem. The latter attracted the interest of a small international academic community mainly between 1985 and 2006. In this period of time, the problem has been thoroughly studied and many effective applications have been deployed by the big players of the telecommunications industry. Small companies and many of the big but emerging telecommunications companies often build development teams that don’t have previous telecommunications background and did never hear about the FI problem. In such conditions, there is a miss of an important process of software quality assurance.

A feature is a small service. A service is built by assembling several features together. For example, Call Forward is developed using the feature that receives a call request, plus the one that routes a call to a given destination, plus a timer, plus an announcer that plays: “your call is being transferred”, etc. The feature interaction problem (FI) is the undesirable situation that arises when two features or more, running together, interact so that one at least displays an unexpected behavior. FI are considered as software integration defects. For example, you have programmed your home phone to block calls attempted to a given number because you don’t want your kids to dial that number. This is called Call Screening or Call Blocking. Your little smart monsters however, discovered the benefits of FI a long time before. And now they are ready to make a suitable workaround. A friend has to program a forward to that forbidden number on his cell and then they have to simply dial that friend’s number in order to be forwarded to the forbidden one. This is a FI between the Call Forward service and the Call Screening service.

Any software development or quality assurance manager wants to see the maximum of software defects avoided at the earlier stages of the SDLC. I guess service engineering managers are among the happiest software managers in the world. That’s because detecting FI situations is done at the feature design time. So, the process is part of the SDLC. When a new service has been designed, a model of that design is compared with all the other service models that have been already deployed. If there is any FI, the service design is modified until there is no interaction. To model a service, languages like SDL or LOTOS are often used. Service model comparison is performed by an automatic formal verification. This is not our purpose here.

Let’s go to the most important part of the story: FI causes. The latter are the runtime software execution conditions that produce FI. I think that being aware of those causes can be useful at the design time and when elaborating the integration test cases.

The following is a list of FI causes. I will not give telecommunications world examples. Rather, I will try to prove their generality through general but real examples. I’m confident that readers will be able to easily project them on their own software conceptual sphere. Since we will think in “general” software engineering, “feature” will mean any software function.

1. Assumption violation

1.1 Feature A uses data that is supposed to be static. However, feature B can modify it. Example: in a billing system there are administrator accounts, sales agent accounts, user accounts, etc. Only the administrators and agents can change the prices. A new feature has been added in order to let sales agents have their own sales agents. The designers have forgotten to restrict second level agents rights in order not to change prices in the system.

1.2 Feature A is triggered by an event that is supposed to be produced under certain conditions. However, feature B can intercept that event and therefore the feature A will not run. Example: a server feature that is supposed to react to a given TCP packet but a newly developed feature is intercepting and modifying all the received packets on that socket.

1.3 Feature A gives a meaning to a data that is different from the one given by feature B. Example: SOAP client and server for which a given field has two different meanings.

1.4 Feature A uses a data that is supposed to be unique. Feature B violates this assumption. Example: an IP address is supposed to be unique but there is a Web interface screen that is allowing users to set the same IP for several network appliances.

2. Contradictory actions

2.1 Feature A has to perform an action that is forbidden by feature B. Example: the system admin can lock some database tables while some customers need to access them.

3. Ambiguous event semantics

3.1 Two different situations create the same event. Example: many implementations of SIP (which is a VoIP protocol) send back the response 500 Server Internal Error in situations where issuing a 404 Forbidden or 603 Decline is much better.

As “supplement”, the following is a FI cause for which I couldn’t find general examples. May be because I’m a telecommunications guy.

4. Race condition

4.1 Feature A is supposed to run on a given event timeout T. The new feature B has never to run if A can be run. But feature B is programmed to run on a timeout that is less than T. So, B will always run independently from A. Example: Voicemail programmed on 4 rings vs. Call Forward on 3 rings. If it rings 3 times and nobody answers, the call will go to the voicemail instead of being forwarded to the secretary.


Jul 11 2008   1:56PM GMT

Overcommunication, Hypocommunication, and Miscommunication



Posted by: Zohair Chentouf
Software Quality

My purpose here is to try to use the concepts of overcommunication, hypocommunication, and miscommunication to figure out some common workplace communication problems.

Modern management theories pay a lot of attention to communication. That is because communication is at the foundation of all the psychosocial processes at the workplace: command, control, cooperation, coordination, delegation, information and feed-back. In other words, whatever are the external business environment and the internal management and production processes, communication is at the core of all of them.

Too many people are talking the talk of communication notably because it is a multidisciplinary concept. And that is why, I think, people invented words like overcommunication, hypercommunication, and hypocommunication. Needless to use a dictionary in order to understand what those words want to say. They are trying to address the quantitative aspect of communication. Miscommunication is another new word, which points out the situation where A tells B something and B understands some other thing that is completely different. Let us now try to use the concepts of overcommunication, hypocommunication, and miscommunication to figure out some common workplace communication problems.

I think everybody will agree that communication has to conform to the right quantity. If you do not have enough communication in your organization or workgroup, one or more of the psychosocial processes will fail and you will be at risk for not having the job done. Such a situation may be described as hypocommunication. Too much communication in the workplace, however, may generate several negative impacts whose seriousness depends on the communication content, the communicator roles, and the contextual elements. We can call that overcommunication. For example, I hold a monthly meeting with my developers in order to brief them on the company’s business news. Once a month is just enough communication. Overcommunication would be to do that every week. Hypocommunication would be a once-a-year event.

The misusage of communication does not relate only to its quantity. Communication must convey the right content between the right communicators at the right time. Otherwise, it may turn into miscommunication. This is a very subtle point. People often spend a lot of time talking about something they wrongly believe they have to talk about. Suppose you have a new project that starts after six months and implies coding in Smalltalk and your team has never used that language before. The following are miscommunication situations:

- Two developers who spend time discussing about Smalltalk then surfing the Web in order to find articles about it. This is miscommunication because it is not the right time to do it.

- You spend too much time talking about Smalltalk and OO programming with the sales manager. This is miscommunication because the latter is not supposed to learn Smalltalk in detail.

Such situations are really waste of time, effort and focus. Things become even worse when miscommunication causes conflicts and frustration. That may happen when senior management is involved. Here are some examples:

Senior management assigns tasks to developers without going through the manager. This would harm:
- The work schedule
- The manager’s authority

Senior management takes work effort estimation from developers without going through the manager. This would affect:
- The manager’s credibility because developers often try to impress the senior management and thus their estimation is too optimistic
- The manager’s authority because this would encourage developers to compete with him

Senior management attend software design meetings. Because developers often try to impress the senior management, such an event would affect:
- The meeting duration, because developers would talk too much
- The meeting agenda, because discussion would turn into arguing own opinions instead of objective debate
- The manager’s authority because developers would try to show they are smarter than him
- The team members’ relations, because some developers would try to demonstrate that others are wrong
- The meeting outcome because developers would push for wrong decisions just because they do not fit with others’ opinions