Uncharted Waters

Apr 9 2019   7:29PM GMT

The Data Comparator, Your Long Lost Friend

Matt Heusser Matt Heusser Profile: Matt Heusser


Comparator: Dolls

It seems like every time we do large-scale test tooling, we also end up writing a data comparator program. That is, program to compare two different things to determine if they are the same in the ways that matter. No, that’s not a typo. The middle step, where we “zero out” the differences that do not matter, is something I call data swizzling. Sadly, when I talk about data swizzling, most people think I am talking about data on how to cook a steak.

This problem of how to compare the “different” to determine if the differences “matter” turns out to be an incredibly common problem in software testing. Without the terms and concepts, teams generally end up inventing their own way of doing it, writing code that eventually becomes a gnarly mess. They reinvent the wheel.

Here’s the basic use case for a comparator, and how to do it.

Data Comparator: What

If you think about it, after the second build, seeing if the differences matter is “the ballgame” for software testing. We only have three components. First, what the software did before that it is supposed to still do. Second, that the new functionality works as expected — “the good difference.” We also have the bad differences, which are functional and regression bugs. Of course we can find the bugs that were always there. Still, all the other bugs are “the bad difference.”

At the user interface level, that is the game. We look for differences, oddities, corner cases, things that are not right. Sometimes we can offload some of that to software.

Then there are things below the user interface level. API responses. Customer Listings. Order history details. These all look the same, except, well, something is different. The order number is different. Today’s date is different. The time is different. In the case of windows, the operating system or the size of the screen is different. None of that matters – except the font. If the font is different that totally matters.

What we need to do is swizzle some of the data, the data that doesn’t matter. Then we compare what is left.

Sometimes, the swizzle-ing needs to be pretty smart. The values might be computed. Even though they are different, and that is okay, they still need to be correct.

Data Comparator: How

There is more than one way to do it, but here are a few common examples.

First, you can split the object up into component parts. For example, a spreadsheet is basically rows and columns. In memory, this could be a data structure (“struct”), where each column is a named element, and the rows are an array or list.

The data swizzle-er runs through each element in each structure, making a change or not. Dates might turn into sysdate or 1/1/1900. Integers that are sequence values in a database might be converted to one (for the first) and then add one for each value. If you separate that from the compare, the compare is a straight in-memory diff. If you don’t, then the comparator software will have a custom comparison function for each column.

A Data Comparator in Practice

Once you’ve created a comparator, the actual needs of what it has to do will slowly grow, until eventually it becomes “real” software. In the past, I have worked with teams that had to test their comparators, or looked into purchasing software from IBM and HP to do test data setup and compare.This is not a trivial topic, and this blog post is only a start. All I wanted to do today was to introduce the concept of the comparator, the swizzling process, and one way to do it. It is about giving you a language to use.

It is also about the immensity of the problem. Think about it: Code to know what differences matter. Data comparators are, in essence, a sort of rudimentary artificial intelligence. Here’s a good example:

A Comparator problem

Does the person in the middle matter? It might. It might not. It depends on the context.

The complexity of a good comparator is very often more complex than the software change itself.

The Final Analysis

All this reminds me of a scroll we used to have in the cadet office of the Frederick Composite building, twenty-five years ago. The scroll read:

“We the willing, led by the unknowing,
Have been doing the impossible, for the ungrateful.
We have done so much for so long with so little,
That we are now qualified to do anything for nothing.”

Do it right, and your comparator software can actually do that for you.

That’s kind of a big deal.


 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: