When Karen Johnson told a barista that she was working on a data warehouse, the barista said, “Oh, where is it located?”
While most of us that work with software realize that a data warehouse is sort of a mega-database, not a physical building, we still might feel a little lost when tasked with testing data on a data warehouse project.
“A BI [Business Intelligence] project is like a foreign language. You can’t really see it. If you’re in management and you’re trying to staff for it, you’re not sure how to hire for it,” said Johnson, talking about the difficulty of trying to explain the skills needed for testing data on BI/DW systems.
I attended Johnson’s presentation this morning at STPCon titled, “Data Testing on Business Intelligence and Data Warehouse Projects.” She gives a quick overview in the short clip below.
[kml_flashembed movie="http://www.youtube.com/v/5QZYWwQ-yq4" width="425" height="350" wmode="transparent" /]
Johnson described a lot of the basics, including the term ETL: Extract, Transform, Load. She explained that this really was all about sucking data out of systems, transforming it to a common format, and then loading it into a database which will allow for some sophisticated analytics. One simple example she gave was with date formats. If you are working with multiple systems that are using different date formats, you would first extract the appropriate data, use an algorithm to transform the data into a consistent format, and then load it into a table in that common format so that reporting could be done.
So what should a tester look for when testing data? Here are some bug opportunities:
- data type mismatches
- boundary conditions
- data accuracy
- is the data current
- data refreshes
- data security
What about when you’re working in an Agile environment? Check out the interview I had with Ken Collier about his book, Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing.