Posted by: Jack Vaughan
Last month we talked with SPSS. The company is in the news at the moment, as IBM has launched a $1-billion-plus bid to buy the firm.
When we spoke, the topic was XML standards for analyzing data. This is one of the places where SPSS is clearly in the vanguard.
SPSS reps discussed Version 4.0 of the Predictive Model Markup Language (PMML) for statistical and data mining models from the Data Mining Group (DMG). The company is incorporating PMML V.4.0 into upcoming versions of its PASW Modeler (formerly Clementine) data mining workbench and PASW Statistics (formerly SPSS Statistics).
PMML Version 4.0 is an amazing example of what XML (the “X,” recall, is for “extensible”) can do. The newly released version of PMML offers support for time series models, support for multiple models (both segmented models and ensembles of models), and improved preprocessing of data. Preprocessing of data here refers primarily to “outliers,” those dubious pieces of data that work for Malcom Gladwell but which tend to muck up the usual works of statistical analysis.
Going forward, said Jing Shyr, SVP & Chief Statistician at SPSS Inc., PMML will allow corporations to more readily embed statistical models into operational systems that enhance business processes. Shyr takes her role as chief SPSS statistician seriously, but has some fun. “If any number looks suspicious, it’s my fault,” she told us.
What does the Data Mining Group’s Bob Grossman think of the latest moves with PMLL, and how these might play out in typical corporations? “With Version 4.0, PMML now handles all of the common use cases that occur when deploying analytic models in practice,” Grossman said in a prepared statement.
We asked for more and he responded via email. His response: “PMML calls multiple models, which include segmented models and ensembles of models. After that the most important new features in PMML are improved support for preprocessing data and for time series data.”