Enterprise IT Watch Blog

Feb 3 2015   9:27AM GMT

Probabilistic programming languages in data science

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh

Data Science
Programming Languages

Programming language image via Shutterstock

By James Kobielus (@jameskobielus)

Data scientists are key programmers in the new era of big-data and cognitive-computing applications. They specialize in those business problems that are addressed in whole or in part through with statistical analysis.

As with any programmer, a data scientist’s core job is to specify the structured, repeatable logic that drives business computing applications. The key practical difference between data scientists and other programmers is that the former specify execution logic that is grounded in probabilistic application patterns. By contrast, traditional programmers specify deterministic application logic, such as if/then/else, case-based and other rules that were deduced from functional analysis of some problem domain.

Data scientists do statistical analysis, which is all about probabilities and uncertainties. An application instantiates a probabilistic pattern when its execution rules incorporate statistical models that are grounded in uncertain inputs (e.g., customer behavioral propensities revealed from historical data) and/or uncertain outcomes (e.g., customer likelihood of accepting specific offers over others within various circumstances).

In keeping with this professional focus, most data scientists use statistically oriented languages, such as especially R, and other analytic modeling tools such as SAS, SPSS and Matlab. In addition, some data scientists may also use probabilistic programming languages, such as those discussed in this recent article.

Probabilistic programming is an emerging new approach that is still unfamiliar to many working data scientists. These specialized languages facilitate the specification of Bayesian reasoning in the programming of machine-learning models for applications with uncertain data or outcomes. To enable this, the languages include operators for inferring probability distributions from uncertain data sets. The languages may support estimation of distributions via sampling; direct computation of them via value flow analysis and other techniques; and/or inference of distribution in spite of the absence of key variables, via machine learning and other approaches.

In a world where more application logic is derived–aka “learned”–at run time from probabilistic patterns found in multistructured data, probabilistic programming is indispensable. Cognitive computing applications, in particular, depend on probabilistic programming to specify, for example, how user experience (UX) interfaces should dynamically adjust to reflect changes in users’ browsing behavior, sentiments, intentions, locations, and myriad other situational variables. Every one of these variables is probabilistic in isolation, and in combination their shifting mosaic may render it build apriori UX logic that optimizes each user’s satisfaction under ever possible dynamic circumstance.

If you’re a working data scientist, you need to incorporate probabilistic programming into your core repertoire. Here’s a good technical paper on the topic for data scientists and other programmers who want to bootstrap their understanding without delay.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: