Thursday, May 2, 2013

Big Data and HR: Everything you might want to know (maybe?)




Big Data:

If you you feel like you are hearing a LOT about Big Data, you are absolutely right! The above graph shows the use of the term "Big Data"  on Google has exploded in the last two years. Not only has the term "Big Data" being used a lot in writing, the actual application of Big Data is changing the world. To give some perspective of how broad Big Data is affecting the world - it is even covered in this month's addition of Foreign Affairs!

Times have changed! 
I was really lucky to work at the Circuit City E-Commerce unit when the whole web thing was kicking off. We were at the cutting edge of retail, and had to figure out how to slow the tide of e-commerce fraud that was starting. It is fun to reflect on how the team developed a model that could predict which orders were fraudulent - and we built the original model in Microsoft Access! Looking back - I am even more amazed as we built a model that was better than Visa's system. 

So what is Big Data? 
Probably the best explanation of Big Data I have heard is N=All. It used to be that all analysis was based on samples of data. So there was all sorts of rules and statistics around figuring out whether your sample was actually going to be useful. Well now, since the information is already digitized, and computer storage and processing power is rarely an issue - now folks can do analysis using the entire set of data. 

How Much Data is there? 
Since we have moved from an analog world to a digital world, almost everything we do - creates a digital footprint that can be captured. Estimates are that 1.7 Megabytes of information is being created for every human every second! Think about your regular day, you are woken up by the alarm on your phone, you check e-mail, you get in your car which has a digital black box that records every step, your phone's GPS creates log file showing every cell phone tower you pass, at work you do almost everything on a PC, your conference calls are recorded digitally, your meals are paid for via credit card.  I am sure you get the point!

There is now so much data being created that the International Standards Body on Units and Measures can't keep up fast enough to describe the size. It used to be Gigabytes was a lot, then Terabytes, and then Petabytes.. Well now - it is up to Yottabytes - which is a septillion bytes!

Data Mining is Old School - it's now about Predictive Analytics!
As the amount of data has increased, and the processing power of computers has increased - the statistical methods have also kept pace. It used to be that doing correlation analysis was pretty hip, and if you really wanted to strut your stuff - you would get the Minitab application and do some multivariate analysis! Now, that is about dated as wearing a Members Only jacket!

So what is predictive analytics? I am oversimplying this, but it essentially is the following steps:

1. Take a huge set of data that has the results, a ton of variables, and divide it into two sets of data. 
For instance, make a file of everyone who left your company and every body still in the company and add every potential data point you can think of. Their office type (cubicle, office), their vacation usage, their performance data, pay, and throw it all in the pot. Now keep some of the data of the people who actually left out (so instead of 20,000 records, only use 10,000). This is like a recipe, keep the other 10,000 records to the side. 

2. Finding the causation variables
Run all sorts of statistic analysis against the existing employee data, and the 10,000 records of employees who left. Since the dataset already knows who stayed and who left - it is a matter of figuring out what the key variables are that led to the departure. It is here that the math/statistics game has gone major league. There are new models (K-means analysis, Random Forest) that now develop very, very complex math formulas that determine what variables were key. So you might end up with something that looks like:

If employee age<30 and mobile device was not a smart phone then attrition risk = 95%, whereas employee age>45 and parking space is .25 miles from office the attrition risk =85%.. 

3. Test the Fit
This is where the other 10,000 records come in. Now you run those records through the same math formula and see what the result looks like. Your results should be that the math formula shows all 10,000 employees leave.. but if not - then something in the math formula needs to be tweaked. This is where predictive analytics moves from being a science to almost an art! The folks that do this can spend a ton of time working on the right math.

4. Building the predictive model 
Now that you have the math equation, how do you make it predictive? To get the real gains of the analysis, you have to have this put into a computer program that looks at real-time data flows to let you know when something is going to happen. So using the above example, once you create the math equation around predicting turnover - you need to create a program that sits on top of your HR systems, monitors the data, and then pings the right Employee relations person because Jane Doe just got moved from an office to a cubicle, has made 13 calls, only sent one e-mail - and according to the math - has moved from a 30% to a 95% chance of turnover. 

Step 4 is critical! As being able to do the math is now fun conversation at a cocktail party, but it won't do HR any good if it is not calculated, and gotten to the right person so they can act on it. 

So what can Predictive analytics solve? 
It seems like everyday - amazing Big Data and Predictive Analytics are breaking new ground. Here are some fun ones:
  • Google was able to predict Flu outbreaks based on search terms being entered into Google.com (e.g. headache, etc.)
  • Target is able to predict which of their customers is pregnant based on purchase behaviors.
  • A Japanese scientist is using the data captured when you sit in a car seat and the driving style to determine whether the car is stolen. 
Big Data is a Natural Resource:
Eric Siegel is one of the pioneers of this space, and he has a great expression - which is Big Data is a Natural Resource. If you start thinking about data in that framework it helps to reorient your decision process to determine if you are using the resource efficiently and/or are you wasting it?

So what does this mean for Human Resources? 
Any field or discipline that creates data can benefit from Big Data and Predictive Analytics - and Human Resources is perfectly staged. The HR profession now has systems that allow us to gather the full employee life cycle experience, from their background, to number of e-mails, to performance ratings, to their pay, their managers, and on, and on, and on. And companies are starting to leverage this. Here are some great examples:
The Math is Not Universal! 
Here is one of the key learnings - the math equation is different for every company! So there is no easy way out on this one. Google predicted the 8 things that makes their managers successful - but that math will probably not work for your company due to different culture, business model, employee population, etc. 


The Resource Needed to Do This!
This new world of Big Data is creating a need for a interesting new role - which some are calling Data Scientists. This individual will have a combination of skills which make them exceedingly rare:
1. They have an analytical mind that allows them to think of the right questions to ask.
It is easy for someone to take a set of Ginzu knives to data and slice it every way possible, but if there is not a common sense understanding of the question being asked - it is all useless. 
2. They know advanced statistics and statistical software
Just to keep you as the coolest HR kid on the bus, there is a community of Data Scientists called Kaggle, and they did a survey of the most used tools - which are listed below:
3. They have to know programming languages to write the machine learning programs. 
There are all sorts of new programming languages to are designed to work with the huge data sets and the corresponding math. Python, and Hadoop are two such languages. 

4. Data Visualization
Andrew Marritt, founder of Organization View, also brought to my attention there is the growing need for a 4th skill. Which is being able to convert the data results into visualizations. Given the rapid success of the Tableau software or the HR Analytics Platform - Visier - I think Andrew is right on target. 

The Resource Scarcity:

So for the HR world, we need someone who knows Statistics, Programming, and HR! To give you an idea of how rare the role is:
  • There are only 5,695 Data Scientists on LinkedIn
  • There are 29,000 people on LinkedIn that have the word "Predictive Analytics" somewhere on their profile. 
  • There are only 1,295 people on LinkedIn that have the word "Predictive Analytics" and HR in their Profile. And as you can guess the Big shops like IBM, Accenture, SAP, Oracle and SAS account for about 200 of them. 

So what do we do? 

I got into a really interesting conversation about this on the HR Technology forum on Linkedin.  What seemed to be the overwhelming response was that the role is truly difficult to find and hire. Brad Hilbert, who is the CTO at Orca Eyes, explained that they have had good success with taking folks who have the strong programming and statistics background and teaching them the HR side.  

In the meantime, there are some pretty innovative solutions that are cropping up, such as Kaggle.com. Kaggle.com is an online community of Data Scientists that allows companies to post problems and prize money - and teams of data scientists compete to solve the problem and win the money. The prize money typically ranges from $3,000 to $25,000 but there are have been some million dollar prizes. So if you have the money, got the data, but not the resources - this might be an option for you.

Or if you have a team of crackerjack folks that you think could figure this whole thing out - have them give Kaggle.com a try. The community has some "training" tests that allow people to see how they do. For instance, the current training puzzle is trying to predict what variables were key in people surviving the Titanic sinking.

Additional Resources:

The below is complete thanks to Andrew Marritt. If you want to dig in further to the math behind predictive analytics and learn more - here are some free textbooks that get way into the details:
1. Advanced Data Analysis from an Elementary Point of View      
2. The Elements of Statistical Learning                

Additional Thoughts?

As always, if you have seen innovative uses of Big Data in the HR space or have any other feedback on this article - please let me know!

No comments:

Post a Comment