Today, more data exists about how we work than ever before. If analyzed properly, this data provides valuable insights about the trends, forces, and inequalities shaping today’s workforce—from gender pay disparities to the remote work revolution hastened by COVID-19.
How these market movements affect companies and careers is central to Visier’s research into employee and workplace trends. It’s also the foundation of our Visier Benchmarks, which enables our customers to easily compare data about their own workforce to confidentially sourced benchmark data from Visier customers.
Visier also explores the use of customer data to generate predictions, specifically for customers whose workforce is too small to generate valuable training data on their own. In these instances, we combine the data from multiple employees as training data to make predictions for individual customers.
But all this data we work from isn’t just abstract numbers—it refers to real peoples’ lives, both at work and away from it. This is why data privacy is a high priority at Visier.
In my role as a Data Engineer, I work closely with customer data on a daily basis. Since my team is responsible for generating Visier Benchmarks and Insights reports, we have important responsibilities as the caretakers of each customer’s private data. In our daily work, we are always keeping confidentiality, privacy, and security at the forefront of our minds.
Privacy is simple, but protection is complex. Let me walk you through how we approach the complex and important task of protecting our customers’ identity and their employees’ personal information. First, know that we proactively embed privacy considerations and requirements in the design, development, and operation of our products and services.
Leading with privacy
At Visier, discussing how we anonymize our customers’ data forms the first step in building every new business relationship. Businesses meet with our customer privacy team before they share any of their data with our platform. This is also when they’re encouraged to ask any questions they have about our privacy and security measures.
Our goal is for customers to begin using Visier with a clear and unambiguous understanding of exactly how their data will be used and the steps that will be taken to ensure its security.
While workforce data may not seem as sensitive as, for example, financial or healthcare information, it does require the same level of caution and protection. Leaks of workforce data, especially salary information or demographics like age and ethnicity, can be damaging and embarrassing to companies and their employees. Visier takes these risks seriously.
Balancing anonymity and utility
Anonymity is the primary concern when working with datasets such as those Visier uses to generate insights. However, since data’s value lies in its ability to provide specific details, there must always be a balance found between a complete removal of risk (by withholding the data entirely) and optimum utility. This is known as statistical disclosure control.
There are a variety of methods for anonymizing datasets and they vary greatly depending on the intended use of the data and how it was collected. For example, data analyzed by government agencies for research and publication will be subject to different standards than information collected by a private company, like Visier, to be seen by a much smaller audience.
Visier follows a set of predetermined procedures for anonymizing our customer data without losing the features that makes it so useful and valuable. Here are the general steps that we follow to ensure that our customers’ data is anonymized, secure, and cannot identify individual employees:
1. Remove identifying information
The first step after information comes into the Visier research pipeline is to remove all personal identifying information (PII). PII includes details like first or last names, email addresses, employee ID numbers or social insurance numbers, and anything else meant to identify a specific individual.
Since this information would compromise the data’s anonymity — and doesn’t contribute anything to its value — we remove it from the dataset immediately.
2. Normalize numerical values
The next step is to generalize numerical data, such as dates, ages, or compensation figures, into broad ranges so they aren’t specific enough to be connected to individuals. For example, if an employee in a dataset started in a new role on March 17th, 2019, that could be generalized to March 2019, along with everyone else who started that month. Ages would also be condensed into buckets such as 18-31 or 32-41, as would salaries or wages.
While this step sacrifices a small amount of specificity, it retains enough detail to analyze trends within the population without making individuals identifiable.
3. Establish k-anonymity
Finally, we examine the anonymized data and make sure it meets standards of k-anonymity, meaning that there are at least k individuals in every sub-demographic within the dataset. In this context, k is a numeric value that varies depending on the needs of the data.
To continue the above example, to attain k-anonymity, there would need to be at least k individuals who joined the company in March 2019. If there were only a small number of people within that sub-group, they would be too easy to identify. In that instance, Visier will discard that entire grouping of data, to preserve the employees’ anonymity.
Data privacy creates better workplaces
At Visier, we don’t see data privacy and workforce data analysis as opposing goals. Rather, by following rigorous measures to protect our employees’ privacy, we can produce research that’s as valuable, specific, and useful as possible.
If procedures for statistical disclosure control are followed adequately, data collection and analysis becomes beneficial for individuals and companies—and eliminates risk. To create healthier, more productive workplaces and labor markets, we must invest time and resources into better understanding our people, and use insights that are safe and grounded in current, accurate data.