Study of 800,000 job applications finds even when algorithms used to enforce gender-balanced shortlists, impact on final hiring diversity is far less than expected
When it comes to hiring, Canadian HR leaders are increasingly turning to artificial intelligence in hopes of rooting out bias and building more diverse teams.
But new research warns that even the most sophisticated algorithms can’t overcome the deep-seated biases baked into hiring practices.
The study, “Algorithmic Hiring and Diversity: Reducing Human-Algorithm Similarity for Better Outcomes” looked at nearly 800,000 job applications at technology firms, finding that even when algorithms are used to enforce gender-balanced shortlists, the impact on final hiring diversity is far less than many HR leaders expect.
Understanding where bias comes from
As Sarah Stockdale, CEO of AI educator Growclass, explains, the core of the problem is not that AI is inherently biased, but that it reflects and amplifies the biases already present in human-led processes and historical data.
This explains why even with sophisticated selection algorithms, the process can be canceled out by a biased human decision.
“It's not that the AI has the bias. It's the humans that have trained the AI,” Stockdale says.
“The LLMs that these tools have been trained on have the bias baked into it, and then we layer on our biases as hiring managers on top of that. So we will train these tools on what we are looking for, and AI is just a mirror of our bias.”
Wendy Cukier, professor of strategy at Toronto Metropolitan University, agrees, adding that HR itself contributes its own bias to the process.
“HR has embedded bias in terms of gender, in terms of race, in terms of everything,” she says.
“So until you grapple with the fundamental biases that are baked into HR, it's hard to say, ‘Well, we're going to fix this one tool, and it will make everything good again.’ Because there's bias at every stage, whether it's human or machine-based.”
The limits of algorithmic shortlists and correlation
The authors of the study found that even when algorithms are used to enforce a 50/50 gender split in shortlists, the effect on final hiring outcomes is modest; when there is close correlation between the hiring manager’s criteria and the algorithm’s: “Equal representation at the shortlist stage does not necessarily translate into more diverse final hires.”
In technical roles, the authors point out, where the alignment between algorithm and human criteria is often highest, the impact of algorithmic shortlisting on diversity is the smallest.
Stockdale explains how an enforced gender split algorithm can convince hiring managers that diversity is ensured, leading them to stumble at the final stage.
“A lot of the logic feels good. It feels like there's more women being interviewed, and that means that more women are potentially going to get these roles,” Stockdale says.
“That actually doesn't pan out … it does help more women get hired, but the impact is a lot smaller than you would think it was.”
Bias in interviews and job design
The research also highlights that bias doesn’t end at the screening stage, with interviews, in particular, remaining a major source of subjectivity and discrimination; Cukier calls traditional interviews “inherently biased” as they tend to favour individuals who are similar to interviewers.
They also favour candidates who present in traditionally professional ways, she adds.
“What interview processes essentially evaluate is a performance, and how good an actor you are,” Cukier says.
“Very often, interviews are ways of covertly assessing whether you like someone, whether you think someone will fit, whether you think someone is like you or not. So the use of interviews, as opposed to competency-based assessments, for instance … is a problem, in my view.”
She adds that before technology is even applied, the very structure of job postings and requirements can reflect and perpetuate systemic bias: “The way in which jobs are designed and the qualifications that are defined in those jobs often are a result of historic practices.”
The funnel effect: bias at every stage
The research paper describes hiring as a funnel, with bias potentially entering at every stage, from resumé screening to interviews to final selection.
Stockdale explains, “You are still going to funnel out really diverse candidates … there's different stages, where the combination of your human bias and the tool’s bias can weed out people who would be qualified and probably great for your organization, but you're not training it to surface those folks.”
Cukier points out the persistence of bias even in the most advanced organizations, citing Google’s infamously failed attempt at diverse hiring through AI as an example of how easily these strategies can go wrong.
“If one of the most sophisticated AI companies in the world can't figure out how to address bias in the algorithms in its hiring processes, there's not much hope for others,” she says.
“You need much more aggressive strategies to detect and fight bias, because there's just so much evidence now that just reinforces the fact that most AI tools reinforce bias, whether it's gender bias, whether it's bias against newcomers.”
Despite these challenges, both experts see a role for technology in making hiring more fair, if it is used as a tool, not a replacement for human judgment.
Cukier warns against over-reliance on AI to address inherent human bias, but says it can be useful in moderate applications, such as longlisting applicants: “I can see where the tools might be used to take you from 2,000 applicants down to 50 or 100, but it would be a big mistake to rely on the tools to take you down to the top 10 or the top three.”
Don’t ditch DEI: training and policy matter
Both experts stress that technology is not a substitute for strong diversity, equity and inclusion (DEI) policies and training. Stockdale points to recent backpedalling on DEI training in Canadian organizations as a “huge mistake,” urging employers to continue DEI strategies to ensure fair and equitable hiring practices.
“Maintaining DEI training, not just when it comes to equity in the hiring processes, but broadly throughout your organization, you are more likely going to have people who can pick up on where there might be gaps, or where there might be blind spots in your process,” says Stockdale, adding that training around the tools themselves is essential.
“The vast majority of folks on your team who are using these tools do not understand how they were made and how they were trained,” she adds.
“What you need to do as an employer is make sure that a) there's broad diversity, equity and inclusion training at your company, and b) that you have AI use policy and AI education throughout if you are going to be using these tools, especially when it comes to sensitive things like hiring.”
Cukier highlights how certain algorithmic criteria can disproportionately harm women, immigrants and refugees, explaining how such factors as employment gaps, common with these groups due to economical and societal reasons, are often seen as disqualifiers.
For this reason, Cukier stresses, human oversight is essential.
“The key is, they're not using it to do their hiring, they're using it to support their hiring processes,” she says.
“It's important to think of AI as an assistant, and this is true whether you're doing research, whether you're doing screening. You also always have to have a human in the loop.”
Building better hiring processes through third parties and intentionality
So what can HR leaders do to reduce bias in hiring, using technology as part of the solution? Stockdale recommends intentionality as a guiding light.
“[It's about] the more thoughtful and intentional you can be when you are putting together these tools and putting together your criteria for what a great employee at your organization looks like,” she says.
“Not only looking at that from an objective lens to see if your own bias is seeping into there, but getting a third party, having someone check your bias, on your team, having someone in HR take a look at it, and then really understanding how these tools work.”
Designing algorithms that intentionally counter the correlation effect can be useful, Cukier says, but she strongly recommends they are tested internally before being implemented.
“Basically doing test cases …'Here's a pile of resumes. This is the result that we get if we go through them the old-fashioned way,'” she explains.
“'Here's a pile of the same resumes. What happens when we feed them into the system?' And then look at the variance.”