Software firm sees 14-per-cent jump in productivity using ChatGPT

Study provides 'first empirical evidence' of generative AI's impact in customer service setting

Software firm sees 14-per-cent jump in productivity using ChatGPT

While there’s been plenty of conjecture about the power of generative AI when it comes to workplace productivity, a new study provides proof.

In carrying out a test in an actual workplace, researchers from Stanford University and the Massachusetts Institute of Technology found that customer service workers given access to the tools became 14% more productive, on average, than those who were not.

The findings “demonstrate that generative AI working alongside humans can have a significant positive impact on the productivity and retention of individual workers,” say the researchers.

“Our paper provides the first empirical evidence on the effects of a generative AI tool in a real-world workplace.”

In first starting the study, generative AI was a lot less well known — before the meteoric rise of ChatGPT, says Lindsay Raymond, a PhD candidate at MIT in Boston.

And while there are a lot of software products or machine learning (ML) technologies, they “don't actually work that well when you take them outside very standardized experimental settings,” she says.

Plus, in looking at a business environment, improvements in productivity of just three to five per cent are considered large, says Raymond.

“Doing anything where people are involved, a 14% increase is… pretty meaningful.”

Background on study using ChatGPT

The AI system in the study used a generative model system that combined a recent version of ChatGPT with additional machine learning (ML) algorithms specifically fine-tuned to focus on customer service interactions. The system was also trained on a large set of customer-agent conversations and this data was used to look for conversational patterns that were most predictive of call resolution and handle time.

The model also prioritized agent responses that expressed empathy, surfaced appropriate technical documentation, and limited unprofessional language. Once deployed, the AI system generated two main types of outputs: real-time suggestions for how agents should respond to customers and links to the data firm’s internal documentation for relevant technical issues. In both cases, recommendations were based on a history of the conversation.

The AI-based conversational assistant used data from 5,179 customer support agents.

“We could compare with each individual, before they had access to the tool, and then after they had access to the tool, and then also between people who were not yet onboarded on the tool, versus those that had access to the tool,” says Raymond.

“That enables us to get at something that we believe is more causal, even though it's an observational study.”

Better results for low-skill workers

In the end, the greatest impact was on novice and low-skilled workers — with a 35-per-cent increase in productivity — while there was minimal impact for highly skilled workers.

“The AI tool is designed to pick up on what are successful things that are associated with helping a customer resolve any specific problems and those are the things that the AI turns into suggestions that it's providing for the workers,” says Raymond.

“Now, if you're really new or you're maybe not that great at the job, these suggestions are really helpful to you, because they provide guidance on what you should be doing,” she says, such as suggesting the documentation needed or transitioning to problem.”

But those suggestions are something that people who are really experienced, are already really good at their job, probably are already doing as best practice, she says.

“That's why we think the tool is really helping those who are both less experienced and lack practice.”

Do employers need a workplace policy for generative AI like ChatGPT? Canadian HR Reporter recently spoke with legal experts.

Improving chat quality, customer sentiment

The researchers also found there was a decline in the time it took to an agent to handle an individual chat and a small increase in the share of chats that were successfully resolved.

“Basically, we see a worker with access to the AI takes about two months to get the same place as a worker without access to the AI, which takes six months to get to,” says Raymond.

That could be because the call center workers have one-on-one coaching for several hours a week, but the manager don’t have time to review all of their conversations.

“Having this AI generate real-time suggestions might be a more effective way of learning things. And also, the AI can be present for every conversation,” she says.

“And then lastly, maybe the AI will notice things like ‘Oh, well, this seems to be a common problem. This is a document linked to the documentation that's most associated with a positive outcome. Let me surface that,’ while a manager might not because they're not reviewing all conversations all the time and might not necessarily pick up on that as quickly as an AI would.”

In addition, AI assistance improves customer sentiment, as it “markedly” improved how customers treat agents, as measured by the sentiments of their chat messages.

“This change may be associated with other organizational changes: turnover decreases, particularly for newer workers, and customers are less likely to escalate a call by asking to speak to an agent’s supervisor,” says the researchers in the study “Generative AI At Work.”

And the implications of the study could go beyond customer service agents,” says Raymond.

“Call centers are an area that has high adoption relative to the rest of the economy, because there's so much language data available. But you could imagine training people to use similar types of tools to maybe train people to drive or fly planes or other sorts of things where there's image or audio or language and video data, where the suggestion-based training could be used to make people better or make people who are less experienced better faster.

“And there's a lot of jobs where that could apply.”

Will tools such as ChatGPT or Google’s Bard replace human resources professionals? Experts recently gave their take in talking to Canadian HR Reporter.

How does generative AI handle management scenarios?

The Stanford/MIT study raises further areas for exploration, such as AI substituting for some of the training and coaching aspects normally associated with a manager, how many people the manager can handle and what managers can do if their time is freed up, says Raymond.

On that note, employer review site JobSage recently tested ChatGPT when it came to responses to several management scenarios. The tool was asked to write emails addressing issues such as sexual harassment, employee terminations and dress code — and each answer was rated by an HR expert, a legal expert and a management expert.

For sensitive management scenarios, nine of the 15 responses (60%) were found to be acceptable while six responses (40%) were given a net failure rating, based on three points for “outstanding,” two points for “acceptable” and zero points for “unacceptable.”

On average, ChatGPT performed better when addressing diversity and performed worse addressing compensation and underperforming employees. 

The tool earned its strongest marks in handling an employee being investigated for sexual harassment and a company switching healthcare providers to cut costs but performed weakest when asked to respond to an employee concerned about pay equity, a company that needs people to work harder than ever, and a company’s freeze of raises despite record payout to the CEO.

Very few HR departments in Canada are using generative AI, according to a recent study.

Lack of empathy an issue for ChatGPT

Overall, the results were mixed, though the tool seemed to do better when a straightforward action was required, says Katie Duncan, content manager at JobSage in Austin, TX.

“The biggest issue was the lack of empathy. And in those responses, ChatGPT kind of came off as harsh and uncaring and so… where you really want the more human elements, ChatGPT seems to be lacking.”

However, she admits that more prompting might have helped fine-tune the responses.

“We just gave it the scenario and didn't really prompt it on how to write it,” she says. “You can ask it to rewrite it in a more empathetic manner, that sort of thing. But we were just going for baseline if someone typed in the prompt.”

Overall, a tool like ChatGPT can be a good jumping-off point to get ideas, says Duncan, so “giving it feedback and having it rework that definitely seems to provide better results overall.”

“It's not worth the risk to put your full trust in it and just send out an email without looking it over and making adjustments where needed, or adding personalization, that sort of thing. On a base level, it's a good tool, but it definitely will always need some degree of refinement.”

In the last few months, it’s been “crazy” how fast ChatGPT has advanced, says Duncan, and the results are getting better by the week.

“To me, it really doesn't sound robotic; instead, it just lacks the empathy that sets it apart from human communication. As a manager, you're working with employees and teammates day in and day out, and ChatGPT will never have the human connection and relationships that co workers form with one another.”

Latest stories