AI systems for Ontario doctors hallucinate: auditor general

Evaluation of 20 platforms highlights inaccuracies, fabrications that could result in 'inadequate or harmful treatment plans'

AI systems for Ontario doctors hallucinate: auditor general

AI systems used by Ontario doctors hallucinate, mis‑record prescriptions and miss key mental‑health details, Ontario’s auditor general has found in a new report that raises governance, training and bias questions.

In a special report released May 12, Shelley Spence concludes that the Ministry of Public and Business Service Delivery and Procurement “did not have consistently effective processes and procedures in place” to manage AI across the Ontario Public Service (OPS).

The performance audit of AI use in government found gaps ranging from unsecured use of public generative AI (GenAI) tools, to low take‑up of approved systems, to limited attention to bias and environmental impacts.

Hallucinations and errors in AI Scribe tools

As part of an AI Scribe vendor‑of‑record process for health‑care providers, Supply Ontario and partners evaluated 20 transcription systems that generate clinical notes from recorded patient visits.

Auditors report that evaluators’ comments “highlighted inaccuracies” in the system‑generated notes. Nine of 20 systems produced hallucinations, fabricating clinical information such as referrals and test orders that had not occurred. In one example, notes stated there were “no masses found” or that a patient had anxiety, even though these issues were not mentioned in the recording.

Twelve of 20 systems captured “a different drug than what was prescribed by the doctor,” while 17 of 20 missed key mental‑health details in at least one test, and six missed mental‑health issues across both tests.

OntarioMD has issued guidance directing doctors to manually review all AI‑generated notes, but AI Scribe products are not required to include a sign‑off feature to confirm that review.

Spence warns that inaccuracies in AI‑generated medical notes “could potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes.”

With AI scribe technology, doctors and nurse practitioners spend 70 per cent to 90 per cent less time doing paperwork, OntarioMD (OMD), a subsidiary of the Ontario Medical Association (OMA), previously reported.

Unapproved AI dominates staff usage

Within the OPS more broadly, the only approved GenAI tool is Microsoft Copilot Chat, deployed in a secure environment under an agreement that data will remain in Canada and not be used to train external models.

However, usage data examined by the auditor show that from April 22 to Aug. 18, 2025, “other popular GenAI websites made up 94 per cent of OPS staff’s usage and Microsoft Copilot Chat made up approximately 6 per cent.”

At the same time, Microsoft Defender logs show that between April and August 2025, 12,000 staff accessed about 400 AI‑related websites from government devices. Of those, “244, or about 60%, were deemed unsafe or unsecured” based on their security score. Some 15 per cent of those high‑risk sites hosted inappropriate, non‑work content.

The Ministry “had not implemented security controls to prevent OPS staff from inadvertently uploading Ontarians’ personal information” or sensitive business data onto those public AI sites, the report says.

A previous report noted that an Ontario hospital’s privacy breach involving an AI transcription tool revealed how organisational oversights can undermine even the strongest data‑protection intentions. According to an investigation into the incident, the breach resulted from "two critical security gaps."

Training gaps and browser work‑arounds

The OPS introduced a Responsible Use of AI course in January 2024, but by August 2025 only 1,800 of 55,000 staff – about 3 per cent – had completed it. The course covers GenAI basics, safe use of AI sites and “the risks of misinformation, bias and security issues,” yet it is not mandatory.

The audit also finds that protections built into Copilot Chat can be bypassed when staff use non‑default browsers. Devices are configured so that logging into the OPS network automatically signs users into Microsoft Edge, enabling Copilot’s Enterprise Data Protection settings. But when employees access Copilot through other browsers without their OPS credentials, “this feature can be bypassed” and data may be retained and used to train external large language models.

The report flags bias‑risk gaps in the Document Verification Service, a facial‑recognition system that will let Ontarians validate identity online to access services. Vendor testing was based on small, non‑representative samples, prompting concern that some demographic groups could face “higher rejection rates or delays” when using the system.

More broadly, the OPS AI Strategy, launched in November 2024, lacks detailed initiatives, timelines, measurable outcomes, explicit bans on high‑risk AI uses and environmental considerations, according to the audit.

The Ministry has agreed to all five recommendations directed to it, including making AI training more robust, validating vendor testing for bias, and strengthening the AI Strategy. Supply Ontario has agreed or partly agreed to all five recommendations related to AI procurement.

Meanwhile, Minister of Public and Business Service Delivery and Procurement Stephen Crawford insisted the hallucinations had taken place in testing and training — not during medical appointments.

“That’s essentially when we’re undergoing the training mode to see whether we’re going to use the scribe or not,” he told reporters, according to a Global News report. “Let’s be very clear about that, that’s not actually in operational use with doctors, that’s in the optional stage where we’re reviewing the various scribes.”

Here are the recommendations from the auditor general’s report:

#

Directed to

Recommendation

1

Ministry — Cyber Security Division

Review and block OPS staff access to unsafe and unsecured AI websites, and ensure all staff complete AI‑risk training, with refresher training as AI use expands.

2

Ministry

Set KPI targets for Microsoft Copilot Chat adoption, take actions to drive usage to those targets, and report the KPIs to management on a monthly basis.

3

Ministry — Cyber Security Division

Block use of Microsoft Copilot Chat on non‑default browsers (for example, Chrome and Firefox), and train staff on the risks of accessing AI websites through non‑Microsoft browsers.

4

Ministry

Validate vendor‑provided AI test results to confirm the sample size is sufficient and demographically representative of Ontario, and that testing meets all AI Directive requirements.

5

Supply Ontario

For future AI procurements, increase the weighting of criteria covering security and privacy controls (TRAs, PIAs, SOC reports), bias and accuracy, and set minimum passing thresholds for these criteria.

6

Supply Ontario

Review AI Scribe standards and guidelines from other jurisdictions and adopt best practices; require vendors to build in an IT control that forces users to attest they have reviewed the generated notes.

7

Supply Ontario (with the Ministry)

Obtain SOC 2 Type 2 and other third‑party security reports annually from all AI Scribe vendors; require such reports in all software procurements; and ensure evaluators verify that the reports contain the expected controls and assess any exceptions.

8

Supply Ontario

For future AI procurements, follow AI Directive principles by requiring vendors to provide evidence of bias testing — or by conducting independent bias testing — before selecting a system.

9

Supply Ontario

Require mandatory live demonstrations from vendors as part of evaluating future AI system procurements.

10

Ministry

Research AI strategies and standards from other jurisdictions, assess which foundational elements apply to the OPS, and incorporate them to strengthen Ontario’s AI Strategy.

 

Latest stories