Skip to content

Search the site

Is Microsoft scraping Word and Excel data to train AI models?

"This can’t be real. It's unbelievable that they quietly enabled this."

Microsoft watchers will "Recall" another controversial AI feature introduced this year
Microsoft watchers will "Recall" a controversial AI feature introduced this year (Photo by Resume Genius on Unsplash)

Microsoft has quietly introduced a new default feature that scrapes and analyses data from Office 365 applications including Word and Excel, sparking a social media storm among tech industry figures who fear the information may be used to train AI models.

However, The Stack has learned that these concerns are inaccurate, and Microsoft has denied claims that it is feeding private information to large language models.

Data from Microsoft 365 can now be automatically downloaded and processed to deliver "connected experiences", which Microsoft said will "enable you to create, communicate, and collaborate more effectively."

Microsoft posted a blog with details on connected experience on October 21 which appears to have evoked memories of Recall, a tool that takes snapshots of users' screens every few seconds that was described as a "privacy nightmare".

Buried deep in the privacy section of the Office 365 settings is a connected experience option that is switched on by default for US customers (we could not immediately confirm for other countries). This menu also boldly states: "Your privacy matters."

If users do not expressly choose to disable this feature, document data is harvested and processed - igniting panic that it could be used to train Copilot and other AI models.

Claims about Microsoft's data scraping circulated on social media over the weekend.

"This can’t be real," wrote Florian Roth, Head of Research at Nextron Systems. "While organizations are busy enforcing AI policies to protect confidential data, Microsoft quietly enables this by default and labels it ‘Your privacy matters.’"

"It's unbelievable that they quietly enabled this while everyone was focused on their 'Recall' AI feature," Roth continued.

Microsoft is vague about what it does with the data it collects - although there is no doubt that it is gathering the data.

"Connected experiences that analyze your content are experiences that use your Office content to provide you with design recommendations, editing suggestions, data insights, and similar features," Microsoft explained.

READ MORE: Microsoft "empowers" IT admins to remotely battle the blue screen of death

What are connected experiences?

We reviewed Microsoft's "experiences" and could not find one that explicitly mentioned training AI data (but please get in touch if you know otherwise). The use cases mentioned that "analyse your content" include the automatic application of sensitivity labels or a feature that scans business cards to extract information.

One concerned user asked a question about Microsoft's AI training policy on its support forum, where an agent admitted its stance wasn't entirely clear.

They wrote: "As of my knowledge, Microsoft has not provided extensive publicly available details about the specific types of personal information used to train their AI models. The information surrounding data usage for AI training can often be vague due to the nature of privacy policies and terms of service.

"However, we can only find general guidance on how Microsoft handles personal data. But due to privacy it may not provide granular details on data specifically used for AI training."

In its privacy statement, Microsoft says: "As part of our efforts to improve and develop our products, we may use your data to develop and train our AI models."

READ MORE: Microsoft warns of $1.5 billion OpenAI loss as it "turns away" GPU business

On a page explaining its policy on AI training, it promises it does not train Copilot AI models from "our commercial customers, or any data from users logged into an organizational M365/EntraID account" or "users logged in with M365 personal or family subscriptions".

A Microsoft spokesperson explicitly denied claims that it was using M365 data to train AI models and said: “In Microsoft 365 consumer and commercial applications, Microsoft does not use customer data to train large language models without your permission.”

This story was updated at 18:30 on November 25 to reflect Microsoft's denial of claims it was training model on data from Word and Excel.

Join peers following The Stack on LinkedIn

Latest