EU’s New AI Regulations Spark Debate on Data Transparency

The EU's AI Act, recently passed, will be implemented in phases over the next two years, allowing regulators and businesses time to adjust to new obligations.

Brussels: A new set of regulations governing artificial intelligence (AI) in the European Union is set to compel companies to disclose more about the data used to train their AI systems, exposing a typically secretive aspect of the industry.

Since Microsoft-backed OpenAI introduced ChatGPT to the public 18 months ago, there has been a surge in interest and investment in generative AI, which includes applications for generating text, images, and audio content. However, as the AI sector expands, concerns have arisen regarding the sources of data used to train these models, particularly whether using copyrighted material like bestselling books and movies without permission constitutes a legal breach.

The EU’s AI Act, recently passed, will be implemented in phases over the next two years, allowing regulators and businesses time to adjust to new obligations. A contentious provision mandates that organizations deploying general-purpose AI models, such as ChatGPT, must provide “detailed summaries” of the data used for training. The AI Office plans to release a template for compliance by early 2025, following consultations with stakeholders.

AI companies, however, are reluctant to disclose their training data, arguing it is a trade secret that could give competitors an unfair advantage. Matthieu Riouf, CEO of AI-powered image-editing firm Photoroom, likened it to safeguarding a secret recipe known only to top chefs.

The granularity of these transparency reports will significantly impact both smaller AI startups and tech giants like Google and Meta, who have staked their future on AI technologies.

Also Read | Reliance’s Jio Platforms Clears Regulatory Hurdle for Satellite Internet Launch in India

Sharing Trade Secrets

In the past year, tech giants including Google, OpenAI, and Stability AI have faced legal challenges from content creators alleging unauthorized use of their work to train AI models. While U.S. President Joe Biden has issued executive orders addressing AI’s security risks, copyright issues remain largely untested, with bipartisan calls in Congress for tech firms to compensate rights holders for data usage.

Amid mounting legal scrutiny, tech companies have struck content-licensing agreements with media outlets and websites. OpenAI, for instance, has partnered with the Financial Times and The Atlantic, while Google has deals with NewsCorp and social platform Reddit.

OpenAI faced criticism for its handling of data transparency, such as refusing to disclose whether YouTube videos were used to train its video-generation tool Sora, citing terms and conditions.

Thomas Wolf, co-founder of AI startup Hugging Face, supports greater transparency but acknowledges industry divisions on the matter. “There’s still much to be decided,” he remarked.

Also Read | Apple Pays OpenAI in Exposure, Not Dollars, for ChatGPT Integration

Political and Regulatory Landscape

European lawmakers remain divided on AI regulation. Dragos Tudorache, involved in drafting the AI Act, advocates for public disclosure of datasets to protect copyright holders’ interests, emphasizing transparency as crucial.

However, there’s opposition within Europe. Under President Emmanuel Macron, the French government has expressed reservations about regulations that could impede the competitiveness of European AI startups. French Finance Minister Bruno Le Maire stressed the importance of Europe leading in AI innovation, cautioning against premature or ill-informed regulation.

Recent News