The Worrying Trend in AI We Need to Watch Out For

One of the most critical concerns emerging in AI development isn't hallucinations or misalignment — it's something quieter and more structural: the quest for new sources of training data. As AI systems advance, their data requirements expand significantly, creating opportunities for tech corporations to consolidate information across their subsidiaries in ways that should make us uncomfortable.

Three Examples Worth Watching

X (formerly Twitter). By integrating Tesla and Starlink data with Twitter's social graph, X can build sophisticated models that combine personal communication data with satellite communication patterns and real-world driving behavior. None of this was explicitly consented to by users of any individual service. The combination is more powerful — and more invasive — than any single dataset alone.

OpenAI. Access to Apple's ecosystem provides access to user habits, preferences, and even biometric data across more than a billion active devices. This enables powerful AI systems, but the potential for misuse is substantial. Users interacting with Apple's services didn't sign up to train OpenAI's models — and the lines between what's permitted and what's exploitative are blurring.

Google. YouTube's video repository offers extraordinarily rich training material — billions of hours of human speech, behavior, and expression. Consolidating such data under a single corporate umbrella can lead to monopolistic practices that no individual competitor can match, entrenching advantages that compound over time.

Data centralization within tech conglomerates increases corporate power while obscuring transparency regarding how information is actually being used.

The Core Problem

The core problem isn't that any single data source is inappropriate — it's that the informal consolidation of multiple sources under one corporate roof happens without adequate oversight or public understanding. While this consolidation currently occurs informally, Swain argues regulators must implement preventative frameworks now, before the structural lock-in makes intervention practically impossible.

What Needs to Happen

Balanced innovation policies are needed — policies that emphasize data privacy and ethical use alongside regulatory oversight, robust consent mechanisms, and transparent policies. The goal isn't to stop AI development. It's to ensure the fuel powering that development doesn't come at the cost of the people whose data makes it possible.

The window for building these frameworks is narrowing. The time to act on this is now, not after the consolidation is complete.