What We Mean — and What We Don't
Our definitions are drawn from the AI Incident Database (AIID) Editor's Guide — the international standard for AI harm documentation. We adopt them here to ensure our database is comparable with global repositories and academically replicable.
- Physical health or safety harm
- Psychological harm
- Financial harm
- Damage to physical or intangible property (IP theft, reputational damage)
- Harm to social or political systems (election interference, erosion of trust in authorities)
- Civil liberties violations (unjustified punishment, censorship, unlawful surveillance)
This is a repository of AI-related incidents sourced from public media reporting. It documents only what surfaces in the public record — primarily events covered as potential crime or illegal activity by news organisations. It is not designed to capture:
- AI harms that go undetected by state institutions
- Systemic harms made invisible by inadequate legal frameworks
- Incidents reported only in regional or non-English language media
All incident counts should be read as a lower bound on actual AI-related harm in India.
Project Background & Motivation
This project was undertaken as a B.Tech Research Project (BTP) at IIIT Delhi, beginning in December 2024. The incidents in this database are dated from 2021 onwards — that is the time range we researched and documented — but the project itself was built and published in 2024–2025. It arose out of a recognition that no centralised documentation of AI-related harms in India existed. While global conversations about AI risk were accelerating, the lived experience of Indian citizens — facing deepfakes, voice-cloning fraud, and algorithmic bias — was going largely unrecorded in any systematic way.
The launch of ChatGPT in November 2022 was a turning point. It dramatically lowered the technical barrier to AI misuse, and within months deepfakes, voice cloning, and AI-generated disinformation became tools readily available to bad actors. Financial fraud cases involving AI voice impersonation began appearing in state police records across India.
COVID-19 lockdowns had simultaneously pushed India online at unprecedented speed — creating large populations that were connected but not protected. First-time internet users in rural districts encountered AI-enabled scams before developing any digital safety literacy.
We drew inspiration from the Global AI Incident Database (Partnership on AI), which has been tracking AI incidents internationally since 2021. However, we found that it systematically under-represents non-Western contexts: incidents from India rarely appear, and when they do, they lack the state-level specificity needed for governance analysis. This database aims to fill that gap — providing granular, India-specific documentation that can inform both research and policy.
Scope of Study
The database covers all 28 states and Union Territories of India for which publicly documented AI misuse incidents could be identified. The time range is 2021 through 2026 — capturing both the pre-GenAI period, the rapid escalation that followed the generative AI wave, and the governance responses through the AI Impact Summit.
Our focus is on publicly reported incidents, primarily those covered by news media. We do not claim to capture the full universe of AI-related harm in India — only what surfaces in the public record.
This database captures only what gets reported. Incidents that go unreported due to shame, lack of awareness, or absence of legal frameworks are not counted. Our numbers are a floor, not a ceiling.
How We Collected Data
Incidents were identified through systematic keyword searches across Indian and international news archives, court records, parliamentary questions, and civil society reports. The primary search terms used:
For each incident recorded, we captured the following fields:
What Qualified as an Incident
Not every AI-adjacent news story was included. Three criteria had to be satisfied for an incident to enter the database:
Source Triangulation Strategy
Indian media operates within a pronounced political landscape, where outlets on different ends of the spectrum may report — or ignore — the same incident depending on its political valence. To counter this bias, we deliberately sourced across the political spectrum.
Our working assumption: if a genuine AI-harm incident occurred, it is likely to get reported across media of different political alignments, because AI misuse against ordinary citizens does not map cleanly onto ideological divides. An elderly person defrauded by a voice-cloning scam is neither a left-wing nor a right-wing story.
- OpIndia
- Republic TV
- Zee News
- The Wire
- Scroll.in
- NewsClick
- The Hindu
- NDTV
- Times of India
Ethical Principles in Data Handling
Documenting harm requires care. Five principles guided our approach throughout the project. Victims' personal details have been anonymised where identification was not essential to understanding the incident, and explicit content has not been reproduced in any form.
What This Database Cannot Tell You
Honest research requires clear-eyed acknowledgement of its limits. Four structural limitations bear directly on how this database should be read and cited:
Where This Research Goes Next
Several directions would meaningfully extend the value of this dataset:
- Incorporate regional-language sources through partnerships with language-specific fact-checkers and regional journalism networks
- Build a self-reported incident pathway — a form that allows victims to submit incidents directly, with appropriate safeguards
- Partner with civil society organisations (iCall, iSafe, Cyber Peace Foundation) to capture cases that reach counsellors before they reach police
- Quantitative analysis: response rates by state, FIR-to-incident gap analysis, regulatory timeline correlations between incidents and policy responses