Research Methodology

How We Built This Database

A transparent account of our data collection, classification, and ethical approach — and what the database does not capture.

01 / Definitions & Scope

What We Mean — and What We Don't

Our definitions are drawn from the AI Incident Database (AIID) Editor's Guide — the international standard for AI harm documentation. We adopt them here to ensure our database is comparable with global repositories and academically replicable.

AI Incident
An alleged harm or near-harm event to people, property, or the environment where an AI system is implicated. We make no distinction between accidental and deliberate harm — what matters is that harm occurred, not whether it was intended.
AI Issue
An alleged harm by an AI system that has yet to occur or be detected — a risk rather than a realised event. The database includes both incidents (harm confirmed) and issues (harm anticipated or narrowly avoided).
Artificial Intelligence (AI)
The capability of machines to perform functions typically thought of as requiring human intelligence: reasoning, recognising patterns, or understanding natural language. Includes but is not limited to machine learning. An algorithm not traditionally considered AI may qualify when a human transfers decision-making authority to the system — e.g. a hospital selecting vaccine candidates via a black-box rule set.
AI System
Technologies and processes in which AI plays a meaningful role. May include non-AI components (mechanical, manual, or rule-based). Examples: a deepfake generator; facial-recognition software; a credit-scoring algorithm; a voice-cloning service; an AI-powered CCTV camera.
Implicated
A system is implicated in an incident if it played an important role in the chain of events leading to harm — at minimum a "but-for" cause: if the AI system hadn't acted as it did, the specific harm would not have occurred. The AI system does not need to be the sole or primary factor; it includes cases where AI had the potential to prevent harm but did not.
Real-World Harm
Includes but is not limited to:
  • Physical health or safety harm
  • Psychological harm
  • Financial harm
  • Damage to physical or intangible property (IP theft, reputational damage)
  • Harm to social or political systems (election interference, erosion of trust in authorities)
  • Civil liberties violations (unjustified punishment, censorship, unlawful surveillance)
Harms do not need to be severe — minor, easily remedied expense or inconvenience still qualifies.
Scope Limitation

This is a repository of AI-related incidents sourced from public media reporting. It documents only what surfaces in the public record — primarily events covered as potential crime or illegal activity by news organisations. It is not designed to capture:

  • AI harms that go undetected by state institutions
  • Systemic harms made invisible by inadequate legal frameworks
  • Incidents reported only in regional or non-English language media

All incident counts should be read as a lower bound on actual AI-related harm in India.

02 / Background

Project Background & Motivation

This project was undertaken as a B.Tech Research Project (BTP) at IIIT Delhi, beginning in December 2024. The incidents in this database are dated from 2021 onwards — that is the time range we researched and documented — but the project itself was built and published in 2024–2025. It arose out of a recognition that no centralised documentation of AI-related harms in India existed. While global conversations about AI risk were accelerating, the lived experience of Indian citizens — facing deepfakes, voice-cloning fraud, and algorithmic bias — was going largely unrecorded in any systematic way.

The launch of ChatGPT in November 2022 was a turning point. It dramatically lowered the technical barrier to AI misuse, and within months deepfakes, voice cloning, and AI-generated disinformation became tools readily available to bad actors. Financial fraud cases involving AI voice impersonation began appearing in state police records across India.

COVID-19 lockdowns had simultaneously pushed India online at unprecedented speed — creating large populations that were connected but not protected. First-time internet users in rural districts encountered AI-enabled scams before developing any digital safety literacy.

We drew inspiration from the Global AI Incident Database (Partnership on AI), which has been tracking AI incidents internationally since 2021. However, we found that it systematically under-represents non-Western contexts: incidents from India rarely appear, and when they do, they lack the state-level specificity needed for governance analysis. This database aims to fill that gap — providing granular, India-specific documentation that can inform both research and policy.

03 / Scope

Scope of Study

The database covers all 28 states and Union Territories of India for which publicly documented AI misuse incidents could be identified. The time range is 2021 through 2026 — capturing both the pre-GenAI period, the rapid escalation that followed the generative AI wave, and the governance responses through the AI Impact Summit.

Our focus is on publicly reported incidents, primarily those covered by news media. We do not claim to capture the full universe of AI-related harm in India — only what surfaces in the public record.

This database captures only what gets reported. Incidents that go unreported due to shame, lack of awareness, or absence of legal frameworks are not counted. Our numbers are a floor, not a ceiling.

04 / Data Collection

How We Collected Data

Incidents were identified through systematic keyword searches across Indian and international news archives, court records, parliamentary questions, and civil society reports. The primary search terms used:

AI-assisted financial fraud Obscene deepfake / morphed image Surveillance technologies and biometric risks Algorithmic bias and unequal treatment Psychological harm from AI Digital arrest / impersonation scam Voice cloning fraud AI-generated political content / electoral deepfake

For each incident recorded, we captured the following fields:

State and date of incident
Plain-language description
AI technology involved
Type of harm caused
Who was affected
Financial impact (where reported)
Authority response
Source link to original reporting
05 / Filtering Criteria

What Qualified as an Incident

Not every AI-adjacent news story was included. Three criteria had to be satisfied for an incident to enter the database:

01
Direct AI Involvement
The incident had to involve AI in a direct and meaningful way — not tangentially. A scammer who happened to use a smartphone did not qualify; a scammer who used voice-cloning software to impersonate a relative did.
02
Documented Harm
Actual financial loss, reputational damage, psychological impact, or credible risk to an identifiable person or group had to be present. Speculative future risks without a concrete incident were excluded.
03
Credible Source
Anonymous blogs, unverifiable social media claims, and clearly partisan sites with no corroboration from independent outlets were excluded. Where possible, incidents were verified across two or more independent sources.
06 / Source Strategy

Source Triangulation Strategy

Indian media operates within a pronounced political landscape, where outlets on different ends of the spectrum may report — or ignore — the same incident depending on its political valence. To counter this bias, we deliberately sourced across the political spectrum.

Our working assumption: if a genuine AI-harm incident occurred, it is likely to get reported across media of different political alignments, because AI misuse against ordinary citizens does not map cleanly onto ideological divides. An elderly person defrauded by a voice-cloning scam is neither a left-wing nor a right-wing story.

Right-leaning
  • OpIndia
  • Republic TV
  • Zee News
Left-leaning
  • The Wire
  • Scroll.in
  • NewsClick
Centrist / Wire-of-record
  • The Hindu
  • NDTV
  • Times of India
07 / Ethics

Ethical Principles in Data Handling

Documenting harm requires care. Five principles guided our approach throughout the project. Victims' personal details have been anonymised where identification was not essential to understanding the incident, and explicit content has not been reproduced in any form.

Accountability
Incidents are documented with enough specificity to hold actors — individuals, platforms, or institutions — accountable for the harm caused, without embellishment.
Fairness
We have not documented incidents selectively by political affiliation of the accused or the victim. AI harm affects everyone; our database should reflect that.
Respect for Consent and Autonomy
Victims documented in this database did not consent to the original harm. We minimise secondary exposure by omitting identifying personal details wherever the incident can be understood without them.
Transparency About Decision-Making
This methodology page exists precisely to make our inclusion criteria, source strategy, and limitations visible. Readers should be able to interrogate our choices.
Do No Secondary Harm
Careless documentation can re-traumatise victims or provide a how-to guide for bad actors. We have been deliberate about the level of technical detail included in incident descriptions.
08 / Limitations

What This Database Cannot Tell You

Honest research requires clear-eyed acknowledgement of its limits. Four structural limitations bear directly on how this database should be read and cited:

01 Built almost entirely on English-language sources. Incidents reported only in Hindi, Tamil, Telugu, Bengali, Marathi, or other regional-language outlets are systematically absent — which likely means underrepresentation of incidents in non-metro areas where regional press dominates.
02 Only publicly reported cases are captured. The vast majority of AI-enabled harm never reaches a reporter, a police station, or a civil court. Every figure in this database should be understood as a lower bound on actual incidence.
03 Does not capture slow, diffuse harms: the quiet algorithmic bias baked into employment screening tools, credit-scoring systems, or medical diagnostics used at scale across India. These harms are real but do not manifest as discrete incidents that make the news.
04 The incident-based approach cannot measure the gap between documented incidents and registered FIRs, or between FIRs and convictions. How the state responds to AI harm once it is reported is a separate — and equally important — research question that this database cannot answer alone.
09 / Future Work

Where This Research Goes Next

Several directions would meaningfully extend the value of this dataset:

Research Team

Team & Affiliation

Anshul Jain
2023102
Dhruv Kantroo
2022167
Siddhant Gautam
2021100
BTP Advisor: Prof. Dr. Manohar Kumar — Department of Social Sciences and Humanities, IIIT Delhi