The Ethics Gap of AI-Assisted Data Workflows in Humanitarian Sectors

The integration of artificial intelligence and machine learning into international development and humanitarian work is outpacing the evolution of the ethical frameworks and legal safeguards that are supposed to govern it. AI-driven digital transformation promises real gains in operational efficiency and programme effectiveness, but it simultaneously opens up a set of questions that existing governance structures were not designed to answer. These questions sit at the intersection of data protection law, humanitarian principles, power asymmetry, and the practical realities of working in contexts where the people whose data is being processed are among the most vulnerable in the world.

This deep dive explores that gap to surface the questions that organisations in the sector need to be sitting with. The gap is characterised by a divergence between the traditional humanitarian principles of humanity, neutrality, impartiality, and independence, and the opaque, data-intensive mechanisms of modern AI systems. The governance of personal data in the sector remains anchored in a pre-algorithmic era. The transition from human-centric analysis to AI-assisted processing (often by third-party cloud services) introduces risks to privacy, autonomy, and justice that the sector's existing frameworks were never designed to address.

The Anonymisation Threshold: The Question Everything Else Depends On

Before we can ask whether consent forms need updating, whether purpose limitation applies, or whether cross-border transfers are lawful, we need to answer a prior question: is the data that reaches the AI tool still personal data?

Most organisations in the sector are not uploading raw, fully identified interview transcripts to AI tools. Good practice and common sense dictates that names, ID numbers, and other direct identifiers are removed before data is submitted to any external service. Many organisations go further, generalising locations, rounding ages, and removing other details that could point to a specific individual.

If this de-identification is robust enough to constitute true anonymisation, then a great deal of the legal and ethical complexity falls away. Truly anonymised data is not personal data under the GDPR. Recital 26 is explicit: the principles of data protection "should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable." If the data is genuinely anonymous, it could be processed on servers on the moon and there would be no consent issue, no purpose limitation issue, and no cross-border transfer issue — because the regulation simply does not apply.

Some organisations have developed rigorous approaches to meeting this bar. The OCHA Centre for Humanitarian Data, for example, employs Statistical Disclosure Control (SDC) as a technical framework to evaluate and mitigate re-identification risks in survey and needs assessment data. This process is applied to all survey datasets hosted on the Humanitarian Data Exchange (HDX) platform, assessing how the combination of specific variables (age, gender, location) can create re-identification probabilities ranging from 1% to 100%. By applying SDC, organisations can reduce these vulnerabilities to an "acceptable level" of risk before sharing sensitive information. However, successful execution of SDC typically requires specialised expertise and secure technical infrastructure . Resources that many organisations in the sector do not have. The existence of rigorous approaches like SDC underscores both that the problem is taken seriously and that meeting the threshold is genuinely difficult.

The question is whether what most organisations are doing actually meets that bar.

The gap between de-identification and true anonymisation

The GDPR draws a sharp distinction between anonymisation and pseudonymisation, and conflating the two is what the Article 29 Working Party (predecessor to the European Data Protection Board) called a "specific pitfall" (WP29, Opinion 05/2014 on Anonymisation Techniques).

Anonymisation renders data permanently unidentifiable. No one, not the organisation that collected it, not a third party with additional datasets, not a motivated attacker can link it back to an individual. When this standard is met, the GDPR does not apply.

Pseudonymisation removes direct identifiers (names, ID numbers) but leaves the data attributable to a specific person if additional information is available (Article 4(5), GDPR). Pseudonymised data is still personal data. The GDPR still applies in full.

The bar for true anonymisation is extremely high. The IAPP has observed that organisations routinely fall short of it. The European Data Protection Supervisor and the Spanish DPA have jointly acknowledged that significant confusion persists about what anonymisation actually requires in practice. And the Article 29 Working Party's 2014 opinion established that pseudonymisation "allows for identifiability" and "therefore stays inside the scope of the legal regime of data protection."

Why development data is particularly hard to anonymise: the small-n problem

The challenge is not just technical but is also contextual. A well-known study demonstrated that 87% of the U.S. population can be uniquely identified from just three data points: ZIP code, gender, and date of birth (Sweeney, 2000). That finding was for a population of over 300 million. In a development programme serving a bounded community, the re-identification risk is dramatically higher.

Consider a maternal health programme operating in a specific district of Jordan, serving Syrian refugee women. An interview transcript has been "anonymised" i.e. the participant's name has been removed, her camp has been generalised to "northern Jordan." But the transcript still contains:

Her approximate age
The number of children she has
The village she fled from in Syria
The health facility she attends
Her husband's occupation
The specific complication she experienced during her last pregnancy

In a programme serving 500 families, this combination of details — none of which is a direct identifier on its own — may be sufficient to identify her. This is the "small-n problem": the smaller and more specific the community, the more likely it is that contextual details function as quasi-identifiers, even after direct identifiers have been removed.

The Journal of International Humanitarian Action has named this directly: "it is often difficult to determine whether data has been sufficiently anonymised, since individuals may be re-identified, by combining other datasets or contextual understanding, especially when the group of data subjects is not sufficiently large" (Kalafatidis et al., "Data to the Rescue," 2020).

Development programmes, by their nature, work with specific, bounded communities. The populations are defined, the contexts are known, and the details that make qualitative data analytically valuable (the richness, the specificity, the texture ) are precisely the details that make true anonymisation difficult. There is an inherent tension between data that is useful and data that is truly anonymous.

What does this mean in practice?

The implication is uncomfortable but important: much of what the sector describes as "anonymised" data may, under the GDPR's framework, be more accurately described as pseudonymised data. And pseudonymised data is still personal data.

This does not mean that de-identification is pointless, far from it. Removing names and direct identifiers substantially reduces risk, and pseudonymisation is recognised by the GDPR as a valuable safeguard (Articles 25 and 89). But it does mean that de-identification alone may not be sufficient to place the data outside the scope of data protection law.

And if the data that reaches the AI tool is still personal data, then every subsequent question about consent, about purpose limitation, about cross-border transfers, about the rights of data subjects then comes back into play.

This is the threshold that determines everything else.

If the data submitted to an AI tool is genuinely anonymised, consent for AI processing is not required . The data is not personal data, and the GDPR does not apply. The ethical questions remain (and we will return to them), but the legal consent framework is not engaged.

But if the de-identified data falls short of true anonymisation i.e. if it is pseudonymised rather than anonymised, still personal data in the eyes of the regulation , then the consent question becomes pressing. And it is here that the sector faces a significant gap.

Most data collection consent forms in the development sector were drafted before the widespread availability of generative AI. They typically authorise the collection and use of personal data for specific programme purposes — monitoring, evaluation, reporting, programme improvement. They contemplate human analysis: a member of staff, or a partner organisation, reading and interpreting the data.

They most likely do not contemplate and do not explicitly permit the submission of that data to a third-party commercial AI platform for AI-assisted processing.

Under the GDPR, consent must be "freely given, specific, informed, and unambiguous" (Article 4(11)). The UK Information Commissioner's Office (ICO) has noted that for consent to be valid in the context of AI, individuals must have a genuine choice and must understand how their data is being used (ICO, "How Do We Ensure Lawfulness in AI?"). In France, the Commission nationale de l'informatique et des libertés (CNIL) has reinforced that data gathered for one purpose should not be repurposed without additional consent or a compatible legal basis (CNIL, "AI System Development: Recommendations to Comply with GDPR," January 2026).

A consent form drafted in 2019 that says "your data will be used for programme monitoring and improvement" was not written with AI processing in mind. It could not have been. The question is whether the consent it secured for human analysis, within the organisation and for a defined purpose, extends to a categorically different kind of processing.

Three ways the processing has changed

The "who" has changed. Traditional consent forms focus on the organisation and its partners as the entities that will access the data. When data is submitted to a third-party AI platform (Anthropic, OpenAI, Google) a new actor enters the picture, one that the data subject was never told about and has no relationship with. Under GDPR Article 28, would this typically require a Data Processing Agreement?

The "what" has changed. A human analyst reads, interprets, and summarises. An AI system tokenises text, embeds it in a vector space, and processes it through billions of parameters. Whether or not the output is similar, the question is whether consent for human analysis extends to a categorically different mechanism.

The "where" has changed. Data that was collected in Jordan, or the Sahel, or Cox's Bazar may now be transmitted to servers in the United States or Europe for AI processing. If the data is still personal data (because anonymisation was insufficient), then cross-border transfer provisions under GDPR Article 45 would apply. Provisions that the original consent form almost certainly did not address. These transfers are strictly governed: GDPR Article 45 requires that the recipient jurisdiction provide an "adequate" level of data protection, or that robust safeguards like Standard Contractual Clauses (SCCs) are in place. And because the use of AI on data from highly vulnerable populations often constitutes "extensive processing" of sensitive information using new technologies, organisations may be required under GDPR Article 35 to conduct a formal Data Protection Impact Assessment (DPIA) — a mandatory risk-management tool designed to evaluate the necessity and proportionality of the processing while ensuring that specific safeguards are implemented to protect the dignity and safety of data subjects.

The power imbalance inherent in humanitarian contexts has always complicated consent. This is not a new problem but AI processing sharpens it in specific ways.

When individuals are fleeing conflict or disaster, the provision of personal data is frequently perceived as a prerequisite for receiving life-saving assistance. The "choice" to consent is shaped by the understanding, sometimes explicit, sometimes implied that services depend on it.

The most documented illustration of this dynamic is the UNHCR's biometric registration of Rohingya refugees in Bangladesh. Human Rights Watch reported in 2021 that refugees were told to register to receive aid, but the risks of data sharing were not meaningfully discussed. One refugee told HRW: "I could not say no because I needed the Smart Card and I did not think that I could say no to the data-sharing question and still get the card" (Human Rights Watch, "UN Shared Rohingya Data Without Informed Consent," June 2021). The Rohingya case involved biometrics rather than AI, but the structural dynamic (consent given under conditions that make genuine refusal functionally impossible) is the same dynamic that shapes consent for data collection in most humanitarian programmes.

If the original consent was compromised by power asymmetry, extending that data's journey to AI processing does not resolve the problem. It compounds it, by adding a layer of processing that the data subject did not anticipate and cannot meaningfully evaluate.

In addition, under GDPR, re-using data for new processing activities requires a "compatibility test" to assess whether the new purpose aligns with the original "reasonable expectations" of the individual. Consider a survivor of gender-based violence (GBV) sharing a narrative in order to access care. Processing that data through a third-party LLM to "identify trends" is a significant departure from those original expectations even if the intent is programmatic improvement.

The explanation gap

There is also a question about whether meaningful informed consent for AI-assisted processing is achievable for anyone, in any context.

It is tempting to frame this as a problem specific to data subjects in low-resource settings i.e. a question of literacy or technical familiarity. But the gap is not in the audience's capacity to understand. It is in the inherent difficulty of explaining a process that is, in important respects, opaque even to the people who build it sometimes.

This matters because "informed" consent requires that the person can form a reasonable understanding of what they are agreeing to. If the processing itself is too complex to explain meaningfully to anyone then the "informed" element of consent is structurally compromised, regardless of how carefully the consent form is drafted.

The question this raises is not easily resolved: if truly informed consent for AI-assisted processing is difficult to achieve even in ideal conditions, what does it mean to seek it in contexts where power imbalances, language barriers, and urgency are already straining the foundations of consent?

The question of historical data

Many organisations in the sector are sitting on substantial repositories of data collected under pre-AI consent forms. They may have years of interview transcripts, survey responses, case notes, and programme records. As AI tools become available, the temptation to apply them retrospectively to this historical data is understandable: the analytical potential is significant, and the data has already been collected.

But the question of whether historical data can be lawfully and ethically processed through AI tools is not straightforward. The IASC Operational Guidance on Data Responsibility in Humanitarian Action (2nd ed., April 2023) emphasises that data management must be "safe, ethical, and effective" across all phases of the data lifecycle , including secondary use. The ICRC/Brussels Privacy Hub Handbook on Data Protection in Humanitarian Action (3rd ed., 2024) provides specific guidance on purpose limitation and secondary processing, including the use of DPIAs as a tool for evaluating whether new processing activities are compatible with the original basis for collection.

In practice, this means that before historical data is submitted to an AI tool, organisations would need to consider several questions. Was the original consent broad enough to encompass this type of processing? If not, is there another lawful basis such as legitimate interest that could apply? Has the data been sufficiently anonymised to fall outside the scope of data protection law? And is there a risk of what data protection practitioners call "function creep", the gradual expansion of data use beyond its original purpose, where data initially collected for emergency relief or programme monitoring is incrementally repurposed for algorithmic analysis, trend modelling, or donor reporting in ways that no one anticipated at the point of collection?

These are not hypothetical concerns. The sector's historical data repositories represent both a significant analytical resource and a significant ethical liability. The difference between the two depends on whether organisations are willing to examine the basis on which that data was collected before putting it to new uses.

How the landscape has shifted

The following comparison illustrates how the assumptions embedded in traditional consent models diverge from the realities of AI-integrated processing . This is particularly relevant in cases where the data has not been fully anonymised.

Dimension	Pre-AI Consent Models	AI-Integrated Processing
Primary processor	Humanitarian staff and local partners	Third-party cloud service providers and their systems
Processing method	Manual, supported by non-AI tools	Automated pattern recognition, LLM ingestion, predictive modelling
Jurisdictional scope	Typically local or regional	Often global — cross-border transfers to US or EU servers
Data retention	Linked to project cycle duration	Varies by provider: may include retention for safety monitoring; some providers do not use API inputs for model training by default, others may
Risk focus	Physical privacy and local confidentiality	Algorithmic inference, re-identification via quasi-identifiers, jurisdictional exposure
Transparency to data subject	Participant can broadly understand who sees their data	Participant cannot meaningfully evaluate what happens during AI processing
Accountability/Redress	Direct human accountability; participants can challenge a caseworker	Often "black-box" opacity; logic is often proprietary, making it difficult for a participant to exercise their "right to an explanation" or to challenge an automated decision

What this comparison makes visible is that the shift from pre-AI to AI-integrated processing is not incremental. It is a change in kind, not degree. The consent frameworks that worked (imperfectly) for the pre-AI model were not designed for AI-integrated processing. And in cases where the data is still personal data (where anonymisation has not fully succeeded) the same consent forms are being relied upon for a fundamentally different type of processing.

The consent gap is real but it is also conditional. It exists specifically in the space where anonymisation falls short of the GDPR's high bar and where the data that reaches the AI tool remains, in the eyes of the law, personal data. The question for organisations is not just "are we anonymising?" but "is our anonymisation sufficient?". If not, whether the consent they hold covers what they are actually doing.

The Anonymisation Threshold: The Question Everything Else Depends On

The gap between de-identification and true anonymisation

Why development data is particularly hard to anonymise: the small-n problem

What does this mean in practice?

The Consent Gap: When Anonymisation Falls Short

What existing consent covers

Three ways the processing has changed

The question of whether consent can ever be truly "free" in a humanitarian setting

The explanation gap

The question of historical data

How the landscape has shifted

Related Articles

AI Policy and AI Governance Framework For Non-Profit

Six Ethical Risks That Must Be Assessed When Using AI

The Agent Engineering Standard: A 13-Category Specification for AI Systems