Ethical Review — The Six Risks That Must Be Assessed When Using AI

Why Ethics Is Not Optional When Using AI

Ethical considerations are extremely important and are particularly fundamental when working with under-represented, marginalised stakeholder groups (both traditionally and systematically marginalised). We will refer to these considerations under data classifications and ethical reviews. Data classification tells you whether you can use AI on a piece of data. Ethical review tells you whether you should.

These risks may include a task passing every data classification check: the data is PUBLIC, the tool is approved by your organisation, the output is reviewed and still cause harm. An AI-generated summary of stakeholder feedback that strips away dissenting voices causes harm. An AI translation that flattens cultural nuance in a mediation brief causes harm. An AI system that decides which beneficiaries receive priority support causes harm. None of these involve a data breach. All of them involve an ethical failure.

The core distinction: Data classification protects information. Ethical review protects people. You need both.

This article covers six ethical risks that arise whenever AI is used in the workplace. Before using AI in your workflows, you should be able to identify which of these risks apply and explain what is in place to mitigate these risks.

1. Consent

The risk

When you use AI to process information that someone provided to your organisation, did that person know their information would be processed by an AI system? Did they have a genuine choice to say no?

Consent can be structurally compromised. Employees may believe that refusing consent means losing access to some opportunities - which is unfortunately sometimes the case. For example, communities in crisis rarely have the time, information, or power to make fully informed decisions about how their data is used. Language barriers, literacy constraints, and unfamiliar technology concepts make "informed" consent a difficult standard to meet honestly.

When AI enters this picture, the consent problem compounds. A stakeholder consented to participate in a needs assessment survey. They did not consent to their responses being processed through a commercial AI system operated by a company in another country. They did not consent to their words being used as input to a model that may retain, log, or learn from their data. They almost certainly were not told any of this would happen, because at the time of data collection, it probably was not planned.

What to ask yourself

Did the individuals whose information I am about to process through AI know that AI would be involved? If not, is there a way to inform them, or is retrospective consent impractical?
Could any individual reasonably feel that refusing AI processing would affect the support they receive from my organisation? If yes, their consent is not freely given.
Am I processing data for the same purpose it was originally collected for, or has the purpose shifted? Consent given for "monitoring programme outcomes" does not automatically extend to "training an AI model" or "generating donor reports through a third-party AI provider."
If I cannot obtain meaningful consent, am I confident that the use of AI in this context serves the interests of the individuals whose data is involved, not just my organisation’s operational convenience?

The standard to aim for

Consent for AI processing should be specific (to the AI tool and purpose), informed (in accessible language, explaining what happens to the data), and genuinely voluntary (refusal does not affect access to services). Where this standard cannot be met — and unfortunately in some contexts, it cannot — the ethical burden shifts to the organisation to demonstrate that AI use serves the interests of the affected population and that safeguards are in place to prevent harm.

2. Misrepresentation

The risk

AI generates text that sounds authoritative, fluent, and confident. It does not know whether what it generates is true. When AI is used to summarise customer voices, translate stakeholder positions, or draft programme narratives, there is a constant risk that the AI output misrepresents what people actually said, meant, or experienced.

This is not hallucination in the technical sense, it is something more subtle. The AI may produce a grammatically perfect summary that is factually accurate but contextually misleading. It may emphasise themes that appear frequently in its training data while downplaying perspectives that are locally significant but globally uncommon. It may impose analytical frameworks from Western development practice onto contexts where they don’t apply. It may smooth over contradictions, dissent, and complexity in community feedback because coherent summaries are what it was trained to produce.

Misrepresentation is not just an accuracy problem. It is a power problem. When an AI summary of employee consultations is presented to a board or a decision-making body, it carries the authority of the organisation that produced it. If that summary distorts what people actually said (even subtly, even unintentionally) it substitutes an algorithmic interpretation for authentic human voice. The people who participated in the consultation have no way to know their input was filtered through AI, no way to review how it was represented, and no mechanism to correct errors.

What to ask yourself

If the people whose views this AI output claims to represent could read it, would they recognise their own perspectives? Would they agree this is what they said and meant?
Has the AI simplified, flattened, or homogenised views that were actually diverse, contested, or contradictory?
Am I presenting AI-generated text as though it came directly from community members, partners, or stakeholders? If so, is that honest?
Who reviewed this output for contextual accuracy? Someone who was present for the original data collection, or someone reading the AI summary in isolation?

The standard to aim for

Any AI-generated content that represents the views, experiences, or voices of programme participants, communities, or partners must be reviewed by someone with direct knowledge of the original source material. AI summaries of qualitative data should be treated as first drafts requiring substantive human revision, not as finished analytical products. When AI-generated content is used in external communications, the organisation should be transparent about the role AI played in producing it.

3. Differential Impact

The risk

AI systems do not affect everyone equally. They perform differently across languages, dialects, cultural contexts, genders, age groups, and levels of digital literacy. When your organisation adopts AI tools, the benefits and risks are distributed unevenly across the people you serve and the people you employ.

Language bias is the most immediate example. Most commercial AI models were trained predominantly on English-language text from high-income countries. Their performance degrades for dialects, indigenous languages, local vernaculars, and code-switched communication patterns that are common in multilingual development contexts. An AI tool that produces excellent English summaries may produce misleading Arabic translations for instance. A beneficiary feedback system powered by AI may work well for literate, digitally connected urban populations and fail entirely for rural communities communicating through oral traditions.

Gender bias is equally significant. AI models trained on historical data encode historical patterns of exclusion. A model that has learned from decades of published literature in which women’s contributions are systematically underrepresented, may reproduce that underrepresentation in its outputs. An AI tool used to identify "key stakeholders" may systematically underweight women’s organisations, informal women’s networks, and female leaders because they appear less frequently in the data the model was trained on.

Within your own organisation, differential impact also applies to staff. AI tools that require strong English literacy, digital confidence, and familiarity with prompt engineering favour staff with those skills, often younger, urban, internationally educated team members. Staff who speak local languages, who carry decades of contextual knowledge, may find themselves marginalised by tools that don’t accommodate how they work.

What to ask yourself

Who benefits most from this AI tool, and who benefits least? Are the people who benefit least also the people with the least power in this context?
Does this AI tool perform equally well across the languages, dialects, and communication styles used by the communities I serve? Have I tested this, or am I assuming?
Could AI-generated analysis or recommendations systematically disadvantage specific groups (women, ethnic minorities, displaced populations, people with disabilities) because of gaps or biases in the model’s training data?
Within my own team, does AI adoption create a new hierarchy between those who can use the tools effectively and those who cannot? What am I doing to prevent that?

The standard to aim for

AI tools should be evaluated for differential performance across the specific populations, languages, and contexts relevant to your programme before they are relied upon for programme decisions. When AI outputs inform resource allocation, beneficiary selection, or programme design, the organisation must verify that the outputs do not systematically disadvantage already-marginalised groups. AI adoption within the organisation should include equitable access to training and tools for all staff, not just those who are already digitally confident.

4. Automation of Power

The risk

Every AI system makes choices. It decides what to include in a summary and what to leave out. It decides how to translate a phrase and which meaning to prioritise when multiple valid translations exist. It decides which patterns in your data are "significant" and which are noise. These choices are invisible because they happen inside the model, but they are choices nonetheless, and they carry power.

When your organisation uses AI to process community feedback, the AI decides which themes to surface. When your organisation uses AI to prioritise needs assessment data, the AI decides which needs rank highest. When your organisation uses AI to draft programme recommendations, the AI decides what the data "says." In each case, a decision that previously required human expertise, contextual judgment, and accountability has been delegated to a system that has none of these things.

This is the automation of power: the transfer of interpretive and decision-making authority from people who can be questioned, challenged, and held accountable, to systems that cannot. The risk is not that the AI makes bad decisions (though it may). The risk is that nobody notices a decision was made at all, because the AI presents its output as though it were neutral, objective, and inevitable.

In some sectors, this risk has a specific dimension: the power dynamics are already imbalanced. Decisions made in headquarters affect communities in the field. Decisions made by international staff affect local partners. Decisions made by donors affect implementing organisations. When AI is inserted into these chains, it can amplify existing power imbalances, concentrating decision-making authority in the hands of those who control the AI tools while further distancing affected communities from the decisions that shape their lives.

What to ask yourself

What decision is this AI output informing? Who used to make this decision, and what expertise and contextual knowledge did they bring? Is the AI an adequate substitute?
If I adopt this AI output as my recommendation or conclusion, can I explain why the AI reached this output? If I cannot, I am delegating a decision I don’t understand to a system that can’t explain itself.
Does this AI use reduce the role of local staff, local partners, or affected individuals in decisions that affect them? If so, have I justified that reduction?
If this AI output turned out to be wrong or biased, who would be accountable? If the answer is "nobody, because the AI did it," I have an accountability gap.

The standard to aim for

AI should inform human decisions, not replace them. The person who signs off on a programme recommendation, a resource allocation, or a new job position must be able to explain the reasoning behind that decision in their own words, not simply point to an AI output. When AI is used to analyse data, prioritise needs, or generate recommendations, the human decision-maker must review the output critically, apply contextual judgment, and take personal responsibility for the final decision. The phrase "the AI recommended this" is never a sufficient justification.

5. Do No Harm

The risk

The Do No Harm principle requires that your interventions do not create new risks for the stakeholder groups you serve. This principle applies to AI use just as it applies to every other aspect of your work.

AI can cause harm in ways that are not immediately visible. A chatbot deployed to provide information to people may give incorrect guidance about what procedures to follow, and the person who follows that guidance may lose opportunities or worst a legal claim. For instance, an AI system that flags "high-risk" households for targeted support may also flag those same households to actors who would exploit that information. An AI translation that subtly distorts a mediation position may derail a negotiation that took months to build.

The Do No Harm assessment for AI requires you to think beyond the intended use of the tool and consider what happens when the tool fails. AI systems do not fail gracefully. They may not say "I don’t know", they may generate a confident, plausible response that could be entirely wrong. They do not flag edge cases, they treat unusual situations with the same confidence as routine ones.

What to ask yourself — the three Do No Harm questions

Question 1: Could this AI use, if the system produces an incorrect output, cause direct or indirect harm to any individual or stakeholder group my organisation serves?

Consider: an AI-generated needs assessment summary that underestimates job insecurity in a specific regional office. A translated protection referral that contains a critical error. An AI-drafted report that overstates programme results, leading to inappropriate programme adjustments. In each case, the harm flows not from the AI itself but from the decisions made based on its output.

Question 2: Could this AI use, if the data is intercepted, leaked, or compelled by a government authority, expose any individual to persecution, detention, violence, or discrimination?

Consider: data transiting through AI infrastructure in a jurisdiction where government agencies can compel disclosure. Conversation logs retained by AI providers that contain sensitive information. AI-generated outputs stored on unsecured devices that could be seized at a checkpoint.

Question 3: Could this AI use, even if technically successful, undermine the agency, dignity, or authentic voice of the stakeholder groups my organisation represents?

Consider: an AI system that summarises employee consultations more efficiently than participatory analysis, but in doing so, removes employees from the process of interpreting their own experience. A chatbot that replaces human interaction for individuals who need to be heard, not processed.

The standard to aim for

If the answer to any of the three questions is "yes" or "possibly," the AI use should not proceed without documented risk mitigation measures approved by senior management. The Do No Harm assessment is not a one-time checklist. It should be revisited whenever the context changes, the tool is updated, or the use case evolves.

6. Quality Assurance

The risk

AI output quality is inconsistent. The same tool can produce an excellent summary of one document and a dangerously misleading summary of the next. It can translate a paragraph flawlessly and hallucinate a statistic in the following sentence. There is no reliable way to predict when AI will fail, because the failure mode — generating confident, plausible, wrong output — looks identical to success.

In some sectors, the consequences of quality failure are more severe than others. A hallucinated statistic in a corporate marketing email is embarrassing. A hallucinated statistic in a donor report triggers audit findings, damages organisational credibility, and may result in the return of funds. A hallucinated statistic in an advocacy brief submitted to a UN body undermines the credibility of the communities the brief was meant to represent.

Quality assurance for AI is different from quality assurance for human-produced work. When a colleague writes a report, you can ask them where a number came from. When AI writes a report, the number may come from nowhere, it was generated because it sounded plausible in context. You cannot ask the AI to show its source, because it may not have one. You cannot ask the AI whether it is confident, because it is always confident.

What to ask yourself

Have I verified every factual claim, statistic, citation, and proper name in this AI output against a primary source? If I have not, I am trusting the AI to be accurate? The AI does not know whether it is accurate.
Am I relying on AI output for a document that will be submitted to a donor, a government, an international body, or a public audience? If so, the verification standard must be higher, not lower, than for internal documents.
Does this AI output contain analysis or interpretation that I am treating as my own? If I present an AI-generated insight in a programme report, I am accountable for that insight i.e its accuracy, its appropriateness, and its implications.
Have I checked whether the AI has fabricated sources? AI systems routinely generate citations to documents, reports, and academic papers that do not exist. If the output includes references, verify that they are real.

The standard to aim for

Every AI output used in programme work must be reviewed by a qualified human before it is acted upon, shared externally, or incorporated into official documents. The reviewer must have sufficient subject-matter knowledge to identify errors that would not be obvious to a general reader. "I ran it through AI" is not a quality assurance step, it is the step that creates the need for quality assurance.

Organisations should establish a clear rule: AI outputs are drafts, not products. They are starting points for human work, not substitutes for it. The moment an AI draft is accepted without substantive human review, quality assurance has failed, regardless of whether the specific output happened to be accurate.

The Ethical Review in Practice

These six risks do not operate in isolation. A single AI use case can trigger multiple risks simultaneously. An AI system that processes beneficiary feedback without consent (consent risk), generates a summary that misrepresents minority views (misrepresentation risk), performs poorly on a local dialect spoken primarily by marginalised groups (differential impact risk), replaces a participatory analysis process that previously gave communities voice in programme decisions (automation of power risk), could cause harm if its output is wrong and that error is not caught (do no harm risk), and produces outputs that are treated as authoritative without verification (quality assurance risk).

The ethical review is not a form to fill in. It is a discipline, it is a habit of asking, when using AI: who could be affected by this, and have I done enough to protect them?

Quick Reference: The Six Questions

Before using AI on any programme task, answer these six questions:

Consent: Did the people whose data I am processing know AI would be involved, and did they have a genuine choice?
Misrepresentation: Would the people this output claims to represent recognise and agree with how their views are presented?
Differential Impact: Does this AI tool perform equally well for all the populations, languages, and contexts in my programme?
Automation of Power: Am I delegating a decision to AI that requires human judgment, and is a human still accountable for the outcome?
Do No Harm: Could this AI use cause harm — through error, data exposure, or loss of individual/stakeholder group agency — even if it works as intended?
Quality Assurance: Have I verified this output against primary sources, and is a qualified person reviewing it before it is used?

If you cannot answer any of these questions confidently, stop and consult your organisation’s AI governance lead before proceeding. The cost of pausing is measured in minutes. The cost of getting it wrong is measured in trust. Trust once lost in the communities we serve, is not easily rebuilt.

Why Ethics Is Not Optional When Using AI

1. Consent

The risk

What to ask yourself

The standard to aim for

2. Misrepresentation

The risk

What to ask yourself

The standard to aim for

3. Differential Impact

The risk

What to ask yourself

The standard to aim for

4. Automation of Power

The risk

What to ask yourself

The standard to aim for

5. Do No Harm

The risk

What to ask yourself — the three Do No Harm questions

The standard to aim for

6. Quality Assurance

The risk

What to ask yourself

The standard to aim for

The Ethical Review in Practice

Quick Reference: The Six Questions

Related Articles

Exploring the Ethics Gap of AI-Assisted Data Workflows in the Humanitarian and Development Sectors

The Agent Engineering Standard: A 13-Category Specification for AI Systems

When AI Operates Your Infrastructure: Why Every Control Must Be Structural