Sonia Ingram

Sonia Ingram

Following the fallout from the Cambridge Analytica scandal, many companies and organisations have developed ethical frameworks they strive to adhere to, but they give little in the way of practical advice on developing ethical AI.

Sexist recruitment tools, racist recidivism tools, both sexist and racist facial recognition systems: Artificial Intelligence (AI) has been getting quite a rap in the last few years. Is this the fault of the AI itself, or the predictable consequences of model developers treating ethics as an afterthought?

AI is booming. Almost every aspect of our lives is being condensed down into numbers crunched by machine learning models: from recruitment to finding your next beau, and from shopping to healthcare. Now more than ever, it is clear that ethics should be considered front and centre when designing AI systems.

So, what exactly does ethics mean when it comes to Data Science and AI? It’s a huge topic and one that leads to a plethora of perspectives, tangents and unanswerable questions. A brilliant and concise overview of the most important areas for consideration is provided by the Coursera course Data Science Ethics. Topics include consent, data ownership, privacy, anonymity, fairness and societal consequences, each a multifaceted and complex problem in itself, but essential consideration when planning a model.

Following the fallout from the Cambridge Analytica scandal, many companies and organisations have developed ethical frameworks they strive to adhere to, but they give little in the way of practical advice on developing ethical AI. The Machine Intelligence Garage, an EU funded Digital Catapult to help start-ups adopt AI, has developed one of the more comprehensive ethical frameworks which covers a multitude of ethical questions to address when building a model, including the ethical impact of your business model. Furthermore, has recently published a guide for using AI in the public sector as well as their own ethical framework. The guide is public sector specific, directing readers to appropriate government approval bodies and whatnot, but what a lot of these frameworks and guides lack is clear, concise and practical advice for actually implementing ethics when designing and implementing AI systems. How should you assess if a model is ethical? What key performance indicators exist for ethics?

At the highest level of ethics in AI is the overall purpose of the model, for example would it be ethical to build a model to assist terrorism? This is where the sage words of Jeff Goldblum’s character in Jurassic Park can be applied: “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.” The ‘should’ is the first and most important question to ask.

The next level concerns the data required to build the model; is data collected in an ethical way? Do the people concerned know about and consent to this use of their data? Is there a way to remove subjects’ data if they request?  General Data Protection Regulation (GDPR) in the European Union aims to ensure companies are approaching data collection and storage in a responsible manner, but what about people from countries with less stringent data regulations? Should we use their data differently? At the most granular level is ethical considerations when actually building the model- but we’ll save that for another blog post.

Data is one of the most critical considerations when it comes to ethics. Not only the legislative reasons mentioned above, but the data used to train your model defines your model: garbage in, garbage out. Biased data has been the cause of so many of the AI fails we have seen in recent years. This is not just embarrassing for the company responsible, but potentially life changing, as with the racist recidivism model developed for the US judicial system. Another example of biased data was when Amazon developed a model to help screen job applicants. Because historically many successful applicants had been male, the model used the male gender as a predictive feature, ultimately leading to a sexist model. It may make business sense to develop a recruitment tool powered by AI to drive a more efficient process, but Amazon should have taken the time to consider the data available and the possible implications to determine whether this was an ethical use of AI.

People have unconscious bias. Everyone does, it’s a fact of life and most people don’t realise when they are being biased. This is why it’s so important to ensure that diverse teams are used for AI development (and really development of anything!). If Amazon’s facial recognition development team had been more diverse, perhaps they would have spotted its inability to recognise dark skinned and female faces. Likewise, Google’s sexist word associations (i.e. doctor is to man as nurse is to woman) could have been picked up on. There’s a great quote in Invisible Women which applies here and should be extended to include ethnicity: “when we are designing a world that is meant for everyone we need women in the room…failing to include the perspective of women is a huge driver of unintended male bias that attempts (often in good faith) to pass itself off as ‘gender neutral’”.

Examples like these lead me back to my initial point that ethics cannot be an afterthought when adopting AI. The impact of AI on our everyday lives is growing by the minute and therefore ethics needs to be a key consideration throughout the life cycle of Data Science models. It should feature in timescale, budget, and resource discussions, and not left as an afterthought. Ethical governance of AI models is coming and we need to be prepared.