Machine learning, and the related but broader concept of ‘artificial intelligence’, sound magical on first encounter.
Machine learning uses sophisticated algorithms and computing power to detect patterns in large amounts of data. This pattern detection is done in a continual, iterative way that allows the machine to ‘learn’ and make predictions about the future.
Machine learning has the potential to enable more intelligent and impartial public policy. The possibilities are everywhere with bail decisions, immigration decisions and detecting tax avoidance just some of the emerging areas of opportunity. Recognising the potential of applying machine learning across government, Carnegie Mellon now offers a joint PhD program in machine learning and public policy.
Unfortunately, the use of machine learning in a public policy context is a high stakes game and not without its challenges; namely:
- The consequences of errors are high. A poor recommendation by Amazon or Netflix may cause reputational damage, but an improper decision by government may affect individual liberty. The recent debt recovery errors by Centrelink show the ‘high-stakes’ consequences of poor implementation of automated administrative decision-making.
- Machine learning is particularly susceptible to inheriting human bias. Commentator Maciej Cegłowski recently described machine learning as money laundering for bias. It relies on historical data that it is likely to include traces of existing bias from human decision-making and culture. For example, a recent study of the use of a machine learning algorithm to produce risk assessments for bail and sentencing in the US by Propublica highlighted a racial bias against black defendants. Likewise, a recent study of a natural language processing algorithm identified that it inherited gender bias that led it to more frequently associate the word ‘doctor’ with male pronouns.
- Machine learning can be inexplicable. More advanced algorithms, such as deep learning, are notoriously hard to interpret. These algorithms are often depicted as a ‘black box’. The algorithms provide outputs based on vast array of input data without the capability to provide an explanation of how the outputs were derived or which inputs were important. This might be appropriate for a self-driving car (although potentially unsettling), but is challenging for policy-makers who will be held accountable for decisions.
To mitigate these challenges, we recommend policy-makers apply the following five steps:
1. Determine if the decision is appropriate for machine learning
Machine learning is not suitable for all types of decisions. The types of decisions that are amenable to machine learning generally require a future prediction. Researchers from Harvard Business School, who studied the application of machine learning to bail decisions, noted subtle differences between bail decisions and sentencing. Bail decisions may be more suitable to machine learning than sentencing decisions. Bail explicitly requires a decision about an individual’s likelihood to re-offend, whereas sentencing “also depends on things like society’s sense of retribution, mercy, and redemption, which cannot be directly measured”.
2. Interrogate data used to develop a machine learning model
Machine learning models are ‘trained’ using what is referred to as ‘labelled data’. This is a historical data set that is used by the model to start making future predictions. Training data can contain implicit or explicit human bias. For example, in the context of bail decisions, historical data may reflect racial biases by human decision makers. To overcome this, policy makers should use their nuanced understanding of the ‘labelled-data’ to highlight potential pitfalls of its application as machine learning training data.
3. Monitor the outputs of models for bias and accuracy
The development of a model represents the beginning of the application of machine learning, not the end. Policy-makers must closely monitor outputs to determine whether the machine learning model is working as intended.
This monitoring process should include testing the model with ‘unseen’ data. A practical way to do this is to withhold a ‘test set’ of data during the build phase so it can be used later for evaluation. When testing, policy-makers should be clear on the relative importance of false negatives versus false positives. The computer that developed the model, and potentially the project team, will be indifferent to error type. But in reality, the consequences of a false negative (such as someone released on bail) may be much more significant than a false positive (such as someone held on bail who may not have reoffended).
Once operationalised, policy-makers should create a system for ongoing monitoring. A model may be accurate initially but decline in accuracy over time. Ongoing monitoring will allow the team to adjust the model to reflect changes in the external environment and human behaviour over time.
4. Assess the trade-off between explicability and prediction power
The best model for prediction may not be the easiest to understand or explain. There can be an inverse relationship between the explicability of a machine learning model and its predictive power. A simple linear regression model is easy to understand; a change in the predictors has a linear impact on the predicted outcome, but real-world relationships are rarely linear. However, a more complex ‘deep learning’ model uses ‘artificial neural networks’ to make more accurate predictions that may not be explainable. Sophisticated users of machine learning may be willing to trade off lower, but acceptable, accuracy for increased explicability to stakeholders.
5. Design ways for human decision-makers to make better decisions
Finally, the best decisions are often made through a combination of machine learning and human decision-making. The analogy of ‘freestyle chess’ is directly relevant to the use of machine learning in a policy setting. Freestyle chess is a variant of chess where human players can use computer assistants. Grandmaster Garry Kasparov (famously defeated by Deep Blue) observed that “weak human + machine + better process was superior to a strong computer alone”. The thinking of ‘freestyle chess’ should inform policy-makers using machine learning. The challenge should be to design systems and processes and develop the capability of staff to make better decisions with the aid of machine learning and other forms of artificial intelligence.
Nous’ practice uniquely combines analytics, design and public policy to improve outcomes for citizens. Get in touch to discuss how you can effectively apply machine learning to implement public policy.