On the Ethical Limits of Natural Language Processing on Legal Text: A Reality Check

A blog post on academic freedom, integrity and ethical reviews in natural language processing (NLP) was recently published and circulated on social media. The post, authored by Prof. Emily M. Bender, criticizes our paper “On the Ethical Limits of Natural Language Processing on Legal Text” (to appear in Findings of ACL 2021). Prof. Bender’s post makes an important contribution to an ongoing discussion about the scope, procedures and role of ethical considerations in (legal, but not solely) NLP. In this post, we dispel some misconceptions about our paper and then move on to address substantive criticisms.

Our paper focuses on certain ethical considerations that we think should be pondered when deciding whether to engage in legal NLP research (from the point of view of the researcher) or when thinking about how to ethically evaluate research (from the point of view of, say, ethics committees). The paper is not comprehensive; not all relevant issues are tackled. Instead, we zone in on three normative parameters that, to the best of our knowledge, have not been extensively discussed so far: a) the importance of academic freedom; b) the existence of a wide diversity of legal and ethical norms both domestically and internationally; c) the threat of moralism in research related to legal NLP. In the remainder of the post, we focus on the first two parameters, which have been taken up by Prof. Bender. But let us begin by making clear what we do not claim.

Dispelling Misconceptions
First, the parameters we focus on are not the only relevant ones, nor are they always the most weighty. To take a rather obvious example, the avoidance of foreseeable and specific harm to others (i.e., a violation of their rights) trumps academic freedom. Hence, despite what Prof. Bender implies, we do not hold that ‘academic freedom entails the freedom to do whatever, with no regards for the rights of others’.

Second, none of the parameters we zone in should be understood as ‘calling the shots’ in any absolute kind of way. On the contrary, they are merely pro tanto reasons, i.e., factors that should be considered in ethical evaluation along with all other applicable reasons and always within specific contexts. So, contrary to what Prof. Bender claims, we do not hold ethical judgments to be ‘binary’. Indeed, ethical judgments are much more complex than that.

Academic Freedom and the allocation of the burden of proof
In the paper, our recommendation about academic freedom is this: in decisions about the ethics of legal NLP research, deliberation should begin by commencing with a (rebuttable) presumption to the effect that academic freedom should not be curtailed unless there are compelling reasons to decide otherwise. With respect to this recommendation, we wish to clarify two points.

First, academic freedom is not an absolute right and researchers do not get ‘carte blanche’ to do ‘mad science’ simply by invoking their supposedly over-riding academic freedom. Academic freedom should always be reconciled with other ethical considerations, including (other) people’s rights. Moreover, ‘mad science’ cases can get trivially caught at the academic institutional level by using Institutional Review Boards.

Second, and more substantively, our recommendation sets out a rule about the allocation of the burden of proof. According to that rule, when someone claims that some research project could, for example, have harmful effects, the burden of proof lies with the person making the claim. Crucially, successfully meeting the burden requires more than just armchair speculation about potential harmful effects. The evidence that such harms could occur must be real and specific. Thus, we believe: (a) that it should be the burden of critics of a research project to present always specific as opposed to generic evidence about potential risks; (b) that at least risks that are known to be speculative should not be considered.

To make things concrete, we discuss in the paper a specific example involving legal-NLP research on the prediction of case outcomes of the Chinese Supreme Court and the risks, real or imagined, that such research poses. Potential risks had been extensively discussed in a previous paper by Leins et al., which helpfully sparked the discussion. One of the issues was to do with the dangers that the non-anonymization of the dataset posed to individuals. This included the risk that, say, convicted individuals could be identified through the dataset long after they had served their conviction. In her post, Prof. Bender claims that we minimize these risks. But we do nothing of the sort. Instead, what we do is apply the proposed burden of proof rule to this scenario to suggest how evidence should be adduced when discussing such cases. First, we said, Leins et al. had not presented sufficiently specific evidence. Second, we speculated, contra Leins et al., that the independent risk that the dataset presented was not significant, since individuals were anyway identifiable mainly through the Supreme Court cases that are already available in the public domain. Third, though, we also openly admitted that we might as well be wrong in our assessment; this is because we do not know enough about the specific social context so as to be in a position of making a serious and confident all-things-considered ethical assessment. However, we claimed, neither did the critics of the project.

The bottom-line, then, is this. What is needed to meet our proposed burden-of-proof rule in such cases is detailed and serious engagement with the pertinent technological and social context. In the specific example under discussion, this involves answering questions such as the following: what is the real possibility that convicted people could be identified by a dataset that does not conform to (mostly Western European) data privacy norms? Is the dataset generally available to the public? If not, is it available specifically to Chinese authorities? Under which conditions? If one’s goal is the identification of a specific individual, is it not simpler to just retrieve the cases from the public domain (i.e., from the Chinese Supreme Court itself)? If so, does the dataset exacerbate that risk? Given our recommendation about the allocation of the burden of proof, these are the real questions to answer.

Norms Diversity
The above discussion leads us directly to the second point, i.e., norms diversity. Our recommendation was that forging genuine universal ethical standards requires a global conversation between researchers engaged from a plurality of standpoints and traditions. When such standards do not exist or exist only to a minimal degree, ethical assessment for global conferences, journals and reviews should be appropriately flexible and respectful of differences and reasonable disagreements. It should not be assumed that a default ‘one-size-fits-all’ model is sufficient.

With regard to norms diversity, we wish to make two clarificatory points in addition to the issues we discuss in the paper. First, we illustrated the mechanisms of diversity by using the specific example of data privacy for the scenario we mention above. In this context, we made an obvious point: that both synchronically (across states) and diachronically (in the history of Western democracies) the high level of protection afforded today to data privacy in the EU and Australia is an outlier. This, of course, does not mean that that level of protection is normatively indefensible. However, it does invite serious reflection in cases of ethical evaluation of research conducted outside the societal context where that level of protection is taken for granted. And this reflection does not entail, as Prof. Bender claims, that the more permissive norm should win out by default. But neither does it entail the opposite.

Second, what does it entail? Our response in the paper is: at the very least, an awareness that a diversity of norms on data privacy exists, along with a diversity of justifications provided to ground these norms. We illustrated this point in specific reference to the paper by Leins et al. Their argument could be interpreted as suggesting that the high level of protection of data privacy afforded in the EU and Australia is (or should be) the default ethical position without considering different substantive positions on the matter, prevalent in many states of the Global South. Serious ethical reflection on these differences is the only way to make genuine progress. Incidentally, we did not insinuate, as Prof. Bender thinks we did, that ACL (or any other institutional body for that matter) fails to take norms diversity seriously.

Substantive Disagreement About Academic Freedom?
So much for misunderstandings of our paper. Apart from those, however, we also believe that Prof. Bender raises a deeper point in her post. It has to do with different ways of conceptualizing academic freedom. The understanding of academic freedom that we favor in the paper could be described as broadly deontological, even though we also speculate that even under an alternative consequentialist rendering of academic freedom, outcomes for most specific cases would probably significantly overlap. Deontological conceptions, of which Kantian ethics is the most well-known example, place emphasis on the importance of rules whose validity and bindingness are independent of the consequences of the actions permitted under the rules. In their Kantian version, these rules are ultimately grounded in the dignity and rational freedom of persons. One important aspect of deontological approaches is thus that they are anti-perfectionist: they leave it to free persons to choose their ends and preclude interference by other persons or institutions if certain constraints are satisfied, irrespective of the ethical optimality of actions. Persons are neither ethically required to do the best they can, nor to conform to ethical ideals; of course, they might well freely choose to, but they need not.

On that understanding, academic freedom entails that researchers are free to choose research projects that are neither ‘the best they could think of’, nor in conformity to any substantive ideal, such as challenging power. Now, this interpretation of academic freedom is contestable. But it is also in line with an old and venerable liberal tradition of conceiving of persons as self-directing agents responsive to reasons and free as such to set their own chosen ends without external interference, even if these ends do not appeal to others, as long as people’s rights are not violated. Researchers must of course abide by scientific norms, including methodological norms; their work is constantly assessed by their peers. An important feature of scientific norms, though, is that they are content-independent: they are not to do with choosing any specific research topic nor, a fortiori, a specific interpretation of the value of scientific research.

Now, a conception of academic freedom such as the one put forth by Prof. Bender in her post(‘the freedom to pursue research in ways that challenge power’) is perfectionist. It stipulates an ideal (challenging power) and measures the ethical appropriateness of research against that ideal. It could be debated whether the pursuit of that ideal is a realistic perspective in contemporary academia for real flesh-and-blood researchers. For example, it is well-known that a lot of extant academic research in fact strengthens existing structures of power, not least because it is overwhelmingly funded by institutions wielding power, such as governments and corporations. But irrespective of that, perfectionist understandings associate academic freedom to a specific and substantive conception of the value of scientific research. They thus pre-empt researchers from forming their own conception of what good research should serve such as, for example (as we say in the paper), a disinterested pursuit of truth for its own sake, or some other value, even if these conceptions ultimately turn out to be ‘mere ideologies’ (i.e., false).

Adjudicating between deontological and perfectionist conceptions of academic freedom is a complex matter, which falls outside the scope of the present post. What we wish to do here is, rather, to identify what we think is a substantive disagreement about the nature of academic freedom as best we can and bring it to the wider attention of other members of the scientific community. Still, the differences between the two conceptions should not be overstated. Prof. Bender is right to point out that academic freedom should mainly be directed against governments and other holders of public (and perhaps private) power. She is also right to insist that academic freedom does not in any way entail any specific right by any individual researcher to have one’s research published. Thus, the ‘controversial’ parts of our proposal involve the claim that research should not be denied publication on ethical grounds as long as it conforms to scientific norms and, realistically understood, it does not present risks to people’s rights. In particular, controversial ideals of the good or speculation about potential harms are not adequate grounds for ethical rejection of such research. We very much look forward to further future discussion on these important issues.

Dimitrios Tsarapatsanis, Lecturer in Law, York Law School, University of York (UK)

Nikolaos Aletras, Lecturer in Natural Language Processing (NLP), Department of Computer Science, University of Sheffield (UK)

Lecturer in Law, The York Law School, University of York (UK)