The Ethics of Analytics: Answering the ‘Should We’ Question

Perhaps the time has come to define a set of ethics associated not with how the data is accessed or manipulated but with the purpose for which it is analyzed.

A great deal has been written on the topic of data and its associated issues related to storage, security, its meaning, governance, and usage. Regulations have been defined to protect the right of the individual and how their data is used from a privacy perspective. While these topics are of concern to organizations and individuals, there is another topic of great importance, but of which very little has been written – ethics. The root question can be stated as – even though we have the ability to determine cause and effect in a certain circumstance and possibly divine the future – should we?

Ethics has always been at the forefront of what we consider to be ‘good business.’ Should we exploit workers in sweatshops, is it responsible to promote misleading product information, or can we manipulate accounts? The obvious ethical answer is “no”; however, while we have regulations to support ethical business practices supporting the rights of the individual and society as a whole, at this point, there are no regulations associated with the ethical use of data.

Data is not something that we manufacture or gather from the planet. In most cases, it is something that we create or gather with permission. As with any other asset, it is how we use this resource that has become the point of contention. New privacy regulations relate to how an organization can use data from a marketing perspective, and much has been written and opined on who owns the data and if a group or individual should have access to it; however, not much has been said on what we are actually using the data for.

To be clear, some has been written on a code of ethics for data analysts, including:

Protect the client (again, from a privacy perspective).
Let the data tell the story. If the data provides an answer that is not what the framer was expecting – the data is what the data is.
Don’t intentionally misinterpret the data.
Don’t embellish when not applicable.
Don’t lie.
Don’t change the data.

While I agree wholeheartedly that these are important ethical standards that must be upheld, the real ethical conundrum is, “Should some questions be asked?”. When we look at it with that lens, we see that we can use data for discriminatory purposes in many circumstances, including medical, the law, hiring, and education.

The medical field as an example

The medical field provides one of the best uses of analytics – not only can both payers and providers analyze claims history to get an understanding of what is going wrong and why, but they can also take this information and add additional health data to form patient safety protocols, putting process and procedures into place that can reduce injury, malpractice suits (keeping insurance and therefore healthcare costs down) and increasing patient satisfaction. Other opportunities include being able to search through histories of health data to determine the best Intervention for a specific injury or illness based not only on how this particular patient presents themselves but the medical histories of millions of patients, including not only the standards of age, gender, weight and so on but include such items as demographics, increased family histories, ancestry and more. These are all positive uses of analytics.

But even in the medical profession, there are ethical questions associated with the use of data. We all understand the concerns associated with acquiring patient data and how privacy must be maintained. That risk aside, what if through the use of analytics, decisions on patient care were made based not on actual patient care, but on the perceived ability to pay (which should not enter the equation), predicted ability to recover (which may be based on quality of life) or, more concerning, the predicted outcome that could lead to an insurance claim against the hospital or doctor. Should a procedure not be undertaken, and a patient allowed to suffer or die, based on the financial decision to not have to pay an insurance claim?

Ethical issues in education

Another example is in education. At the outset, a university could use analytics to determine the viability of a professor based on past history. Do all of the students who have this particular professor for economics drop that as a major or continue on? Is that professor better suited for other areas of instruction? By using historical data to ensure that the right professors are teaching the right courses, everyone benefits. The reputations of the professor and the school increase as the students who learn and graduate attribute their success to the school. The students benefit by getting the best education available to them.

So, where is the harm, and what ethical questions are there? There are any number, including what data should be made available to professors on student capabilities if students should be recruited based on their predisposition not to transfer (causing additional costs to replace that student) or even in the admissions process itself.

Should a university select students, not based on academics but on the student’s predisposition to donate to the university itself. Is it not possible to analyze the donations to the school based on academics, courses taken, participation in sports or extracurricular activities, and more from a high school perspective and include additional information on anticipated areas of study, demographics, etc.? Should a university select one student over another based on that predisposition to donate? If that decision overrides the decision based on academics, then the university may have turned down a more promising student based on financials rather than the potential to better serve the community at large.

Keeping data use ethical

Questions in ethical analytics span a range of industries. A director of a shared data science group for a US State indicated that on numerous occasions, once data has been retrieved for study, the requesting state agency noticed that other implications could be derived, especially when correlated data needed to be pulled from multiple agencies. The director was concerned that these additional questions were outside of the original purview of the approved analysis, had not been vetted, and could result in an unethical use of the data itself. The Director was so concerned with this possibility that she added an ethics clause to the group’s Charter and has considered developing an Ethics Board to review such requests and remove the politics from the list of responsibilities of the data scientists themselves.

While these questions are being raised, they are seldom being answered. There is no single body that has yet defined a true code of ethics for use in analytics. Some courses are being offered at major universities on the ethical use of data from the philosophical point of view, but at the time of this writing, there were no organizations that had included this type of ethos within their standards of behavior. Even the ADaSci (Association of Data Scientists) has a handbook, but these ethics focus more on issues associated with transparency as defined above, ensuring that the data is not manipulated or misinterpreted.

An old comic stated, “With great power comes great responsibility,” and we have given a great deal of power to a great number of people. But can we find a way to manage that power? Many professions have organizations that work to ensure both ethics and standards within their communities, and membership within these organizations is considered to be a recognition, not only of the skill attained but of those standards. Perhaps the time has come to define a set of ethics associated not with how the data is accessed or manipulated but with the purpose for which it is analyzed. It may be as simple as borrowing from another profession – data shall not be used or analyzed in any way as to cause harm to or bias against any individual or group.