Fruit of the poisonous tree
Our modern tools of statistical analysis were developed by white men who directly exploited Black and Brown bodies during colonial rule. Validating embodied data from below could be a form of repair.
So far in this series, I’ve talked about the need for data rights as a framework for individual and collective control over access to community data, which can help us deconstruct oppressive systems that rely on unlimited access to our data.
I’ve also talked about how the current power dynamics affecting the politics of data favor the right to privacy for people with money and influence, and surveillance by default for everyone else.
Understanding these dynamics sets the stage for us to understand that a framework for data rights has to flip the paradigm so that transparency flows in the direction of systemic power.
But what would it look like to actually flip that paradigm? What would the world look like if the value and validity of data was defined from below?
Obviously, we’re still figuring that out. Western technologies that have enabled mass data collection and surveillance have only existed under the hegemonic rule of capitalism, white supremacy, and principles of individual gain over collective well-being.
But there are other ways of knowing that undermine these systems. Projects like Our Data Bodies have brought attention to the need to embodied ways of knowing that fight back against top-down data extraction and surveillance. And various other data justice hubs and collectives are cropping up in response to global advancement of AI and other invasive technologies to instead embrace dignity, autonomy, and power in response to data exploitation.
Many data justice experiments at some point touch on the same foundational sources of knowledge — our bodies and our immediate environments. Our body is our first source of contextualized knowledge, so it makes sense to center our earthly form as we start to explore what bottom-up data justice paradigms can look like.
According to research by Jennifer Frank Tantia in The Art and Science of Embodied Research Design, we need better language to capture the complexity of our embodied experiences. Jennifer’s research defines “embodied data” as being related to how we describe our bodies’ movements and experiences. In other words, embodied data can be something that emanates from us and that we try to capture through various words and measures.
If embodied data can be so personal, then why does data so often make us feel like insignificant dots?
A major criticism of AI is that it harvests information indiscriminately. On an embodied level, it just feels gross to know that every email you’ve sent and every search you’ve made has created a machine that isn’t accountable and has no relation to you. It matters that those experiences are now totally disembodied from the representations of your data as shown through AI.
The “ick” you feel with seeing yourself in data can happen on any level. Imagine you’re at a community event where data is being shown about drainage repairs in your neighborhood that will prevent flooding from climate disasters.
You see a chart on the wall about flooding in different parts of your city. You look to find your street, to see if the flooding you experienced last year is represented in the data. It says flooding was minimal. But you called 311 more than five times trying to report blocked drains all down your street! Your neighbor had water up to their doorstep! Sure, it didn’t technically count as more than a foot of water but the blocked drains made all the difference.
What defines that experience for you? How does it feel in your body? How does it feel to see yourself represented in the map? Does it feel bad?
Now imagine it feels good. Imagine that when you see the map, you’re like, “Yes, this is exactly what I’ve been saying!” Imagine you feel happy that policymakers and decisionmakers will see this information because maybe it means they’ll finally install better drainage on your street.
Ask yourself, in the scenario where you feel good about what’s represented, which data is added to the account? Is there more context about what data might be missing? Is there space where you can write in additional parts of the story in person? Are there other layers of information that you can add as a viewer of the map?
Whether this information is included or not is decided by the person who created the dataset or visualization. That data carries their bias. If this data is held up as truth, but the person who collected the data doesn’t share your point of view, it could change how you yourself feel about the representation.
That’s important to understand because anyone who has ever used statistics is biased by major systemic influences. There is bias that’s built into the fundamentals of how we collect or analyze data in the first place.
Statistical analysis was invented and championed by eugenicists who measured and compared humans in order to construct scientific standards for white racial superiority.
The development of statistics, like any advancement in technology, happened in fits and starts across hundreds of years and likely hundreds of scholars. Al-Kindi, a Muslim mathematician in the Abbasid caliphate is the first documented person to study statistical frequencies while deciphering cryptographic messages in the 9th century.
But in the West, many traditions of statistical analysis were developed by a cadre of European scientists in the 19th century. At the start of the 1800s, the British Empire had already existed for about 200 years with colonies in Africa, South Asia, and the Pacific. By the middle of the century, eugenicists like Adolphe Quetelet, Francis Galton, Karl Pearson, and Ronald Fisher, were constructing some of the fundamental building blocks of statistics today.
Galton, the so-called “father of eugenics,” coined the phrase “nature vs. nurture.” He felt that successful people got their traits only by nature, and he went about using statistics to prove that. He would work with Pearson and Fisher to invent the concept of “statistical significance,” which is still the standard we use to decide if data is rigorous enough to be considered accurate today.
Galton, Pearson, and Fisher looked up to a Belgian man named Adolphe Quetelet whose core motivation was to find the ideal “average man” by measuring and comparing human bodies by height and chest size, for example. Because statistically Quetelet sought out “statistical modalities” in the data, or common trends so that he could smooth out the data and find an ideal average, he decided that only bodies of “like races” could be meaningfully compared to each other. As a result, his research centered around building up the ideal standards of the white body, and in the process he contributed to the foundations of how we currently use demography and statistics to compare and average populations.
This research arrived on the back of previous abuses of scientific research in the British empire including the dehumanizing treatment and posthumous violation of the body of Sarah Baartman. Across the board, developments in statistics in the West came from men who believed in the superiority of the white race by birth, and they created the rules we still use to decide which data are considered valid today to reinforce that belief.
In order to unpack and disentangle statistics from eugenics, we need to understand what the legacy is actually doing to our ways of knowing, seeing, and understanding. These eugenicists viewed those who they saw as “other” as if from above. They sought to flatten, smooth, average, and extract a single source of truth from the “data” that was people’s bodies.
That vantage point, from above, still characterizes the way that people in academia, government, and especially the corporate world are taught to see and know things. It also encourages us to use data to seek a certain type of truth, which we could also call an ideal, the same way that Quetelet sought an ideal and an average for the human body.
Because of the proliferation of racial capitalism through colonialism and empire, these ways of knowing have now proliferated. Business practices around the world that engage with western capitalism have to also engage with its practices of key performance indicators, quarterly metrics, real-time insights, predictive algorithms and the rest.
These tools are an evolution of the original intent of statistical thought. They exist to flatten and synthesize information for those at the top of the food chain. Embracing embodied data means unpacking and resisting all of that, something too huge to capture in one newsletter. But I believe a starting point can involve allowing for knowledge to be more fuzzy, viewing it more as a tool for mutual understanding than as a single source of truth.
Data is not the truth, data is a medium for communication.
If data when viewed from above is understood as truth, reality, evidence, or outcomes, then data when viewed from the ground can be understood simply as conversation or an exchange of perspectives.
When viewed this way, it becomes a lot less scary that all data is biased. Of course it is! It’s our perspective. And we should be able to put it out there with the right context, the same way we do with our ideas in conversations.
When someone tells a story that catches you by surprise, sometimes you need to ask for more detail. If you don’t understand what’s being shared or it doesn’t align with your worldview, you might want to ask, “When did this happen? Who was involved? How did you find out??” This is the same as the work that needs to happen with contextualizing data.
Because even when there is a true version of events to a story, there can always be different perspectives. And it’s possible that all of the perspectives are valid because they are different interpretations of the same actual events.
You’re not coming in to extract the story and get out. You’re comparing people’s perspectives because you want to actually understand what’s going on, not just logging the event itself, but also getting to know the people. And isn’t that the point of everything? Isn’t that life?
So in order to get to a real truth, you not only need to collect details from one person to understand a dataset, but from many, to understand their various contexts and the ways they understand the story of the data. Only then can you start to understand who is in the room, or who is in the data. The dataset itself is just the starting point for the conversation.
This has real-world implications. If we favor statistical ways of knowing over embodied ways of knowing, we will miss really important stories that are out there in the world.
In Houston, people living Kashmere Gardens have known for decades that the Union Pacific Railroad site has caused fatal health effects and toxic contamination in the soil. One person who grew up in the area told me that in her lifetime, she watched the frogs that used to swim in the drainage ditches in front of her house slowly disappear because of the contamination of the water, and now there were no more that she could see. But the area was only validated by institutions as a “cancer cluster” in 2019. Only when these results were “validated” did government agencies and others start to take more significant action.
Why don’t we believe a person’s decades of observation and insight? Why have we internalized the idea that these other statistical ways of knowing are more important to take as truth? We’ve all done it. It’s a core belief that’s instilled in us through our systems of education and research.
Data as communication might mean that no single perspective is the ultimate truth, but it also means that all perspectives are valid; even with incomplete data, even with insufficient tools to conduct analysis. All perspectives are valid. That is the lesson of embodied data, data from below.
What would it look like to actually trust people’s stories of their movements and experience —their ways to summarize them and look at them through a new lens. How enchanting is it when you get to see that happen? To see reality through someone else’s eyes.