Data Soup Ingredients (feelings, methods and next steps)

I’m a big fan of soup. Really, it is magic, you add all kinds of ingredients into a pot and hopefully get a “worthy soup”. Sometimes I think we should take the cooking smarts and apply it to our work. What are the key ingredients and tactics to activate Open Data? Surely, you’ve had a sad pot of soup before (eg. too much salt, not enough spice, etc.) It is really some new ground, but I think there are some common sense practices that need to be discussed. So, instead of ‘nuts and bolts’, let’s talk about how we get to the ingredients to make a decent plan (a pot of great soup) to feed our brains and guide our decisions.

Tonight I’m participating in the ICT4D London Panel:
Eliza Anyangwe of ICT4D London Meetup/Guardian Development asked that we prepare the following:

…go beyond gushing about how great open data is and stimulate thought about areas in need of improvement and strategies to get us there. The topic: what would happen if data quality is not addressed?

What do you see as the challenges of open data for development? How can we as a community get around them?

I decided to focus on challenges and things that we can do to build community:


It might seem really ‘off’ to talk about ‘Feelings’ when we talk about open data, but we should. Do we Trust the data? Do we trust the sources/collection methods? Do the consumers/users understand the data and, most of all, can we ‘believe’ that we can base our decisions on big issues like nutrition planning or humanitarian responses? If we don’t talk about assumptions or talk about ‘data corrupts humans and humans corrupt data’ feelings, then we might be setting ourselves up for data fail. How do we build ‘trust’?

At the Open Government Partnership Summit, I had the chance to meet Chuks Ojidoh of ReclaimNaija project from Nigeria. He challenged assumptions about data collection and how communities ‘on the ground’ who were supposed to be served by ‘data’ felt about it. To sum: people don’t trust the data and won’t trust it unless they are involved in the data collection. And, the concept of central ‘firm’ datasets on, for example, budgets, needs to be community sourced council by council. Plus, people want to give feedback on the data and interact with it. This is a good reason to share, whenever possible, the datasets in an open fashion. It helps us be accountable and transparent plus gives us the ability to remix and use data.

So, there we have it: feelings! Until we talk more about how we will help people collect basic datasets and involve citizens (including digital literacy), then we won’t really get an accurate open data picture. I firmly believe that the ‘open data movement’ as it stands today will be greatly reshaped by folks like BudgIT from Nigeria and ReclaimNaija.

(Chuks shared this research with me this great article from Dr. Steven Livingston on Africa’s Information Revolution.)

Method to Madness

From data collection methods (sms, maps, spreadsheets, analog paper) to data ethics, privacy/security and protection, we need to grapple more with “methods to our madness’. This also includes the heavy topic: How can we have feedback loops and iterate/improve data. Of course, I fall into the category of “open data” allows people to use and remix data, but this does not come without a heavy filter of common sense.


What is a clean dataset?

After working on the Uchaguzi Kenyan Election project, I was on a mission to find a way to open up the Ushahidi datasets, I started to research how to create a Clean DataSet Guidelines list (created with help from Okfn, Datakind and Groundtruth Initiative). I think that there are many people trying to figure this out. If we had some standards and means to clean data, then people may be more comfortable sharing it. This does not minimize the risk of human error. A quick summary of how to get to a cleaner dataset: remove names, phone numbers, geolocation, and sensitive information. Challenging isn’t it? What happens when you remove this and it affects a decision negatively? (See some resources on this topic.)

One important thing that we did during the election project was to have a Data quality team reviewing the data in real-time. This team involved researchers, software developers and subject matter experts. Most of these people were community participants. While this might not be possible for all projects, it is a road forward on short sprint ones until organizations reorganize to do it themselves.

To the machines

Along with the Data Science for Social Good Fellows and my (then) Ushahidi colleagues, I spent the summer assisting a Data cleaning project. How can we use machine learning to help us build clean data sets? How can we clean datasets to know include personally identifiable information? What are some of the data ethics that we can infuse in our methods?

Details on that project:

Project background

While machines cannot replace the Human API (your eyes), it is a start. There is much improvement to be done on this.


Improving the Recipe

The biggest ingredient that I think needs to be addressed is the “how’. A widespread group of people from various disciplines and regions need to have the capacity to be data makers. This means that we have digital literacy barriers combined with data literacy. At OKFN, we have the School of Data to help with this.

Last week, together with my colleague, Michael Bauer, and a few community members, we held an experimental training workshop. Our goal was to give people of menu of data skills from data cleaning, how to use spreadsheets, data visualization 101 and how to geo-code. These are really tough skills to integrate into all our workflows. For early adopters, I am sure that these are easy, but how to we get the next 1000 datamakers in civil society and governments? I was actually approached by someone thanking us for giving them a safe and equal space to learn these important things. Datamaking should not be some magical thing that is inaccessible to average people doing great work. It should be every day soup making: given the ingredients, some basic skills, and time to learn/test/iterate, there is a chance that we can fill this gap.

Steve and SCODA
(Steve of Devint training folks at the International Crisismappers Conference)

When we talk about challenges and opportunities, there are some important next steps to consider: Spaces to learn for free, mentors to help us on our data journey and checks/balances on data curation/quality.

Share this Story

About Heather

One comment

  1. Totally agree on the biggest ingredient, Heather. I think with communities that foster educational values for learning how to use spreadsheet, geo-based, visualization, and statistics-based tools (among others), as well as understand how to use application-programmable interfaces (APIs), its members can realize that they can make big impacts with data.

Leave a Reply

Your email address will not be published. Required fields are marked *


four + 4 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© Copyright 2016, All Rights Reserved