Searching big data for ‘digital smoke signals’

The office has the look and feel of an Internet start-up. The workers are young, the dress is casual and the computer of choice is an Apple notebook. They inhabit a single open room. The walls have whiteboards for scribbling…

The office has the look and feel of an Internet start-up. The workers are young, the dress is casual and the computer of choice is an Apple notebook. They inhabit a single open room. The walls have whiteboards for scribbling ideas when inspiration strikes.

But the office in Manhattan is not dedicated to the latest app. It is the base camp of the United Nations Global Pulse team — a tiny unit inside an institution known for its sprawling bureaucracy, not its entrepreneurial hustle. Still, the focus is on harnessing technology in new ways — using data from social networks, blogs, cellphones and online commerce to transform economic development and humanitarian aid in poorer nations.

“We work hard, play hard and tend to stay well-caffeinated,” said Robert Kirkpatrick, who leads the group. “This is an exercise in entrepreneurship.”

The efforts by Global Pulse and a growing collection of scientists at universities, companies and nonprofit groups have been given the label “Big Data for development.” It is a field of great opportunity and challenge. The goal, the scientists involved agree, is to bring real-time monitoring and prediction to development and aid programs. Projects and policies, they say, can move faster, adapt to changing circumstances and be more effective, helping to lift more communities out of poverty and even save lives.

Research by Global Pulse and other groups, for example, has found that analyzing Twitter messages can give an early warning of a spike in unemployment, price rises and disease. Such “digital smoke signals of distress,” Mr. Kirkpatrick said, usually come months before official statistics — and in many developing countries today, there are no reliable statistics.

Finding the signals requires data, though, and much of the most valuable data is held by private companies, especially mobile phone operators, whose networks carry text messages, digital-cash transactions and location data. So persuading telecommunications operators, and the governments that regulate and sometimes own them, to release some of the data is a top task for the group. To analyze the data, the groups apply tools now most widely used for pinpointing customers with online advertising.

“We’re trying to track unemployment and disease as if it were a brand,” Mr. Kirkpatrick said.

Global Pulse is small, employing 11 people in New York. Seven more people work at a laboratory in Jakarta, Indonesia, that opened last fall. And Global Pulse is hiring for another lab in Kampala, Uganda, to open this fall.

The research labs are initially working on demonstration projects to show the potential of the technology. “But the larger role of Global Pulse is as a catalyst to foster a data ecosystem for development, bringing together the private sector, universities and governments,” said William Hoffman, an associate director who leads the data-driven development program at the World Economic Forum, which has worked with Global Pulse.

Its United Nations pedigree helps Global Pulse serve as an impresario for data-driven development efforts. “Global Pulse has been central in raising awareness,” said Alex Pentland, a data scientist and director of the Human Dynamics Lab at the Massachusetts Institute of Technology. “And it is a trusted party in an area that is sensitive for many governments and companies.”

The group traces its origins to the 2008 financial crisis and concerns about how the economic pain would sweep through the developing world. But as Secretary General Ban Ki-moon of the United Nations said in a speech, “Our traditional 20th-century tools for tracking international development cannot keep up.”

Global Pulse is intended as a 21st century answer to that problem. It was set up in 2009, as an innovation arm in the office of the secretary general. Mr. Kirkpatrick joined in early 2010, began assembling a team and emphasized tightly focused projects and rapid experimentation, while traveling the world to spread the data-for-development gospel at conferences and in private meetings.

There are several nonprofit organizations dedicated to using Internet technology and data for humanitarian ends, including DataKind, Ushahidi, Crisis Mappers and InSTEDD. But those groups typically respond after natural disasters and emergencies. Yet Global Pulse is also focused on re-engineering traditional development projects in transportation, water supplies and food distribution. Its deputy director is Makena Walker, a 15-year veteran of the United Nations’ World Food Program.

For all of its goals, Global Pulse needs corporate partners. In addition to working for nonprofits, Mr. Kirkpatrick spent years in the corporate world, having been a founder of the humanitarian systems teams at both Microsoft and Groove Networks, a software company bought by Microsoft in 2005.

In Indonesia, for example, Global Pulse has worked with both Crimson Hexagon, a start-up, and SAS Institute, a large data analytics software company, to mine Twitter messages and other online media for clues to price trends. The smart algorithms must identify not just words, but context and often sentiment. “I had rice for breakfast” is not a signal. “The price of rice is getting scary” is. The research found that surges in online mentions accurately capture price increases a month or two before official statistics.

“Sentiment analysis of social media is where our technology is headed,” said I-Sah Hsieh, global manager for international development at SAS. “We certainly never expected that the U.N. would be our partner for cutting-edge research.”

Cellphones are mobile sensors of human behavior. So the data collected by mobile carriers is often particularly useful for development programs. But the collection and sharing of that data often raises questions about privacy.

Mr. Kirkpatrick has been an advocate of “data philanthropy” and the creation of a public “data commons,” in which companies contribute large customer data sets, stripped of personally identifying information, for research on development and public health. For companies, Mr. Kirkpatrick insists, it should be a matter of self-interest, since economically healthy communities are more attractive markets.

Orange, formerly France Telecom, took a significant step last year when it released a data set containing 2.5 billion records of calls and text messages exchanged between five million anonymous cellphone users in Ivory Coast. It was done for research purposes and with the cooperation of the Ivory Coast government.

The result was a global contest of ideas, with hundreds of university and corporate scientists participating. The research projects were presented, and winners were announced in May at a conference at M.I.T.; Mr. Kirkpatrick was on a jury selecting the winners.

The winner in the development category was an I.B.M. team of scientists, who analyzed travel patterns, derived from call location data. Minor changes to the bus network, they concluded, could cut the average commute time in Abidjan, the Ivory Coast’s largest city, by 10 percent, making is easier for children to travel to school, for rural residents to seek work in the city and to reduce pollution.

Before submitting the call records, Orange executives sent the data set to three European universities, where computer experts probed the anonymous data and made suggestions to improve security.

“It is a gray zone, and there are risks, but we think it’s really worth it,” said Nicolas de Cordes, vice president for marketing vision at Orange. “We hope this stimulates the desire of other mobile operators to work on best practices for sharing their data.”