post

People as Sensors – mining social media for meaningful information

I gave a talk at our recent ThingMonk, Internet of Things conference in London which I titled People as Sensors – mining Social Media for Good. The talk was principally about the many use cases where the firehose that is social media can now be analysed in realtime, and real, meaningful information can be extracted from it.

Feedback on the talk was extremely positive, so I said I’d post the video here.

Here’s the transcript of my talk:

Thanks very much! People As Sensors, it’s the idea of mining social media for useful information.

Obviously we have heard about the difference between data and information this morning, so we are just going to power through a little bit about that.

This slide deck is already up on SlideShare, so anyone wants to have a look at it, it’s there. I have my notes published, my notes for the slides published with the slides on SlideShare, so if you want to download it, you will get the notes there as well.

So mobile data; every one of us has got one of these little devices, and it’s publishing, not just the information that we publish ourselves, but also a lot of other information as well.

And this was brought home to us in 2009 very clearly when a german politician called Malte Spitz sued Deutsche Telecom because of the data retention laws in Germany that had just been legislated and he asked them for his data, he wanted the six months of data that they had on retention for him.

Can I get a show of hands here for anyone who has not heard the story already? Okay, a good few people haven’t.

So I will just break out of the presentation for a second, because — if I can; apparently it doesn’t want to. Okay, I will just — no, it doesn’t want to. What he did was he published the information in ZEIT ONLINE, and the link is at the bottom there, and all these screens that I have, all these slides that I have, they have a link at the bottom, it’s a clickable link; it’s a clickable link in the PDF on the SlideShare as well, so you can go and you can view this data.

There is a Play bottom in the bottom left there. You can hit Play on that button on the site and you can go through the six months of his life and it plays where he goes.

So when he gets on the train, the little dot there moves along the map, so you can see where he was for almost all the time of that six months. It lights up a little mobile phone icon when he is on the phone, when he is making a phone call or sending texts.

You can see where he sleeps, you can see when he sleeps, you can see when he gets up, it’s all there, and it’s all beautifully visualized. And when you see something as stark as that you suddenly realize, Jesus, we are really publishing a lot of information, aren’t we?

And it’s not just that kind of information; we are publishing a load of stuff in social medial as well. So you just take a quick look at some of the numbers in social media and you realize how big it is. Facebook have announced now that they have got 1.2 billion users and the latest numbers that they published in August, they talk about 4.5 billion likes per month, 4.75 billion items published — oh no, that’s per day. 4.5 billion likes per day, 4.75 billion items published per day, and I have forgotten how many billion photographs. It’s just insane.

Twitter, this is a typical diurnal graph of Twitter tweets per second. So you are starting at kind of midnight on the left, you are going across through the morning. It peaks at around — okay, over there it peaks at around 8,000, a little over 8,000, dips again mid-afternoon, picks up, and then drops off at nighttime. That’s daily.

The average number of tweets they say it’s around 6,000 tweets per second, and this is tweets per day over a 365 day period. You can see 400,000 going up to around 600,000 tweets per day now.

And Twitter are actually selling this data. They announced in their filing for the IPO that they have made about 47.5 million, which is quite modest I would have thought, selling direct access to their data. So people who buy their data from them house their servers in the same complex as the Twitter servers and get direct access to all the tweets that have been published instantaneously so they can mine it there and then.

So it’s not just Twitter, it’s not just Facebook, you have got Google+ talking about 500 million users, 300 million in the stream.

Sina Weibo; we are talking about 500 million users and growing. And you have got other networks as well; Waze, which was recently bought by Google, is a GPS application, which is great, but it’s a community one as well. So you go in and you join it and you publish where you are, you plot routes.

If there are accidents on route, or if there is police checkpoints on route, or speed cameras, or hazards, you can click to publish those as well. It’s a very simple interface, so that it doesn’t interfere with your driving, or it’s minimal interference with your driving. And I will come back to why that’s interesting in a few minutes.

And I am rushing through this because I have got 50 something slides and James wants me to do it in 15 minutes. So here are some of the use cases from all that data, and there are some nice ones out there. A lot of you are probably familiar with this one; it’s the UK snow meteorology example. It was one that was put up a couple of years back and it has been used every year every time there is snow in the UK.

There is a little dash of snow over London there in this screenshot, because there wasn’t one when I went to the site, so I tweeted about it, and got a bit of snow to fall on EC 2 there.

Utility companies are starting to use social media increasingly for outage management. So GE have got this Grid Insight Application, and what they do is if a utility company has an outage in their area, they can look for mentions of the outage on social media channels. And in this picture here you see someone has tweeted a photograph of a tree, which is after taking down an electricity line, so not they have a good idea of what the issue is.

This is in real time. So instead of having to send out an investigatory truck roll, they just send out the vegetation truck roll, and that cuts down massively on the time to get the outage fixed and get people back live again.

And this is another one, you can see here there is a fire in the substation, and it’s right beside a road, and you can see a cluster of Twitter — maybe not, you would have to look closely, but those are the blue dots there, those are little clusters of tweets and Facebook posts, and you have got a Facebook video posted of the fire in the substation.

Other things; the United Nations Development Project are analyzing in real time social media. This is the project they ran to analyze social media, because they want to know when there are likely risks to their people on the ground.

This is one they did in Georgia around the time of the upset between Georgia and South Ossetia in 2008-2009. So they looked at the mentions there and they graphed it versus when the trouble actually happened. So now they are building a model so they can call their people and say, okay, look, it has gotten to the point where it’s getting risky for you guys to be in there, we need to get you out now.

Automotive; the automotive industry are starting to use this. There was an application developed by the Pamplin College in University of Virginia Tech where they started mining social media for mentions of particular, what they call, smoke terms. These were terms which are important for the automobile industry and so they can identify quickly when faults come in cars.

This is a much faster way of reporting faults back to the manufacturer rather than going back up to the dealer network, which can take weeks and months. If they are getting it directly from the consumers, they get it faster, they do the recall faster, and you have got safety issues there, you are saving people’s lives. Plus, you are also having to recall fewer cars because few of them have been sold by the time the issue comes to life.

In the finance industry; this is a paper that was published. It was published in, I think it was 2009, and it said that Twitter can predict the stock market with 87% accuracy, and again, the link is at the bottom, you can click through and read the paper.

So on the back of that this UK crowd called Derwent Capital Management licensed the technology and set up a fund, and it has now become Cayman Atlantic, and they are doing quite well apparently. And there are several other companies who are doing similar now as well, using Twitter to predict the stock market.

In law enforcement social media is huge, it’s absolutely huge. A lot of the police forces now are actively mining Facebook and Twitter for different things. Like some of them are doing it for gang structures, using people’s social graph to determine gang structures. They also do it for alibis. All my tweets are geo-stamped, or almost all, I turned it off this morning because I was running out of battery, but almost all my tweets are geo-stamped. So that’s a nice alibi for me if I am not doing anything wrong.

But similarly, it’s a way for authorities to know where you were if there is an issue that you might be involved in, or not. So that’s one.

They also use it for interacting with people. They set up fake profiles and interact with suspects as well and try and get them to admit and all that kind of stuff.

I have a few extra slides hidden here, because James asked me to crunch this down. If you do download it, you will get all the sides there, and they are some very interesting ones. If you have an interest in the law enforcement angle, there are some great case studies that you can look into there.

Obviously the law enforcement one is one you have got to be very careful of, because you have issues there around the whole Minority Report and Precrime, and it’s more of a dodgy one than many of the other ones I have been talking about.

Smart cities; we heard people talking about smart cities this morning. This is the City of Boston and they have got their citizens connect to application, and that allows people with a smartphone, and it’s agnostic; it can be Android, iOS, I am not sure if they do BlackBerry, but Android and iOS are covered anyway. You can report potholes, street lights, graffiti, sidewalk patches, whatever those are, and damage signs and others.

You get reports back when you report something to the City of Boston, and a couple of other cities are rolling these out as well, but in this particular one, when you report an issue to the City of Boston, you get a communication back from the city telling you who is assigned to fix that particular item you have reported. And then that person contacts you to say when they have done it, and often they will photograph it and you get a photograph of the item you have reported having been fixed by the named person who has done it. So very smart.

Healthcare; healthcare is a big one as well. You are probably familiar with Google Trends and Google Flu Trends, so Google Flu Trends, they take the search data to predict when there are likely flu outbreaks.

Well, they went a step further and they funded this paper, which was published in the American Journal of Tropical Medicine and Hygiene, and what they did was they looked at the data, the social media data for mentions of cholera and cholera symptoms in Haiti in 2010 after the earthquake there. And they found that the mentions of cholera and cholera symptoms on social media tracked exactly with the governmental data, so it was an exact match. The only difference being it was two weeks ahead of the government data.

So you can imagine two weeks on a cholera outbreak, the number of lives you could save, so really important stuff.

There is also this fantastic application which was called Asthmapolis and is now called Propeller Health. And what that is, it’s a little device that sits on top of an inhaler, so when you give a puff on your inhaler, it reports it with GPS and timestamp.

So when you go to your doctor, your doctor then can see a map of where and when you puffed on your inhaler, and you get to see it as well. So you start to see patterns in when you used your inhaler.

So you might say every time I visit my friend’s house, I use the inhaler more. They are a smoker. Okay, so now I need to be aware.

Or every time I am on my way to work, when I pass this particular place I use the inhaler, maybe I should take a different route.

But it goes a step beyond that as well. They have gotten the City of Louisville, in Kentucky to roll this out to all their asthma people. And they have a particular issue with pollution in Louisville, because there is a 13 year lifespan difference in people’s expected lifespan depending on where they live in Louisville.

So you live in one place, you live 13 years less than your neighbors. So they are using this application to try and help them identify and to try and help them clean up the City of Louisville, so a really interesting application there.

In CRM, Customer Relationship Management, it was T-Mobile in the U.S. who went through the millions of customer records they had, they went through their billing records, they went through mentions in social media. They had, I think it was 33 million customers, and they were losing customers all over the place.

When they started analyzing the social media mentions, matched it up with the billing records, etcetera, and they started taking preventative action for people they identified as likely to defect, they halved their defections in three months.

So they cut down on their customer defections, in three months they cut them down by 50%. Amazing!

Brand management; a couple of years ago Nestlé got Greenpeace. They were sourcing palm oil for making their confectionary from unsustainable sources, from — Sinar Mas was the name of the company and they were deforesting Indonesia to make the palm oil.

So Greenpeace put up a very effective viral video campaign to highlight this, and this actually had an impact on Nestlé’s stock price, short-term, small impact, but it had an impact on their stock price, as well as the reputational issues.

Nestlé put in place a Digital Acceleration Team who monitor very closely now mentions of Nestlé online and as a result of that this year, for the first time ever, Nestlé are in the top ten companies in the world in the Reputation Institute’s Repute Track Metric. So they are now considered globally as one of the more reputable industries, at least partly as a result of this.

In transportation; I mentioned Waze earlier. So Google Maps have now started to incorporate data from Waze. So right here you can see a screenshot of someone’s Google Maps and it’s highlighting that there was an accident reported on this particular road via Waze, via the Waze App. So that’s really impressive, you are on your Google Maps and now you are notified ahead of time that there has been an accident up the road, you have a chance to reroute.

Also in transportation, this is a lovely little example; Orange in the Ivory Coast, they took, I think it was — I have it noted here somewhere, 5 million Orange users, 2.5 billion anonymized records from their data.

Anonymized released it and said, okay, let’s see what you can do with this anonymized data from our customers. There is a competition. The best use was where they remapped the country’s public transport because they could see looking at people’s mobile phone records where people were going during the day.

So they said, okay, people are going from here to here, but our bus route goes from here to here, to here, to here, let’s redraw the bus route this way where people actually want to go. Simple! Beautiful application of data, the data that we all published all the time, to make people’s lives easier. They reckon they saved the first 10% of people’s commute times.

Looking ahead, and I am wrapping up here James, wherever he is, you have got things like Google Glass, which will now be publishing people’s data as well.

You have got this thing called Instabeat, and what it is, it’s like Google Glass for swimmers. So it has got a little display inside people’s goggles as they are swimming, so they can see how fast their heart rate is; they can see several of the kind of things that you want when you are a competitive swimmer and you are trying to up your game.

And you have got all the usual stuff that we are all aware of, the Jawbones and all these other things that people are using to track their fitness.

More and more we are being quantified, we are generating more and more data, and it’s going to be really interesting to see the applications that come from this data.

So the conclusion from all of this very quickly, data and the data sources are increasing exponentially, let’s go hack that data for good.

Thank you!