Collect 150Gb/day of compressed log files from MSN instant messenger over thirty days (June 2006) and you find on a typical day you get 1 billion conversations, 93 million users logged in with 65 million of those actually engaging in a conversation. This is the basis of research by Jure Leskovec and Eric Horvitz.
The demographics of MSN users shows that, not surprisingly, they're far younger than the general population. The probability of any two users having a conversation with each other isn't largely affected by their repsective ages or genders. One interesting finding is that people are more general to have conversations with a range of people in all ages as they get older. Older people talk the longest, but the trend isn't monotonic. Middle aged people have shorter conversations than older and younger talkers. Young people type faster (more messages per unit time).
There's little gender bias. Any two random nodes are as likely to be talking to the opposite gender as the same gender. Cross gender conversations are longer and there are more messages per conversation.
Generally people talk less as they live further apart. You can see peaks in the data that correspond to continental distances. Most conversations happen between people within 50-100 km of each other.
Only 8% of US population has MSN IM. For Iceland, it's 35%. When you look at the map with users per capita, the world map pops out, but there are some bear regions. Interestingly the western US is very high. No analysis in the talk on why. You can plot axes of conversations between world areas and see heavy connectivity between US and Europe and less between other areas of the world.
Over the course of June 2006, 180 million people exchanged messages and there are 1.3 billion edges in the graph (each user does 6-7 conversations on average). Over 30 billion total conversations. The number of buddies follows a power law distribution.
Analyzing connectivity confirms the 6 degrees of separation theory. The average length of a path from any two users in buddy lists is 6.2. 90% of people can be reached in less than 8 hops. People are very close together.
When you removed nodes (given some order like number of links, total conversations, total duration, etc.) you can see the strength of the network. Not surprisingly, removing people with the most links makes the network fall apart fastest. But removing them according to the average conversation length causes the network to fall apart even more slowly than removing them at random.
So, in conclusion,
- People who communicate are similar (except gender)
- The world is well connected (small world theory)
- The network is very robust. Many random people can be removed and the network is still connected.