Prabhakar Raghavan is giving the morning keynote. He's the head of Yahoo! Research. The title of the talk was "What sciences will Web N.0 take?" But, more accurately, I'd call it "Science for Engaging and Monetizing Audience."
Yahoo! takes in editorial, free (including blogs, twitter, pictures, etc.) and commercial content "content." The audience "consumes the content" but also enriches the content. Finally the audience transacts (commerce) with the content. Yahoo! isn't the only one in this business. Google, AOL, MSN, and even NewsCorp are in the business of matching content to audience (see this Bear Sterns presentation for a detailed look at that (note that Raghavan didn't reference this).
Ragahavan's premise: people don't want to search. People want to get tasks done. Search engines spend very little time servicing you compared to the time you spend doing queries, evaluating results, and so on. This is backwards. The machines should be working harder than we are.
Search engines need to extract and exploit information in the query. But extracting semantic structure isn't easy. It' easy to build a demo that shows the right hotels when you search on "hotel near leicester square," for example, but to do it in the general case, for query topics you don't know about a priori. For example, there is a town in Washington state named Cheney. So, when someone searches on "cheney pizza washington" do they want to order pizza in WA state or know about the vice president and pizza?
The grand challenge is to devise general platforms for semantic searches
There is no scale-based differentiation around web text content because the cost of storage is dropping. The price of storing everything everyone on the Internet will produce in 5 years is about the same as employing 10 people. Small companies can afford to store content at Web scale.
User-generated metadata is growing. Anchortext and tags are growing at the rate of 100 Mb/day. Pageviews are around 50-100 Gb/day. Reviews and ratings are small. All of these, are important, but only anchors are central to how people work on the Web.
START metadata:
- Star: I like this
- Tag: creating tags on pictures, etc.
- Access: you view a page (in a way I can see)
- Routing: forwarding thigns to friends
- Text: write a review, blog article, etc.
These are in order of increasing engagement. Flickr is an example of tagging providing real value. The effort of millions of people is used to give better results. The principle isn't too different from using anchor text to determine the relevant keywords for what's linked to.
Challenges: How do we use tags better? How do we cope with Spam? What the ratings and reputation system? More important, how can this be used better. He mentions the ESP Game. I heard about this a few weeks ago at Jeanette Wing's lecture. The game uses game result to contribute tags to image search.
He mentions Yahoo! Answers. People are intrinsically motivated to help other people, show off, and contribute. It helps if you have a game attached. For example Yahoo! Answers has a "leader board" that allows people to show off their acumen. Part of the design of a community system is determining what assignment of incentives leads to good user behavior? Whom do you trust and why?
Incentivized chaos retains and enriches participation. What is the science behind online community? Not just about human-computer interaction, but people-to-people interaction mediated by the computer. This can be an intrinsicly data-driven society. We don't have to get 200 people to fill out surveys. We can watch what 200 million people do and study that.
Some questions:
- Why do people lurk or participate?
- Why do people create new online personae?
- Why are YouTube, Flickr, and MySpace successful and others not?
- What new genres of audience experience are emerging and what can we provoke?
Some dimensions:
- Duration (short/long)
- Ephermerality (forgotten/remembered)
- Social context (alone/with others)
How do we measure audience engagement? Page views, hours? Who cares besides advertisers? Investors certainly (but mostly because advertisers do). He mentions the redesign of Yahoo! Finance pages to not require refreshes to see updates stock prices. The audience was happy and still engaged, but advertisers were NOT happy because they couldn't see that the audience was engaged and their ads were being seen.
Raghavan shows a formula that takes into account not only repetition and time spent, but also a measure of influence (log(user_neighborhood)). The grand challenges: devise standardized, defensible metrics of online engagement. Use these to predictively devise online experiences. Not a substitute for creativity, but provides a scientific basis that informs design.
Ragahavan gave an in-depth discussion of sponsored search as a combination of information retrieval and microeconomics. He calls this "computational microeconomics." This includes reputation and incentive mechanisms, and marketplace matching (references the stable marriage problem). People talk about "network effects" but what does this mean, from a value standpoint? Are 500 million users 500 times as valuable as a million users? Or 5000 times more valuable? What of Metcalfe's Law?
Monetization and economics should be an afterthought in design. They should be intrinsic.
This was a good talk and it lead to several new lines of thought for me. In particular, I'm reminded of Britt Blaser's theory of "stepping stones" in bringing people into more and more interaction with a site and, more importantly, with each other. Britt's OrgWare (disclosure: I'm an advisor) is a systematic attempt to build infrastructure that supports and encourages audience engagement.