June 16, 2009

Social recommendation reaching long-promised potential

Content on the Internet has been exploding exponentially. YouTube, the largest video site on the Internet, adds ten hours of new video every single minute. By some measure, 10M new pages are added on the World Wide Web every day. The world of blogs has also grown tremendously, with Technorati, the popular blog search engine, tracking over 120M blogs worldwide (excluding millions that are not tracked in China, country with the world's largest Internet population).

This makes discovery of new content a huge challenge for both users (how to discover) and publishers (how to get discovered). I'm going to focus only on the user challenge in this post.

As the content on the Internet continues to multiply dramatically, the ability for users to find the content that may exist out there, which they may like and therefore want to consume, poses one of the most significant problems on the Web. Search engines have become a popular tool for users to find what they're looking for, but Search requires users to express their intent first. Since you don't know what you don't know, discovery of new content that you are unaware of, but may like, necessitated the development of recommendation engines, which proactively suggest what content users may like without much, if any, explicit action on their part.

Recommendation engines broadly fall under three categories:

1) Editorial recommendation: This is the most basic form of recommendation method whereby the editor/publisher picks what to highlight on its site amongst all available content - e.g., "featured video of the day", "featured story of the week," etc. This is a one-size-fits-all approach, where the recommendation focuses on publisher's goals and/or the average consumption behavior/demand amongst all visitors to the site, and is not customized for each individual visitor.

2) Customized recommendation based on individual's behavior: This method involves analyzing each user's content consumption pattern and using technology to recommend similar content to that user. This is the most difficult approach technically. Complex algorithms are needed to 1) "read" and categorize the repository of all available content on one hand, 2) track, capture and analyze each user's consumption patterns, and then 3) match those patterns against the categorized repository to make recommendations which are uniquely customized for each user.

Examples of this approach include the Pandora Internet radio that provides customized radio channels to users based on the songs users initially select to hear. Pandora's Music Genome Project utilizes 400 different characteristics to classify all available songs into categories that are leveraged by Pandora to understand a user's music taste and recommend him/her relevant songs. Cinematch, Netflix's movie rental recommendation system, is another such example.

"Reading" the content repository and categorizing them, say, in logical genres, is the most difficult aspect of this approach. Getting the recommendation correct with a high level of accuracy on a consistent basis is extremely hard. Since the recommendation engine works in the background, employing complex algorithms and other behavioral science factors, users expect the "black box" to make correct recommendation every single time. Getting it 90% correct, though impressive, may still not win user's loyalty and trust. If I hate chick flicks, make one such movie recommendation to me, and that's the last time I'd trust the technology behind the recommendation engine.

Netflix, recognizing the complexity of this recommendation approach, has an on-going contest, launched in October 2006, that will award $1M to the person that can improve the accuracy of its already-impressive Cinematch movie recommendations by 10%.

3) Social recommendation: This approach makes content recommendation to a user based on usage patterns of other users instead of his/her own consumption pattern. It's essentially a popularity contest, where popularity amongst a set of users is measured to make recommendations.

Social recommendation comes in two flavors. The set of users used to measure popularity can be either a limited set - user's friends (social graph), members belonging to a particular group or affiliation, etc - or the general set of all users consuming content on the site (e.g., most viewed, highest rated, etc).

Of the two, social recommendation based on preferences of my social graph - people I trust - holds the biggest promise amongst all approaches currently being utilized to make content recommendations. Reasons: 1) the approach requires rather simple technology, 2) users expectations are managed - not all recommendations need to be on the mark because everyone has friends whose tastes are different from their own - and 3) the approach simply works, because it follows a long and well established norm in the real world. By some measures, more than 30% of new content consumed by people is based on recommendation from someone they know and trust. Web 2.0 tools and changing user behavior where more and more people are sharing and contributing more and more stuff on the Internet is making social recommendation based on a user's social graph a mainstream reality.

The phenomenal growth of Twitter (32M users in May vs. 1.6M a year ago), and status update feature (copied from Twitter) used by Facebook's 200M+ users has immensely contributed in the social discovery of new content on the ever-growing World Wide Web. Given the relevancy due to the context provided by my social graph, I check out most of the links, photos, videos, and stories suggested to me on Facebook.

Publishers will very soon look at social recommendation engines as a major source of traffic to their sites, maybe more important than Web search engines, which on average contributes to as much as a third of all traffic to a publisher's site today. The impact on Google's business as a leading Web search engine remains to be seen. But users are not complaining. Why should they, if their trustworthy social graph is at work, 24/7/365, to open the doors to exciting, new content on the Web for them.


Anonymous said...

This is great article, you also have mentioned lots of statistics. As a user I would be interested in the source. I think you list all the sources for each statistical information.

Natalie said...

Interesting article -- where does this stat come from?: "more than 30% of new content consumed by people is based on recommendation from someone they know and trust."

Sab said...

Natalie, the 30% data point is a well known industry figure. Here is one reference by thePlatform: http://theplatform.com/blog/entry/relevance_increases_video_consumption_and_engagement/