Activity Streams

The web is rapidly changing from being just a source of information consumption to a place where people produce, consume, share and interact. The social web is engaging people away from a web of clicks to a web of online activities and real-world interactions performed in a conversational style.

With services such as Facebook and Twitter a new world of online activities became observable, promising richer signals than the web of pages/links and queries/clicks. Google has shown that any observable web activity by online users can be turned in economic value through targeted online advertising.

Davai (Давай =>Lets Go!!) a startup was founded to explore the commercial usage of network effects on social media under the assumption that the following signals are statistically significant:

  • Influence – with the number of a user’s friends performing an action, the likelihood that the user also performs the action is increased,
  • Repetition – a users who performed an action will have a higher probability to perform the action again than a user who has never performed the action, and
  • Correlation – two users who are befriended have a higher probability to perform an action at the same time than users randomly chosen from the network.

analytic flow

The basic model we developed associates users that share connections with objects – people, places, and things – in a similar way as Facebook’s Open Graph does this today. Connections through objects allow information to flow form user to user; hence a channel for information flow is created.

With information flow influence spreads through the network (graph) and can be quantified by scoring graph elements based on topological properties. Scores represent sub-graphs as much as their individual nodes and imply their similarity.

Connectivity breeds similarity and targeting the social network neighbors is an efficient strategy of audience identification for advertisers compared say to targeting a set of random nodes.

The following presentation details the approach: Marketing to networked customers

Our infrastructure made heavy use of the open source stack and Amazon EC. Hadoop, Hbase, Mahout, Lucene/Solr where integrated into a 24/7 pipeline with a Flash/Browser-based workbench. The depth of the open source stack and the dedication of contributors to components are amazing. Lucene and Weka are my favorite communities.