Today’s social web platforms, such as Facebook, Twitter, Google+, and LinkedIn, increasingly have to process large volumes of user-generated data on the fly. As the role of such platforms shifts from being portals for largely historic data towards providing platforms for real-time data analytics, we observe that their architectures incrementally move from storage-centric designs, based on distributed data management technologies, towards event-based models exploiting queueing and stream processing systems.
We believe that it is time to rethink fundamentally the software architecture for social web platforms and base them on a content-based communication model, that is explicitly designed to disseminate and partition incoming request flows on a cluster of servers. A content-based publish/subscribe system thus acts as a scalable and elastic, highly responsive data distribution backbone. By focusing on fresh data, such an architecture can optimize the routing of data to match the topology of the data center, dynamically adapt data flows to alleviate hot spots, and elastically scale to more servers when required by computationally expensive on-the-fly data analytics applications.