Managed Graph Analytical Database
Manager and Tech Leads: Annjana R, Piotr G, Jillian Crossley
Product teams like Recommendations, Search, Timelines, Flygraph (internal knowledge graph) and other critical features depend on the Graph Storage team. Almost all the data on Twitter is Graph data (who follows whom, who mutes whom, who likes what, etc.). At Twitter, data teams had to undergo a lot of infrastructure setup and maintenance to interact with graph data. Most of the analytical and Machine Learning work is performed on the graph datasets as well. Thus, it was natural to provide something to these product teams to relieve this burden in the shape of infrastructure as code. The Graph Storage team has two distributed database services: Flock and Tflock. Flock stores user-user relationships like “follows”, “blocks”, “mutes”, etc. Tflock stores tweet-entity relationships such as likes, retweets, etc. For real-time use cases, these services work but for analytical purposes, querying them involves a lot of work for they don’t provide rich analytical APIs like other software such as Neo4j or Janusgraph. Therefore, I was put as an SME to develop a solution for providing a managed offering to such teams for aforementioned use cases. The problem was fairly open ended: develop an offering that relieves Product teams while leveraging the current infrastructure we have at Twitter. I drove this project from POC to Production Readiness Review. I designed an analytical platform for data teams at Twitter that reduced operation time from days to minutes for data teams.