Realtime Backup Verification of 300TB of data
Manager and Tech Leads: Annjana R, Piotr G, Jillian Crossley
My team is Graph Storage, which comes under Platform -> Real Time Storage. Almost all the data on Twitter is Graph data (who follows whom, who mutes whom, who likes what, etc.). Any graph relationships within Twitter - follows, blocks, mutes, replies, retweets, favourites. We store the data and serve those reads at 200M QPS. Graph data is used to serve recommendations, search, timelines and other critical features on Twitter. The current stack is more than a decade old, and my team is revamping the write pipeline, reducing human operations, increasing the writing scale by 10x, and improving data consistency. This new service is Flock V1.5. Building a new service, I was involved in the multi-DC setup and the Backup Verification service. The new write pipeline involves leveraging backups in real-time while spinning up new DB replicas. My work is to create a service that verifies these backups and alerts for invalid backups. Flock v1.5 relies on backups to seed Storage Nodes that serve the data to customers. While we use Comparator to detect live mismatches between v1 and v1.5, there should be a mechanism that proactively verifies whether the backups have the correct data - before they can be used in production. We also want a mechanism that doesn't rely on v1 data for comparison, for we are building v1.5 to replace v1.