Nov 5, 2014

Jay Kreps Shares Lessons He Learned at LinkedIn


Jay Kreps, Principal Staff Engineer, shared lessons he learned during his past seven years at LinkedIn.

Scaling the Site: Started with standard Oracle DB and separate systems for search and social graph. Moved to key-value store (Voldemort) and Hadoop, which augmented the Oracle DB’s. Now added kafka on top of Hadoop and separated newsfeed and espresso.

  • Few simple cheap primitives: Like OpenGL which has triangles as basic primitive, think about high-performance basic primitives like key-value store. Picked on an anti-pattern from alums with centralized user table. 
  • Ops first: Teams with most operational focus spent the least amount of time on operations. Keep production designs simple and do them well—maybe at the expense of newer, fancier designs.
  • Do hard things later: Use asynchronous and offline/batch when possible instead of request-response patterns. 

Scaling the code base: How do you scale as you grow the size of the Engineering team? Started with a monolithic application and decomposed the application into services. Originally was not a very disciplined decomposition. 
  • Services (may) scale development. Bad services are worse than no services. 300-400 services without appropriate tools to diagnose and debug services. Dependencies not well understood. Treat the service as a product, with a team that owns the service and provides appropriate documentation and artifacts. Service layer evolution: Moved from Spring-RPC to REST + JSON.
  • The service contract is binary: Instead of just exposing existing code as a service, develop the service contract first and then write the appropriate code.
  • Isolation vs Utilization: Develop services with the right granularity. How many services should you have? He recommends having a team of 5 develop each service as a rule-of-thumb to ensure that services have the right granularity. He recommends against a model where a single engineer is responsible for several services. 

Scaling software engineering: Cutting up the codebase.
  • Build your process: They moved from using wiki pages that track lifecycle to tools and services that manage the life cycle: from review, to check-in, to build, to rollout.
  • Governance: Need a balance between central planning (communist) and free-for-all every team does their own thing (capitalist).
  • Treat code as property. Someone is responsible for the code— functionality, hygiene. That eliminated a lot of bureaucracy. 
  • But you need effective government. Don’t duplicate integration layers. Standardize where you can: monitoring.
Great stuff! I always enjoy learning about best practices and lessons from other companies. It's a great way to compare notes and also accelerate your own Engineering practices and technology.



No comments: