Jay Kreps, Principal Staff Engineer, shared lessons he learned during his past seven years at LinkedIn.
Scaling the Site: Started with standard Oracle DB and separate systems for search and social graph. Moved to key-value store (Voldemort) and Hadoop, which augmented the Oracle DB’s. Now added kafka on top of Hadoop and separated newsfeed and espresso.
- Few simple cheap primitives: Like OpenGL which has triangles as basic primitive, think about high-performance basic primitives like key-value store. Picked on an anti-pattern from alums with centralized user table.
- Ops first: Teams with most operational focus spent the least amount of time on operations. Keep production designs simple and do them well—maybe at the expense of newer, fancier designs.
- Do hard things later: Use asynchronous and offline/batch when possible instead of request-response patterns.
Scaling the code base: How do you scale as you grow the size of the Engineering team? Started with a monolithic application and decomposed the application into services. Originally was not a very disciplined decomposition.
- Services (may) scale development. Bad services are worse than no services. 300-400 services without appropriate tools to diagnose and debug services. Dependencies not well understood. Treat the service as a product, with a team that owns the service and provides appropriate documentation and artifacts. Service layer evolution: Moved from Spring-RPC to REST + JSON.
- The service contract is binary: Instead of just exposing existing code as a service, develop the service contract first and then write the appropriate code.
- Isolation vs Utilization: Develop services with the right granularity. How many services should you have? He recommends having a team of 5 develop each service as a rule-of-thumb to ensure that services have the right granularity. He recommends against a model where a single engineer is responsible for several services.
Scaling software engineering: Cutting up the codebase.
- Build your process: They moved from using wiki pages that track lifecycle to tools and services that manage the life cycle: from review, to check-in, to build, to rollout.
- Governance: Need a balance between central planning (communist) and free-for-all every team does their own thing (capitalist).
- Treat code as property. Someone is responsible for the code— functionality, hygiene. That eliminated a lot of bureaucracy.
- But you need effective government. Don’t duplicate integration layers. Standardize where you can: monitoring.
Great stuff! I always enjoy learning about best practices and lessons from other companies. It's a great way to compare notes and also accelerate your own Engineering practices and technology.