[go: up one dir, main page]

skip to main content
10.1145/2785956.2787508acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open Access

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network

Published:17 August 2015Publication History

ABSTRACT

We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale networks. Second, much of the general, but complex, decentralized network routing and management protocols supporting arbitrary deployment scenarios were overkill for single-operator, pre-planned datacenter networks. We built a centralized control mechanism based on a global configuration pushed to all datacenter switches. Third, modular hardware design coupled with simple, robust software allowed our design to also support inter-cluster and wide-area networks. Our datacenter networks run at dozens of sites across the planet, scaling in capacity by 100x over ten years to more than 1Pbps of bisection bandwidth.

Skip Supplemental Material Section

Supplemental Material

p183-singh.webm

webm

151 MB

References

  1. Ahn, J. H., Binkert, N., Davis, A., McLaren, M., and Schreiber, R. S. HyperX: topology, routing, and packaging of efficient large-scale networks. In Proc. High Performance Computing Networking, Storage and Analysis (2009), ACM, p. 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Al-Fares, M., Loukissas, A., and Vahdat, A. A scalable, commodity data center network architecture. In ACM SIGCOMM Computer Communication Review (2008), vol. 38, ACM, pp. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alizadeh, M., Greenberg, A., Maltz, D. A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., and Sridharan, M. Data center TCP (DCTCP). ACM SIGCOMM computer communication review 41, 4 (2011), 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Barroso, L. A., Dean, J., and Holzle, U. Web search for a planet: The Google cluster architecture. Micro, Ieee 23, 2 (2003), 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barroso, L. A., and Hölzle, U. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture 4, 1 (2009), 1--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bates, T., Chen, E., and Chandra, R. Bgp route reflection: An alternative to full mesh internal bgp (ibgp). RFC 4456, RFC Editor, April 2006. http://www.rfc-editor.org/rfc/rfc4456.txt.Google ScholarGoogle Scholar
  7. Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., McKelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., et al. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (2011), ACM, pp. 143--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, Y., Griffith, R., Liu, J., Katz, R. H., and Joseph, A. D. Understanding TCP incast throughput collapse in datacenter networks. In Proceedings of the 1st ACM workshop on Research on enterprise networking (2009), ACM, pp. 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Clos, C. A Study of Non-Blocking Switching Networks. Bell System Technical Journal 32, 2 (1953), 406--424.Google ScholarGoogle ScholarCross RefCross Ref
  10. Dean, J., and Ghemawat, S. MapReduce: simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dietz, H. G., and Mattox, T. I. KLAT2's flat neighborhood network. Proceedings of the Extreme Linux track in the 4th Annual Linux Showcase, Atlanta, GA (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Farrington, N., Rubow, E., and Vahdat, A. Data center switch architecture in the age of merchant silicon. In Proc. HOT Interconnects, 2009. 17th IEEE Symposium on (2009), pp. 93--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Feamster, N., Rexford, J., and Zegura, E. The Road to SDN: An Intellectual History of Programmable Networks. ACM Queue 11, 12 (December 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google file system. In ACM SIGOPS Operating Systems Review (2003), vol. 37, ACM, pp. 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Greenberg, A., Hamilton, J. R., Jain, N., Kandula, S., Kim, C., Lahiri, P., Maltz, D. A., Patel, P., and Sengupta, S. VL2: a scalable and flexible data center network. In Proc. ACM SIGCOMM Computer Communication Review (2009), pp. 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., and Lu, S. BCube: A high performance, server-centric network architecture for modular data centers. In Proc. ACM SIGCOMM (2009), pp. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guo, C., Wu, H., Tan, K., Shi, L., Zhang, Y., and Lu, S. Dcell: a scalable and fault-tolerant network structure for data centers. ACM SIGCOMM Computer Communication Review 38, 4 (2008), 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks. In Proc. ACM SIGOPS Operating Systems Review (2007), pp. 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jain, S., Kumar, A., Mandal, S., Ong, J., Poutievski, L., Singh, A., Venkata, S., Wanderer, J., Zhou, J., Zhu, M., Zolla, J., Hölzle, U., Stuart, S., and Vahdat, A. B4: Experience with a globally-deployed software defined WAN. In Proc. ACM SIGCOMM (2013), pp. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Moy, J. OSPF Version 2. STD 54, RFC Editor, April 1998. http://www.rfc-editor.org/rfc/rfc2328.txt.Google ScholarGoogle Scholar
  21. Prakash, P., Dixit, A. A., Hu, Y. C., and Kompella, R. R. The TCP Outcast Problem: Exposing Unfairness in Data Center Networks. In Proc. NSDI (2012), pp. 413--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Singla, A., Hong, C.-Y., Popa, L., and Godfrey, P. B. Jellyfish: Networking Data Centers Randomly. In NSDI (2012), vol. 12, pp. 17--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thorup, M. OSPF Areas Considered Harmful. IETF Internet Draft 00, individual, April 2003. http://tools.ietf.org/html/draft-thorup-ospf-harmful-00.Google ScholarGoogle Scholar
  24. Vahdat, A., Al-Fares, M., Farrington, N., Mysore, R. N., Porter, G., and Radhakrishnan, S. Scale-Out Networking in the Data Center. IEEE MICRO, 4 (August 2010), 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
      August 2015
      684 pages
      ISBN:9781450335423
      DOI:10.1145/2785956

      Copyright © 2015 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 August 2015

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGCOMM '15 Paper Acceptance Rate40of242submissions,17%Overall Acceptance Rate554of3,547submissions,16%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader