Project andromeda google

9/13/2023

Since we last blogged about Andromeda, we launched Andromeda 2.2. Andromeda lets us share Jupiter networks for many different uses, including Compute Engine and bandwidth-intensive products like Cloud BigQuery and Cloud Bigtable. To put this in perspective, this is enough capacity for 100,000 servers to exchange information at a rate of 10 Gbps each, or enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.Īndromeda, meanwhile, is a Software Defined Networking (SDN) substrate for our network virtualization platform, acting as the orchestration point for provisioning, configuring, and managing virtual networks and in-network packet processing. For example, Jupiter fabrics can deliver more than 1 Petabit/sec of total bisection bandwidth. Jupiter provides Google with tremendous bandwidth and scale. Our highly-scalable Jupiter network fabric and high-performance, flexible Andromeda virtual network stack are the same technologies that power Google’s internal infrastructure and services. Making it all possible: Jupiter and Andromeda One customer, a large chip manufacturer, leverages 100 Gbps GPU-based VMs to run these massively parallel ML jobs, while another customer uses our 100 Gbps GPU machines to test a massively parallel seismic analysis application. Likewise, the compute instances that run the worker nodes create high throughput requirements for VMs and the fabric serving the VMs. These ML jobs consume substantial network bandwidth due to large model size and frequent data exchanges among workers. The dataset is divided and trained by separate workers, which exchange model parameters with each other.

To train large datasets or models, ML workloads use a distributed ML framework, e.g., TensorFlow.

One use case that is particularly network- and compute-intensive is distributed machine learning (ML). In addition, services built on top of Compute Engine like CloudSQL, Cloud Filestore and some partner solutions can leverage 32 Gbps throughput already. Virtual network appliances (firewalls, load balancers) High-performance computing applications, batch processing, scientific modeling Some key applications and workloads that can leverage these high-throughput VMs are:

These high-throughput VMs are ideal for running compute-intensive workloads that also need a lot of networking bandwidth. Any VM with eight NVIDIA V100 or four T4 GPUs attached will have bandwidth caps raised to 100 Gbps. Meanwhile, 100 Gbps Accelerator VMs are in alpha, soon in beta. There is no additional configuration needed to get that 32 Gbps throughput. This includes n1-ultramem VMs, which provide more compute resources and memory than any other Compute Engine VM instance type. Specifically, for any Skylake or newer VM with at least 16 vCPUs, we raised the egress bandwidth cap to 32 Gbps for same-zone VM-to-VM traffic this capability is now generally available. We also announced VMs with up to 100 Gbps bandwidth on NVIDIA V100 and T4 GPU accelerator platforms-all without raising prices or requiring you to use premium VMs. During Google Cloud Next ‘19, we improved that bandwidth even further by doubling the maximum network egress data rate to 32 Gbps for common VM types. Here at Google Cloud, we’ve always aimed to provide great network bandwidth for Compute Engine VMs, thanks in large part to our custom Jupiter network fabric and Andromeda virtual network stack.

0 Comments

Project andromeda google

Leave a Reply.

Author

Archives

Categories