Sandeep: Hi. My name is Sandeep, a developer advocate on the Google Cloud platform. Welcome to the Google Data Center at the Dalles, Oregon. Take a look around. Before we go inside, we need to make sure that we have the appropriate security clearance. Most Google employees can’t even get in here. So let’s go on a special behind-the-scenes tour. I’m here with Noah from the Site Reliability Engineering Team. Noah, can you tell us a little bit more about the SRE role at Google? Noah: Yeah, SREs write and maintain the software systems designed to keep our services running. Sandeep: So what happens if one of these systems goes down? Noah: We’ve designed our systems from the ground up to be able to handle any unexpected failures that might occur.
We have highly redundant power, networking, and serving domains so that even if we do lose an entire cluster, we’re able to re-direct those workloads and live migrate data in order to minimize any impact. In addition, we have a team of SREs on call 24/7 that can tackle any problems that might arise. Sandeep: Thanks, Noah. Now we’ve learned more about the systems that manage our fleet at Google, let’s take a deeper look at the data center infrastructure itself. Before we can continue further, we need to go through the biometric iris scan and circle lock. These only allow one person in at a time and require dual authentication to continue further. I’ll see you on the other side. computer voice: Please situate your eyes to begin the procedure. Please come a little closer to the camera. Sandeep: Welcome to the data center floor. As you can tell, we have a lot of servers, and this is a single cluster in a single floor in a single building.
Managing all of these servers on a global scale is quite a challenge. To utilize our fleet, we use tools such as Borg, Colossus, and Spanner. You may be familiar with similar tools, such as Kubernetes, Google Cloud storage, and BigQuery. These tools allow Google engineers and Cloud customers to more easily manage infrastructure, allowing everyone to build innovative and scalable applications. Here at Google, a lot of our infrastructure is custom-made. This gives us the flexibility and performance we need to run all of our services at scale. Oh, hey, it’s Virginia, one of our network engineers. Virginia: Hey, Sandeep. Sandeep: Virginia, what are you working on today? Virginia: Today I’m working with Hardware Ops to expand this data center network to deploy additional machines in this building. Our fleet is constantly growing to support new capacity for Google products and our Cloud customers.
Sandeep: That sounds like a lot of work, to be constantly adding capacity around the globe. Virginia: Well, we designed our network so that this kind of capacity growth isn’t very hard. Jupiter, our current data center and network technology, is a hierarchical design using software-defined networking principles. So just like with our servers, we abstracted away the specific details of our network and can manage them like they’re software programs and data. Sandeep: Abstracting seems to be a common theme here at Google. I’ve also noticed there’s a lot of fiber running in our data centers. Virginia: That’s right.
A single building can support 75,000 machines, and carry over one petabit per second of bandwidth, which is actually more than on the entire Internet. Sandeep: Wow. Virginia: This allows us to reliably access storage and compute resources with low latency and high throughput. Sandeep: So how is this data center connected to all our other data centers around the globe? Virginia: Google runs B4, our own private, highly efficient backbone network, which is actually growing faster than our Internet-facing network.
It connects all our data centers together and allows services to efficiently access resources in any location. Sandeep: Nice. I finally know what all this Google fiber is really used for. Thanks, Virginia. Virginia: No problem. Sandeep: So now you’ve seen all the compute and networking horsepower required to run your workloads in the Cloud, let’s take a look at where your data is safely and securely stored. Let’s go. Whether you’re querying terabytes of data on BigQuery or storing petabytes in Google Cloud Storage, all of your data needs to be stored on a physical device. Our data center infrastructure allows us to access our storage quickly and securely. At our scale, we need to handle hard drive and SSD failure on a daily basis.
While your data is replicated and safe, we need to destroy or recycle used hard drives so no one can access your data. From the time a disc is removed from the server to the time it’s decommissioned, we maintain a very strict chain of custody. The discs are completely wiped and then destroyed in a huge shredder. Let’s go shred some hard drives. We’ve looked at a lot of the hardware that runs in our data centers, but it doesn’t end there. We need to cool and power our infrastructure in an environmentally sustainable and reliable way. Let’s take a look at how we cool our servers. Welcome to the mechanical equipment room. Looks pretty cool, doesn’t it? Oh, hey, it’s Brian, one of our data center facilities technicians! Brian: Hey, Sandeep. Sandeep: Hey, Brian. Brian, can you tell us a little bit more about this room? Brian: Sure. This is a cooling plant for one of the data centers that we have on site. So a lot of heat is generated on the server floor, and it all has to be removed, and that starts right here in the cooling plant.
So it’s basically two loops. We have the condenser water loop and we have the process water loop. The process water loop are these blue and red pipes over here. So they take the heat off the server floor and they transfer it to these heat exchangers here. The condenser water loop are these green and yellow pipes here. They take the cold water from the basin underneath us, they transfer it to these heat exchangers here, and they send it up to the cooling towers up on the roof.
Sandeep: I notice our pipes are Google colors. It’s pretty cool. So how efficient is our data center? Brian: Well, Google has some of the most efficient data centers in the world. In fact, when we started reporting our power usage effectiveness or P.U.E., in 2008, most data centers were around 100% overhead. At that point in time, Google was 20% overhead, but since then, we’ve reduced it to just 12%, and that even includes our cafeterias. Sandeep: Whoa! That is so low! Also what’s this big green machine for? Brian: Oh, well, this is a chiller. We very rarely use them, but it helps keep the process water temperature in the desired temperature range when it gets really hot outside, basically helping the cooling tower do its job, and some of our newer data centers, they have no chillers at all. Sandeep: I love how our new data centers are even more efficient. By the way, can we go up and take a look at a cooling tower? Brian: Sure. Let’s go. Sandeep: Wow, what a view up here! Brian: So, Sandeep, this is a cooling tower.
It uses evaporation to rapidly cool the water from the condenser loop and sends it back down to the basin. You could say we’re making actual clouds with the Cloud. Sandeep: Clouds making actual clouds–welcome to Google! So, Brian, how do we power the Cloud? Brian: Well, that all starts at Google’s power substation. Let’s go take a look. So this is the Google-owned power substation. This is where the high voltage power enters the site. It’s reduced and then sent to multiple power distribution centers such as this one right here. Sandeep: What happens if a power distribution center loses power? Brian: If it loses power, we have multiple generator and utility backup sources available to maintain power to those servers. Sandeep: And where does all the power come from? Brian: It actually comes from multiple hydroelectric power plants that are nearby. Sandeep: I love how Google uses reliable green energy whenever possible. Brian: We are 100% carbon neutral actually. Sandeep: That’s pretty cool You know, it seems like Google builds reliability from the ground up, from the power and cooling all the way to the software systems that manage our fleet.
Thanks for showing me around, Brian. Brian: No problem. Have a great day. Sandeep: Thank you for joining me on this special behind-the-scenes tour. Please check out cloud.google.com to learn how you can build what’s next. .