Google's tips for reducing AI training emissions
David Patterson wrote a blog post for Google’s The Keyword blog on how the company is minimizing AI’s carbon footprint, mostly covering his new paper on the topic: Carbon Emissions and Large Neural Network Training (Patterson et al.
2021).
The paper went live on arXiv just half a week ago, but coming in at 22 data-dense pages, I think it’ll become a key piece of literature for sustainable AI.
My two main takeaways from the paper were: (1) retroactively estimating AI training emissions is difficult, so researchers should measure it during model development; and (2) where, when and on what hardware models are trained can make an enormous difference in emissions.
Emissions estimates
Patterson et al.
calculate the carbon footprint of several recent gargantuan models (T5, Meena, GPT-3, etc.) more precisely than previous work, which they found to be off by up to two orders of magnitude in some cases: the previous estimate for The Evolved Transformer‘s Neural Architecture Search (NAS), for example, was 88 times too high (see Appendix D).
This shows that, without knowing the exact datacenter, hardware, search algorithm choices, etc., it’s pretty much impossible to accurately estimate how much CO2 was emitted while training a model.
Because of this, one of the authors’ recommendations is for the machine learning community to include CO2 emissions estimates as a standard metric in papers: a measurement by the people training the models, who have much better access to all relevant information (see e.g.
Table 4 in the paper and the Google Cloud page on their different datacenters’ carbon intensities), will always be more accurate than a retroactive estimate by another researcher.
If conferences and journals start requiring emissions metrics in paper submissions and include them in acceptance criteria, it’ll encourage individual researchers and AI labs to take steps to reduce their emissions.
(As an aside, this is an interesting comparison that makes “tons of CO2-equivalent greenhouse gas emissions” a bit easier to think about: a whole passenger jet round trip flight between San Francisco and New York emits about 180 tons of CO2e; relative to that, “T5 training emissions are ~26%, Meena is 53%, Gshard-600B is ~2%, Switch Transformer is 32%, and GPT-3 is ~305% of such a round trip.” Puts it all in perspective quite well.)
Emissions reductions
Patterson et al.
also have some specific recommendations for reducing the CO2 emissions caused by training AI models:
- Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters.
- Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization.
- Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems.
Adding all these up, “remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X.” Two to three orders of magnitude! Since this research happened inside Google, its teams are now already optimizing where and when large models are trained to take advantage of these ideas.
Another cool aspect of the paper is that each of the four specific focus points for reducing emissions (improvements in algorithms, processors, datacenters, and the grid’s energy mix) is accompanied by a business rationale for implementing it as a cloud provider — I’m guessing the researchers also used some of these arguments to push for change internally at Google.
(Maybe, as a next step, they can also look into ramping model training based on signals from the intraday electricity market?)
It’s great to see a paper on AI sustainability with so much measured data and actionable advice.
I haven’t seen it going around much yet on Twitter, but I hope it’s read widely — here’s the PDF link again; the fallacy debunks in section 4.5 (page 12) are an interesting bit I haven’t summarized above, so give it a click!
I also hope that the paper’s recommendations are implemented: even just the relatively low-effort change of shifting our training workloads to different datacenters can already make a big difference.
And of course it’ll be interesting to see if there are any specific critiques on the paper’s emissions measurement methodology, since of course this is all still just a preprint and half the paper’s authors work at Google.
The climate opportunity of gargantuan AI models
Climate change and the energy transition
Climate change is our generation’s biggest challenge, and the transitions needed to reduce emissions and prevent it from becoming catastrophic will affect almost every part of society in the coming decades.
On their excellent Our World in Data page on CO2 and Greenhouse Gas Emissions, Hannah Ritchie and Max Roser write:
To make progress in reducing greenhouse gas emissions, there are two fundamental areas we need to focus on: energy (this encapsulates electricity, heat, transport, and industrial activities) and food and agriculture (which includes agriculture and land use change, since agriculture dominates global land use).
The biggest of these is energy: it’s responsible for almost three quarters of global greenhouse gas emissions.
Decarbonizing energy involves two parallel transitions: (1) electrifying sectors powered by fossil fuels, and (2) shifting our electricity generation to low-emissions sources like solar, wind, hydro and nuclear.
Take road transport for example: cars need to be powered by electricity, and that electricity needs to be green.
Replacing all internal combustion engine cars with electric ones will take at least two decades, as will replacing all gas- and coal-fired electricity plants with low-carbon power generation — so if we want to be climate-neutral by 2050, neither transition can wait for the other.
Road transport is responsible for about 12% of overall emissions, but the same dual transition applies for other energy-intensive sectors like iron and steel production (7%) or lighting and heating in buildings (17.5%).
That’s the energy transition in a nutshell: we need to move energy demand towards electricity, and the electricity supply toward low-carbon sources.
But there’s one less-discussed thing linking these two: how do we move this low-carbon electricity from the supply to the demand?
That’s where electrical grids — the focus of this post — come in.
A quick primer on electrical grids
Let’s start with a short, hopefully not too technical, primer on electrical grids — they play a huge role in all our lives, but I personally didn’t really know how they worked until I started working at a renewables optimization software company in January.
On the most basic level, grids are very large systems — all of Europe is a single grid, and North America is divided into an eastern and a western grid (plus the smaller Texas and Quebec grids) — consisting of power lines at different voltages (high for long-distance transmission, low for local distribution), electrical substations which step voltage up or down, and electricity producers and consumers.
As opposed to the direct current (DC) in, for example, a battery-LED system — where electrons flow from one pole of the battery through the LED to the other pole — electricity in grids is in the form of alternating current (AC) — where electrons oscillate back and forth on the power line tens of times a second: at 50 Hz (times a second) in Europe and 60 Hz in North America.
One of the main jobs of a grid operator is to ensure that this frequency remains constant, because lots of stuff breaks if it is too far from nominal, which can cause grid-wide blackouts in the worst case.
An oversupply of electricity (more generation than consumption) causes the frequency to increase, while an undersupply causes it to decrease; so the operator has to make sure that generation and consumption are equal at all times.
Electricity markets and renewable generation
Grid operators keep electricity generation and consumption equal by creating time-based markets for electricity.
For this discussion, the day-ahead and intraday markets are the most important.
On the day-ahead market, producers and consumers place bids to sell and buy the amount of electricity they want to produce or consume during each hour of the following day.
At the end of the day, the operator settles these bids in an optimal way that ensures that, for each hour, the amount sold matches the amount bought.
Problem solved, right?
Sadly, as anyone who has ever been outside knows, the weather (and other factors affecting generation and consumption) can’t be perfectly predicted down to the hour a whole day ahead.
It could happen, for example, that the afternoon is less sunny than expected, which means that a solar farm will produce less electricity than it sold the previous day.
This is where the intraday market — operating on 15-minute intervals instead of hour-long ones — comes in.
In this scenario of an expected underproduction, the solar farm can go to the intraday market and place bids to buy the difference between the amount of electricity it sold on the day-ahead market and the amount it’ll actually produce, from someone who is willing to either consume less power than they bought or produce more power than they already sold.
In practice, it’s usually the latter: someone will jump in and produce the extra electricity.
This is big business for coal and gas plants, because they can ramp their production up (or down if the scenario is reversed) on-demand, and very quickly.
As a larger percentage of electricity on the grid is generated using weather-dependent renewables, this intraday market becomes more valuable — and coal and gas-burning plants can be operated profitably for longer, even as learning effects make wind and solar power cheaper and CO2 emission prices rise.
Beyond fossil fuel-burning power plants ramping their generation up and down to meet consumption, another obvious supplier of flexibility is large batteries.
These can be paid to charge when there is an oversupply, and paid again to discharge when there is an undersupply.
Another plausible demand-side response comes from climate-controlled (food) distribution centers that need to run their cooling units a number of hours a day, but can be a bit flexible about exactly when those hours are.
These are both useful, but they’re not happening at scale (yet).
So it’d be great for the planet if these coal and gas plants had some more competition on the intraday electricity balancing market.
(Any imbalance that is not solved on the day-ahead and intraday markets is handled by the grid operator’s balancing reserves; I won’t go into the details of these FCRs and FRRs here.)
Datacenters and flexible AI training for demand-side response
This is — finally — where datacenters and AI models come in.
Here in The Netherlands, there has been some controversy in recent months about how many datacenters are being built (I bike by this imposing-looking one in Amsterdam several times a week) and how much energy they use.
But given the above, I actually think datacenters have the potential to play a positive role in the intraday electricity market.
Although many tasks of a datacenter, like serving websites, facilitating video calls, or powering Netflix streams, can’t really be shifted around in time at will, AI-related tasks often can be — both in research and production.
In a research setting, gargantuan AI models like DeepMind’s AlphaFold 2 can often take several days or weeks to train on dozens, hundreds or thousands of powerful machines.
And labs like OpenAI already use highly-customized versions of tools like Kubernetes to orchestrate these machines.
It’s not a stretch to imagine that these tools can be extended to ramp training up or down (in terms of the number of active machines, for example), along with the intraday electricity market.
(In fact, I tried building a little tool similar to this myself last year!)
In production settings, machine learning models are often retrained periodically, once for a whole service or even many times for individual (groups of) users.
This doesn’t happen exactly when the user queries or interacts with the model, but rather in an “offline” way: training happens on some schedule, and the model is saved to be retrieved for inference whenever the user wants to query it — so there’s potential for flexibility there.
Even inference can happen offline: things like tagging photo libraries with the objects present in the photos are not too time-sensitive, and can probably happen flexibly within some period after the photos are uploaded without impacting user experience too much.
It’s also not too crazy to imagine syncing this up to the electricity market.
Luckily, I’m not the first person to come up with this idea — see the Boden Tech datacenter in Sweden and Google’s partnership with Electricity Map, for example — but I do think that it’s under-appreciated, and often missed in discussions about Green AI and the climate risks of large AI models.
Since these big models can often be scheduled to be trained at any time, perhaps counterintuitively, the more power they use, the more flexibility they can offer to the grid — and the more they can out-compete fossil fuel plants on the intraday electricity market!
I think we have a better shot at getting big tech companies and AI labs to implement ideas like this at scale, than we do at getting them to stop training big AI models.
So instead of looking at gargantuan AI (language) models only as a climate problem, let’s give some more attention to their potential as a climate solution.
Al Gore launches Climate TRACE
Former vice president Al Gore and Gavin McCormick of WattTime launched Climate TRACE, a project for Tracking Real-time Atmospheric Carbon Emissions.
From the coalition’s launch post:
Our first-of-its-kind global coalition will leverage advanced AI, satellite image processing, machine learning, and land- and sea-based sensors to do what was previously thought to be nearly impossible: monitor GHG emissions from every sector and in every part of the world.
Our work will be extremely granular in focus — down to specific power plants, ships, factories, and more.
Our goal is to actively track and verify all significant human-caused GHG emissions worldwide with unprecedented levels of detail and speed.
Extracting information from satellite imagery is shaping up to be the killer app for climate change AI: we’ve previously seen it used for predicting electrical grid resilience (see DT #14), locating solar panels (#29), tracking deforestation (#25, #28, #39), and classifying farming land use (#41).
At the NeurIPS 2019 panel on AI for climate change research (#30), former head of Google Brain Andrew Ng also mentioned that the ability to train models on small satellite datasets is one of the machine learning advances he was most excited about for climate projects.
All this is to say: I’m extremely excited to see such a broad coalition—its founding members include “Blue Sky Analytics, CarbonPlan, Carbon Tracker, Earthrise Alliance, Hudson Carbon, Hypervine, OceanMind, and Rocky Mountain Institute”—launch as an independent observer of greenhouse gas emissions.
Their goals are certainly ambitious:
Through Climate TRACE, we will equip business leaders and investors, NGOs and climate activists, as well as international, domestic, and local policy leaders with an essential tool to fully realize the economic and societal benefits of a clean energy future, while ensuring that no one — corporation, country, or otherwise — will ever again have the ability to hide or fake their emissions data.
Next year, every country in the world will gather in Glasgow, Scotland, to enhance their commitments to the Paris Agreement and raise collective ambition in line with what the world’s scientists tell us is necessary.
We at the Climate TRACE coalition hope to support these COP26 climate talks with the most thorough and reliable data on emissions the world has ever seen.
The rest of the launch post goes a bit into how their GHG emissions observation will work, but beyond mentioning that they’ll do sensor fusion on visible + infrared imagery and satellite + radar measurements, Gore and McCormick don’t go into much technical detail yet.
They mention that this will follow in future posts, which I’ll be sure to link to here when they come out.
For now, they’ve set up a number of online profiles to follow for updates (website, Twitter, GitHub, LinkedIn); David Roberts at Vox also wrote a nice feature about how the coalition came to be.
Greenpeace report: oil in the cloud
Overview of Greenpeace’s findings in their Oil in the Cloud report.
Greenpeace released their Oil in the Cloud report .
Focusing on Google’s GCP, Amazon’s AWS, and Microsoft’s Azure, the report covers in what ways these cloud companies are working with oil and gas companies.
We’ve already heard a lot about this: it’s been highlighted in a viral Vox video, on the CCAI forums, and in the Tech Won’t Drill It pledge (see DT #33).
This report adds an exhaustive overview of how cloud services—and sometimes machine learning—are involved in the different phases of oil and gas extraction:
- Upstream: finding and extracting oil and gas, using ML to fill in missing data and manage datasets.
- Midstream: transporting and storing oil and gas, using ML to monitor pipelines (see DT #39) and “optimize pipelines, inventory and workforce.”
- Downstream: refining, marketing and selling oil and gas; this seems more focused on other cloud services, not ML specifically.
Greenpeace found specific examples of contracts that all three companies had in at least one of these phases.
It also notes that because of public outrage over the past few months, all three companies have deemphasized their oil and gas products on marketing websites.
So far, though, it looks like only Google has actually committed to no longer taking on new oil and gas contracts (but still continuing with its existing contracts).
Overall, Amazon and Microsoft, the largest players in western cloud computing at 33% and 18% market share respectively, come out of this report looking pretty bad.
Google, the smallest at 8%, is taking the biggest steps in the right direction.
Google also the only one of the three that’s already matching its datacenter energy use with renewable power purchases, and doing some very cool work to shift its workloads to happen when electricity grids are cleanest.
If you’re working in ML and training your models in the cloud, encouraging your company or group to switch to GCP—away from AWS and Azure–is probably one of the highest-impact actions you can take for climate change right now.
Radiant Earth Crop Detection in Africa challenge
“Sample fields (color coded with their crop class) overlayed on Google basemap from Western Kenya.” (Radiant Earth)
The Radiant Earth Foundation announced the winners of their Crop Detection in Africa challenge .
The competition was hosted on Zindi, a platform that connects African data scientists to organizations with “the world’s most pressing challenges”—similar to Kaggle.
Detecting crops from satellite imagery comes with extra challenges in Africa due to limited training data and the small size of farms.
A total of 440 data scientists across the world participated in building a machine learning model for classifying crop types in farms across Western Kenya using training data hosted on Radiant MLHub.
The training data contained crop types for a total of more than 4,000 fields (3,286 in the training and 1,402 in the testing datasets).
Seven different crop classes were included in the dataset, including: 1) Maize, 2) Cassava, 3) Common Bean, 4) Maize & Common Bean (intercropping), 5) Maize & Cassava (intercropping), 6) Maize & Soybean (intercropping), 7) Cassava & Common Bean (intercropping).
Two major challenges with this dataset were class imbalance and the intercropping classes that are a common pattern in smallholder farms in Africa.
As climate change will make farming more difficult in many regions across the world, this type of work is vital for protecting food production capacities.
Knowing what is being planted where is an important first step in this process.
Last year I covered the AI Sowing App from India (DT #20), another climate resilience project that helps farmers decide when to plant which crop using weather and climate data; better data on crop types and locations can certainly help initiatives like that as well.
NVIDIA's Inception climate AI startups
For the 50th anniversary of Earth Day, Isha Salin wrote about three startups using deep learning for environmental monitoring, which are all part of NVIDIA’s Inception program for startups.
Here’s what they do.
Orbital Insight maps deforestation to aid the Global Forest Watch, similar to the work being done by and 20tree.ai (DT #25) and David Dao’s lab at ETH Zurich (DT #28):
The tool can also help companies assess the risk of deforestation in their supply chains.
Commodities like palm oil have driven widespread deforestation in Southeast Asia, leading several producers to pledge to achieve zero net deforestation in their supply chains this year.
3vGeomatics monitors the thawing of permafrost on the Canadian Arctic in a project for the Canadian Space Agency.
Why it matters:
As much as 70 percent of permafrost could melt by 2100, releasing massive amounts of carbon into the atmosphere.
Climate change-induced permafrost thaw also causes landslides and erosion that threaten communities and critical infrastructure.
Azevea is monitoring construction around oil and gas pipelines to detect construction activities that may damage the pipes and cause leaks:
The U.S.
oil and gas industry leaks an estimated 13 million metric tons of methane into the atmosphere each year — much of which is preventable.
One of the leading sources is excavation damage caused by third parties, unaware that they’re digging over a natural gas pipeline.
I’m always a bit hesitant to cover ML startups that work with oil and gas companies, but I think in this case their work is a net benefit.
For details about the GPU tech being used by all these projects, see Salin’s full post.