Dynamically Typed

#39: Cloudflare's ML to block bad bots, 3x satellite-based environmental monitoring, and AR Face Doodles

Hey everyone, welcome to Dynamically Typed #39! Today in productized AI, I’m covering Cloudflare’s ML system to block bad bots trying to access their customers’ websites; and I have links to a user study of Google’s AI disease screening tool and an article on sidewalk food delivery robots. For ML research and climate change AI, I also have a whole host of quick links, as well as a write-up of environmental monitoring project in NVIDIA’s Inception startup program. And finally for cool stuff, I found a website that lets you draw on your face in augmented reality. Let’s dive in!

Productized Artificial Intelligence 🔌

Cloudflare’s overview of good and bad bots.

Cloudflare’s overview of good and bad bots.

Web infrastructure company Cloudflare is using machine learning to block “bad bots” from visiting their customers’ websites. Across the internet, malicious bots are used for content scraping, spam posting, credit card surfing, inventory hoarding, and much more. Bad bots account for an astounding 37% of internet traffic visible to Cloudflare (humans are responsible for 60%).

To block these bots, Cloudflare built a scoring system based on five detection mechanisms: machine learning, a heuristics engine, behavior analysis, verified bots lists, and JavaScript fingerprinting. Based on these mechanisms, the system assigns a score of 0 (probably a bot) to 100 (probably a human) to each request passing through Cloudflare—about 11 million requests per second, that is. These scores are exposed as fields for Firewall Rules, where site admins can use them in conjunction with other properties to decide whether the request should pass through to their web servers or be blocked.

Machine learning is responsible for 83% of detection mechanisms. Because support for categorical features and inference speed were key requirements, Cloudflare went with gradient-boosted decision trees as their model of choice (implemented using CatBoost). They run at about 50 microseconds per inference, which is fast enough to enable some cool extras. For example, multiple models can run in shadow mode (logging their results but not influencing blocking decisions), so that Cloudflare engineers can evaluate their performance on real-world data before deploying them into the Bot Management System.

Alex Bocharov wrote about the development of this system for the Cloudflare blog. It’s a great read on adding an AI-powered feature to a larger product offering, with good coverage of all the tradeoffs involved in that process.

Quick productized AI links 🔌

Machine Learning Research 🎛

Quick ML research + resource links 🎛 (see all 62)

Artificial Intelligence for the Climate Crisis 🌍

For the 50th anniversary of Earth Day, Isha Salin wrote about three startups using deep learning for environmental monitoring, which are all part of NVIDIA’s Inception program for startups. Here’s what they do.

Orbital Insight maps deforestation to aid the Global Forest Watch, similar to the work being done by and 20tree.ai (DT #25) and David Dao’s lab at ETH Zurich (DT #28):

The tool can also help companies assess the risk of deforestation in their supply chains. Commodities like palm oil have driven widespread deforestation in Southeast Asia, leading several producers to pledge to achieve zero net deforestation in their supply chains this year.

3vGeomatics monitors the thawing of permafrost on the Canadian Arctic in a project for the Canadian Space Agency. Why it matters:

As much as 70 percent of permafrost could melt by 2100, releasing massive amounts of carbon into the atmosphere. Climate change-induced permafrost thaw also causes landslides and erosion that threaten communities and critical infrastructure.

Azevea is monitoring construction around oil and gas pipelines to detect construction activities that may damage the pipes and cause leaks:

The U.S. oil and gas industry leaks an estimated 13 million metric tons of methane into the atmosphere each year — much of which is preventable. One of the leading sources is excavation damage caused by third parties, unaware that they’re digging over a natural gas pipeline.

I’m always a bit hesitant to cover ML startups that work with oil and gas companies, but I think in this case their work is a net benefit. For details about the GPU tech being used by all these projects, see Salin’s full post.

Quick climate AI links 🌍

Cool Things ✨

Yours truly, now with mustache, beard, and brows.

Yours truly, now with mustache, beard, and brows.

Cyril Diagne, resident artist/designer/programmer at Google Arts & Culture, built AR Face Doodle —a website that lets you draw on your face in 3D. It’s powered by MediaPipe Facemesh, “a lightweight machine learning pipeline predicting 486 3D facial landmarks to infer the approximate surface geometry of a human face,” which can run real-time in browsers using TensorFlow.js. The site lets you draw squiggles on top of your selfie camera feed and then locks them to the closest point on your face. As you move your face around—or even scrunch it up—the doodles stick to their places and move around in 3D remarkably well. AR Face Doodle should work on any modern browser; you can also check out the site’s code on GitHub: cyrildiagne/ar-facedoodle.

Quick cool things links ✨

Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.

If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. ⛱