#39: Cloudflare's ML to block bad bots, 3x satellite-based environmental monitoring, and AR Face Doodles
Hey everyone, welcome to Dynamically Typed #39! Today in productized AI, I’m covering Cloudflare’s ML system to block bad bots trying to access their customers’ websites; and I have links to a user study of Google’s AI disease screening tool and an article on sidewalk food delivery robots. For ML research and climate change AI, I also have a whole host of quick links, as well as a write-up of environmental monitoring project in NVIDIA’s Inception startup program. And finally for cool stuff, I found a website that lets you draw on your face in augmented reality. Let’s dive in!
Productized Artificial Intelligence 🔌
Cloudflare’s overview of good and bad bots.
Web infrastructure company Cloudflare is using machine learning to block “bad bots” from visiting their customers’ websites. Across the internet, malicious bots are used for content scraping, spam posting, credit card surfing, inventory hoarding, and much more. Bad bots account for an astounding 37% of internet traffic visible to Cloudflare (humans are responsible for 60%).
To block these bots, Cloudflare built a scoring system based on five detection mechanisms: machine learning, a heuristics engine, behavior analysis, verified bots lists, and JavaScript fingerprinting. Based on these mechanisms, the system assigns a score of 0 (probably a bot) to 100 (probably a human) to each request passing through Cloudflare—about 11 million requests per second, that is. These scores are exposed as fields for Firewall Rules, where site admins can use them in conjunction with other properties to decide whether the request should pass through to their web servers or be blocked.
Machine learning is responsible for 83% of detection mechanisms. Because support for categorical features and inference speed were key requirements, Cloudflare went with gradient-boosted decision trees as their model of choice (implemented using CatBoost). They run at about 50 microseconds per inference, which is fast enough to enable some cool extras. For example, multiple models can run in shadow mode (logging their results but not influencing blocking decisions), so that Cloudflare engineers can evaluate their performance on real-world data before deploying them into the Bot Management System.
Alex Bocharov wrote about the development of this system for the Cloudflare blog. It’s a great read on adding an AI-powered feature to a larger product offering, with good coverage of all the tradeoffs involved in that process.
Quick productized AI links 🔌
- 🏥 Emma Beede conducted a user study on how nurses in Thailand are using Google’s AI screening tool to help diagnose diabetic retinopathy. “[The] study found that the AI system could empower nurses to confidently and immediately identify a positive screening, resulting in quicker referrals to an ophthalmologist.” Beede emphasizes, though, that it’s important to engage with clinicians and patients before widely deploying such systems, to ensure it doesn’t inadvertently hinder diagnosis.
- 🍔 Writing for Ars Technica, Timothy B. Lee shared his experience of getting a burger delivered by a robot. Part-self-driving and part-piloted, these box-on-wheels sidewalk robots by startups like Starship and Kiwibot are getting pretty clever. “If, like, a group of people surrounded the robot and blocked it,” said Starship executive Ryan Touhy, “the robot would identify the situation and say ‘Hello I’m a Starship delivery robot. Can you please let me pass.’” The whole story is a fun read, as is this comment. Also check out Joan Lääne’s post about their mapping and navigation tech for Starship’s blog.
- 📝 Google Lens now lets you copy text from handwritten notes by pointing your phone at them.
Machine Learning Research 🎛
Quick ML research + resource links 🎛 (see all 62)
- 🧮 Papers With Code, the site that has benchmarked the performance of over 20,000 ML models on 2,500 standard tasks, now links results in plots back directly to the tables they came from in a paper. Ross Taylor wrote up their automated results extraction method, which is open-source on GitHub: paperswithcode/axcell.
- ⚡️ PyTorch Serve is an open-source tool by Facebook and Amazon to easily turn ML models into API endpoints accessible from the web: pytorch/serve.
- 📉 OpenAI released an analysis showing that “since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months. Compared to 2012, it now takes 44 times less compute to train a neural network to the level of AlexNet” (Hernandez and Brown, 2020).
Artificial Intelligence for the Climate Crisis 🌍
For the 50th anniversary of Earth Day, Isha Salin wrote about three startups using deep learning for environmental monitoring, which are all part of NVIDIA’s Inception program for startups. Here’s what they do.
Orbital Insight maps deforestation to aid the Global Forest Watch, similar to the work being done by and 20tree.ai (DT #25) and David Dao’s lab at ETH Zurich (DT #28):
The tool can also help companies assess the risk of deforestation in their supply chains. Commodities like palm oil have driven widespread deforestation in Southeast Asia, leading several producers to pledge to achieve zero net deforestation in their supply chains this year.
3vGeomatics monitors the thawing of permafrost on the Canadian Arctic in a project for the Canadian Space Agency. Why it matters:
As much as 70 percent of permafrost could melt by 2100, releasing massive amounts of carbon into the atmosphere. Climate change-induced permafrost thaw also causes landslides and erosion that threaten communities and critical infrastructure.
Azevea is monitoring construction around oil and gas pipelines to detect construction activities that may damage the pipes and cause leaks:
The U.S. oil and gas industry leaks an estimated 13 million metric tons of methane into the atmosphere each year — much of which is preventable. One of the leading sources is excavation damage caused by third parties, unaware that they’re digging over a natural gas pipeline.
I’m always a bit hesitant to cover ML startups that work with oil and gas companies, but I think in this case their work is a net benefit. For details about the GPU tech being used by all these projects, see Salin’s full post.
Quick climate AI links 🌍
- 🔌 The US Department of Energy has announced that it’s investing $30 million in research on ML/AI research on energy systems. Two specific areas of interest are ML for predictive modeling and simulation (presumably stuff like DeepMind’s wind farm power output predictions, see DT #8) and AI for “decision support” in managing complex systems in general.
- 📺 Climate Change 101 is CCAI’s 50-slide deck on the basics of climate science, aimed at ML/AI researchers.
- 📄 Cool climate-adjacent paper by Biermann et al. (2020): Finding Plastic Patches in Coastal Waters using Optical Satellite Data
- 💼 Job alert: Ryan Orbuch’s team at Stripe is hiring two fullstack engineers and a product designer, “to make it easy for users to have a real impact on climate.” Stripe’s execution is always top-tier so if you’re a designer or software engineer looking for a change of employment, this is the most sure-fire way you can help the climate. (Not sponsored.)
Cool Things ✨
Yours truly, now with mustache, beard, and brows.
Cyril Diagne, resident artist/designer/programmer at Google Arts & Culture, built AR Face Doodle —a website that lets you draw on your face in 3D. It’s powered by MediaPipe Facemesh, “a lightweight machine learning pipeline predicting 486 3D facial landmarks to infer the approximate surface geometry of a human face,” which can run real-time in browsers using TensorFlow.js. The site lets you draw squiggles on top of your selfie camera feed and then locks them to the closest point on your face. As you move your face around—or even scrunch it up—the doodles stick to their places and move around in 3D remarkably well. AR Face Doodle should work on any modern browser; you can also check out the site’s code on GitHub: cyrildiagne/ar-facedoodle.
Quick cool things links ✨
- 📱 Also by Cyril Diagne: AR cut & paste—take a photo of something with your phone and paste it into a document on your laptop. One of the coolest 30-second UI demos I’ve seen in a while—you don’t want to miss this one.
- 🎶 OpenAI Jukebox is “a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.” Come for the audio samples, stay for the t-SNE cluster of artists and genres the model learns without supervision. In one fun application, the model is shown the first 12 seconds of a song and then tries to realistically generate the rest of the track—my favorite is Jukebox’s continuation of Adele’s Rolling in the Deep. Also check out this thoughtful critique from musician and Google Brain researcher Jesse Engel, and Janelle Shane’s thread of silly samples.
- 🤡 Dylan Wenzlau built an end-to-end system for meme text generation with a deep convolutional network in Keras & TensorFlow, supporting dozens of meme formats. You can try it on imgflip.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. ⛱