#47: Facebook's AI Red Team, predictions of future AI crimes, and TensorFlow's new TF-Coder tool
Hey everyone, welcome to Dynamically Typed #47! I was on holiday the past two weeks—hanging out with my family for the first time since the start of the pandemic—so today’s newsletter is a bit shorter than usual. That means no feature stories, but lots of quick links! For productized AI, I’m covering Facebook’s AI Red Team, future AI-enabled crime, and Voyage’s new G3 car. For ML research, I’ve got TensorFlow’s new TF-Coder tool, Google’s new efficient language model for question answering, and an analysis of datasets available on Google’s Dataset Search. Finally, for cool stuff I have a link to Mozilla’s new Black AI art fund.
Productized Artificial Intelligence 🔌
- ⚔️ There has always been a cat-and-mouse game between ever-updating automated content filters and users who think of clever new ways to circumvent them: from email spam filters decades ago to blockers for explicit, violent or fake viral content on social media today. A new filter evasion trick falls through the cracks every once in a while, becomes popular and widely used, and is then eventually added to the automated filters. Depending on the severity of the bypass, this process sometimes has to be completed in mere hours or days. In light of, well, the state of the world, the stakes here are obviously very high—I don’t envy the pressure these ML teams must be under. Tom Simonite at Wired wrote a feature on Facebook’s internal AI Red Team, which is the company’s response to this problem. The team tries to hack the company’s own AI-powered filtering systems before users do, to always stay one step ahead of them. It’s a good read that covers the company’s “risk-a-thons”, their deepfakes detection challenge (DT #23), automated testing, and much more.
- 👮♀️ Related: Caldwell et al. wrote a paper on AI-enabled future crime for Crime Science, a journal associated with University College London. They think the highest-risk possibilities are: audio/video impersonation (e.g. deepfakes, again see DT #23), driverless vehicles as weapons, tailored phishing, disrupting AI-controlled systems (like the Facebook stuff above), large-scale blackmail, and AI-authored fake news. Burglar bots rank as low-risk and killer robots rank as medium-risk—personally I’d rank killer drones (bad title, good 7-minute sci-fi) above those two.
- 🚗 Voyage has put up a detailed blog post announcing the G3, its next-generation robotaxi aimed at senior citizens. Although the company is not quite as far along as Waymo, which has had customers riding their driverless taxis for over a year now, Voyage’s service should be live in San Jose, California, next year. I’ve been following this company for a while now and I thought I had featured them on DT at least once before, but my archive appears to disagree with me there. To rectify that, here are some more high-quality technical blog posts from Voyage that I’ve read but never get around to covering: one on their automatic emergency braking system, one on their active learning data curation, and one on their Telessist remote operations solution.
Machine Learning Research 🎛
- ❓ Although increasingly enormous do-it-all language models like T5 and GPT-3 (DT #42, #44) have been getting a lot of attention (haha) lately, smaller and more parameter-efficient models are still improving a lot as well. A recent interesting one is REALM by Guu et al. (2020) at Google AI, which, unlike these larger models, separates the encoding of language from the encoding of knowledge. Instead of implicitly storing information about the world in the language model’s weights, it introduces a neural retriever that learns to find relevant snippets of text from Wikipedia to be fed into the language model as context alongside the original query. As a result, it achieves a score of 40.4 on Natural Questions with just 300 million parameters, compared to T5’s score of 36.6 with 11 billion parameters—10% better results at 35x fewer parameters.
- ⚡️TF-Coder is TensorFlow’s new tensor manipulation utility. Given a few examples of input and output tensors, it generates TF2 code that transforms the input into the output. Check out the code on GitHub, try it out in a Colab notebook, or read about how it works in Shi et al. (2020).
- 🔎 Google’s Dataset Search (DT #15) now contains over 31 million datasets, a tripling since its initial launch two years ago. Natasha Noy and Omar Benjelloun wrote up an analysis of the types of datasets that are now available for the Google AI blog. Social science plus geoscience make up almost half the datasets, and together with biology, agriculture, and medicine, they comprise three quarters. The post also includes some best practices for publishing datasets so that Dataset Search can properly index them for other researchers to find.
I’ve also collected all 70 ML research tools previously featured in Dynamically Typed on a Notion page for quick reference. ⚡️
Cool Things ✨
- 🎨 Funding alert: Mozilla is launching a new $245,000 round of its Creative Media Awards for Black artists who are exploring the effects of AI on racial justice. I’m excited to see the projects that come out of this.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🏰