#51: Photoshop's new AI features, Descript for video, and AR copy & paste with ClipDrop
Hey everyone, welcome to Dynamically Typed #51! Today in productized AI, I’m covering Adobe’s big new Photoshop release full of AI-powered features and I have quick links on Descript’s new video editor, Cyril Diagne’s ClipDrop app, and a Google deep dive on the engineering of big machine learning systems. Beyond that, I’ve to links to two open-source projects for ML research: Facebook’s M2M-100 translation model and an African MNIST dataset. Finally, for climate AI, I found a new paper that can find defects in solar panels by segmenting thermal images from drone-mounted cameras.
Productized Artificial Intelligence 🔌
Light direction is one of many new AI-powered features in Photoshop; in the middle picture, the light source is on the left; in the right picture, it’s moved to the right.
Adobe’s latest Photoshop release is jam-packed with AI-powered features . The pitch, by product manager Pam Clark:
You already rely on artificial intelligence features in Photoshop to speed your work every day like Select Subject, Object Selection Tool, Content-Aware Fill, Curvature Pen Tool, many of the font features, and more. Our goal is to systematically replace time-intensive steps with smart, automated technology wherever possible. With the addition of these five major new breakthroughs, you can free yourself from the mundane, non-creative tasks and focus on what matters most – your creativity.
Adobe is branding the most exciting of these new features as Neural Filters : neural-network-powered image manipulations that are parameterized by sliders in the Photoshop UI. Some of them automate tasks that were previously very labor-intensive, while others enable changes that were previously impossible. Here’s a few of both:
- Style transfer: apply one photo’s style to another, like the classic “make this look like a Picasso / Van Gogh / Monet.”
- Smart portraits: subtly change a photo subject’s age, expression, gaze direction, pose, hair thickness, etc.
- Colorize: infer colors for black-and-white photos based on their contents.
- JPEG Artifacts Removal: smooth out the blocky artifacts that occur on patches of JPEG-compressed photos.
These all run on-device and came out of a collaboration between Adobe Research and NVIDIA, implying they’re best suited to machines with beefy GPUs — not surprising. However, the blog post is a little vague in about the specifics here (“performance is particularly fast on desktops and notebooks with graphics acceleration”), so I wonder whether this Neural Filters is also optimized for any other AI accelerator chips that Adobe can’t mention yet. In particular, Apple recently showed off their new A14 chips that feature a much faster Neural Engine. These chips launched in the latest iPhones and iPads but will also be in a new line of non-Intel “Apple Silicon” Macs, rumored to be announced next month — what are the chances that Apple will boast about the performance of Neural Filters on the Neural Engine during the presentation? I’d say pretty big. (Maybe worthy of a Ricky, even?)
Anyway, this Photoshop release is exactly the kind of productized AI that I started DT to cover: advanced machine learning models — that only a few years ago were just cool demos at conferences — wrapped up in intuitive UIs that fit into users’ existing workflows. It’s now just as easy to tweak the intensity of a smile or the direction of a gaze in a portrait photo as it is to manipulate its hue or brightness. That’s pretty amazing.
Quick productized AI links 🔌
- 📼 Descript has launched their new video editor. This is another DT-favorite: Descript originally built an app that lets you edit the transcribed text of an audio file and reflects those changes back into the audio (see DT #18), followed by a version of the product optimized for podcast editing (#24). The newest release turns the app into a fully-fledged video editor, including support for Descript’s core transcript-based editing feature: it can delete sections, auto-remove “uhm"s, and even generate new audio (in the speaker’s voice!) for small corrections. And it comes with a great launch video (by Sandwich, of course).
- 📱 Cyril Diagne’s AR cut & paste demo (#39) is now an app: ClipDrop lets you take photos of objects on your phone, uses a background removal model to cut them out, and then lets you paste them onto your laptop screen in augmented reality. I’ve tried it on a few objects I had laying around my apartment, and capturing objects (the “clip” bit) works super reliably; sending the photo to my laptop (the “drop” bit) was a bit less robust.
- 📖 Long (technical) deep-dive from Google on their lessons learned in a decade of software engineering for machine learning systems: Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX). A recurring theme for those of you that have been reading DT for a while: “We also recommend that before focusing on cutting-edge ML modeling techniques, product leaders should invest more time in adopting interoperable ML platforms for their organizations.”
Machine Learning Research 🎛
- 💱 Kyle Wiggers wrote a feature for Venture Beat on Facebook’s new M2M-100 model — a machine translation model that, unlike e.g. Google Translate does for many language pairs, does not use English as go-between. Instead of translating from A to English and then from English to B, it translates form A directly to B — which for 100 languages means there are nearly 10,000 combinations. The model was trained on 2200 of these combinations, and is a new state of the art (in terms of BLEU) for many non-English language pairs. The model has 15 billion parameters, continuing the trend that strength really is in numbers for NLP and MT models. FAIR has open-sourced M2M-100 at pytorch/fairseq.
- 🔢 Wu et al. (2020) present four variations of the popular MNIST digit recognition for “our orthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic), Vai, Osmanya, and N’Ko.” They’re formatted so that they can be used as drop-in replacement for any existing MNIST model, and the authors show that LeNet achieves similar classification accuracies to classic MNIST on each of the new datasets. The data is open-source at Daniel-Wu/AfroMNIST.
Artificial Intelligence for the Climate Crisis 🌍
- ☀️ Montañez et al. (2020) have developed a method and GUI-enabled application for automatically segmenting defects in solar PV modules from thermal images captured using drones. As over 100 gigawatts of new solar power installations are being deployed across the world every year, AI-enabled products like this will become more and more useful.
Thanks for reading! As usual, you can let me know what you thought of today’s issue using the buttons below or by replying to this email. If you’re new here, check out the Dynamically Typed archives or subscribe below to get a new issues in your inbox every second Sunday.
If you enjoyed this issue of Dynamically Typed, why not forward it to a friend? It’s by far the best thing you can do to help me grow this newsletter. 🎃