One AI model, four competing services
Melody ML, Acapella Extractor, Vocals Remover, and Moises.ai are all services that use AI to separate music into different tracks by instrument. Like many of these single-use AI products, they wrap machine learning models into easy-to-use UIs and APIs, and sell access to them as a service (after users exceed their free tier credits). Here’s a few examples of their outputs:
- Bill Withers - Lean On Me: original vs. vocals extracted using Acapella Extractor.
- The Beatles - Yellow Submarine: original vs. instrumental extracted using Vocals Remover.
- Etnia - Estrella Síria: original and isolated tracks on the Moses.ai landing page.
As you can tell, these services all have pretty similar-quality results. That’s no accident: all four are in fact built on top of Spleeter, an open-source AI model by French music service Deezer—but none of them are actually by Deezer. So these services are basically just reselling Amazon’s or Google’s GPU credits at a markup—not bad for what I imagine to be about a weekend’s worth of tying everything together with a bit of code. There’s a lot of low-hanging fruit in this space, too: even just within the audio domain, there are 22 different task on Papers with Code for which you can find pretrained, state-of-the-art models that are just waiting to be wrapped into a service. (And for computer vision, there are 807 tasks.)
I actually quite like the idea of this. You need a whole different skillset to turn a trained model into a useful product that people are willing to pay for: from building out a thoughtful UI and the relevant platform/API integrations, to finding a product/market fit and the right promotional channels for your audience. As long as the models are open-source and licensed to allow commercial use, I think building products like this and charging money for them is completely fair game.
Since the core technology is commoditized by the very nature of the underlying models being open-source, the competition shifts to who has the best execution around those same models.
For example, the Melody ML service restricts both free and paid users to a maximum length of 5 minutes per song. Moises.ai saw that and thought they could do better: for $4/month, they’ll process songs up to 20 minutes long. Similarly, the person who built both Vocals Remover and Acapella Extractor figured the pitch worked better in the form of those two separate, specialized websites. They even set up namesake YouTube channels that respectively post instrumentals-only and vocals-only versions of popular songs—some with many thousands of views—and of course link those back to the websites. Clever!
It’s really cool to see how the open-source nature of the AI community, along with how easy it is to build websites that integrate with cloud GPUs and payments services nowadays, is enabling these projects to pop up more and more. So who’s picking up something like this as their next weekend project? Let me know if you do!
(Thanks for the link to Acapella Extractor, Daniël! Update: I previously thought the Melody ML service was by Deezer, but someone at Deezer pointed out it was built by a third party.)
Greenpeace report: oil in the cloud
Overview of Greenpeace’s findings in their Oil in the Cloud report.
Greenpeace released their Oil in the Cloud report . Focusing on Google’s GCP, Amazon’s AWS, and Microsoft’s Azure, the report covers in what ways these cloud companies are working with oil and gas companies. We’ve already heard a lot about this: it’s been highlighted in a viral Vox video, on the CCAI forums, and in the Tech Won’t Drill It pledge (see DT #33). This report adds an exhaustive overview of how cloud services—and sometimes machine learning—are involved in the different phases of oil and gas extraction:
- Upstream: finding and extracting oil and gas, using ML to fill in missing data and manage datasets.
- Midstream: transporting and storing oil and gas, using ML to monitor pipelines (see DT #39) and “optimize pipelines, inventory and workforce.”
- Downstream: refining, marketing and selling oil and gas; this seems more focused on other cloud services, not ML specifically.
Greenpeace found specific examples of contracts that all three companies had in at least one of these phases. It also notes that because of public outrage over the past few months, all three companies have deemphasized their oil and gas products on marketing websites. So far, though, it looks like only Google has actually committed to no longer taking on new oil and gas contracts (but still continuing with its existing contracts).
Overall, Amazon and Microsoft, the largest players in western cloud computing at 33% and 18% market share respectively, come out of this report looking pretty bad. Google, the smallest at 8%, is taking the biggest steps in the right direction.
Google also the only one of the three that’s already matching its datacenter energy use with renewable power purchases, and doing some very cool work to shift its workloads to happen when electricity grids are cleanest. If you’re working in ML and training your models in the cloud, encouraging your company or group to switch to GCP—away from AWS and Azure–is probably one of the highest-impact actions you can take for climate change right now.
Is it enough for only big tech to pull out of facial recognition?
Big tech companies are putting an end to their facial recognition APIs. Beside their obvious privacy problems, commercial face recognition APIs have long been criticized for their inconsistent recognition accuracies for people of different backgrounds. Frankly said, these APIs are better at identifying light-skinned faces than dark-skinned ones. Joy Buolamwini and Timnit Gebru first documented a form of this in their 2018 Gender Shades paper, and there have been many calls to block facial recognition APIs from being offered ever since; see Jay Peter’s article in The Verge for some more historical context.
It took two years and the recent reckoning of discrimination and police violence in the United States (see DT #41), for IBM to finally write a letter to the US congress announcing they’re done with the technology:
IBM no longer offers general purpose IBM facial recognition or analysis software. IBM firmly opposes and will not condone uses of any technology, including facial recognition technology offered by other vendors, for mass surveillance, racial profiling, violations of basic human rights and freedoms, or any purpose which is not consistent with our values and Principles of Trust and Transparency.
Amazon and Microsoft followed soon after, pausing police use of their equivalent APIs. Notably Google, where Gebru works, has never had a facial recognition API. Now that these big-name tech companies are no longer providing facial-recognition-as-a-service, however, this does expose a new risk. Benedict Evans, in his latest newsletter:
The catch is that this tech is now mostly a commodity (and very widely deployed in China) - Google can say “wait”, but a third-tier bucketshop outsourcer can bolt something together from parts it half-understands and sell it to a police department that says ‘it’s AI - it can’t be wrong!’.
This is a real risk, and that’s why the second half of these announcements is equally—if not more—important. Also from IBM’s letter to congress:
We believe now is the time to begin a national dialogue on whether and how facial recognition technology should be employed by domestic law enforcement agencies.
The real solution here is not for individual big tech companies to be publicly shamed into stopping their facial recognition APIs, but for the technology to be regulated by law—so that a “third-tier bucketshop outsourcer” can’t do the same thing, but out of the public eye. So: these are good steps, but this week’s news is far from the last chapter in the story of face recognition.
OpenAI's GPT-3: a language model that doesn't need finetuning
OpenAI announced GPT-3, the next generation of its language model. As we’re used to by now, it’s another order of magnitude bigger than previous models, at 175 billion parameters—compared to 1.5 billion for GPT-2 and 17 billion for Microsoft’s Turing NLG (DT #33). It’s not the model’s size that’s interesting, though, but what this enables. From the abstract of the 74-page paper by Brown et al. (2020) detailing GPT-3:
Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. … For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.
This is super cool! Where GPT-2 could only complete a passage from a given input in a natural-sounding way, GPT-3 can now do several tasks just from being shown examples. Instead of fine-tuning the model for specific tasks like translation, question-answering, or generating podcast episode titles that do not exist (👀), the model can do everything out of the box. For example, if you feed it several questions and answers prefixed with “Q:” and “A:” respectively, followed by a new question and “A:”, it’ll continue the passage by answering the question—without ever having to update its weights! Other example include parsing unstructured text data into tables, improving English-language text, and even turning natural language into Bash terminal commands (but can it do git?).
OpenAI rolled out its previous model in stages, starting with a 117-million parameter version (“117M”) in February 2019 (DT #8), followed by 345M in May of that year (DT #13), 774M in September with a six-month follow up blog post (DT #22), and finally the full 1.5-billion parameter version in November (DT #27). The lab is doing the same for GPT-3, which is also the first model that it’s making commercially available in the form of an API. Just a few vetted organizations have had access to the API so far. Ashlee Vance for Bloomberg:
To date, Casetext has been using the technology to improve its legal research search service, MessageBird has tapped it for customer service, and education software maker Quizlet has used it to make study materials.
Janelle Shane als has access to GPT-3, and she has used the API to make some “spookily good Twitter bots” on her AI Weirdness blog.
I’m glad OpenAI staging the release of their API this way again, since valid criticism has already started popping up: Anima Anandkumar pointed out on Twitter that the GPT-2 has “produced shockingly racist and sexist paragraphs without any cherry picking.” (Also see this follow-up discussion with OpenAI policy director Jack Clark.) These type of bias problems have to be worked out before the model can responsibly be released beyond a few trusted partners, which OpenAI CEO Sam Altman also acknowledged this in Vance’s piece:
As time goes on, more organizations will gain access, and then the API will be public. “I don’t know exactly how long that will take,” Altman said. “We would rather be on the too-slow than the too-fast side. We will mistakes here, and we will learn.”
As the OpenAI API gets released more broadly and integrated into more products, I’ll keep following its progress.
Datasheets for datasets and Model Cards for model reporting
Google’s model card for their face detection model. (Google)
Datasheets for Datasets and Model Cards for Model Reporting . These two papers aim to improve transparency and accountability in machine learning models and the datasets that were used to create them.
From the abstract of the first paper by Gebru et al. (2018):
The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on.
The paper goes on to provide a set of questions and a workflow to properly think through and document each of these aspects of a dataset in a dataseheet. It also has example datasheets for two standard datasets: Labeled Faces in the Wild and the Movie Review Data.
From the abstract of the second paper by Mitchell et al. (2019):
Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information.
This is essentially the same principle, but now applied to a trained model instead of a dataset. The paper also includes details on how to fill in each part of a model card, as well as two examples: a smile detection model and a text toxicity classifier. I’ve also seen some model cards in the wild recently: Google has them for their face detection and object detection APIs and OpenAI has one for their GPT-2 language model (but not yet for GPT-3, as far as I can tell).
I’m excited to try creating a dataset datasheet and a model card at work—which also makes me think: practicing making these should really have been part of my AI degree. I’ve also added both papers to my machine learning resources list.