Distill: Branch Specialization
Distill #2: Branch Specialization by Voss et al. (2021), a chapter in the Circuits thread which includes previous work like Zoom in on Circuits, Early Vision in CNNs, Curve Detectors, Equivariance, High-Low Frequency Detectors, and Curve Circuits. In this article, the authors find that similar circuit-level functions tend to group themselves in network branches, which are “sequences of layers which temporarily don’t have access to ‘parallel’ information which is still passed to later layers.” For example, all 30 curve-related features in InceptionV1’s mixed3b_5x5 layer are concentrated in just one of the layer’s four branches. The authors hypothesize that this is because of a positive feedback loop during training, where the earlier layer in a branch is incentivized to form low-level features that the later layer uses as primitives for higher-level features. One cool thing about Distill is that it also invites non-AI researchers to provide commentary on articles. In this case, Matthew Nolan and Ian Hawes, neuroscientists at the University of Edinburgh, see a “striking parallel” with the separation of cortical pathways in the human brain.