FAVOR+, or Fast Attention Via Positive Orthogonal Random Features, is an efficient attention mechanism used in the Performer architecture which leverages approaches such as kernel methods and random features approximation for approximating softmax and Gaussian kernels.
FAVOR+ works for attention blocks using matrices $\mathbf{A} \in \mathbb{R}^{L×L}$ of the form $\mathbf{A}(i, j) = K(\mathbf{q}_{i}^{T}, \mathbf{k}_{j}^{T})$, with $\mathbf{q}_{i}/\mathbf{k}_{j}$ standing for the $i^{th}/j^{th}$ query/key rowvector in $\mathbf{Q}/\mathbf{K}$ and kernel $K : \mathbb{R}^{d } × \mathbb{R}^{d} \rightarrow \mathbb{R}_{+}$ defined for the (usually randomized) mapping: $\phi : \mathbb{R}^{d } → \mathbb{R}^{r}_{+}$ (for some $r > 0$) as:
$$K(\mathbf{x}, \mathbf{y}) = E[\phi(\mathbf{x})^{T}\phi(\mathbf{y})] $$
We call $\phi(\mathbf{u})$ a random feature map for $\mathbf{u} \in \mathbb{R}^{d}$ . For $\mathbf{Q}^{'}, \mathbf{K}^{'} \in \mathbb{R}^{L \times r}$ with rows given as $\phi(\mathbf{q}_{i}^{T})^{T}$ and $\phi(\mathbf{k}_{i}^{T})^{T}$ respectively, this leads directly to the efficient attention mechanism of the form:
$$ \hat{Att_{\leftrightarrow}}\left(\mathbf{Q}, \mathbf{K}, \mathbf{V}\right) = \hat{\mathbf{D}}^{1}(\mathbf{Q^{'}}((\mathbf{K^{'}})^{T}\mathbf{V}))$$
where
$$\mathbf{\hat{D}} = \text{diag}(\mathbf{Q^{'}}((\mathbf{K^{'}})\mathbf{1}_{L})) $$
The above scheme constitutes the FApart of the FAVOR+ mechanism. The other parts are achieved by:
The details are quite technical, so it is recommended you read the paper for further information on these steps.
Source: Rethinking Attention with PerformersPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Crop Yield Prediction  1  10.00% 
Keyword Spotting  1  10.00% 
Language Modelling  1  10.00% 
Dynamic Time Warping  1  10.00% 
Time Series  1  10.00% 
Time Series Clustering  1  10.00% 
Image Classification  1  10.00% 
Knowledge Distillation  1  10.00% 
Novel View Synthesis  1  10.00% 
Component  Type 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 