Fc2 3292343
| Category | Representative Works | Key Idea | Limitations | |---|---|---|---| | | SlowFast [1], ViViT [2] | Spatiotemporal convolutions / transformers | No audio information | | Audio‑only | WaveNet [3], PANNs [4] | Raw waveform / spectrogram modeling | No visual context | | Early Fusion | AVFusion [5] | Concatenate raw frames + spectrograms | Temporal misalignment | | Late Fusion | Two‑Stream LSTM [6] | Separate predictions + averaging | Ignores cross‑modal dynamics | | Intermediate Fusion | Cross‑modal Transformers [7] | Shared self‑attention | High memory/computation | | Hybrid | MMT [8] | Modality‑specific backbones + cross‑attention | Still computationally heavy |
The entry, titled features the well-known actress Asada Himari (朝田ひまり) . fc2 3292343
[ w_k = \textsoftmax( \textConv1D(F) k ), \qquad p k = \sum_t=1^T w_k,t , f_t. ] | Category | Representative Works | Key Idea
