论文阅读——多模态情感分析（一）

MFN, TFN

朱传波

Jul 16, 2020 1 分钟阅读时长文献阅读, 多模态智能, 多模态情感分析

文章一：Memory Fusion Network for Multi-view Sequential Learning[@Zadeh2018]

2018年AAAI会议文章，主要工作是构造了针对多模态数据的MFN

Basic concepts

多模态的序列学习包括两个过程

view-specific interactions that handle only one view
cross-view interactions which are defined across different views and need to handle multi-views

so, the modeling of view-specific interactions and cross-view interactions is the core question of multi-view sequential learning[@Zadeh2018]

文章将多模态学习分为三类，基本思路是将不同模态下的特征向量映射到同一个特征子空间

the first category of models have relied on concatenation of all multiple views into a single view to simplify the learning setting

思路：通过将不同模态映射到某个单一模态来获得input层次的级联，再送入神经网络进行学习，得到特征向量
**问题：**简单的级联忽视了不同模态的数据本来就有专属特征

the second category of models introduce multi-view variants to the structured learning approaches of the first category

思路：对第一种的改进，单独学习每个模态的特征向量，是特征层次上的级联
问题： 特征向量的级联，如$${v}, {a}, {l} -> {v, a, l}$$并没有做到真正的数据融合

the third category of models rely on collapsing the time dimension from sequences by learning a temporal representation for each of the different views.

思路： 按照时间维度融合多模态的数据，融合后的特征向量是几个模态的平均，是特征层次上的级联，是modality fusion，或者通过voting的方式获取结果

my question: I don’t think that combining models by voting is a fusion method which should do fusion before generating decision, but the author classified it to the thrid category.

问题： 平均的方式掩盖了不同模态之间的差异性

Motivation

Innovation

DataSets

Memory Fusion Network (MFN) 由三部分构成：

the Systems of LSTMs for view-specific interactions
a special attention mechanism for cross-view interactions(DMAN)
Multi-view Gated Memory for summarization

文章二：Tensor Fusion Network for Multimodal Sentiment Analysis.

这是普通的文字

论文笔记前传

LSTM结构单元示意图

结构一

结构二

TFN MFN 多模态情感分析