论文阅读——多模态情感分析(一)

MFN, TFN

文章一:Memory Fusion Network for Multi-view Sequential Learning[@Zadeh2018]

2018年AAAI会议文章,主要工作是构造了针对多模态数据的MFN

Basic concepts

多模态的序列学习包括两个过程

  1. view-specific interactions that handle only one view
  2. cross-view interactions which are defined across different views and need to handle multi-views

so, the modeling of view-specific interactions and cross-view interactions is the core question of multi-view sequential learning[@Zadeh2018]

文章将多模态学习分为三类,基本思路是将不同模态下的特征向量映射到同一个特征子空间

  1. the first category of models have relied on concatenation of all multiple views into a single view to simplify the learning setting

思路:通过将不同模态映射到某个单一模态来获得input层次的级联,再送入神经网络进行学习,得到特征向量
**问题:**简单的级联忽视了不同模态的数据本来就有专属特征

  1. the second category of models introduce multi-view variants to the structured learning approaches of the first category

思路:对第一种的改进,单独学习每个模态的特征向量,是特征层次上的级联
问题: 特征向量的级联,如$${v}, {a}, {l} -> {v, a, l}$$并没有做到真正的数据融合

  1. the third category of models rely on collapsing the time dimension from sequences by learning a temporal representation for each of the different views.

思路: 按照时间维度融合多模态的数据,融合后的特征向量是几个模态的平均,是特征层次上的级联,是modality fusion,或者通过voting的方式获取结果

my question: I don’t think that combining models by voting is a fusion method which should do fusion before generating decision, but the author classified it to the thrid category.

问题: 平均的方式掩盖了不同模态之间的差异性

Motivation

Innovation

DataSets

Memory Fusion Network (MFN) 由三部分构成:

  1. the Systems of LSTMs for view-specific interactions
  2. a special attention mechanism for cross-view interactions(DMAN)
  3. Multi-view Gated Memory for summarization

文章二:Tensor Fusion Network for Multimodal Sentiment Analysis.

这是普通的文字

论文笔记前传

LSTM结构单元示意图

结构一

结构二

朱传波
朱传波
计算机科学与技术专业博士生

我的研究兴趣包括:多模态智能、情感分析、情绪识别、讽刺识别等

相关