Please enable JavaScript.
Coggle requires JavaScript to display documents.
DropEdge (:red_flag: Motivation: Over-fitting and over-smoothing, :star:…
DropEdge
-
:star: Idea: randomly removes a number of edges at each training epoch => [data augmenter] [message passing reducer]. #eta-smoothing#: d_M(H)=distance between hidden vectors H and the subspace M=通过计算一跳和二跳网络输出结果的差异来判定是否出现过平滑.
:silhouettes: DropNode: GraphSage, FastGCN, ASGCN
Exp. Backbones: GCN, ResGCN, JKNet, IncepGCN, GraphSAGE
-
DropNode
ASGCN 2018-NIPS
[Adaptive Sampling Towards Fast Graph Representation Learning] 腾讯 AI Lab 新提出的逐层采样是自适应的,并且能显式地减少采样方差。实验证明在有效性和准确性上优于其他基于采样的方法:GraphSAGE和FastGCN。
:star: Idea: 1. 逐层采样,所有的邻域由父层中的节点共享,因此所有层间连接都被利用 (逐节点采样中,每个父节点的邻域不会被其他父节点看到)。这种方式的层之间的信息是共享的并且采样的节点的数量是可控的。2.逐层采样的采样方式是自适应的并且在训练阶段显式地由方差降低来确定。3.通过在两个层之间建立skip-connection来保持second-order proximity。
:pencil2: Node-wise sampling: 卷积公式=期望的形式->蒙特卡罗采样近似期望值加快计算->递归采样当前层中每个节点的邻居(自顶向下),cons: 很高的计算代价和内存代价,因为随着层数的增加采样的节点数呈指数趋势增长。:pencil2: Layer-wise sampling: 重要性采样p(uj|ui)/p(uj|u1,...un)期望->蒙特卡罗采样近似期望值加快计算->所有采样节点{uj}由当前层的所有节点共享,能够最大限度地增强消息传递。更重要的是,每层的大小固定为nnn,采样节点总数只随网络深度线性增长。
Related works: :silhouette:1. GraphSAGE通过随机采样节点的固定数量邻居然后进行信息聚合来学习节点的embedding; :silhouette:2. FastGCN的每一层都是独立的,FastGCN模型将图卷积解释为embedding函数的积分变换,并对每一层中的节点独立采样; :silhouette:3. 在逐层采样方法中,下层节点的采样是基于上层节点的,这样就能够获得各层之间的相关性。GraphSAGE或FastGCN的采样器不涉及任何参数,不能自适应地最小化方差。相反,这种自适应的采样器用一个自相关函数来修正最优的重要性采样分布。通过对网络和采样器进行微调,可以显式地减少产生的方差。【https://blog.csdn.net/yyl424525/article/details/102493007】
GraphSAGE
-
:star: Idea: sampling and aggregating features. 1. Neighborhood sampling: uniformly sample a fixed-size set of neighbors. 2. Aggregator: [symmetric] Mean aggregator/ LSTM aggregator / Pooling aggregator. 3. Unsupervised loss: encourage nearby nodes to have similar representations [co-occurs on fixed-length random walk]
Exp. Citation dataset, Reddit, PPI. Tasks: supervised/unsupervised NC. Theoretical analysis: Clustering coefficient of a node
FastGCN
:star: Idea: 将图卷积操作解释为embedding函数在概率度量下的积分变换,这种方式为inductive学习提供了一个理论支持。具体来说,此文将图的顶点解释为某种概率分布下的独立同分布的样本,并将损失和每个卷积层作为顶点embedding函数的积分。然后,通过定义样本的损失和样本梯度的蒙特卡罗近似计算积分,并且可以进一步改变采样分布(如在重要性采样中)以减小近似方差。
-
-
over-smoothing
:red_flag: phenomenon: all nodes' representations will converge to a stationary point
:no_entry: result: making them unrelated to the input features and leading to vanishing gradients
:silhouettes: first point out: #Deeper insights into gcn for semi-supervised learning# implies that node features will converge to a fixed point as the network depth increases. Studies: :silhouette: personalized PageRank: involves rooted node into message passing. :silhouette:JKNet: employs dense connections from multi-hop message passing. :silhouette: 2019Oono #On asymptotic behaviors of graph cnns from dynamical systems perspective# theoretically proce that node features of deep GCNs will converge to a subspace and incur information loss.
MixHop
-
:star:Idea: 1.Delta Operators 2. Neighborhood Mixing: 3. MixHop: mixes powers of the adjacency matrix. A^j H^(i−1)W^(i−1) concate,让模型同时学习W_j里对应的不同跳数的参数。针对数据学一 个参数各异的滤波器,达成 Mix 的效果。训练过程,MixHop 采用了一个相对简单的,不断 Drop 的过程,也没有造成过大的计算负担。 MixHop 事实上给了我们对于多跳信息的一点启发: 1. 不同跳数据之间的输出,是拥有相当充裕的信息,可以为图片学习的效果做出相当强的反馈; 2. 图结构的部分信息,事实上具有自适应,抑或是强化学习的余地,这也引出了GraphNAS 在内的部分工作。
over-fitting
:red_flag: cause: utilize over-paprameterized model to fit a dist. with limited training data.
:no_entry: result: fit training data well but generalize poorly to testing data