[Phase 3] Expand Multimodal AI Beyond CLIP and BLIP-2
Problem
Has CLIP (issue #272) and BLIP-2 (issue #273) but missing advanced multimodal models.
Existing
- Issue #272: CLIP
- Issue #273: BLIP-2 (VQA, captioning)
- src/RetrievalAugmentedGeneration/MultiModalEmbeddingModel.cs
Missing Implementations
Visual Language Models (CRITICAL):
- Flamingo (in-context learning)
- GPT-4V-equivalent (vision + language LLM)
- LLaVA (large language and vision assistant)
Audio-Visual (HIGH):
- Audio-visual correspondence learning
- Cross-modal retrieval (audio-image-text)
- Audio-visual event localization
Video-Language (HIGH):
- VideoCLIP
- Video captioning (beyond frame-level)
- Video question answering
- Action recognition with language
Advanced (MEDIUM):
- DALL-E 3 text-to-image (complementing diffusion models)
- ImageBind (6+ modality binding)
- Unified multimodal models
Architecture
- Expand src/Multimodal/
- Cross-attention mechanisms
- Unified embedding space
Success Criteria
- Zero-shot cross-modal retrieval
- Visual question answering benchmarks
- Multi-modal few-shot learning
- Video understanding datasets
Issue #405: Junior Developer Implementation Guide
Graph Neural Networks (GCN, GAT, GraphSAGE)
Table of Contents
- Understanding Graph Neural Networks
- Graph Convolutional Networks (GCN)
- Graph Attention Networks (GAT)
- GraphSAGE: Inductive Learning
- Graph Representations and Operations
- Implementation Strategy
- Testing Strategy
- Step-by-Step Implementation Guide
Understanding Graph Neural Networks
What are Graph Neural Networks?
Graph Neural Networks (GNNs) are neural networks designed to operate on graph-structured data.
Graph: G = (V, E) where:
-
V= Set of nodes/vertices (e.g., users, molecules, documents) -
E= Set of edges (e.g., friendships, chemical bonds, citations)
Examples:
- Social Networks: Nodes = users, Edges = friendships
- Molecules: Nodes = atoms, Edges = chemical bonds
- Citation Networks: Nodes = papers, Edges = citations
- Knowledge Graphs: Nodes = entities, Edges = relationships
Why Graph Neural Networks?
Traditional Neural Networks (CNNs, RNNs) work on:
- Grids: Images (2D grids of pixels)
- Sequences: Text (1D sequences of words)
GNNs work on:
- Irregular structures: Graphs with variable numbers of neighbors
- Non-Euclidean data: No fixed coordinate system
The Core Idea: Message Passing
GNNs learn node representations by aggregating information from neighbors:
For each node v:
1. Collect messages from neighbors N(v)
2. Aggregate messages (sum, mean, max, attention)
3. Update node representation h_v
Mathematical Form:
h_v^(l+1) = σ(W * AGG({h_u^(l) : u ∈ N(v)}))
Where:
-
h_v^(l)= Node v's representation at layer l -
N(v)= Neighbors of node v -
AGG= Aggregation function (sum, mean, attention) -
W= Learnable weight matrix -
σ= Activation function
Visual Example
Initial node features:
[A]--[B]
| X |
[C]--[D]
After 1 layer:
- Node A learns from neighbors: B, C
- Node B learns from neighbors: A, D
- Node C learns from neighbors: A, D
- Node D learns from neighbors: B, C
After 2 layers:
- Node A knows about all nodes (through 2-hop paths)
Graph Convolutional Networks (GCN)
Core Idea
Apply spectral graph theory to define convolutions on graphs.
Simplification: Use symmetric normalized aggregation for message passing.
Mathematical Foundation
1. Adjacency Matrix
Represent graph structure as matrix A:
A[i,j] = 1 if edge (i,j) exists
A[i,j] = 0 otherwise
Example:
Graph: 0---1
| |
2---3
Adjacency matrix A:
0 1 2 3
0 [ 0 1 1 0 ]
1 [ 1 0 0 1 ]
2 [ 1 0 0 1 ]
3 [ 0 1 1 0 ]
2. Degree Matrix
Diagonal matrix D where D[i,i] = degree of node i (number of neighbors):
D[i,i] = Σ_j A[i,j]
For the example above:
D = diag([2, 2, 2, 2])
3. Normalized Adjacency
Add self-loops and normalize:
A_hat = A + I (add self-loops)
D_hat = degree matrix of A_hat
A_norm = D_hat^(-1/2) * A_hat * D_hat^(-1/2)
Why normalize?
- Prevents numerical instabilities
- Makes aggregation independent of node degree
4. GCN Layer Formula
H^(l+1) = σ(A_norm * H^(l) * W^(l))
Where:
-
H^(l)= Node feature matrix at layer l (N × d_l) -
W^(l)= Learnable weight matrix (d_l × d_{l+1}) -
A_norm= Normalized adjacency matrix (N × N) -
σ= Activation function (ReLU, etc.)
Interpretation:
-
H^(l) * W^(l): Transform features -
A_norm * (...): Aggregate from neighbors -
σ(...): Non-linearity
5. Multi-Layer GCN
Stack multiple GCN layers to capture multi-hop neighborhoods:
H^(0) = X (input features)
H^(1) = σ(A_norm * H^(0) * W^(0))
H^(2) = σ(A_norm * H^(1) * W^(1))
...
Z = H^(L) (final embeddings)
Implementation
public class GCNLayer<T> : ILayer<T> where T : struct
{
private readonly int _inputDim;
private readonly int _outputDim;
private Tensor<T> _weights; // W matrix (inputDim × outputDim)
private Tensor<T> _bias; // bias vector (outputDim)
private readonly IActivationFunction<T> _activation;
// Cached values for backpropagation
private Tensor<T> _lastInput = null!;
private Tensor<T> _lastAdjacency = null!;
public GCNLayer(int inputDim, int outputDim, IActivationFunction<T> activation)
{
_inputDim = inputDim;
_outputDim = outputDim;
_activation = activation;
// Initialize weights (Xavier/Glorot initialization)
_weights = InitializeWeights(inputDim, outputDim);
_bias = new Tensor<T>(new[] { outputDim });
}
/// <summary>
/// Forward pass: H' = σ(A_norm * H * W + b)
/// </summary>
public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> normalizedAdjacency)
{
// nodeFeatures: (numNodes × inputDim)
// normalizedAdjacency: (numNodes × numNodes)
_lastInput = nodeFeatures;
_lastAdjacency = normalizedAdjacency;
// Step 1: Transform features: H * W
var transformed = nodeFeatures.MatMul(_weights); // (numNodes × outputDim)
// Step 2: Aggregate from neighbors: A_norm * (H * W)
var aggregated = normalizedAdjacency.MatMul(transformed); // (numNodes × outputDim)
// Step 3: Add bias
var withBias = aggregated.Add(_bias);
// Step 4: Apply activation
var output = _activation.Activate(withBias);
return output;
}
/// <summary>
/// Backward pass: compute gradients
/// </summary>
public Tensor<T> Backward(Tensor<T> gradOutput)
{
// Gradient through activation
var gradActivation = _activation.Derivative(_lastInput);
var gradPreActivation = gradOutput.Multiply(gradActivation);
// Gradient w.r.t. weights: H^T * A_norm^T * gradPreActivation
var gradWeights = _lastInput.Transpose()
.MatMul(_lastAdjacency.Transpose())
.MatMul(gradPreActivation);
// Gradient w.r.t. bias: sum over nodes
var gradBias = gradPreActivation.Sum(axis: 0);
// Gradient w.r.t. input: A_norm^T * gradPreActivation * W^T
var gradInput = _lastAdjacency.Transpose()
.MatMul(gradPreActivation)
.MatMul(_weights.Transpose());
// Update weights
UpdateWeights(gradWeights, gradBias);
return gradInput;
}
private Tensor<T> InitializeWeights(int fanIn, int fanOut)
{
// Xavier initialization: U(-sqrt(6/(fan_in+fan_out)), sqrt(6/(fan_in+fan_out)))
double scale = Math.Sqrt(6.0 / (fanIn + fanOut));
var random = new Random();
var weights = new T[fanIn * fanOut];
for (int i = 0; i < weights.Length; i++)
{
double value = (random.NextDouble() * 2 - 1) * scale;
weights[i] = NumOps<T>.FromDouble(value);
}
return new Tensor<T>(weights, new[] { fanIn, fanOut });
}
}
Graph Normalization Helper
public static class GraphNormalization
{
/// <summary>
/// Compute normalized adjacency: D^(-1/2) * (A + I) * D^(-1/2)
/// </summary>
public static Tensor<T> ComputeNormalizedAdjacency<T>(Tensor<T> adjacency) where T : struct
{
int numNodes = adjacency.Shape[0];
// Step 1: Add self-loops: A_hat = A + I
var aHat = adjacency.Clone();
for (int i = 0; i < numNodes; i++)
{
aHat[i, i] = NumOps<T>.Add(aHat[i, i], NumOps<T>.One);
}
// Step 2: Compute degree matrix D_hat
var degrees = new T[numNodes];
for (int i = 0; i < numNodes; i++)
{
T sum = NumOps<T>.Zero;
for (int j = 0; j < numNodes; j++)
{
sum = NumOps<T>.Add(sum, aHat[i, j]);
}
degrees[i] = sum;
}
// Step 3: Compute D^(-1/2)
var degreeInvSqrt = new T[numNodes];
for (int i = 0; i < numNodes; i++)
{
double d = Convert.ToDouble(degrees[i]);
degreeInvSqrt[i] = NumOps<T>.FromDouble(1.0 / Math.Sqrt(d));
}
// Step 4: Compute D^(-1/2) * A_hat * D^(-1/2)
var normalized = new Tensor<T>(new[] { numNodes, numNodes });
for (int i = 0; i < numNodes; i++)
{
for (int j = 0; j < numNodes; j++)
{
var value = NumOps<T>.Multiply(
NumOps<T>.Multiply(degreeInvSqrt[i], aHat[i, j]),
degreeInvSqrt[j]
);
normalized[i, j] = value;
}
}
return normalized;
}
}
Graph Attention Networks (GAT)
Core Idea
Use attention mechanism to learn importance weights for neighbors.
Problem with GCN: All neighbors have equal importance (after normalization).
GAT Solution: Learn attention weights α_{ij} for each edge (i → j).
Mathematical Foundation
1. Attention Mechanism
For each edge (i → j), compute attention coefficient:
e_{ij} = LeakyReLU(a^T [W*h_i || W*h_j])
Where:
-
h_i,h_j= Node features for nodes i and j -
W= Shared weight matrix -
a= Attention vector (learnable) -
||= Concatenation -
LeakyReLU= Activation function
2. Normalize with Softmax
α_{ij} = softmax_j(e_{ij}) = exp(e_{ij}) / Σ_{k∈N(i)} exp(e_{ik})
Result: α_{ij} represents the importance of node j to node i.
3. Aggregate with Attention
h_i' = σ(Σ_{j∈N(i)} α_{ij} * W * h_j)
4. Multi-Head Attention
Use multiple attention heads (like Transformer):
h_i' = ||_{k=1}^K σ(Σ_{j∈N(i)} α_{ij}^k * W^k * h_j)
Where:
-
K= Number of attention heads -
||= Concatenation or averaging
Implementation
public class GATLayer<T> : ILayer<T> where T : struct
{
private readonly int _inputDim;
private readonly int _outputDim;
private readonly int _numHeads;
private readonly List<AttentionHead<T>> _heads;
private readonly bool _concat; // Concatenate or average heads
public GATLayer(int inputDim, int outputDim, int numHeads = 8, bool concat = true)
{
_inputDim = inputDim;
_outputDim = outputDim;
_numHeads = numHeads;
_concat = concat;
// Create multiple attention heads
_heads = new List<AttentionHead<T>>();
for (int i = 0; i < numHeads; i++)
{
_heads.Add(new AttentionHead<T>(inputDim, outputDim / numHeads));
}
}
public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> adjacency)
{
// Compute attention for each head
var headOutputs = _heads.Select(head =>
head.Forward(nodeFeatures, adjacency)
).ToList();
// Combine heads
if (_concat)
{
// Concatenate along feature dimension
return Tensor<T>.Concatenate(headOutputs, axis: 1);
}
else
{
// Average across heads
var sum = headOutputs[0];
for (int i = 1; i < headOutputs.Count; i++)
{
sum = sum.Add(headOutputs[i]);
}
return sum.Divide(NumOps<T>.FromDouble(_numHeads));
}
}
}
public class AttentionHead<T> where T : struct
{
private Tensor<T> _W; // Weight matrix
private Tensor<T> _a; // Attention vector
private readonly double _leakyReluSlope = 0.2;
private Tensor<T> _lastFeatures = null!;
private Tensor<T> _lastAdjacency = null!;
private Tensor<T> _lastAlphas = null!;
public AttentionHead(int inputDim, int outputDim)
{
_W = InitializeWeights(inputDim, outputDim);
_a = InitializeWeights(2 * outputDim, 1); // [W*h_i || W*h_j] has dim 2*outputDim
}
public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> adjacency)
{
// nodeFeatures: (numNodes × inputDim)
// adjacency: (numNodes × numNodes)
_lastFeatures = nodeFeatures;
_lastAdjacency = adjacency;
int numNodes = nodeFeatures.Shape[0];
// Step 1: Transform features: h' = W * h
var transformed = nodeFeatures.MatMul(_W); // (numNodes × outputDim)
// Step 2: Compute attention coefficients
var attentionScores = ComputeAttentionScores(transformed, adjacency);
// Step 3: Apply softmax to get attention weights
var alphas = ComputeAttentionWeights(attentionScores, adjacency);
_lastAlphas = alphas;
// Step 4: Aggregate features using attention
var aggregated = alphas.MatMul(transformed); // (numNodes × outputDim)
// Step 5: Apply activation (ELU is common for GAT)
return ApplyELU(aggregated);
}
private Tensor<T> ComputeAttentionScores(Tensor<T> features, Tensor<T> adjacency)
{
int numNodes = features.Shape[0];
var scores = new Tensor<T>(new[] { numNodes, numNodes });
for (int i = 0; i < numNodes; i++)
{
for (int j = 0; j < numNodes; j++)
{
// Only compute attention for existing edges
if (Convert.ToDouble(adjacency[i, j]) == 0 && i != j)
{
scores[i, j] = NumOps<T>.FromDouble(double.NegativeInfinity);
continue;
}
// Concatenate [h_i || h_j]
var concat = ConcatenateFeatures(features, i, j);
// e_ij = LeakyReLU(a^T * [W*h_i || W*h_j])
var score = concat.MatMul(_a).ToScalar();
scores[i, j] = ApplyLeakyReLU(score, _leakyReluSlope);
}
}
return scores;
}
private Tensor<T> ComputeAttentionWeights(Tensor<T> scores, Tensor<T> adjacency)
{
int numNodes = scores.Shape[0];
var alphas = new Tensor<T>(new[] { numNodes, numNodes });
// Softmax for each node's neighbors
for (int i = 0; i < numNodes; i++)
{
// Collect scores for node i's neighbors
var neighborScores = new List<double>();
var neighborIndices = new List<int>();
for (int j = 0; j < numNodes; j++)
{
if (Convert.ToDouble(adjacency[i, j]) > 0 || i == j)
{
neighborScores.Add(Convert.ToDouble(scores[i, j]));
neighborIndices.Add(j);
}
}
// Apply softmax
var softmaxValues = Softmax(neighborScores);
// Assign attention weights
for (int k = 0; k < neighborIndices.Count; k++)
{
int j = neighborIndices[k];
alphas[i, j] = NumOps<T>.FromDouble(softmaxValues[k]);
}
}
return alphas;
}
private double[] Softmax(List<double> scores)
{
double max = scores.Max();
var exp = scores.Select(s => Math.Exp(s - max)).ToArray();
double sum = exp.Sum();
return exp.Select(e => e / sum).ToArray();
}
private T ApplyLeakyReLU(T x, double slope)
{
double value = Convert.ToDouble(x);
return NumOps<T>.FromDouble(value > 0 ? value : slope * value);
}
}
GraphSAGE: Inductive Learning
Core Idea
Problem with GCN/GAT: They are transductive - cannot generalize to unseen nodes.
- Require the full graph during training
- Cannot handle new nodes at test time
GraphSAGE Solution: Learn aggregation functions that generalize to new nodes.
Key Innovation: Sampling and Aggregating
Instead of using all neighbors:
- Sample a fixed number of neighbors
- Aggregate their features using a learned function
- Combine with node's own features
Mathematical Foundation
1. Neighbor Sampling
N_k(v) = Sample k neighbors from N(v)
Why sample?
- Computational efficiency (avoid exponential blowup)
- Fixed mini-batch sizes
- Regularization effect
2. Aggregation Functions
Mean Aggregator:
h_{N(v)}^k = MEAN({h_u^{k-1} : u ∈ N_k(v)})
LSTM Aggregator:
h_{N(v)}^k = LSTM({h_u^{k-1} : u ∈ N_k(v)})
Pooling Aggregator:
h_{N(v)}^k = MAX({σ(W * h_u^{k-1} + b) : u ∈ N_k(v)})
3. Combining with Self
h_v^k = σ(W^k * CONCAT(h_v^{k-1}, h_{N(v)}^k))
Or with skip connections:
h_v^k = σ(W^k * h_{N(v)}^k + h_v^{k-1})
4. Normalization
L2-normalize embeddings:
h_v^k ← h_v^k / ||h_v^k||_2
GraphSAGE Algorithm
Algorithm: GraphSAGE Forward
Input: Graph G(V, E), node features {x_v : v ∈ V}, depth K
Output: Node embeddings {z_v : v ∈ V}
1. Initialize: h_v^0 ← x_v for all v
2. For k = 1 to K:
For each node v in V:
a. Sample neighbors: N_k(v) ← Sample(N(v), k)
b. Aggregate neighbor features:
h_{N(v)}^k ← AGGREGATE({h_u^{k-1} : u ∈ N_k(v)})
c. Combine with self:
h_v^k ← σ(W^k * CONCAT(h_v^{k-1}, h_{N(v)}^k))
d. Normalize:
h_v^k ← h_v^k / ||h_v^k||_2
3. Return: z_v = h_v^K for all v
Implementation
public class GraphSAGELayer<T> : ILayer<T> where T : struct
{
private readonly int _inputDim;
private readonly int _outputDim;
private readonly int _numSamples; // Number of neighbors to sample
private readonly AggregatorType _aggregatorType;
private Tensor<T> _weights;
private readonly IActivationFunction<T> _activation;
public enum AggregatorType
{
Mean,
LSTM,
Pool,
GCN
}
public GraphSAGELayer(
int inputDim,
int outputDim,
int numSamples = 25,
AggregatorType aggregatorType = AggregatorType.Mean)
{
_inputDim = inputDim;
_outputDim = outputDim;
_numSamples = numSamples;
_aggregatorType = aggregatorType;
_activation = new ReLU<T>();
// Weight matrix for combining self + neighbor features
_weights = InitializeWeights(2 * inputDim, outputDim);
}
public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> adjacency)
{
int numNodes = nodeFeatures.Shape[0];
var output = new Tensor<T>(new[] { numNodes, _outputDim });
for (int nodeIdx = 0; nodeIdx < numNodes; nodeIdx++)
{
// Step 1: Sample neighbors
var neighborIndices = SampleNeighbors(adjacency, nodeIdx, _numSamples);
// Step 2: Aggregate neighbor features
var aggregated = AggregateNeighbors(nodeFeatures, neighborIndices);
// Step 3: Concatenate with self features
var selfFeatures = nodeFeatures.GetRow(nodeIdx);
var combined = Concatenate(selfFeatures, aggregated);
// Step 4: Transform and activate
var transformed = combined.MatMul(_weights);
var activated = _activation.Activate(transformed);
// Step 5: L2 normalize
var normalized = L2Normalize(activated);
output.SetRow(nodeIdx, normalized);
}
return output;
}
private List<int> SampleNeighbors(Tensor<T> adjacency, int nodeIdx, int k)
{
// Find all neighbors
var neighbors = new List<int>();
int numNodes = adjacency.Shape[0];
for (int j = 0; j < numNodes; j++)
{
if (Convert.ToDouble(adjacency[nodeIdx, j]) > 0 && j != nodeIdx)
{
neighbors.Add(j);
}
}
// Sample k neighbors (with replacement if needed)
if (neighbors.Count == 0)
return new List<int> { nodeIdx }; // Use self if no neighbors
var sampled = new List<int>();
var random = new Random();
for (int i = 0; i < Math.Min(k, neighbors.Count); i++)
{
int idx = random.Next(neighbors.Count);
sampled.Add(neighbors[idx]);
}
return sampled;
}
private Tensor<T> AggregateNeighbors(Tensor<T> nodeFeatures, List<int> neighborIndices)
{
return _aggregatorType switch
{
AggregatorType.Mean => AggregateMean(nodeFeatures, neighborIndices),
AggregatorType.Pool => AggregatePool(nodeFeatures, neighborIndices),
AggregatorType.LSTM => AggregateLSTM(nodeFeatures, neighborIndices),
_ => AggregateMean(nodeFeatures, neighborIndices)
};
}
private Tensor<T> AggregateMean(Tensor<T> nodeFeatures, List<int> indices)
{
if (indices.Count == 0)
return new Tensor<T>(new[] { _inputDim });
// Compute mean of neighbor features
var sum = new Tensor<T>(new[] { _inputDim });
foreach (var idx in indices)
{
var features = nodeFeatures.GetRow(idx);
sum = sum.Add(features);
}
return sum.Divide(NumOps<T>.FromDouble(indices.Count));
}
private Tensor<T> AggregatePool(Tensor<T> nodeFeatures, List<int> indices)
{
if (indices.Count == 0)
return new Tensor<T>(new[] { _inputDim });
// Transform each neighbor then take element-wise max
var pooled = new T[_inputDim];
for (int dim = 0; dim < _inputDim; dim++)
{
T maxValue = NumOps<T>.FromDouble(double.NegativeInfinity);
foreach (var idx in indices)
{
var value = nodeFeatures[idx, dim];
if (NumOps<T>.GreaterThan(value, maxValue))
{
maxValue = value;
}
}
pooled[dim] = maxValue;
}
return new Tensor<T>(pooled, new[] { _inputDim });
}
private Tensor<T> L2Normalize(Tensor<T> vector)
{
// Compute L2 norm
double sumSquares = 0;
var data = vector.ToArray();
foreach (var value in data)
{
double v = Convert.ToDouble(value);
sumSquares += v * v;
}
double norm = Math.Sqrt(sumSquares);
if (norm < 1e-12)
return vector; // Avoid division by zero
// Normalize
var normalized = new T[data.Length];
for (int i = 0; i < data.Length; i++)
{
normalized[i] = NumOps<T>.FromDouble(Convert.ToDouble(data[i]) / norm);
}
return new Tensor<T>(normalized, vector.Shape);
}
}
Graph Representations and Operations
Graph Data Structure
public class Graph<T> where T : struct
{
public Tensor<T> NodeFeatures { get; set; } // (numNodes × featureDim)
public Tensor<T> EdgeIndex { get; set; } // (2 × numEdges) - edge list format
public Tensor<T> EdgeWeights { get; set; } // (numEdges) - optional edge weights
public int NumNodes { get; set; }
public int NumEdges { get; set; }
/// <summary>
/// Convert edge list to adjacency matrix.
/// </summary>
public Tensor<T> ToAdjacencyMatrix()
{
var adjacency = new Tensor<T>(new[] { NumNodes, NumNodes });
for (int e = 0; e < NumEdges; e++)
{
int src = Convert.ToInt32(EdgeIndex[0, e]);
int dst = Convert.ToInt32(EdgeIndex[1, e]);
T weight = EdgeWeights != null ? EdgeWeights[e] : NumOps<T>.One;
adjacency[src, dst] = weight;
// For undirected graphs:
// adjacency[dst, src] = weight;
}
return adjacency;
}
/// <summary>
/// Get neighbors of a node.
/// </summary>
public List<int> GetNeighbors(int nodeIdx)
{
var neighbors = new List<int>();
for (int e = 0; e < NumEdges; e++)
{
int src = Convert.ToInt32(EdgeIndex[0, e]);
int dst = Convert.ToInt32(EdgeIndex[1, e]);
if (src == nodeIdx)
neighbors.Add(dst);
}
return neighbors;
}
}
Graph Batch Processing
public class GraphBatch<T> where T : struct
{
public List<Graph<T>> Graphs { get; set; } = new List<Graph<T>>();
/// <summary>
/// Combine multiple graphs into a single disconnected graph.
/// </summary>
public Graph<T> Collate()
{
int totalNodes = 0;
int totalEdges = 0;
// Count totals
foreach (var graph in Graphs)
{
totalNodes += graph.NumNodes;
totalEdges += graph.NumEdges;
}
// Allocate combined tensors
int featureDim = Graphs[0].NodeFeatures.Shape[1];
var nodeFeatures = new List<T[]>();
var edgeList = new List<(int, int)>();
int nodeOffset = 0;
// Concatenate graphs
foreach (var graph in Graphs)
{
// Add node features
for (int i = 0; i < graph.NumNodes; i++)
{
nodeFeatures.Add(graph.NodeFeatures.GetRow(i).ToArray());
}
// Add edges (with offset)
for (int e = 0; e < graph.NumEdges; e++)
{
int src = Convert.ToInt32(graph.EdgeIndex[0, e]) + nodeOffset;
int dst = Convert.ToInt32(graph.EdgeIndex[1, e]) + nodeOffset;
edgeList.Add((src, dst));
}
nodeOffset += graph.NumNodes;
}
// Create combined graph
var combinedFeatures = new Tensor<T>(
nodeFeatures.SelectMany(f => f).ToArray(),
new[] { totalNodes, featureDim }
);
var combinedEdgeIndex = new Tensor<T>(new[] { 2, totalEdges });
for (int e = 0; e < totalEdges; e++)
{
combinedEdgeIndex[0, e] = NumOps<T>.FromDouble(edgeList[e].Item1);
combinedEdgeIndex[1, e] = NumOps<T>.FromDouble(edgeList[e].Item2);
}
return new Graph<T>
{
NodeFeatures = combinedFeatures,
EdgeIndex = combinedEdgeIndex,
NumNodes = totalNodes,
NumEdges = totalEdges
};
}
}
Testing Strategy
Unit Tests
[TestClass]
public class GNNTests
{
[TestMethod]
public void GCNLayer_ForwardPass_CorrectShape()
{
// Arrange: Create simple graph (4 nodes, fully connected)
var adjacency = CreateFullyConnectedGraph(numNodes: 4);
var normalizedAdj = GraphNormalization.ComputeNormalizedAdjacency(adjacency);
var nodeFeatures = CreateRandomTensor(4, 8); // 4 nodes, 8 features
var gcnLayer = new GCNLayer<double>(inputDim: 8, outputDim: 16, new ReLU<double>());
// Act
var output = gcnLayer.Forward(nodeFeatures, normalizedAdj);
// Assert: Output shape is (4 × 16)
Assert.AreEqual(4, output.Shape[0]);
Assert.AreEqual(16, output.Shape[1]);
}
[TestMethod]
public void GAT_AttentionWeights_SumToOne()
{
// Arrange
var graph = CreateSimpleGraph(); // 3 nodes: 0-1, 1-2
var gatLayer = new GATLayer<double>(inputDim: 4, outputDim: 8, numHeads: 2);
// Act
var output = gatLayer.Forward(graph.NodeFeatures, graph.ToAdjacencyMatrix());
// Get attention weights for node 1 (has 2 neighbors)
var alphas = gatLayer.GetLastAttentionWeights();
// Assert: Attention weights for each node sum to 1
for (int node = 0; node < 3; node++)
{
double sum = 0;
for (int neighbor = 0; neighbor < 3; neighbor++)
{
sum += Convert.ToDouble(alphas[node, neighbor]);
}
Assert.AreEqual(1.0, sum, 1e-6);
}
}
[TestMethod]
public void GraphSAGE_Sampling_ReturnsCorrectCount()
{
// Arrange: Node with 10 neighbors
var graph = CreateStarGraph(numNeighbors: 10); // Center node + 10 neighbors
var sageLayer = new GraphSAGELayer<double>(
inputDim: 4, outputDim: 8, numSamples: 5
);
// Act
var output = sageLayer.Forward(graph.NodeFeatures, graph.ToAdjacencyMatrix());
// Assert: Should sample exactly 5 neighbors
var sampledIndices = sageLayer.GetLastSampledNeighbors(centerNode: 0);
Assert.AreEqual(5, sampledIndices.Count);
}
[TestMethod]
public void GraphSAGE_Inductive_WorksOnNewNodes()
{
// Arrange: Train on graph A
var trainGraph = CreateGraph(numNodes: 100);
var sageModel = new GraphSAGE<double>(layerDims: new[] { 16, 32, 64 });
sageModel.Train(trainGraph);
// Act: Test on graph B (completely new nodes)
var testGraph = CreateGraph(numNodes: 50); // Different nodes!
var embeddings = sageModel.Forward(testGraph);
// Assert: Should produce embeddings for new nodes
Assert.AreEqual(50, embeddings.Shape[0]);
Assert.AreEqual(64, embeddings.Shape[1]); // Output dimension
}
}
Integration Tests
[TestClass]
public class GNNIntegrationTests
{
[TestMethod]
public void GCN_Cora_NodeClassification()
{
// Arrange: Load Cora citation network
var (graph, labels, trainMask, testMask) = LoadCoraDataset();
// Build 2-layer GCN
var gcn = new GCN<double>(
inputDim: graph.NodeFeatures.Shape[1],
hiddenDim: 16,
outputDim: 7, // 7 classes in Cora
numLayers: 2
);
// Act: Train
gcn.Train(graph, labels, trainMask, epochs: 200);
// Evaluate
var predictions = gcn.Forward(graph);
var accuracy = ComputeAccuracy(predictions, labels, testMask);
// Assert: Should achieve > 70% accuracy
Assert.IsTrue(accuracy > 0.70, $"Expected > 70%, got {accuracy:F2}%");
}
[TestMethod]
public void GAT_TransductiveLearning_ConvergesCorrectly()
{
// Arrange
var (graph, labels) = LoadPubMedDataset();
var gat = new GAT<double>(
inputDim: graph.NodeFeatures.Shape[1],
hiddenDim: 8,
outputDim: 3,
numHeads: 8
);
// Act
var losses = gat.TrainWithLogging(graph, labels, epochs: 100);
// Assert: Loss should decrease
Assert.IsTrue(losses.Last() < losses.First());
Assert.IsTrue(losses.Last() < 0.5);
}
[TestMethod]
public void GraphSAGE_ProteinProteinInteraction_InductiveLearning()
{
// Arrange: Load PPI dataset (inductive setting)
var (trainGraphs, trainLabels) = LoadPPITrain();
var (testGraphs, testLabels) = LoadPPITest(); // Completely new graphs!
var sage = new GraphSAGE<double>(
inputDim: 50,
hiddenDims: new[] { 256, 256, 128 },
numSamples: new[] { 25, 10 }
);
// Act: Train on training graphs
sage.Train(trainGraphs, trainLabels, epochs: 100);
// Test on completely new graphs (inductive)
var predictions = sage.PredictBatch(testGraphs);
var f1Score = ComputeF1Score(predictions, testLabels);
// Assert: Should achieve > 0.5 F1 score
Assert.IsTrue(f1Score > 0.5);
}
}
Step-by-Step Implementation Guide
Phase 1: Graph Infrastructure (Week 1)
Step 1: Create Graph Data Structure
File: src/GraphNeuralNetworks/Graph.cs
(See Graph Representations section for full implementation)
Step 2: Implement Graph Utilities
File: src/GraphNeuralNetworks/GraphUtils.cs
public static class GraphUtils
{
public static Tensor<T> ComputeNormalizedAdjacency<T>(Tensor<T> adjacency)
{
// See GraphNormalization.ComputeNormalizedAdjacency implementation
}
public static Tensor<T> EdgeListToAdjacencyMatrix<T>(Tensor<T> edgeIndex, int numNodes)
{
// Convert edge list format to adjacency matrix
}
public static List<int> GetKHopNeighbors<T>(Graph<T> graph, int nodeIdx, int k)
{
// Get all neighbors within k hops
}
}
Phase 2: GCN Implementation (Week 2)
Step 3: Implement GCN Layer
File: src/GraphNeuralNetworks/Layers/GCNLayer.cs
(See GCN section for full implementation)
Step 4: Build Multi-Layer GCN
File: src/GraphNeuralNetworks/Models/GCN.cs
public class GCN<T> : NeuralNetwork<T> where T : struct
{
private readonly List<GCNLayer<T>> _layers;
private readonly Tensor<T> _normalizedAdjacency;
public GCN(int inputDim, int hiddenDim, int outputDim, int numLayers = 2)
{
_layers = new List<GCNLayer<T>>();
// Input layer
_layers.Add(new GCNLayer<T>(inputDim, hiddenDim, new ReLU<T>()));
// Hidden layers
for (int i = 1; i < numLayers - 1; i++)
{
_layers.Add(new GCNLayer<T>(hiddenDim, hiddenDim, new ReLU<T>()));
}
// Output layer
_layers.Add(new GCNLayer<T>(hiddenDim, outputDim, new Softmax<T>()));
}
public Tensor<T> Forward(Graph<T> graph)
{
var adjacency = graph.ToAdjacencyMatrix();
var normalizedAdj = GraphUtils.ComputeNormalizedAdjacency(adjacency);
var h = graph.NodeFeatures;
foreach (var layer in _layers)
{
h = layer.Forward(h, normalizedAdj);
}
return h;
}
}
Phase 3: GAT Implementation (Week 3)
Step 5: Implement Attention Head
File: src/GraphNeuralNetworks/Layers/AttentionHead.cs
(See GAT section for full implementation)
Step 6: Implement GAT Layer
File: src/GraphNeuralNetworks/Layers/GATLayer.cs
(See GAT section for full implementation)
Phase 4: GraphSAGE Implementation (Week 4)
Step 7: Implement Aggregators
File: src/GraphNeuralNetworks/Aggregators/IAggregator.cs
public interface IAggregator<T> where T : struct
{
Tensor<T> Aggregate(Tensor<T> nodeFeatures, List<int> neighborIndices);
}
public class MeanAggregator<T> : IAggregator<T> where T : struct
{
// See AggregateMean in GraphSAGE section
}
public class PoolAggregator<T> : IAggregator<T> where T : struct
{
// See AggregatePool in GraphSAGE section
}
Step 8: Implement GraphSAGE Layer
File: src/GraphNeuralNetworks/Layers/GraphSAGELayer.cs
(See GraphSAGE section for full implementation)
Phase 5: Testing and Validation (Week 5-6)
Step 9: Create Test Datasets
File: tests/GraphNeuralNetworks/TestData/GraphDatasets.cs
public static class GraphDatasets
{
public static (Graph<double>, Tensor<double>, bool[], bool[]) LoadCora()
{
// Load Cora citation network
// Returns: (graph, labels, trainMask, testMask)
}
public static Graph<double> CreateKarateClub()
{
// Create Zachary's Karate Club graph (classic test case)
}
}
Step 10: Run Comprehensive Tests
(See Testing Strategy section for complete test suite)
Summary
This guide provides:
- GCN: Spectral graph convolutions with normalized aggregation
- GAT: Attention-based neighbor aggregation for adaptive weighting
- GraphSAGE: Inductive learning via sampling and aggregation
- Graph Operations: Efficient representations and batch processing
- Testing: Comprehensive unit and integration tests on real datasets
Key Differences:
- GCN: Fast, simple, works well on homophilic graphs (similar neighbors)
- GAT: Learns attention weights, better for heterophilic graphs
- GraphSAGE: Inductive (generalizes to new nodes), sampling-based
Expected Timeline: 6 weeks for full implementation with extensive testing