[Phase 3] Expand Multimodal AI Beyond CLIP and BLIP-2

Open ooples opened this issue 4 months ago • 1 comments

Problem

Has CLIP (issue #272) and BLIP-2 (issue #273) but missing advanced multimodal models.

Existing

Issue #272: CLIP
Issue #273: BLIP-2 (VQA, captioning)
src/RetrievalAugmentedGeneration/MultiModalEmbeddingModel.cs

Missing Implementations

Visual Language Models (CRITICAL):

Flamingo (in-context learning)
GPT-4V-equivalent (vision + language LLM)
LLaVA (large language and vision assistant)

Audio-Visual (HIGH):

Audio-visual correspondence learning
Cross-modal retrieval (audio-image-text)
Audio-visual event localization

Video-Language (HIGH):

VideoCLIP
Video captioning (beyond frame-level)
Video question answering
Action recognition with language

Advanced (MEDIUM):

DALL-E 3 text-to-image (complementing diffusion models)
ImageBind (6+ modality binding)
Unified multimodal models

Architecture

Expand src/Multimodal/
Cross-attention mechanisms
Unified embedding space

Success Criteria

Zero-shot cross-modal retrieval
Visual question answering benchmarks
Multi-modal few-shot learning
Video understanding datasets

Nov 07 '25 03:11 ooples

Issue #405: Junior Developer Implementation Guide

Graph Neural Networks (GCN, GAT, GraphSAGE)

Understanding Graph Neural Networks
Graph Convolutional Networks (GCN)
Graph Attention Networks (GAT)
GraphSAGE: Inductive Learning
Graph Representations and Operations
Implementation Strategy
Testing Strategy
Step-by-Step Implementation Guide

Understanding Graph Neural Networks

What are Graph Neural Networks?

Graph Neural Networks (GNNs) are neural networks designed to operate on graph-structured data.

Graph: G = (V, E) where:

V = Set of nodes/vertices (e.g., users, molecules, documents)
E = Set of edges (e.g., friendships, chemical bonds, citations)

Examples:

Social Networks: Nodes = users, Edges = friendships
Molecules: Nodes = atoms, Edges = chemical bonds
Citation Networks: Nodes = papers, Edges = citations
Knowledge Graphs: Nodes = entities, Edges = relationships

Why Graph Neural Networks?

Traditional Neural Networks (CNNs, RNNs) work on:

Grids: Images (2D grids of pixels)
Sequences: Text (1D sequences of words)

GNNs work on:

Irregular structures: Graphs with variable numbers of neighbors
Non-Euclidean data: No fixed coordinate system

The Core Idea: Message Passing

GNNs learn node representations by aggregating information from neighbors:

For each node v:
    1. Collect messages from neighbors N(v)
    2. Aggregate messages (sum, mean, max, attention)
    3. Update node representation h_v

Mathematical Form:

h_v^(l+1) = σ(W * AGG({h_u^(l) : u ∈ N(v)}))

Where:

h_v^(l) = Node v's representation at layer l
N(v) = Neighbors of node v
AGG = Aggregation function (sum, mean, attention)
W = Learnable weight matrix
σ = Activation function

Visual Example

Initial node features:
   [A]--[B]
    |  X  |
   [C]--[D]

After 1 layer:
- Node A learns from neighbors: B, C
- Node B learns from neighbors: A, D
- Node C learns from neighbors: A, D
- Node D learns from neighbors: B, C

After 2 layers:
- Node A knows about all nodes (through 2-hop paths)

Graph Convolutional Networks (GCN)

Core Idea

Apply spectral graph theory to define convolutions on graphs.

Simplification: Use symmetric normalized aggregation for message passing.

Mathematical Foundation

1. Adjacency Matrix

Represent graph structure as matrix A:

A[i,j] = 1 if edge (i,j) exists
A[i,j] = 0 otherwise

Example:

Graph:  0---1
        |   |
        2---3

Adjacency matrix A:
    0  1  2  3
0 [ 0  1  1  0 ]
1 [ 1  0  0  1 ]
2 [ 1  0  0  1 ]
3 [ 0  1  1  0 ]

2. Degree Matrix

Diagonal matrix D where D[i,i] = degree of node i (number of neighbors):

D[i,i] = Σ_j A[i,j]

For the example above:

D = diag([2, 2, 2, 2])

3. Normalized Adjacency

Add self-loops and normalize:

A_hat = A + I  (add self-loops)
D_hat = degree matrix of A_hat
A_norm = D_hat^(-1/2) * A_hat * D_hat^(-1/2)

Why normalize?

Prevents numerical instabilities
Makes aggregation independent of node degree

4. GCN Layer Formula

H^(l+1) = σ(A_norm * H^(l) * W^(l))

Where:

H^(l) = Node feature matrix at layer l (N × d_l)
W^(l) = Learnable weight matrix (d_l × d_{l+1})
A_norm = Normalized adjacency matrix (N × N)
σ = Activation function (ReLU, etc.)

Interpretation:

H^(l) * W^(l): Transform features
A_norm * (...): Aggregate from neighbors
σ(...): Non-linearity

5. Multi-Layer GCN

Stack multiple GCN layers to capture multi-hop neighborhoods:

H^(0) = X  (input features)
H^(1) = σ(A_norm * H^(0) * W^(0))
H^(2) = σ(A_norm * H^(1) * W^(1))
...
Z = H^(L)  (final embeddings)

Implementation

public class GCNLayer<T> : ILayer<T> where T : struct
{
    private readonly int _inputDim;
    private readonly int _outputDim;
    private Tensor<T> _weights;  // W matrix (inputDim × outputDim)
    private Tensor<T> _bias;     // bias vector (outputDim)
    private readonly IActivationFunction<T> _activation;

    // Cached values for backpropagation
    private Tensor<T> _lastInput = null!;
    private Tensor<T> _lastAdjacency = null!;

    public GCNLayer(int inputDim, int outputDim, IActivationFunction<T> activation)
    {
        _inputDim = inputDim;
        _outputDim = outputDim;
        _activation = activation;

        // Initialize weights (Xavier/Glorot initialization)
        _weights = InitializeWeights(inputDim, outputDim);
        _bias = new Tensor<T>(new[] { outputDim });
    }

    /// <summary>
    /// Forward pass: H' = σ(A_norm * H * W + b)
    /// </summary>
    public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> normalizedAdjacency)
    {
        // nodeFeatures: (numNodes × inputDim)
        // normalizedAdjacency: (numNodes × numNodes)

        _lastInput = nodeFeatures;
        _lastAdjacency = normalizedAdjacency;

        // Step 1: Transform features: H * W
        var transformed = nodeFeatures.MatMul(_weights);  // (numNodes × outputDim)

        // Step 2: Aggregate from neighbors: A_norm * (H * W)
        var aggregated = normalizedAdjacency.MatMul(transformed);  // (numNodes × outputDim)

        // Step 3: Add bias
        var withBias = aggregated.Add(_bias);

        // Step 4: Apply activation
        var output = _activation.Activate(withBias);

        return output;
    }

    /// <summary>
    /// Backward pass: compute gradients
    /// </summary>
    public Tensor<T> Backward(Tensor<T> gradOutput)
    {
        // Gradient through activation
        var gradActivation = _activation.Derivative(_lastInput);
        var gradPreActivation = gradOutput.Multiply(gradActivation);

        // Gradient w.r.t. weights: H^T * A_norm^T * gradPreActivation
        var gradWeights = _lastInput.Transpose()
            .MatMul(_lastAdjacency.Transpose())
            .MatMul(gradPreActivation);

        // Gradient w.r.t. bias: sum over nodes
        var gradBias = gradPreActivation.Sum(axis: 0);

        // Gradient w.r.t. input: A_norm^T * gradPreActivation * W^T
        var gradInput = _lastAdjacency.Transpose()
            .MatMul(gradPreActivation)
            .MatMul(_weights.Transpose());

        // Update weights
        UpdateWeights(gradWeights, gradBias);

        return gradInput;
    }

    private Tensor<T> InitializeWeights(int fanIn, int fanOut)
    {
        // Xavier initialization: U(-sqrt(6/(fan_in+fan_out)), sqrt(6/(fan_in+fan_out)))
        double scale = Math.Sqrt(6.0 / (fanIn + fanOut));
        var random = new Random();

        var weights = new T[fanIn * fanOut];
        for (int i = 0; i < weights.Length; i++)
        {
            double value = (random.NextDouble() * 2 - 1) * scale;
            weights[i] = NumOps<T>.FromDouble(value);
        }

        return new Tensor<T>(weights, new[] { fanIn, fanOut });
    }
}

Graph Normalization Helper

public static class GraphNormalization
{
    /// <summary>
    /// Compute normalized adjacency: D^(-1/2) * (A + I) * D^(-1/2)
    /// </summary>
    public static Tensor<T> ComputeNormalizedAdjacency<T>(Tensor<T> adjacency) where T : struct
    {
        int numNodes = adjacency.Shape[0];

        // Step 1: Add self-loops: A_hat = A + I
        var aHat = adjacency.Clone();
        for (int i = 0; i < numNodes; i++)
        {
            aHat[i, i] = NumOps<T>.Add(aHat[i, i], NumOps<T>.One);
        }

        // Step 2: Compute degree matrix D_hat
        var degrees = new T[numNodes];
        for (int i = 0; i < numNodes; i++)
        {
            T sum = NumOps<T>.Zero;
            for (int j = 0; j < numNodes; j++)
            {
                sum = NumOps<T>.Add(sum, aHat[i, j]);
            }
            degrees[i] = sum;
        }

        // Step 3: Compute D^(-1/2)
        var degreeInvSqrt = new T[numNodes];
        for (int i = 0; i < numNodes; i++)
        {
            double d = Convert.ToDouble(degrees[i]);
            degreeInvSqrt[i] = NumOps<T>.FromDouble(1.0 / Math.Sqrt(d));
        }

        // Step 4: Compute D^(-1/2) * A_hat * D^(-1/2)
        var normalized = new Tensor<T>(new[] { numNodes, numNodes });
        for (int i = 0; i < numNodes; i++)
        {
            for (int j = 0; j < numNodes; j++)
            {
                var value = NumOps<T>.Multiply(
                    NumOps<T>.Multiply(degreeInvSqrt[i], aHat[i, j]),
                    degreeInvSqrt[j]
                );
                normalized[i, j] = value;
            }
        }

        return normalized;
    }
}

Graph Attention Networks (GAT)

Core Idea

Use attention mechanism to learn importance weights for neighbors.

Problem with GCN: All neighbors have equal importance (after normalization).

GAT Solution: Learn attention weights α_{ij} for each edge (i → j).

Mathematical Foundation

1. Attention Mechanism

For each edge (i → j), compute attention coefficient:

e_{ij} = LeakyReLU(a^T [W*h_i || W*h_j])

Where:

h_i, h_j = Node features for nodes i and j
W = Shared weight matrix
a = Attention vector (learnable)
|| = Concatenation
LeakyReLU = Activation function

2. Normalize with Softmax

α_{ij} = softmax_j(e_{ij}) = exp(e_{ij}) / Σ_{k∈N(i)} exp(e_{ik})

Result: α_{ij} represents the importance of node j to node i.

3. Aggregate with Attention

h_i' = σ(Σ_{j∈N(i)} α_{ij} * W * h_j)

4. Multi-Head Attention

Use multiple attention heads (like Transformer):

h_i' = ||_{k=1}^K σ(Σ_{j∈N(i)} α_{ij}^k * W^k * h_j)

Where:

K = Number of attention heads
|| = Concatenation or averaging

Implementation

public class GATLayer<T> : ILayer<T> where T : struct
{
    private readonly int _inputDim;
    private readonly int _outputDim;
    private readonly int _numHeads;
    private readonly List<AttentionHead<T>> _heads;
    private readonly bool _concat;  // Concatenate or average heads

    public GATLayer(int inputDim, int outputDim, int numHeads = 8, bool concat = true)
    {
        _inputDim = inputDim;
        _outputDim = outputDim;
        _numHeads = numHeads;
        _concat = concat;

        // Create multiple attention heads
        _heads = new List<AttentionHead<T>>();
        for (int i = 0; i < numHeads; i++)
        {
            _heads.Add(new AttentionHead<T>(inputDim, outputDim / numHeads));
        }
    }

    public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> adjacency)
    {
        // Compute attention for each head
        var headOutputs = _heads.Select(head =>
            head.Forward(nodeFeatures, adjacency)
        ).ToList();

        // Combine heads
        if (_concat)
        {
            // Concatenate along feature dimension
            return Tensor<T>.Concatenate(headOutputs, axis: 1);
        }
        else
        {
            // Average across heads
            var sum = headOutputs[0];
            for (int i = 1; i < headOutputs.Count; i++)
            {
                sum = sum.Add(headOutputs[i]);
            }
            return sum.Divide(NumOps<T>.FromDouble(_numHeads));
        }
    }
}

public class AttentionHead<T> where T : struct
{
    private Tensor<T> _W;  // Weight matrix
    private Tensor<T> _a;  // Attention vector
    private readonly double _leakyReluSlope = 0.2;

    private Tensor<T> _lastFeatures = null!;
    private Tensor<T> _lastAdjacency = null!;
    private Tensor<T> _lastAlphas = null!;

    public AttentionHead(int inputDim, int outputDim)
    {
        _W = InitializeWeights(inputDim, outputDim);
        _a = InitializeWeights(2 * outputDim, 1);  // [W*h_i || W*h_j] has dim 2*outputDim
    }

    public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> adjacency)
    {
        // nodeFeatures: (numNodes × inputDim)
        // adjacency: (numNodes × numNodes)

        _lastFeatures = nodeFeatures;
        _lastAdjacency = adjacency;

        int numNodes = nodeFeatures.Shape[0];

        // Step 1: Transform features: h' = W * h
        var transformed = nodeFeatures.MatMul(_W);  // (numNodes × outputDim)

        // Step 2: Compute attention coefficients
        var attentionScores = ComputeAttentionScores(transformed, adjacency);

        // Step 3: Apply softmax to get attention weights
        var alphas = ComputeAttentionWeights(attentionScores, adjacency);
        _lastAlphas = alphas;

        // Step 4: Aggregate features using attention
        var aggregated = alphas.MatMul(transformed);  // (numNodes × outputDim)

        // Step 5: Apply activation (ELU is common for GAT)
        return ApplyELU(aggregated);
    }

    private Tensor<T> ComputeAttentionScores(Tensor<T> features, Tensor<T> adjacency)
    {
        int numNodes = features.Shape[0];
        var scores = new Tensor<T>(new[] { numNodes, numNodes });

        for (int i = 0; i < numNodes; i++)
        {
            for (int j = 0; j < numNodes; j++)
            {
                // Only compute attention for existing edges
                if (Convert.ToDouble(adjacency[i, j]) == 0 && i != j)
                {
                    scores[i, j] = NumOps<T>.FromDouble(double.NegativeInfinity);
                    continue;
                }

                // Concatenate [h_i || h_j]
                var concat = ConcatenateFeatures(features, i, j);

                // e_ij = LeakyReLU(a^T * [W*h_i || W*h_j])
                var score = concat.MatMul(_a).ToScalar();
                scores[i, j] = ApplyLeakyReLU(score, _leakyReluSlope);
            }
        }

        return scores;
    }

    private Tensor<T> ComputeAttentionWeights(Tensor<T> scores, Tensor<T> adjacency)
    {
        int numNodes = scores.Shape[0];
        var alphas = new Tensor<T>(new[] { numNodes, numNodes });

        // Softmax for each node's neighbors
        for (int i = 0; i < numNodes; i++)
        {
            // Collect scores for node i's neighbors
            var neighborScores = new List<double>();
            var neighborIndices = new List<int>();

            for (int j = 0; j < numNodes; j++)
            {
                if (Convert.ToDouble(adjacency[i, j]) > 0 || i == j)
                {
                    neighborScores.Add(Convert.ToDouble(scores[i, j]));
                    neighborIndices.Add(j);
                }
            }

            // Apply softmax
            var softmaxValues = Softmax(neighborScores);

            // Assign attention weights
            for (int k = 0; k < neighborIndices.Count; k++)
            {
                int j = neighborIndices[k];
                alphas[i, j] = NumOps<T>.FromDouble(softmaxValues[k]);
            }
        }

        return alphas;
    }

    private double[] Softmax(List<double> scores)
    {
        double max = scores.Max();
        var exp = scores.Select(s => Math.Exp(s - max)).ToArray();
        double sum = exp.Sum();
        return exp.Select(e => e / sum).ToArray();
    }

    private T ApplyLeakyReLU(T x, double slope)
    {
        double value = Convert.ToDouble(x);
        return NumOps<T>.FromDouble(value > 0 ? value : slope * value);
    }
}

GraphSAGE: Inductive Learning

Core Idea

Problem with GCN/GAT: They are transductive - cannot generalize to unseen nodes.

Require the full graph during training
Cannot handle new nodes at test time

GraphSAGE Solution: Learn aggregation functions that generalize to new nodes.

Key Innovation: Sampling and Aggregating

Instead of using all neighbors:

Sample a fixed number of neighbors
Aggregate their features using a learned function
Combine with node's own features

Mathematical Foundation

1. Neighbor Sampling

N_k(v) = Sample k neighbors from N(v)

Why sample?

Computational efficiency (avoid exponential blowup)
Fixed mini-batch sizes
Regularization effect

2. Aggregation Functions

Mean Aggregator:

h_{N(v)}^k = MEAN({h_u^{k-1} : u ∈ N_k(v)})

LSTM Aggregator:

h_{N(v)}^k = LSTM({h_u^{k-1} : u ∈ N_k(v)})

Pooling Aggregator:

h_{N(v)}^k = MAX({σ(W * h_u^{k-1} + b) : u ∈ N_k(v)})

3. Combining with Self

h_v^k = σ(W^k * CONCAT(h_v^{k-1}, h_{N(v)}^k))

Or with skip connections:

h_v^k = σ(W^k * h_{N(v)}^k + h_v^{k-1})

4. Normalization

L2-normalize embeddings:

h_v^k ← h_v^k / ||h_v^k||_2

GraphSAGE Algorithm

Algorithm: GraphSAGE Forward
Input: Graph G(V, E), node features {x_v : v ∈ V}, depth K
Output: Node embeddings {z_v : v ∈ V}

1. Initialize: h_v^0 ← x_v for all v

2. For k = 1 to K:
   For each node v in V:
      a. Sample neighbors: N_k(v) ← Sample(N(v), k)

      b. Aggregate neighbor features:
         h_{N(v)}^k ← AGGREGATE({h_u^{k-1} : u ∈ N_k(v)})

      c. Combine with self:
         h_v^k ← σ(W^k * CONCAT(h_v^{k-1}, h_{N(v)}^k))

      d. Normalize:
         h_v^k ← h_v^k / ||h_v^k||_2

3. Return: z_v = h_v^K for all v

Implementation

public class GraphSAGELayer<T> : ILayer<T> where T : struct
{
    private readonly int _inputDim;
    private readonly int _outputDim;
    private readonly int _numSamples;  // Number of neighbors to sample
    private readonly AggregatorType _aggregatorType;
    private Tensor<T> _weights;
    private readonly IActivationFunction<T> _activation;

    public enum AggregatorType
    {
        Mean,
        LSTM,
        Pool,
        GCN
    }

    public GraphSAGELayer(
        int inputDim,
        int outputDim,
        int numSamples = 25,
        AggregatorType aggregatorType = AggregatorType.Mean)
    {
        _inputDim = inputDim;
        _outputDim = outputDim;
        _numSamples = numSamples;
        _aggregatorType = aggregatorType;
        _activation = new ReLU<T>();

        // Weight matrix for combining self + neighbor features
        _weights = InitializeWeights(2 * inputDim, outputDim);
    }

    public Tensor<T> Forward(Tensor<T> nodeFeatures, Tensor<T> adjacency)
    {
        int numNodes = nodeFeatures.Shape[0];
        var output = new Tensor<T>(new[] { numNodes, _outputDim });

        for (int nodeIdx = 0; nodeIdx < numNodes; nodeIdx++)
        {
            // Step 1: Sample neighbors
            var neighborIndices = SampleNeighbors(adjacency, nodeIdx, _numSamples);

            // Step 2: Aggregate neighbor features
            var aggregated = AggregateNeighbors(nodeFeatures, neighborIndices);

            // Step 3: Concatenate with self features
            var selfFeatures = nodeFeatures.GetRow(nodeIdx);
            var combined = Concatenate(selfFeatures, aggregated);

            // Step 4: Transform and activate
            var transformed = combined.MatMul(_weights);
            var activated = _activation.Activate(transformed);

            // Step 5: L2 normalize
            var normalized = L2Normalize(activated);

            output.SetRow(nodeIdx, normalized);
        }

        return output;
    }

    private List<int> SampleNeighbors(Tensor<T> adjacency, int nodeIdx, int k)
    {
        // Find all neighbors
        var neighbors = new List<int>();
        int numNodes = adjacency.Shape[0];

        for (int j = 0; j < numNodes; j++)
        {
            if (Convert.ToDouble(adjacency[nodeIdx, j]) > 0 && j != nodeIdx)
            {
                neighbors.Add(j);
            }
        }

        // Sample k neighbors (with replacement if needed)
        if (neighbors.Count == 0)
            return new List<int> { nodeIdx };  // Use self if no neighbors

        var sampled = new List<int>();
        var random = new Random();

        for (int i = 0; i < Math.Min(k, neighbors.Count); i++)
        {
            int idx = random.Next(neighbors.Count);
            sampled.Add(neighbors[idx]);
        }

        return sampled;
    }

    private Tensor<T> AggregateNeighbors(Tensor<T> nodeFeatures, List<int> neighborIndices)
    {
        return _aggregatorType switch
        {
            AggregatorType.Mean => AggregateMean(nodeFeatures, neighborIndices),
            AggregatorType.Pool => AggregatePool(nodeFeatures, neighborIndices),
            AggregatorType.LSTM => AggregateLSTM(nodeFeatures, neighborIndices),
            _ => AggregateMean(nodeFeatures, neighborIndices)
        };
    }

    private Tensor<T> AggregateMean(Tensor<T> nodeFeatures, List<int> indices)
    {
        if (indices.Count == 0)
            return new Tensor<T>(new[] { _inputDim });

        // Compute mean of neighbor features
        var sum = new Tensor<T>(new[] { _inputDim });

        foreach (var idx in indices)
        {
            var features = nodeFeatures.GetRow(idx);
            sum = sum.Add(features);
        }

        return sum.Divide(NumOps<T>.FromDouble(indices.Count));
    }

    private Tensor<T> AggregatePool(Tensor<T> nodeFeatures, List<int> indices)
    {
        if (indices.Count == 0)
            return new Tensor<T>(new[] { _inputDim });

        // Transform each neighbor then take element-wise max
        var pooled = new T[_inputDim];

        for (int dim = 0; dim < _inputDim; dim++)
        {
            T maxValue = NumOps<T>.FromDouble(double.NegativeInfinity);

            foreach (var idx in indices)
            {
                var value = nodeFeatures[idx, dim];
                if (NumOps<T>.GreaterThan(value, maxValue))
                {
                    maxValue = value;
                }
            }

            pooled[dim] = maxValue;
        }

        return new Tensor<T>(pooled, new[] { _inputDim });
    }

    private Tensor<T> L2Normalize(Tensor<T> vector)
    {
        // Compute L2 norm
        double sumSquares = 0;
        var data = vector.ToArray();

        foreach (var value in data)
        {
            double v = Convert.ToDouble(value);
            sumSquares += v * v;
        }

        double norm = Math.Sqrt(sumSquares);
        if (norm < 1e-12)
            return vector;  // Avoid division by zero

        // Normalize
        var normalized = new T[data.Length];
        for (int i = 0; i < data.Length; i++)
        {
            normalized[i] = NumOps<T>.FromDouble(Convert.ToDouble(data[i]) / norm);
        }

        return new Tensor<T>(normalized, vector.Shape);
    }
}

Graph Representations and Operations

Graph Data Structure

public class Graph<T> where T : struct
{
    public Tensor<T> NodeFeatures { get; set; }  // (numNodes × featureDim)
    public Tensor<T> EdgeIndex { get; set; }     // (2 × numEdges) - edge list format
    public Tensor<T> EdgeWeights { get; set; }   // (numEdges) - optional edge weights
    public int NumNodes { get; set; }
    public int NumEdges { get; set; }

    /// <summary>
    /// Convert edge list to adjacency matrix.
    /// </summary>
    public Tensor<T> ToAdjacencyMatrix()
    {
        var adjacency = new Tensor<T>(new[] { NumNodes, NumNodes });

        for (int e = 0; e < NumEdges; e++)
        {
            int src = Convert.ToInt32(EdgeIndex[0, e]);
            int dst = Convert.ToInt32(EdgeIndex[1, e]);
            T weight = EdgeWeights != null ? EdgeWeights[e] : NumOps<T>.One;

            adjacency[src, dst] = weight;
            // For undirected graphs:
            // adjacency[dst, src] = weight;
        }

        return adjacency;
    }

    /// <summary>
    /// Get neighbors of a node.
    /// </summary>
    public List<int> GetNeighbors(int nodeIdx)
    {
        var neighbors = new List<int>();

        for (int e = 0; e < NumEdges; e++)
        {
            int src = Convert.ToInt32(EdgeIndex[0, e]);
            int dst = Convert.ToInt32(EdgeIndex[1, e]);

            if (src == nodeIdx)
                neighbors.Add(dst);
        }

        return neighbors;
    }
}

Graph Batch Processing

public class GraphBatch<T> where T : struct
{
    public List<Graph<T>> Graphs { get; set; } = new List<Graph<T>>();

    /// <summary>
    /// Combine multiple graphs into a single disconnected graph.
    /// </summary>
    public Graph<T> Collate()
    {
        int totalNodes = 0;
        int totalEdges = 0;

        // Count totals
        foreach (var graph in Graphs)
        {
            totalNodes += graph.NumNodes;
            totalEdges += graph.NumEdges;
        }

        // Allocate combined tensors
        int featureDim = Graphs[0].NodeFeatures.Shape[1];
        var nodeFeatures = new List<T[]>();
        var edgeList = new List<(int, int)>();

        int nodeOffset = 0;

        // Concatenate graphs
        foreach (var graph in Graphs)
        {
            // Add node features
            for (int i = 0; i < graph.NumNodes; i++)
            {
                nodeFeatures.Add(graph.NodeFeatures.GetRow(i).ToArray());
            }

            // Add edges (with offset)
            for (int e = 0; e < graph.NumEdges; e++)
            {
                int src = Convert.ToInt32(graph.EdgeIndex[0, e]) + nodeOffset;
                int dst = Convert.ToInt32(graph.EdgeIndex[1, e]) + nodeOffset;
                edgeList.Add((src, dst));
            }

            nodeOffset += graph.NumNodes;
        }

        // Create combined graph
        var combinedFeatures = new Tensor<T>(
            nodeFeatures.SelectMany(f => f).ToArray(),
            new[] { totalNodes, featureDim }
        );

        var combinedEdgeIndex = new Tensor<T>(new[] { 2, totalEdges });
        for (int e = 0; e < totalEdges; e++)
        {
            combinedEdgeIndex[0, e] = NumOps<T>.FromDouble(edgeList[e].Item1);
            combinedEdgeIndex[1, e] = NumOps<T>.FromDouble(edgeList[e].Item2);
        }

        return new Graph<T>
        {
            NodeFeatures = combinedFeatures,
            EdgeIndex = combinedEdgeIndex,
            NumNodes = totalNodes,
            NumEdges = totalEdges
        };
    }
}

Testing Strategy

Unit Tests

[TestClass]
public class GNNTests
{
    [TestMethod]
    public void GCNLayer_ForwardPass_CorrectShape()
    {
        // Arrange: Create simple graph (4 nodes, fully connected)
        var adjacency = CreateFullyConnectedGraph(numNodes: 4);
        var normalizedAdj = GraphNormalization.ComputeNormalizedAdjacency(adjacency);

        var nodeFeatures = CreateRandomTensor(4, 8);  // 4 nodes, 8 features
        var gcnLayer = new GCNLayer<double>(inputDim: 8, outputDim: 16, new ReLU<double>());

        // Act
        var output = gcnLayer.Forward(nodeFeatures, normalizedAdj);

        // Assert: Output shape is (4 × 16)
        Assert.AreEqual(4, output.Shape[0]);
        Assert.AreEqual(16, output.Shape[1]);
    }

    [TestMethod]
    public void GAT_AttentionWeights_SumToOne()
    {
        // Arrange
        var graph = CreateSimpleGraph();  // 3 nodes: 0-1, 1-2
        var gatLayer = new GATLayer<double>(inputDim: 4, outputDim: 8, numHeads: 2);

        // Act
        var output = gatLayer.Forward(graph.NodeFeatures, graph.ToAdjacencyMatrix());

        // Get attention weights for node 1 (has 2 neighbors)
        var alphas = gatLayer.GetLastAttentionWeights();

        // Assert: Attention weights for each node sum to 1
        for (int node = 0; node < 3; node++)
        {
            double sum = 0;
            for (int neighbor = 0; neighbor < 3; neighbor++)
            {
                sum += Convert.ToDouble(alphas[node, neighbor]);
            }
            Assert.AreEqual(1.0, sum, 1e-6);
        }
    }

    [TestMethod]
    public void GraphSAGE_Sampling_ReturnsCorrectCount()
    {
        // Arrange: Node with 10 neighbors
        var graph = CreateStarGraph(numNeighbors: 10);  // Center node + 10 neighbors
        var sageLayer = new GraphSAGELayer<double>(
            inputDim: 4, outputDim: 8, numSamples: 5
        );

        // Act
        var output = sageLayer.Forward(graph.NodeFeatures, graph.ToAdjacencyMatrix());

        // Assert: Should sample exactly 5 neighbors
        var sampledIndices = sageLayer.GetLastSampledNeighbors(centerNode: 0);
        Assert.AreEqual(5, sampledIndices.Count);
    }

    [TestMethod]
    public void GraphSAGE_Inductive_WorksOnNewNodes()
    {
        // Arrange: Train on graph A
        var trainGraph = CreateGraph(numNodes: 100);
        var sageModel = new GraphSAGE<double>(layerDims: new[] { 16, 32, 64 });
        sageModel.Train(trainGraph);

        // Act: Test on graph B (completely new nodes)
        var testGraph = CreateGraph(numNodes: 50);  // Different nodes!
        var embeddings = sageModel.Forward(testGraph);

        // Assert: Should produce embeddings for new nodes
        Assert.AreEqual(50, embeddings.Shape[0]);
        Assert.AreEqual(64, embeddings.Shape[1]);  // Output dimension
    }
}

Integration Tests

[TestClass]
public class GNNIntegrationTests
{
    [TestMethod]
    public void GCN_Cora_NodeClassification()
    {
        // Arrange: Load Cora citation network
        var (graph, labels, trainMask, testMask) = LoadCoraDataset();

        // Build 2-layer GCN
        var gcn = new GCN<double>(
            inputDim: graph.NodeFeatures.Shape[1],
            hiddenDim: 16,
            outputDim: 7,  // 7 classes in Cora
            numLayers: 2
        );

        // Act: Train
        gcn.Train(graph, labels, trainMask, epochs: 200);

        // Evaluate
        var predictions = gcn.Forward(graph);
        var accuracy = ComputeAccuracy(predictions, labels, testMask);

        // Assert: Should achieve > 70% accuracy
        Assert.IsTrue(accuracy > 0.70, $"Expected > 70%, got {accuracy:F2}%");
    }

    [TestMethod]
    public void GAT_TransductiveLearning_ConvergesCorrectly()
    {
        // Arrange
        var (graph, labels) = LoadPubMedDataset();
        var gat = new GAT<double>(
            inputDim: graph.NodeFeatures.Shape[1],
            hiddenDim: 8,
            outputDim: 3,
            numHeads: 8
        );

        // Act
        var losses = gat.TrainWithLogging(graph, labels, epochs: 100);

        // Assert: Loss should decrease
        Assert.IsTrue(losses.Last() < losses.First());
        Assert.IsTrue(losses.Last() < 0.5);
    }

    [TestMethod]
    public void GraphSAGE_ProteinProteinInteraction_InductiveLearning()
    {
        // Arrange: Load PPI dataset (inductive setting)
        var (trainGraphs, trainLabels) = LoadPPITrain();
        var (testGraphs, testLabels) = LoadPPITest();  // Completely new graphs!

        var sage = new GraphSAGE<double>(
            inputDim: 50,
            hiddenDims: new[] { 256, 256, 128 },
            numSamples: new[] { 25, 10 }
        );

        // Act: Train on training graphs
        sage.Train(trainGraphs, trainLabels, epochs: 100);

        // Test on completely new graphs (inductive)
        var predictions = sage.PredictBatch(testGraphs);
        var f1Score = ComputeF1Score(predictions, testLabels);

        // Assert: Should achieve > 0.5 F1 score
        Assert.IsTrue(f1Score > 0.5);
    }
}

Step-by-Step Implementation Guide

Phase 1: Graph Infrastructure (Week 1)

Step 1: Create Graph Data Structure

File: src/GraphNeuralNetworks/Graph.cs

(See Graph Representations section for full implementation)

Step 2: Implement Graph Utilities

File: src/GraphNeuralNetworks/GraphUtils.cs

public static class GraphUtils
{
    public static Tensor<T> ComputeNormalizedAdjacency<T>(Tensor<T> adjacency)
    {
        // See GraphNormalization.ComputeNormalizedAdjacency implementation
    }

    public static Tensor<T> EdgeListToAdjacencyMatrix<T>(Tensor<T> edgeIndex, int numNodes)
    {
        // Convert edge list format to adjacency matrix
    }

    public static List<int> GetKHopNeighbors<T>(Graph<T> graph, int nodeIdx, int k)
    {
        // Get all neighbors within k hops
    }
}

Phase 2: GCN Implementation (Week 2)

Step 3: Implement GCN Layer

File: src/GraphNeuralNetworks/Layers/GCNLayer.cs

(See GCN section for full implementation)

Step 4: Build Multi-Layer GCN

File: src/GraphNeuralNetworks/Models/GCN.cs

public class GCN<T> : NeuralNetwork<T> where T : struct
{
    private readonly List<GCNLayer<T>> _layers;
    private readonly Tensor<T> _normalizedAdjacency;

    public GCN(int inputDim, int hiddenDim, int outputDim, int numLayers = 2)
    {
        _layers = new List<GCNLayer<T>>();

        // Input layer
        _layers.Add(new GCNLayer<T>(inputDim, hiddenDim, new ReLU<T>()));

        // Hidden layers
        for (int i = 1; i < numLayers - 1; i++)
        {
            _layers.Add(new GCNLayer<T>(hiddenDim, hiddenDim, new ReLU<T>()));
        }

        // Output layer
        _layers.Add(new GCNLayer<T>(hiddenDim, outputDim, new Softmax<T>()));
    }

    public Tensor<T> Forward(Graph<T> graph)
    {
        var adjacency = graph.ToAdjacencyMatrix();
        var normalizedAdj = GraphUtils.ComputeNormalizedAdjacency(adjacency);

        var h = graph.NodeFeatures;

        foreach (var layer in _layers)
        {
            h = layer.Forward(h, normalizedAdj);
        }

        return h;
    }
}

Phase 3: GAT Implementation (Week 3)

Step 5: Implement Attention Head

File: src/GraphNeuralNetworks/Layers/AttentionHead.cs

(See GAT section for full implementation)

Step 6: Implement GAT Layer

File: src/GraphNeuralNetworks/Layers/GATLayer.cs

(See GAT section for full implementation)

Phase 4: GraphSAGE Implementation (Week 4)

Step 7: Implement Aggregators

File: src/GraphNeuralNetworks/Aggregators/IAggregator.cs

public interface IAggregator<T> where T : struct
{
    Tensor<T> Aggregate(Tensor<T> nodeFeatures, List<int> neighborIndices);
}

public class MeanAggregator<T> : IAggregator<T> where T : struct
{
    // See AggregateMean in GraphSAGE section
}

public class PoolAggregator<T> : IAggregator<T> where T : struct
{
    // See AggregatePool in GraphSAGE section
}

Step 8: Implement GraphSAGE Layer

File: src/GraphNeuralNetworks/Layers/GraphSAGELayer.cs

(See GraphSAGE section for full implementation)

Phase 5: Testing and Validation (Week 5-6)

Step 9: Create Test Datasets

File: tests/GraphNeuralNetworks/TestData/GraphDatasets.cs

public static class GraphDatasets
{
    public static (Graph<double>, Tensor<double>, bool[], bool[]) LoadCora()
    {
        // Load Cora citation network
        // Returns: (graph, labels, trainMask, testMask)
    }

    public static Graph<double> CreateKarateClub()
    {
        // Create Zachary's Karate Club graph (classic test case)
    }
}

Step 10: Run Comprehensive Tests

(See Testing Strategy section for complete test suite)

Summary

This guide provides:

GCN: Spectral graph convolutions with normalized aggregation
GAT: Attention-based neighbor aggregation for adaptive weighting
GraphSAGE: Inductive learning via sampling and aggregation
Graph Operations: Efficient representations and batch processing
Testing: Comprehensive unit and integration tests on real datasets

Key Differences:

GCN: Fast, simple, works well on homophilic graphs (similar neighbors)
GAT: Learns attention weights, better for heterophilic graphs
GraphSAGE: Inductive (generalizes to new nodes), sampling-based

Expected Timeline: 6 weeks for full implementation with extensive testing

Nov 07 '25 04:11 ooples

[Phase 3] Expand Multimodal AI Beyond CLIP and BLIP-2

Problem

Existing

Missing Implementations

Architecture

Success Criteria

Issue #405: Junior Developer Implementation Guide

Graph Neural Networks (GCN, GAT, GraphSAGE)

Table of Contents

Understanding Graph Neural Networks

What are Graph Neural Networks?

Why Graph Neural Networks?

The Core Idea: Message Passing

Visual Example

Graph Convolutional Networks (GCN)

Core Idea

Mathematical Foundation

1. Adjacency Matrix

2. Degree Matrix

3. Normalized Adjacency

4. GCN Layer Formula

5. Multi-Layer GCN

Implementation

Graph Normalization Helper

Graph Attention Networks (GAT)

Core Idea

Mathematical Foundation

1. Attention Mechanism

2. Normalize with Softmax

3. Aggregate with Attention

4. Multi-Head Attention

Implementation

GraphSAGE: Inductive Learning

Core Idea

Key Innovation: Sampling and Aggregating

Mathematical Foundation

1. Neighbor Sampling

2. Aggregation Functions

3. Combining with Self

4. Normalization

GraphSAGE Algorithm

Implementation

Graph Representations and Operations

Graph Data Structure

Graph Batch Processing

Testing Strategy

Unit Tests

Integration Tests

Step-by-Step Implementation Guide

Phase 1: Graph Infrastructure (Week 1)

Phase 2: GCN Implementation (Week 2)

Phase 3: GAT Implementation (Week 3)

Phase 4: GraphSAGE Implementation (Week 4)

Phase 5: Testing and Validation (Week 5-6)

Summary