Developing a Neural Network in Golang.
...from Scratch: From Basics to XOR Solution "To know is to build". (c) -Anonymous In the world of machine learning, neural networks are powerful tools for solving a wide range of tasks, from image recognition to natural language processing and games. While there are many high-level libraries that simplify the creation and training of neural networks (e.g., TensorFlow, PyTorch, Keras), understanding how these networks work "under the hood" is invaluable. This article is dedicated to creating a simple yet functional feedforward neural network in Go, using only the standard library. This approach will allow us to delve deeply into the mechanisms of forward and backward propagation, weight initialization, and training. Why Golang and "From Scratch"? Go is a compiled language known for its performance, ease of concurrency, and strong typing. These qualities make it an excellent choice for low-level implementations where control over performance is crucial. Developing a neural network from scratch in Go allows you to: Deeply understand algorithms: You implement each mathematical step, which strengthens your understanding of neural network principles. Learn Go: This is excellent practice for learning matrix operations, data structures, and working with pointers in Go. Control and optimization: Full control over every aspect of the implementation allows for deep optimization when necessary. Alternatives: Existing ML Libraries in Go Before diving into the "from scratch" implementation, it's worth mentioning that for more complex and "production-ready" projects, there are specialized libraries that provide pre-built and optimized components: Gorgonia: A powerful library for building computational graphs, very similar to TensorFlow or PyTorch. It allows you to define and train complex deep learning models. GoMind: A simpler library for creating and training basic neural networks. GoLearn: While primarily a general machine learning library (including clustering, classification algorithms, etc.), it also offers some capabilities for working with neural networks. These libraries significantly speed up development, but for understanding the fundamentals, it's best to start "from scratch." Architecture of Our Neural Network We will build a Feedforward Neural Network (FFNN). This means that neurons in one layer are fully connected to all neurons in the next layer, and information flows in only one direction – from input to output. Layer Structure: NeuralNetworkLayer In our implementation, we've combined the logic of linear transformation (weighted sums and biases) and activation functions into a single NeuralNetworkLayer structure. This aligns with the conceptual understanding of a "layer" in most modern frameworks. // NeuralNetworkLayer represents one fully connected layer with an activation function. type NeuralNetworkLayer struct { InputSize int OutputSize int Weights [][]float64 // Weights[output_neuron_idx][input_neuron_idx] Biases []float64 ActivationFunc func(float64) float64 DerivativeFunc func(float64) float64 // Temporal values for backpropagation InputVector []float64 // Input values to layer (from previous layer) WeightedSums []float64 // Values after linear transformation (before activation) OutputVector []float64 // Output values after activation // Gradients for updating weights and biases WeightGradients [][]float64 BiasGradients []float64 InputGradient []float64 // Gradient passed to the previous layer } Weights and Biases: These are the trainable parameters of the layer. Weights is a matrix that multiplies the input vector, and Biases is a vector added to the result. ActivationFunc and DerivativeFunc: Pointers to activation functions (e.g., ReLU, Sigmoid) and their derivatives, which are applied to the output of the linear transformation. Weight Initialization: He Initialization One of the most critical aspects is proper weight initialization. It helps prevent vanishing or exploding gradient problems that can slow down or halt training. We use He initialization for layers with ReLU activation, while for other types of activations (e.g., Sigmoid), a standard initialization with a smaller scale is used: // In the NewNeuralNetworkLayer function for i := range weights { weights[i] = make([]float64, inputSize) for j := range weights[i] { // Initialize weights based on the activation function if activationName == "relu" { weights[i][j] = rand.NormFloat64() * math.Sqrt(2.0/float64(inputSize)) // He initialization } else { weights[i][j] = rand.NormFloat64() * math.Sqrt(1.0/float64(inputSize)) // More general approach } } biases[i] = 0.0 } Here, rand.NormFloat64() generates a random number from a standard normal distribution, and the multiplier math.Sqrt(2.0/flo

...from Scratch: From Basics to XOR Solution
"To know is to build". (c) -Anonymous
In the world of machine learning, neural networks are powerful tools for solving a wide range of tasks, from image recognition to natural language processing and games. While there are many high-level libraries that simplify the creation and training of neural networks (e.g., TensorFlow, PyTorch, Keras), understanding how these networks work "under the hood" is invaluable.
This article is dedicated to creating a simple yet functional feedforward neural network in Go, using only the standard library. This approach will allow us to delve deeply into the mechanisms of forward and backward propagation, weight initialization, and training.
Why Golang and "From Scratch"?
Go is a compiled language known for its performance, ease of concurrency, and strong typing. These qualities make it an excellent choice for low-level implementations where control over performance is crucial. Developing a neural network from scratch in Go allows you to:
Deeply understand algorithms: You implement each mathematical step, which strengthens your understanding of neural network principles.
Learn Go: This is excellent practice for learning matrix operations, data structures, and working with pointers in Go.
Control and optimization: Full control over every aspect of the implementation allows for deep optimization when necessary.
Alternatives: Existing ML Libraries in Go
Before diving into the "from scratch" implementation, it's worth mentioning that for more complex and "production-ready" projects, there are specialized libraries that provide pre-built and optimized components:
Gorgonia: A powerful library for building computational graphs, very similar to TensorFlow or PyTorch. It allows you to define and train complex deep learning models.
GoMind: A simpler library for creating and training basic neural networks.
GoLearn: While primarily a general machine learning library (including clustering, classification algorithms, etc.), it also offers some capabilities for working with neural networks.
These libraries significantly speed up development, but for understanding the fundamentals, it's best to start "from scratch."
Architecture of Our Neural Network
We will build a Feedforward Neural Network (FFNN). This means that neurons in one layer are fully connected to all neurons in the next layer, and information flows in only one direction – from input to output.
Layer Structure: NeuralNetworkLayer
In our implementation, we've combined the logic of linear transformation (weighted sums and biases) and activation functions into a single NeuralNetworkLayer
structure. This aligns with the conceptual understanding of a "layer" in most modern frameworks.
// NeuralNetworkLayer represents one fully connected layer with an activation function.
type NeuralNetworkLayer struct {
InputSize int
OutputSize int
Weights [][]float64 // Weights[output_neuron_idx][input_neuron_idx]
Biases []float64
ActivationFunc func(float64) float64
DerivativeFunc func(float64) float64
// Temporal values for backpropagation
InputVector []float64 // Input values to layer (from previous layer)
WeightedSums []float64 // Values after linear transformation (before activation)
OutputVector []float64 // Output values after activation
// Gradients for updating weights and biases
WeightGradients [][]float64
BiasGradients []float64
InputGradient []float64 // Gradient passed to the previous layer
}
Weights
andBiases
: These are the trainable parameters of the layer.Weights
is a matrix that multiplies the input vector, andBiases
is a vector added to the result.ActivationFunc
andDerivativeFunc
: Pointers to activation functions (e.g., ReLU, Sigmoid) and their derivatives, which are applied to the output of the linear transformation.
Weight Initialization: He Initialization
One of the most critical aspects is proper weight initialization. It helps prevent vanishing or exploding gradient problems that can slow down or halt training. We use He initialization for layers with ReLU activation, while for other types of activations (e.g., Sigmoid), a standard initialization with a smaller scale is used:
// In the NewNeuralNetworkLayer function
for i := range weights {
weights[i] = make([]float64, inputSize)
for j := range weights[i] {
// Initialize weights based on the activation function
if activationName == "relu" {
weights[i][j] = rand.NormFloat64() * math.Sqrt(2.0/float64(inputSize)) // He initialization
} else {
weights[i][j] = rand.NormFloat64() * math.Sqrt(1.0/float64(inputSize)) // More general approach
}
}
biases[i] = 0.0
}
Here, rand.NormFloat64()
generates a random number from a standard normal distribution, and the multiplier math.Sqrt(2.0/float64(inputSize))
scales it to maintain a stable variance of activations when using ReLU. For other activations, a multiplier of math.Sqrt(1.0/float64(inputSize))
is used.
Activation Functions: ReLU and Sigmoid
Our implementation supports both ReLU and Sigmoid.
ReLU (f(x)=max(0,x)) is widely used in hidden layers to introduce non-linearity and combat the vanishing gradient problem.
Sigmoid (f(x)=1/(1+e^{−x})) squashes values into the range of 0 to 1, often used in output layers for classification tasks where probabilities are needed.
// Example activation functions from tools.go
func Sigmoid(x float64) float64 {
return 1.0 / (1.0 + math.Exp(-x))
}
func SigmoidDerivative(x float64) float64 {
s := Sigmoid(x)
return s * (1 - s)
}
func ReLU(x float64) float64 {
return math.Max(0, x)
}
func ReLUDerivative(x float64) float64 {
if x > 0 {
return 1.0
}
return 0.0
}
For the output layer, a linear activation (denoted as "none" in our code) is typically used to obtain raw numerical values, such as Q-values in DQN or regression outputs.
Implementation Details: Forward and Backward Pass
Let's examine the key Forward
and Backward
methods from NeuralNetworkLayer
, which form the core of neural network computations.
Forward
Method (Forward Pass)
The Forward
method is responsible for computing the output of a layer based on given input data.
// Forward performs a forward pass through the layer (linear part + activation).
func (item *NeuralNetworkLayer) Forward(input []float64) []float64 {
item.InputVector = input // Save input for use in Backward
// 1. Linear transformation: multiply by weights and add biases
item.WeightedSums = MultiplyMatrixVector(item.Weights, input)
item.WeightedSums = AddVectors(item.WeightedSums, item.Biases)
// 2. Activation: apply the activation function to the result of the linear transformation
item.OutputVector = make([]float64, len(item.WeightedSums))
for i := range item.WeightedSums {
item.OutputVector[i] = item.ActivationFunc(item.WeightedSums[i])
}
return item.OutputVector
}
At each step:
The input vector (
input
) is multiplied by the layer's weight matrix (item.Weights
).The bias vector (
item.Biases
) is added to the result. This givesWeightedSums
(or z-values).The activation function (
item.ActivationFunc
) is applied to each element ofWeightedSums
, yielding the finalOutput
of the layer.
Backward
Method (Backward Pass)
The Backward
method computes the gradients of the loss with respect to the layer's weights, biases, and inputs. This is the foundation of the backpropagation algorithm.
// Backward performs a reverse pass through the layer.
func (item *NeuralNetworkLayer) Backward(outputGradient []float64) []float64 {
// 1. Gradient via activation function:
// Apply the derivative of the activation function to the saved WeightedSums
activationGradient := make([]float64, len(item.WeightedSums))
for i := range item.WeightedSums {
activationGradient[i] = item.DerivativeFunc(item.WeightedSums[i])
}
// Combine the gradient from the next layer with the activation gradient (element-wise multiplication)
gradientAfterActivation := MultiplyVectors(outputGradient, activationGradient)
// 2. Gradient for biases: equals the gradient after activation
item.BiasGradients = gradientAfterActivation
// 3. Gradient for weights: computed as the outer product
// (gradient_after_activation X InputVector)
item.WeightGradients = OuterProduct(gradientAfterActivation, item.InputVector)
// 4. Gradient for input: computed as the product of the transposed weight matrix
// and the gradient after activation. This is the gradient passed to the previous layer.
transposedWeights := TransposeMatrix(item.Weights)
item.InputGradient = MultiplyMatrixVector(transposedWeights, gradientAfterActivation)
return item.InputGradient
}
The main steps of Backward
for NeuralNetworkLayer
:
The gradient passing through the activation function is calculated. For this,
outputGradient
(the gradient received from the next layer) is element-wise multiplied by the derivative of the activation function, computed at theWeightedSums
.The gradients for
Biases
are equal to thisgradientAfterActivation
.The gradients for
Weights
are computed as the outer product ofgradientAfterActivation
and the layer'sInput
.The gradient to be passed to the previous layer (
InputGradient
) is calculated as the product of the transposed weight matrix of the layer andgradientAfterActivation
.
Overall Network Structure: NeuralNetwork
The neural network itself (NeuralNetwork
) is a collection of these layers:
// network.go
type NeuralNetwork struct {
Layers []*NeuralNetworkLayer
}
...
// Predict performs a forward pass to obtain network predictions.
func (item *NeuralNetwork) Predict(input []float64) []float64 {
output := input
for _, layer := range item.Layers {
output = layer.Forward(output)
}
return output
}
// Train performs one step of training the network.
// input: input data
// targetOutput: target output data (Q-values for training)
// learningRate: learning rate
func (item *NeuralNetwork) Train(input []float64, targetOutput []float64, learningRate float64) {
// Forward pass (saving intermediate values)
predictedOutput := item.Predict(input)
// Calculate the gradient of the output (MSE loss derivative)
// dLoss/dOutput = 2 * (predicted - target)
outputGradient := make([]float64, len(predictedOutput))
for i := range predictedOutput {
outputGradient[i] = 2 * (predictedOutput[i] - targetOutput[i])
}
// Backward pass
currentGradient := outputGradient
for i := len(item.Layers) - 1; i >= 0; i-- {
currentGradient = item.Layers[i].Backward(currentGradient)
}
// Updating weights
for _, layer := range item.Layers {
layer.Update(learningRate)
}
}
Predict
: The method for the forward pass. It simply passes the input data sequentially through each layer to get the final output.Train
: The method for training. It performs a forward pass, calculates the loss function gradient (MSE), then performs backpropagation by iterating through the layers in reverse order and computing gradients for each layer. Finally, it updates the weights and biases of the layers.
Application: Solving the XOR Problem
The XOR (exclusive OR) problem is a classic test for neural networks because it is not linearly separable. A simple network without hidden layers (a linear model) cannot solve it. However, a network with one or more hidden layers can successfully tackle XOR.
XOR Data
The XOR problem takes two binary inputs (0 or 1) and outputs 1 if the inputs are different, and 0 if they are the same.
Inputs:
[0,0], [0,1], [1,0], [1,1]
Outputs:
[0], [1], [1], [0]
Architecture for XOR
To solve XOR, according to the provided source code (main.go
), we use a network with 2 input neurons, 1 hidden layer with 2 neurons (and Sigmoid activation), and 1 output neuron (with linear activation):
NeuralNetwork := NewNeuralNetwork(2, []int{2}, 1, "sigmoid")
This minimal architecture with one hidden layer is necessary for solving the XOR problem. Using a sigmoid in the hidden layer is a traditional approach for this task.
Training Process
Training occurs iteratively:
We define the learningRate and the number of epochs.
In each epoch, we iterate through all XOR examples.
For each example (input, target), the NeuralNetwork.Train method is called, which adjusts the network's weights.
We track the total loss for each epoch to ensure it decreases, indicating that the network is learning.
// main.go
unc main() {
// Define XOR data
// Inputs: [0,0], [0,1], [1,0], [1,1]
// Outputs: [0], [1], [1], [0]
xorInputs := [][]float64{
{0.0, 0.0},
{0.0, 1.0},
{1.0, 0.0},
{1.0, 1.0},
}
xorOutputs := [][]float64{
{0.0},
{1.0},
{1.0},
{0.0},
}
// Neural Network architecture for XOR
// 2 inputs, 1 hidden layer with 2 neurons, 1 output, sigmoid activation
NeuralNetwork := neural.NewNeuralNetwork(2, []int{2}, 1, "sigmoid")
// Test the non-trained network
fmt.Println("Testing the non-trained network:")
for i, input := range xorInputs {
predictedOutput := NeuralNetwork.Predict(input)
fmt.Printf("Input: %v, Expected: %v, Predicted: %.4f\n", input, xorOutputs[i], predictedOutput)
}
fmt.Println("Starting XOR training...")
learningRate := 0.01
epochs := 20000 // Number of training epochs
for i := range epochs {
totalLoss := 0.0
for j := range xorInputs {
input := xorInputs[j]
target := xorOutputs[j]
// Train the network on one XOR example
NeuralNetwork.Train(input, target, learningRate)
// Calculate current loss for monitoring (optional, but good practice)
predicted := NeuralNetwork.Predict(input)
loss := 0.0
for k := range predicted {
diff := predicted[k] - target[k]
loss += diff * diff // MSE
}
totalLoss += loss
}
if (i+1)%1000 == 0 {
fmt.Printf("Epoch %d, Average Loss: %.6f\n", i+1, totalLoss/float64(len(xorInputs)))
}
}
fmt.Println("XOR training finished.")
fmt.Println("Testing the trained network:")
// Test the trained network
for i, input := range xorInputs {
predictedOutput := NeuralNetwork.Predict(input)
fmt.Printf("Input: %v, Expected: %v, Predicted: %.4f\n", input, xorOutputs[i], predictedOutput)
}
}
Console Output Example
During training, you will see the average loss gradually decrease. Note that when using sigmoid and the specified training parameters, the results might not be perfectly 0.0 or 1.0, but they will be close enough for logical classification:
Testing the non-trained network:
Input: [0 0], Expected: [0], Predicted: [-0.6342]
Input: [0 1], Expected: [1], Predicted: [-0.6208]
Input: [1 0], Expected: [1], Predicted: [-0.0719]
Input: [1 1], Expected: [0], Predicted: [-0.0617]
Starting XOR training...
Epoch 1000, Average Loss: 0.210077
Epoch 2000, Average Loss: 0.153441
Epoch 3000, Average Loss: 0.056236
Epoch 4000, Average Loss: 0.003917
Epoch 5000, Average Loss: 0.000094
Epoch 6000, Average Loss: 0.000002
Epoch 7000, Average Loss: 0.000000
...
Epoch 19000, Average Loss: 0.000000
Epoch 20000, Average Loss: 0.000000
XOR training finished.
Testing the trained network:
Input: [0 0], Expected: [0], Predicted: [0.0000]
Input: [0 1], Expected: [1], Predicted: [1.0000]
Input: [1 0], Expected: [1], Predicted: [1.0000]
Input: [1 1], Expected: [0], Predicted: [0.0000]
As the example shows, the predicted values are very close to the expected 0 or 1, indicating successful training.
If you are interested in the weights and biases before and after training, here they are:
// Before training
Biases 0 - [0 0]
Weights 0 - [[-1.5470222607067037 -0.01696838240061016] [0.24909017310045511 0.06488626663290908]]
Biases 1 - [0]
Weights 1 - [[-1.6581899752969296 0.389885131323495]]
// Biases and Weights after training:
Biases 0 - [0.3944580734523731 2.1331071307836806]
Weights 0 - [[-3.3592486686297476 -3.275219318130251] [-1.6500618206834206 -1.631091932264212]]
Biases 1 - [-0.7323529446320001]
Weights 1 - [[-3.3658966295356247 3.0679477637291073]]
Conclusion
We've explored how to build a basic feedforward neural network in Golang from scratch, including its architecture, activation functions, and weight initialization principles. By applying this network to solve the classic XOR problem, we've confirmed the functionality of our implementation.
In the next part of this article series, we will use this architecture as a foundation for creating a more complex system – a Deep Q-Network (DQN) based agent that can learn to play Tic-Tac-Toe independently, tackling the challenge of delayed rewards. Stay tuned for updates!
The full source code is available at the link:
https://github.com/andrey-matveyev/go-sample-neural