Skip to content

Ball and Penalty Mark Detection

Dataset

Our dataset consists of 182229 images form various games. The dataset is separated in subfolders.

dataset
├─ classification
│     ├─ ball
│     ├─ penalty
│     └─ none
└─ regression

In the dataset directory, there are subdirectories for classification and regression. The classification subdirectory is further divided into ball, negatives and none, each representing the classification classes for the neural network. The file names within these three directories are irrelevant. What matters is that the correct samples are placed in the corresponding directories. The samples that are used for the regression values of the ball's center and radius are stored in the regression folder. Here, the file name is important. the last three digits are used to extract the X-coordinate, the Y-coordinate of the ball's center, and the radius of the ball in the image. Ideally, a copy of the regression directory should be placed in the classification/ball folder, as the samples from regression can also be used for classification.

Data Pipeline

The dataset is generated from the directory structure, and the brightness will be normalized. A sample consists of a 32 × 32 pixel grayscale image and a label with the following structure:

\[ (ClassNone, ClassPenalty, ClassBall, BallCenterX, BallCenterY, BallRadius, LossFunction) \]

\(ClassNone\), \(ClassPenalty\), and \(ClassBall\) are one-hot encoded, representing the respective class of the image. If the sample is a ball with a given center and radius, or if it was extracted from the regression directory, then \(BallCenterX\), \(BallCenterY\), and \(BallRadius\) contain the regression values. The \(LossFunction\) determines whether the sample will be treated as a classification problem (set to 0) or a regression problem (set to 1) during training. If the sample does not contain a center point or does not come from the regression directory, then the center and radius are set to 0.

Before training, data augmentation will be applied. In real-world usage, the network will encounter images that differ significantly from the training dataset. Data augmentation modifies the training dataset so that the network is better adapted to variations. For each sample in the training dataset, Gaussian noise is added, followed by one of several possible additional augmentations applied at random. These augmentations can include multiplicative noise, random brightness adjustments, and motion blur. An example looks like:

patch augmented patch

Loss Function

The loss is calculated as follows:

\[ Loss = Loss_{classifier}(y_{true}, y_{pred}) + Loss_{regression}(y_{true}, y_{pred}) \]

Since both classification and regression are performed, the total loss is the sum of the classification loss and the regression loss.

\[ Loss_{classifier} = \mbox{Categorical Cross Entropy}(y_{true}, y_{pred}) \]

The classification loss is determined using categorical cross-entropy. It is important to note that \(y_{true}\) and \(y_{pred}\) refer to the first three elements of the label and prediction, respectively, which have the following structure: \((ClassNone, ClassPenalty, ClassBall)\).

\[ Loss_{regression} = \mbox{Mean Squared Error}(y_{true}, y_{pred}) \]

The regression loss is calculated using the mean squared error. In this case, \(y_{true}\) and \(y_{pred}\) refer to the last three elements of the label and prediction, structured as: \((BallCenterX, BallCenterY, BallRadius)\).

Structure of the Neural Network

The network is based on an encoder-decoder structure. The encoder is shared and the output is fed into two different decoders. The first decoder is used to classify the patch, while the other predicts its given center coordinates and radius. Currently we are only using the coordinates and the radius if a ball is classified.

Layer Output Shape Number of Parameters
Input Layer (x) 32 × 32 × 1 0
Conv2D + BatchNorm + ReLU 32 × 32 × 8 72/32
Max Pooling 16 × 16 × 8 0
Dropout (0.1) 16 × 16 × 8 0
Conv2D + BatchNorm + ReLU 16 × 16 × 16 1152/64
Max Pooling 8 × 8 × 16 0
Dropout (0.1) 8 × 8 × 16 0
Conv2D + BatchNorm + ReLU 8 × 8 × 16 2304/64
Max Pooling 4 × 4 × 16 0
Dropout (0.1) 4 × 4 × 16 0
Conv2D + BatchNorm + ReLU 4 × 4 × 32 4608/128
Max Pooling 2 × 2 × 32 0
Dropout (0.1) 2 × 2 × 32 0
Flatten (y) 128 0
Dense + BatchNorm + ReLU 32 4096/128
Dropout (0.25) 32 0
Dense + BatchNorm + ReLU 64 2048/256
Dropout (0.25) 64 0
Dense + BatchNorm + ReLU 16 1024/64
Dense + Softmax 3 51
Flatten (z) 128 0
Dense + BatchNorm + ReLU 32 4096/128
Dropout (0.25) 32 0
Dense + BatchNorm + ReLU 64 2048/256
Dropout (0.25) 64 0
Dense + BatchNorm 3 192
Concatenate (y, z) 6 0

An algorithmic approach is initially used to calculate potential candidate positions for balls and penalty marks. However, since this method is unreliable, the neural network is used to improve the accuracy of detection. For each spot identified by the algorithm, a 32 × 32 pixel image patch is extracted from the area surrounding the spot. This patch is then passed through the neural network.


Last update: October 14, 2024