Core Concept Mathematics and Economics Published: December 19, 2025

Opening the Black Box: Neural Networks Explained

Abstract

You have probably heard of ChatGPT, or even used it already. ChatGPT is an interactive chatbot that can generate detailed answers to all kinds of questions. It can even write your school papers for you. But how does it know how to answer your questions with such detail? ChatGPT consists of neural networks that allow the chatbot to answer your question. We will explore these neural networks in this article.

What are Neural Networks?

Neural networks are used everywhere these days. They change user inputs through an algorithm consisting of some fancy math, to output something useful. Chatbots like ChatGPT are based on neural networks. Moreover, Netflix uses a neural network to recommend new movies and series for you to watch. The input into the network is the movies/series you have already watched. As output, unwatched movies/series pop up in your recommendations. Detecting skin cancer is another example of how neural networks can be used. Some phone applications allow you to take a picture of a skin mole. A neural network will then tell you the risk that this mole is cancerous.

A Flowery Example

Imagine you see a flower of the iris plant, which has many subtypes. If you want to know which subtype it is, you could take a picture with Google Lens to get the answer. This app uses a kind of neural network. But how does such a network decide on the correct subtype of iris in your picture? There are three very similar-looking subtypes: setosa, versicolor, and virginica. They have sepals and petals, as you can see in Figure 1A. We can measure the length and the width of these sepals and petals (in centimeters) and collect this information in a dataset. Figure 1B shows measurements from six flowers chosen randomly, two for each subtype, from the entire dataset of 150 observations. As you can see, the lengths and widths are not the same for each subtype. When working with neural networks, we call the characteristics of the flower (such as sepal length) features while the subtypes are called categories. The idea is that we will use these features to figure out the category of iris flower. We call this “predicting”.

Image divided into three sections. a. Three photos of iris flowers labeled Virginica, Versicolor, and Setosa, showing petal and sepal parts. b. Data table with six observations detailing sepal and petal measurements for each species: Setosa, Versicolor, Virginica. c. Neural network diagram with input layer (sepal length, sepal width, petal length, petal width), hidden layer with four nodes, and output layer classifying Setosa, Versicolor, Virginica.

Try to discover some patterns in the flower dataset in Figure 1B. Focus on the petal and sepal widths. How are these features different for each subtype? (now try Exercise 1 in Box 1)

Box 1 - Exercise 1:

(Throughout the article you will encounter exercises. You can try to answer these questions to check your understanding.) Try to discover some patterns in the flower dataset in Figure 1b. Focus on the petal and sepal widths. How are these features different for each subtype?

Creating a Neural Network

Neurons

To help you understand neural networks, we first need to talk about what a neural network consists of [1, 2]. One of the most important components are called neurons (hence the name “neural network”) because their function was inspired by nerve cells called neurons in the brain [3]. Figure 1C shows an overview of a neural network that could be used to analyze the flower data. The neurons are represented by circles. Every neural network always has an input layer and output layer. In the input layer (green), the number of neurons is always equal to the number of features, so in our case there are four neurons (N1, N2, N3, and N4). For the output layer (orange), the number of neurons always equals the number of categories, three in our case (N9, N10, and N11), corresponding to the three flower subtypes (now try Exercise 2 in Box 2).

Box 2 - Exercise 2:

Another well-known example is to let a neural network predict handwritten digits from 0 to 9. How many output neurons would such a network have?

Weights and Biases

Neural networks also contain other/middle layers (purple) called hidden layers, with a certain number of neurons. These layers process the input to generate the output. There could be as many hidden layers as you want in the network. Each hidden layer could also contain as many neurons as you like. For our data, one hidden layer with a relatively small number of neurons (four) will work well. Each neuron in a layer has connections to each neuron in the next layer (shown by the arrows). For example, neuron N5 in the hidden layer has connections to all four neurons in the input layer.

In practice, these connections are actual numbers, and we call them weights (w1, w2 etc.). So, the number of weights corresponds to the number of connections. The purpose of these weights is to transform the input values (sepal length, width, etc.) into different numbers. This helps the network to decide which category the flower belongs to. You can also think of the weights as the strength of the connection between neurons. So, a large value means that the neuron has a lot of influence on the connected neuron. Each neuron, except the input neurons, also has another number called a bias (b1, b5, etc.) that further helps to determine the category. This number basically tweaks the results from the weights to make the network more flexible. For example, a large bias further strengthens the role of that neuron (now try Exercise 3 in Box 3).

Box 3 - Exercise 3:

Try to figure out the total number of weights and biases for the network in Figure 1C.

Training a Network

How does the neural network determine the category from the features of the flower? For example, if we plugged in the width and length values from the first row in Figure 1B, how would the network know this is a setosa flower? To help you understand this, we need to talk about training and testing a neural network.

To start, we randomly separate the dataset into a training set and a test set. In general, the training set should be larger than the test set. More training data tend to lead to more accurate predictions. But there needs to be enough data to test the network on, so it is a bit of a trade-off. In our case, we can use 2/3 of the flowers (100) for the training set and the rest (50) for the test set (If you would like to read more about training a network, see this article).

When we first give the network the training set, it does not know the correct category for each flower, and it basically guesses. At this point we say that the network is untrained. The weights are random numbers, while the biases are zero (Figure 2A). We feed the network all the features of the 100 flowers from the training set and ask it to predict which type of flower the data are from. In the beginning, the output will often be incorrect, and we tell the network when it is wrong. We can express how wrong the network is, which is called the error. At the start of training the error will be large, but after each run the network adjusts its weights and biases to make the error smaller. Step by step, the network learns patterns in the data and starts to predict the correct categories for each presented flower. For our network, the training stage took about 5 s—but for large datasets it can take hours or longer (now try Exercise 4 in Box 4).

Diagram showing two neural network architectures labeled a and b. Both have three layers with nodes: input (N1-N4), hidden (N5-N8), and output (N9-N11). Weights and biases differ between a and b, with N9 labeled Setosa, N10 Versicolor, and N11 Virginica. Each network processes four features: sepal length, sepal width, petal length, and petal width.
  • Figure 2 - (A) The network with the untrained weights and biases, shown in the cells of the table.
  • The numbers describe the specific connections between two neurons. For example, N1 in the input layer has a weight of -0.09 that connects to N5 in the next layer. The final column has the biases belonging to the neurons. (B) The final weights and biases after training the network.

Box 4 - Exercise 4:

How often do you think an untrained network would guess the correct flower subtype at the start?

Testing a Network and Accuracy

After enough training, the error will become really small, and the network will settle on certain weights and biases (Figure 2B).

However, those numbers are pretty meaningless at first glance. That is why networks are considered black boxes. Still, it will likely make the right predictions. There is one catch though: the network has settled down on weights and biases that are optimal for the training set. What would happen if we showed the network new examples of iris flowers that it has never seen before? This is the ultimate test, and so we call this “testing the network.” We do this using the test set of flower data that we created earlier. For this new data, we often calculate the accuracy of the network to determine its performance. The accuracy is the percentage of flowers that are predicted correctly. Often, the accuracy will not be 100%. Values between 90 and 100% are considered pretty good (now try Exercise 5 in Box 5).

Box 5 - Exercise 5:

Imagine we have trained a network and want to determine its accuracy. We feed it 60 new flower observations, and the network correctly predicts the category of 54 observations. Calculate the accuracy for this network. (see Box 6 for the answers to the exercises).

It is finally time to use our network (from Figure 1C) to predict the categories for our test set. (You can play around with this yourself on this website.) Figure 3 shows a snippet of code to create and run our network (you do not need to understand the code snippet). The bottom of the figure shows the results, in the form of a confusion matrix and the accuracy of the network. Do not get confused by the confusion matrix! It merely shows the correct and incorrect predictions. The rows refer to the true flower categories, while the columns represent the category predictions made by the network. So, all the correct predictions end up along the diagonal, while incorrect predictions appear in the other cells. Our network mistakenly classified two virginica flowers as versicolor, so we ended up with an accuracy of 96%. Remember, our network has never seen these 50 flowers before and still managed to determine the right category for 48 of them. Pretty impressive, right?

Code snippet showing Python implementation for neural network classification using the Iris dataset. The code splits the dataset into training and test sets, defines a three-layer neural network, and predicts results on the test set. A confusion matrix displays predictions, with true and predicted labels for three classes: Setosa, Versicolor, and Virginica. The accuracy is 0.96, highlighted with a star.
  • Figure 3 - A snippet of the Python code used to run the neural network.
  • It is okay if you do not understand the actual code snippet. The output is shown in the lower part of the figure, within the confusion matrix, and the accuracy of the network is shown in the lower left, near the yellow star.

Limitations of Neural Networks

As cool as neural networks may seem, they do have certain shortcomings. For one, we often say that these networks are black boxes. That means we do not know how the features are used to predict the categories of the flowers. For example, it could be that a single feature, such as petal width, is very important for predicting the flower category. The network would not be able to tell us this. There are other algorithms that can explain their decision during prediction. An example is multiple linear regression, which you can read about in this article.

Another issue is that the network’s performance heavily depends on the training data. The final weights and biases are based on the training data. These weights and biases are then used to predict categories for the new test data. Imagine if the training data are not like the actual iris flowers and thus are very different from the test data. For example, maybe for some reason the training data only came from flowers of each category with very small sepal and petal widths and lengths. If we train our network with this unusual set of training data, then the weights and biases will be rooted in this data. So, when we feed the trained network random new flowers (the test set), its performance will likely be poor.

What to Remember

Neural networks use input information about an object (petal and sepal width a flower, for example) to predict something about that object (such as flower subtype). The network has weights and biases that are updated during the training stage. During the test stage the network can often predict new observations that it has never seen before with good accuracy. So, getting back to our initial example, ChatGPT has learned the response to lots of different input prompts and can now take your message and predict a useful response!

Glossary

Neural Networks: A complex function that creates output based on the input.

Algorithm: A process that solves a specific problem based on a set of mathematical rules.

Dataset: A structured collection of data with the variables in the columns and the observations in the rows.

Bias: In the context of neural networks, refers to a number connected to each neuron that helps the network to determine the correct category.

Error: A number to represent how wrong the network is when it predicts the categories.

Confusion Matrix: A table that shows the number of correct predictions on the diagonal while the incorrect predictions are in the other cells.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank several of our colleagues at Leiden University and at Karolinska Institutet as well as an anonymous child for their helpful comments on previous versions of this manuscript.

AI Tool Statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Additional Materials

https://python-kr7xiht74wpxndrfffbxzz.streamlit.app

Box 6 - Exercise answers

Exercise 1: To discover patterns, we calculate some averages. Setosa has the lowest petal width (0.2 cm on average) and the highest sepal width (3.25 cm on average). Versicolor has the second highest petal width (1.45 cm on average) and sepal width (3.2 cm on average). Virginica has the highest petal width (2.2 cm on average) and the lowest sepal width (3.0 cm on average).

Exercise 2: The number of output neurons always equals the number of categories. In the case of handwritten digits, these range from 0 to 9, so we end up with 10 categories.

Exercise 3: : Focusing on the weights first, remember that every neuron in each layer is connected to every neuron in the next layer. So, we first multiply the number of input neurons by the number of neurons in the hidden layer: 4 × 4 = 16. Then we multiply the number of neurons in the hidden layer by the number of neurons in the output layer: 4 × 3 = 12. We add 16 and 12 to get a total of 28 weights. For the biases, recall that every neuron has a bias except the input neurons. The number of neurons in the hidden layer is 4 and the number of neurons in the output layer is 3: 4 + 3 = 7 biases.

Exercise 4: At the start, the weights and biases are random numbers, and the network will be "guessing" the categories. There are 3 categories in total, but for each flower only 1 category is correct. If the network were to guess the category for 100 flowers, then on average it would guess the right category 1/3 of the time.

Exercise 5: For the accuracy, we are interested in the number of flowers assigned to the correct category (out of the total number of flowers). This means that we should divide 54 by 60. 54 is part of the multiplication table of 6 because 6 × 9 equals 54. The accuracy is therefore 0.9 or 90%.


References

[1] Menczer, F., Fortunato, S., and Davis, C. A. 2020. A First Course in Network Science. Cambridge, UK: Cambridge University Press.

[2] Newman, M. 2018. Networks, 2nd edn. Oxford, UK: Oxford University Press.

[3] Hebb, D. 1949. The Organization of Behavior. New York, NY: Wiley.