How does an artificial intelligence actually learn?
The transmission of knowledge has always helped mankind to become more efficient from generation to generation. Even in the world of artificial intelligence, knowledge learned in one place can be reused in another context. In this guest article, our AI engineer Manuel Hoffmann outlines how to turn a general AI into an individual AI that answers very specific questions in three simple steps.
How can an AI transfer what it has already learned to new use cases?
With neural networks, this happens quite simply in three steps. For knowledge transfer, an already trained neural network is first used. Subsequently, a part of this model is frozen. This part of the model remains unchanged in the remainder of the model and serves as the “knowledge base” for the rest of the model. In the last step, the model is adapted to the new application and with the help of the new data.
And what does knowledge transfer in the AI world look like in concrete terms?
Just as simple as it sounds! For this example, let’s assume that in the production of metal plates have recorded some images for quality assurance. Defects in production have different effects and can be, for example, inclusions, scratches and holes. The aim of the application is to classify the production defects into their appropriate category based on the images. Since this is an image processing task, we first search (e.g., on huggingface.co or in the torchvision package) for a suitable image processing model. We come across the VGG-19 model and decide to use it because of its compact size and high accuracy. The model has already been trained on the Imagenet dataset with over 14 million annotated images. In the second step, the model is divided, into the so-called feature extraction part and the output part. The feature extractor includes the preparation of the data and transfers it into a highly abstracted representation. Since feature extraction is independent of the specific application at the output of the neural network, this part of the model can remain unchanged. However, the model output must be adapted to both the new task and the changed data. Since there are exactly six different production defects in the dataset, we adjust the output to exactly six classes. Now we can further train the adapted model on the images from quality assurance and after a short time we can already recognize the correct defect class with convincing certainty.
Since we access an already trained model, we can achieve initial results in a short time and with minimal effort. In addition, we can enormously reduce the complexity of model development. This also makes our program code much leaner and easier to understand. This makes it much easier for us to maintain and further develop the AI. Since we don’t have to train our model from scratch, we save a lot of the energy costs spent on training.
Got it! But when does that make sense?
Always. Or whenever we find a pre-trained model. Since there are now very many pre-trained models from different domains under corresponding usage policies, this requirement is met in most use cases. Especially when we have only a few training examples or only partially annotated datasets available, it makes sense to rely on what we have learned from a pre-trained model.
To see the outlined example in action, we have attached the program code here for you to try out for yourself. Have fun diving into the world of AI.