In Mosquito Alert we make our data accessible to other researchers and developers. Now, we go a step further, and we also open our collection of tagged photographs that have been validated by our team of expert entomologists. In total, the collection has almost 20,000 photographs of mosquitoes and breeding places. All of them, made by citizens and sent through the Mosquito Alert app since 2014. If you are interested, they are available on the GitLab platform. The database will be updated annually with new images.
You can download all the images and their attached file with the code of the image, the date, the location where it was taken, how it was classified by the participant and the different validation categories of the experts.
These correctly classified images are a great resource for image recognition programs that want to explore the problems of identifying and classifying objects in an image. Mosquito Alert provides a large amount of data so that an Artificial Intelligence algorithm can be “trained”, adjusted, learned and improved, in what is known as machine learning or automatic learning.
How is a machine taught to recognize objects?
Today there is a widespread demand for systems with an advanced Artificial Intelligence, which are capable of processing the large amount of data that is generated continuously. This is happening in all sectors, since there are many activities that can benefit from an intelligent and automated analysis of the data.
One of the keys of Artificial Intelligence is learning. Increasingly, programmers ask machines to learn by themselves, as it is literally impossible to pre-program instructions that take into account the infinite combinations of situations that occur in the real world. Therefore, it is necessary that the machines learn from their own experience, a strategy called machine learning.
Recognition of patterns and interpretation of images is one of the most promising tasks of machine learning. This technology can be used to verify a user by recognizing his face, his fingerprint, helping in diagnosing diseases by identifying pathologies in medical images, recognizing handwritten or printed characters and interpreting photographs of all kinds.
However, for a computer to be able to visually identify an object is not easy. The process involves reproducing the same skills that humans have. If somebody shows us a photo of a dog, whatever the race, we can identify it is a dog in the picture. Even if the photo is taken from one angle or another, if the animal is visible from one side or from behind, from above or from below. Even if the animal does not appear complete in the photo. With a detail of its face, its silhouette, or any other detail, we are able to recognize it is a dos. Computer scientists want to make a computer capable of doing the same.
To do this, they develop algorithms that allow the machine to learn from their experiences. A process of learning that reminds us a lot of how we learn ourselves as children, based on reinforcement, positive and negative. In machine learning systems, as with children, the behaviors that are rewarded tend to increase their likelihood of recurrence, while the behaviors that are punished stop repeating and tend to disappear.
It is what is called supervised learning, which requires the intervention of people to go indicating to the machine if the identification has done is right or wrong. In all these cases, it is humans who know before the machine what are the properties of the object to be classified, for this they must label many images so that the machine can learn.
In order for algorithms to improve their ability to correctly identify and interpret images, they must have a huge set of training images provided by humans. From this training experience, the algorithms will be able to generalize and begin to classify the images without human intervention.
Deep learning or deep neural learning
The most promising current learning is what is known as deep learning or deep neural learning, a process inspired by human thought. Unlike supervised learning, in which humans must extract the characteristics of the object that the computer recognizes, depending entirely on the ability of the programmer to define a set of characteristics that define, for example, what a “tiger mosquito” is. The advantage of deep learning is that the program itself manufactures the feature set without human supervision. Something faster and generally more accurate (Fig. 1).
For it to work, the program needs a lot of training data. Many previously labeled images that say if it is “tiger mosquito” or “no tiger mosquito”. With these first labels the program creates a set of characteristics for “tiger mosquito” and builds a predictive model. In this first step the computer could predict that anything with two wings, black and white spots, should be labeled as “tiger mosquito”. Without being conscious, it has generated some labels “wings”, “black body”, “white spots”, and looks for patterns of pixels in the digital data that has those properties. With each new image, the model is adding new “categories”, building its own “concept” of tiger mosquito, which become more complex and precise, improving the predictive model.
Currently, the main brake of the power of these algorithms is the training process itself, because this requires a good database with images that have been previously labeled by people. Without this, the algorithms cannot learn effectively.
It is at this point, where the data offered by Mosquito Alert are valuable, since they represent thousands of images of mosquitoes and breeding points that have been validated and labeled by experts (Fig. 2). A valuable material to train algorithms in their learning process.
Mosquito Alert, a 100% Open Science project
Mosquito Alert wants the science it makes to be reused by other researchers or citizens, offering all its data in open and accessible. The software used in the Mosquito Alert application is free and open source, distributed under a license that allows to use, change, improve the software and redistribute it, either in its modified form or in its original form.
All the data collected through the application is made public, both in an interactive online map, from which you can download the data, as in the Zenodo platform, where they are distributed under CC0 (Creative Commons 1.0 Universal) license.
Now, it continues to open its data, offering its entire set of tagged photos so that they can be used by other lines of research and thus contribute to artificial intelligence projects in machine learning.
Mosquito Alert shares all its data and software according to its Open Science philosophy. Open science is a paradigm shift in the way of doing science. They do not change their motivations and goals, but their methods do. It does not change what is done but how it is done. The main objective of this movement is to make science open, collaborative and made “with and for” society.
Open science is a paradigm shift in the way of doing science
In the context of open science, what is open is any result of the research, both the results published in scientific papers and the original data, as well as the tools and instruments used. This implies that the used programming codes are also made public.
Offering all this information makes the whole process more transparent, since it allows other researchers to review the entire process, as well as to reuse the data and code to carry out their own investigations.
In this way, they can be given a new use to data and research, winning the whole of the scientific community and with it the whole of society. Mosquito Alert wants its science to be reused by any other, offering its data in open, allowing its new use, while also making it possible to verify and reproduce its results.