Top of its class for automated plankton identification
8 October 2020
There are tens of thousands of marine and freshwater plankton species around the world. Together, they represent a staggeringly diverse collection of organisms that differ in their size, shape and ecological function. It is important to monitor which species are present where and in what numbers, as this can give a good indication of how an ecosystem is functioning. Monitoring information can also be used to highlight subtle changes that may be occurring, such as shifts in plankton distributions. A plankton species that appears or disappears in a given area can alter the associated food web and organisms further up the food chain may need to adapt their foraging behaviour or find a new source of food. Furthermore, some species of plankton can produce so-called Harmful Algal Blooms (HABs) with potentially wide-ranging implications, including fish die-offs. In turn, these events can impact activities such as commercial fishing and recreation.
Traditionally microscope analysis has been used to identify the different plankton species within water samples. This is a highly-skilled role that is time consuming and labour intensive. Automated imaging techniques speed up this process but can also lead to the accumulation of huge numbers of plankton images that, if not processed quickly, can result in a bottleneck in analysis. Given these restrictions, and recent advancements in artificial intelligence and deep learning, considerable effort has been applied to automating the various steps involved in species classification. However, previous attempts, which have used a variety of techniques including the application of single Deep Learning models, have suffered from accuracy issues, especially for rarer species.
A group of scientists from Plymouth Marine Laboratory (PML) and the Universities of Exeter and Glasgow, demonstrate that by bringing multiple Deep Learning models together to work as a team within a single classification system, classification accuracy improves significantly when compared with the performance of individual models, again especially for rarer species.
They began the work by collecting live plankton from Station L4 in the Western English Channel which was analysed using the automated plankton imaging system FlowCam (see below for more detail on FlowCam). Digital image libraries were compiled containing 11,371 particles, covering 104 taxonomic groups, however, the image set contained a severe class imbalance, with some species groups (taxa) being represented by > 600 images while other rarer taxa were represented by just 14.
Thomas Kerr, lead author of the paper and PhD student at PML, commented: "Sampling plankton from the natural environment can result in a severe (class) imbalance, where rare species are sampled infrequently. Consequently, attempting to construct an automated plankton classification system remains a challenging problem since there is not enough training data from which to learn. In this work we introduce the idea of using collaborative Deep Learning models to help address this issue. By training a function that teaches different unique deep learning models to work collaboratively, the final classification model was able to significantly improve results in difficult classes."
Class imbalance is an issue common to many automated classification tasks, yet often rarer objects are scientifically interesting and living with poor classification accuracy for them is not an option. In the case of plankton identification, it is important to quantify the presence of rare taxa in order to establish an accurate measure of plankton biodiversity. Furthermore, rare taxa may become numerically dominant or disappear completely at some point in the future, as environmental conditions change. Being able to monitor changes in species abundance is important to understanding their growth dynamics and ultimate impact on the food web. For these reasons, it is important to build tools that work equally well for both the rare and abundant taxa.
In the study, the top collaborative model achieved a 3.2 % improvement in overall accuracy, resulting in a 97.4 % overall accuracy score. By using this new tool, once fully trained, it is estimated that a plankton net haul that traditionally would have taken about an hour to classify, could be classified in seconds.
Dr James Clark, co-author and Marine Ecosystem Modeller at PML, said: “Automated imaging technologies are now able to generate a tremendous number of images of marine plankton. Yet without a rapid, high efficacy means of automatically identifying which organisms are present within a given image, the data becomes difficult to handle. Here we have used recent advances in computer vision research to build new models that automatically classify plankton in image data, with promising results. It is hoped such models will eventually be used to support the creation of a fully automated plankton imaging and classification system that will allow us to efficiently and effectively monitor which marine planktonic organisms are present where, and how their abundances change in time."
Dr Nicolas Pugeault, supervising co-author and Principal Lecturer in the School of Computer Science at the University of Glasgow, stated: “The field of machine learning has seen ground-breaking advances over the last decade, especially in the domain of image analysis and pattern recognition. The development of powerful algorithms, such as deep neural networks, and the availability of vast computational power has allowed the development of robust solutions to the tasks of visual recognition. But more importantly, this work demonstrates that such algorithms can now be used effectively across scientific fields of study to automate time consuming tasks.”
Elaine Fileman, Plankton Ecologist at PML, commented: “Using FlowCam on a regular basis for numerous projects at PML generates hundreds to thousands of images per single sample analysis. Automating the process of classifying these images will undoubtedly save a lot of time, but at the end of the day, we will always need taxonomic knowledge to verify plankton identification.”
Claire Widdicombe, Plankton Ecologist at PML, continued: “Identifying individual plankton species is time-consuming and requires a high level of taxonomic expertise, which can take years to obtain and can be easily lost when analysts retire. Novel machine learning techniques provide the opportunity to capture this wealth of knowledge and greatly increase the speed of identification in a fraction of the time, which will allow the capacity for plankton monitoring programmes to be significantly increased.”
There is still more to do, in particular, testing the system on different datasets from around the world and testing the tools on smaller phytoplankton. However, it is believed that the method implemented is a significant step forward in bringing an automated, high efficacy plankton imaging and classification system closer to fruition.
Collaborative Deep Learning Models to Handle Class Imbalance in FlowCam Plankton Imagery published in the Institute of Electrical and Electronic Engineers (IEEE) Access journal
FlowCam: Since its purchase in 2012, the PML FlowCam has supported the lab work of 7 Phd students, 6 masters students and 5 visiting scientists. FlowCam is currently used to analyse live plankton net hauls collected at Station L4, which on average is about 40 net hauls a year and generating approximately 1.8million images over the past 5 years. FlowCam images not only plankton but also anything else present in a water sample, such as silt, sand grains, faecal pellets, pollen, fibres and plastics.