Applying cluster analysis and Google Maps in the study of large-scale species occurrence data
biodiversity informatics, cluster analysis, species occurrence data, visualization
The primary species occurrence data include the data on animal and plant specimens in museums and herbaria, as well as species observations. TaiBIF (Taiwan Biodiversity Information Facility) data portal has integrated 26 datasets so far, resulting in more than 1.5 million species occurrence data; 85% of them are geo-referenced. This study utilizes more than 8,800 Cyprinidae occurrence data from 11 datasets and uses three different types of clustering algorithms—grid-based, partition-based, and density-based—to produce different spatial visualization results. It aims to resolve the problems of efficacy and poor visualization when large scales of species occurrence data are presented in Google Maps. The study also explores the compara- tive differences between the results obtained from the three clustering algorithms and the expert opinion range maps of Cyprinidae. It hopes to identify a quick and efficient way to present species distribution data, in turn help researchers to extract knowledge from large amount of data so that the knowledge can be tapped as important reference for ecological conservation efforts.