A survey and classification of publicly available COVID-19 datasets

Dutta, Biswanath ; Das, Puranjani ; Mitra, Sushmita


The current study curates a list of authentic and open-access sources of alphanumeric COVID-19 pandemic data. We have gathered 74 datasets from 42 sources, including sources from 18 countries. The datasets are searched through the Kaggle and GitHub repositories besides Google, providing a representation of varieties of pandemic-related datasets. The datasets are categorized according to their sources- primary and secondary, and according to their geographical distribution. While analyzing the dataset, we came across some classes in which the datasets can be categorized. We present the categorization in the form of taxonomy and highlight the present COVID-19 data collection and use challenges. The study will help researchers and data curators in the identification and classification of pandemic data.


COVID-19; Classification; Curation; Datasets; Metadata

