COVID-19 data sources are volatile - schemas, methods, + errors change often. We need coherent summaries that capture this + smooth it out.
A few high-level composite datasets are released on a regular basis, have a consistent schema and process, and are careful about describing their uncertainties, error margins, and any historical revisions. These have limited scope, and are generally not granular enough, and do not capture enough context, for effective modeling of local spread or of the impact of policy changes.
As a result, there is duplicated effort in constructing local data silos, and a proliferation of poorly-grounded or -trained models, often applied outside of the context for which they were developed.
We need a more thorough solution, that provides such things as:
* a catalog of primary and secondary/aggregate sources of COVID-19 data, in a shared pool , which captures uncertainty and changes
* a shared set of time-series aggregates which note what methods or functions they use to infer a consistent aggregate from those variable sources
* a way for users of data to leave corrections or error-assessments or other annotations on those datasets
* hosted programmable notebooks that anyone can use to generate visuals from public data (which provides precision about what data is used to create them, and helps understand and catch errors or conflicts)
Jul 7, 2020
With a possible partnership with someone who is already doing it. c3.ai believe maintains such a dataset.
Jul 11, 2020
There is a need of these types of data sources which are properly validated - it can be a one stop for all possible Data points
Once the above and other potential data sources are brought in once singular unit, the same database can be used for not only build epidemiological models but also create a policy framework that can help the government to take necessary actions.