How can we forge coherent datasets from a sea of inconsistent sources? by metasj

Summary

COVID-19 data sources are volatile - schemas, methods, + errors change often. We need coherent summaries that capture this + smooth it out.

Description

A few high-level composite datasets are released on a regular basis, have a consistent schema and process, and are careful about describing their uncertainties, error margins, and any historical revisions. These have limited scope, and are generally not granular enough, and do not capture enough context, for effective modeling of local spread or of the impact of policy changes.

As a result, there is duplicated effort in constructing local data silos, and a proliferation of poorly-grounded or -trained models, often applied outside of the context for which they were developed.

We need a more thorough solution, that provides such things as:
* a catalog of primary and secondary/aggregate sources of COVID-19 data, in a shared pool , which captures uncertainty and changes
* a shared set of time-series aggregates which note what methods or functions they use to infer a consistent aggregate from those variable sources
* a way for users of data to leave corrections or error-assessments or other annotations on those datasets
* hosted programmable notebooks that anyone can use to generate visuals from public data (which provides precision about what data is used to create them, and helps understand and catch errors or conflicts)

0 supports

Contribution summaryshow/hide

How can we forge coherent datasets from a sea of inconsistent sources?

Team contribution: Only members listed on the contribution's Contributors tab will be able to edit this contribution. Members can request to join the contribution team on the Contributors tab. The contribution owner can open this contribution for anyone to edit using the Admin tab.

By: Samuel Klein

Challenge: Open Discussion: Critical problems related to COVID 19
What are the most critical problems you are seeing for individuals, communities, businesses, governments, or the world related to COVID-19?

2comments

Jump to comment form

Share conversation:

Rashmi Paralkar

Jul 7, 2020
05:07

Member

1 |

With a possible partnership with someone who is already doing it. c3.ai believe maintains such a dataset.

Krishnendu Dasgupta

Jul 11, 2020
03:49

Member

2 |

There is a need of these types of data sources which are properly validated - it can be a one stop for all possible Data points

Country wise - State Level - County Level - City Level - Daily reporting of Covid Cases - authorized by either government or public managed single source
Country wise Economic Indicators - GDP, Informal Economy , PMI, Interest Rate , etc.
Country wise - State Level - County Level - City Level - COVID Centers , Vulnerable Communities , Law order Units , Help Units, Hospitals
Population Density Maps - Ideally from Facebook Data for Social Good
Foot Movement based on people and vehicular traffic
Country Wise - State Level - County Level - City Level - Policies getting enforced in different parts to contain the spread
Genetic Mutation and Gene Transformation Data - from NIHM
Regulated Guidelines only from WHO and CDC
Important Ciculars from Leading newspaper about changes in City Administration

Once the above and other potential data sources are brought in once singular unit, the same database can be used for not only build epidemiological models but also create a policy framework that can help the government to take necessary actions.

ADD YOUR COMMENT

You must be logged into your account to post a comment.