Human-Centered Data Discovery: Applications for Data Curation

Making data available – even well-managed data – does not mean that data can or will be reused. We argue that data must first be discovered, evaluated, and understood before they can be reused for particular purposes.

These practices of data discovery require different types of work. Data creators must document and share their data; data curators and repository managers must clean, enrich and select data; and ‘data seekers’ must find and make sense of data for reuse.

In other words, data discovery involves processes of both making data discoverable, as well as discovering data. Both of these perspectives need to be taken into account in order to facilitate the reuse of data.

We recently conducted a workshop for the Data Curation Network (DCN), a community of data curators from various academic and non-profit data repositories in the United States, to explore how understanding data discovery practices can inform curatorial work. Taking our recent short book as a starting point, we presented theoretical and empirical work exploring the concepts of ‘data needs’, data-centric sensemaking, and different conceptions of data quality. We discussed how these are situated within different types of data communities and hence not always easy to define.

In interactive group discussions, we discussed how data curators and repositories choose where to focus their curation and design efforts. Themes from the discussions included the question of meaningful metrics to measure data reuse, required curation skills, and how repository design facilitates reuse. Participants highlighted the human side of managing data, as well as barriers to capturing feedback from data reusers.

The importance of building relationships with data communities who both produce and reuse data as well as how to communicate the value of data curation and reuse to different stakeholders emerged as important topics. In light of data descriptions, monitoring, and curation, we also speculated about the promises and risks of AI solutions in this space.

Three main takeaways

The workshop provided an opportunity for data curators to learn a bit more about our research, but it also gave us insight into curation practice which we will take forward in our work.

  • Data curation is a translation activity. Data curators and researchers often view the value of data and data management differently. While researchers may view data management as a box-ticking exercise, data curators are motivated to preserve and archive data in ways that serve a community. The translation work involved between these two perspectives is vital, but not something that can be taken for granted or which is easily achieved.
  • Institutional repositories, operated by individual universities, seem to occupy a unique position in the landscape of data platforms and portals. While other data platforms foreground the potential reuse of data (e.g. Kaggle or governmental data portals), institutional repositories seem to be more archival in nature or are used as a means to share data in order to meet policy or funder requirements. Despite this perception, there is interest and potential in bringing a ‘reuse’ perspective to institutional repositories.
  • Collaboration within the DCN is a positive example of inter-organizational support which can save resources between curators and add additional value by improving the quality of data published within the network.

Interested in learning more?

Gregory, K., & Koesten, L. (2023). Human-centered data discovery. Springer Nature.

Contact

Reseach Group Visualization & Data Analysis
University of Vienna
Sensengasse 6, 1090 Vienna