This week, search engine giant Google announced a new search engine for the scientific community that will help them make search millions of datasets present online.
The service which is called Dataset Search, it trawls the millions of open data repositories on the web for desired datasets. It looks on publisher sites, digital libraries, and on author’s personal web pages, among other places. But it relies on dataset publishers to correctly label their datasets with the appropriate information, or metadata tags, as their otherwise known. The information gotten will aid scientists, data journalists and geeks find the data required for their work and their stories — or simply to satisfy their intellectual curiosity. The new search engine will work like Google Scholar, the company’s popular search engine for academic studies and reports.
Natasha Noy, Research Scientist, Google AI, said in a blog post.
“Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page,” “These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc,” Noy said. “To create Dataset search, Google developed guidelines for dataset providers to describe their data in a way that the company (and other search engines) can better understand the content of their pages.
“We then collect and link this information, analyse where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.” Writes moy.
How it works
Supposing you wish to know about car accidents and the related datasets or an earthquake happening around where you stay, you will simply need to type the query and Google will list all the sources as shown below. Simply click on any source on the left and get more detail:
To make sure that the datasets are accessible via Google’s tool, the company recommends the institutions to adopt the open source Schema markup standard. It allows publishers to include machine-readable data like date of publication, how data was collected, the terms of usage, etc.
The datasets from organisations like NOAA, NASA, Harvard Dataverse, ProPublica, etc., are already accessible via the tools and more data providers are expected to extend the support.
The Google Dataset Search beta website which is now available in multiple languages.