Google is now indexing CSV files, although they previously used CSV data through structured data to improve the search interface
Google has quietly updated the Google Search Center documentation to note that .csv files are now indexed.
This opens up a new way of discovery. If publishers don't want to scan their .csv files, this may involve updating the robots.txt file to exclude them.
Comma Separated Values (CSV)
Comma Separated Values (CSV) files are text files that save data in a tabular format that can be viewed as a spreadsheet.
A CSV file contains plain text data, meaning a CSV file does not contain style elements such as fonts, images, or active links.
They are useful when doing things like uploading a list of URLs for crawling to software like Screaming Frog.
But they are also useful for organizing data in spreadsheets.
CSV file indexing is new
Google's ability to index CSV files is a new feature because a Google search for "file type" for CSV files currently doesn't return CSV files.
Searches like this currently don't return a CSV file:
- filetype:csv site:.com
- filetype:csv site:.gov
- filetype:csv site:.edu
Google has indirectly used CSV files before
The curious thing about Google indexing CSV files is that Google's dataset search interface used CSV files but apparently only when described with structured data. , The Dataset Structured Data Document on the Old Google Developer Docs (available on Archive.org) states that CSV files are the accepted standard for appearance in search features data set.
The use of tabular data as a search interface dates back to 2018, when Google announced that it would show this type of data in search when the data is accompanied by structured data.
According to the original documentation:
“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats are provided as structured data…Here are some examples of what can qualify as a dataset:A table or a CSV file with some data
An organized collection of tables
A file in a proprietary format that contains data
A collection of files that together constitute some meaningful dataset
A structured object with data in some other format that you might want to load into a special tool for processing
Images capturing data
Files relating to machine learning, such as trained parameters or neural network structure definitions
Anything that looks like a dataset to you”
Google updated the above document in 2022 and redirected it to the new Search Center document.
Updated documentation makes it more clear that Google is relying on structured data to use CSV files in their dataset search interface. But does this change mean that Google will crawl CSV files and use them for search (in addition to the tabular data noted in structured data)?
This is what the current documentation explains today:
“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats as structured data.Google’s approach to dataset discovery makes use of schema.org and other metadata standards that can be added to pages that describe datasets…Here are some examples of what can qualify as a dataset:A table or a CSV file with some data…”
Google indexed CSV related to recent update?
The definition of a core algorithm update is when Google makes “significant” and “significant” changes to its core algorithm.
It could be a coincidence that the indexing of the CSV file and the main algorithm update happened at roughly the same time.
But it might be worth asking if Google has improved its crawler to be able to index CSV files, or if this feature already exists.
Read an updated list of indexable file types:
Read the Google Central Search Dataset documentation: ,