Search Engine Land » Google » Google SEO »
It used to be that, if you hoped Google would index a PDF file, you had to create a PDF that was text-based, not image-based; Googlebot couldn’t recognize the content of scanned or image-based documents. According to an announcement today, that’s no longer the case.
Google says it’s now using OCR (Optical Character Recognition) technology to read any scanned documents that it finds in PDF format:
This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found.
Google’s announcement includes a few examples where you can see the results of OCR scanning in action. On a search for repairing aluminum wiring, the first result is a Consumer Product Safety Commission PDF that was clearly scanned as an image. You can now get the text of that image thanks to Google’s OCR scanning and the “View as HTML” link on the search results page. As with any use of OCR, results are probably not going to be perfect. But the examples Google provides do look quite accurate.
Countless new documents are now available to searchers — documents that were never available before. On the other hand, if you’ve been scanning and uploading image-based PDFs knowing that they’d never be found by searchers — and I know people who have — you may want to rethink that strategy.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
New on Search Engine Land
About The Author
Get the daily newsletter search marketers rely on.
Learn actionable search marketing tactics that can help you drive more traffic, leads, and revenue.
August 16-17, 2022: Master Classes
September 29-30, 2022: SMX Advanced Europe
November 15-16, 2022: SMX Next
March 15-16, 2023: SMX Munich
Discover time-saving technologies and actionable tactics that can help you overcome crucial marketing challenges.
Start Discovering Now: Spring (virtual)
September 28-29, 2022: Fall (virtual)
How Google Paid Search Automation Has Changed the Game for Marketers!
Harness Your First-Party Data For Customer Acquisition & Conversion
Tracking Growth From Organic Search
Enterprise Marketing Performance Management Platforms: A Marketer’s Guide
Enterprise Customer Journey Orchestration Platforms: A Marketer’s Guide
Enterprise Account-Based Marketing Platforms: A Marketer’s Guide
The Bar is Raised for B2B Search – 5 Big Challenges of B2B Search
Receive daily search news and analysis.
© 2022 Third Door Media, Inc. All rights reserved.
Google Using OCR To Index Scanned Documents – Search Engine Land