What’s changing
Cloud Search now supports Optical Character Recognition (OCR) based text extraction for PDFs that contain images, such as:
- Physical contract documents
- Engineering documents that contain annotations or labels
- Physical customer invoices, and more
This makes PDFs with images containing text, such as scanned documents, easily searchable by users and improving discoverability of such PDFs.
Who’s impacted
Admins and end users
Why it’s important
Many critical business documents are either in physical form or as scanned versions of those physical documents. With OCR support, admins can now easily index these documents for Cloud Search, making it easier for users to quickly find relevant scanned documents.
In addition, this feature eliminates the need to extract the text offline from PDFs containing images before indexing these documents on Cloud Search.
Getting started
- Admins: The feature is ON by default. Use this guide to learn more about how to use enhanced search for PDFs containing images. Important Note: PDFs must be submitted using the Asynchronous Indexing mode and must contain only images.
- End Users: No user action is required
Rollout pace
- Rapid Release and Scheduled Release domains: This feature is available now for all users.
Availability
- Available to Google Workspace Enterprise Plus and Google Cloud Search customers
- Not available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Education Fundamentals, Education Plus, Frontline, and Nonprofits, as well as G Suite Basic and Business customers