-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Overview
According to npm statistics (and Git Issues), many users are still using Tesseract.js v2. Version 2 was released in 2019 and includes many bugs, memory leaks, and performance issues that have been fixed in subsequent versions (in some cases v2 is 20x slower than the current version), so updating is strongly recommended. Additionally, v2 is no longer supported, so updating is a requirement to receive support in Git Issues.
While the changes made in each release are fully documented, to make upgrading as easy as possible, below is a guide describing all changes that v2 users may need to make to use the latest version. This guide describes the process of upgrading from v2 to v5. If (for whatever reason) you wish to update from v2 to v4, see the comment below.
Changes Impacting Most Users
createWorkeris now async- In most code this means
worker = Tesseract.createWorker()should be replaced withworker = await Tesseract.createWorker()
- In most code this means
- The arguments to
createWorkerhave changed--the first two arguments are now language andoem- E.g.
createWorker('eng', 1, { logger: m => console.log(m) })
- E.g.
worker.load,worker.loadLanguage, andworker.initializeare no longer needed- Simply delete these functions from existing code
Changes Impacting Fewer Users
- Electron users
- Use the browser version of Tesseract.js
- In v2, many users used the Node.js version
- Use the browser version of Tesseract.js
- Users of
getPDFfunction- This function has been replaced by
pdfrecognize option (GetPDF() with Scheduler returns the same PDF file #488) - See browser and node examples for usage
- This function has been replaced by
- Users who set
cacheMethod: 'none'orcacheMethod: 'refresh'as workaround for caching bug- This workaround can be removed, the underlying bug has been fixed (see this comment)
- Users who set the optional
corePathargumentcorePathmust be pointed to a directory containing all 4 of the following files from Tesseract.js-core v5:tesseract-core.wasm.jstesseract-core-simd.wasm.jstesseract-core-lstm.wasm.jstesseract-core-simd-lstm.wasm.js
- Node.js <14 users
- Node.js v14 is now the earliest version supported
- Users of
worker.detectfunction- This function is now disabled by default
- To enable, set arguments
legacyCore: trueandlegacyLang: trueincreateWorkeroptions- E.g.
Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true})
- E.g.
- Users who implemented progress bars using log messages
- The language used in logs was standardized, so any scripts that parse logs may need to be updated