Why can I not find specific content when searching the HTML export?
When searching in a generated HTML export, specific search terms in some languages may not yield any results. This is due to the way in which the search library within Scroll HTML Exporter splits words during indexing and searching.
Scroll HTML Exporter uses a generic algorithm that works for most of our supported languages. However, since the app cannot know the language of the export content, sometimes search issues can arise. For instance, for Japanese content, the export may not always split words correctly - since Japanese content usually does not contain spaces between words.
Enhancing the search results
To improve the search results it is possible to apply some additional characters/formatting to enhance the search results. These advanced search options include:
- Using Wildcards: by adding the * character when searching, such as exp*, it will match all words including exp such as "exporter", "expression", etc..
- Search only in title or body: by searching title:exporter it will only search for exporter in the exported page titles. Alternatively, using body:exporter it will only search for exporter in the page content
- Boosting search terms: By using the ^ character alongside a number, such as pdf^10 exporter, the search will look for pages with the terms pdf or exporter but will score pages with pdf 10 times higher - meaning they will be displayed higher up in the search result
- Fuzzy searching: By using the ~ character alongside a number, the search results will produce results with the number of characters changed, added or removed. For instance, export~2 would also yield results for import (changed 2 characters), exporter (added 2 characters), and expo (removed 2 characters)
- Term presence: By default, searching pdf export would yield search results for pages containing the terms pdf OR export. However, by using the + character alongside the search term, such as +pdf +export, would display search results for pages that contain the terms pdf AND export. Additionally, using the - character, such as -pdf export would search for pages containing export but not pdf