Apply a Similarity Search in Custom Analysis
Analyze and Evaluate Results of Similarity Search
Introduction
PatentSight’s Similarity Search Workflow
Based on a homogenous set of patents (seed patents or seed tag) defined by the user, the Similarity Search finds patents which belong to the same technology field as the seed patents.
The Similarity Search is a useful tool for patent researcher as well for users who have at least some expertise in a technology field for the following tasks:
- Definition of a technology field (Prior art search, FTO search, patent monitoring etc.)
- Verification of a defined technology field (Cross-check, sensitivity analysis etc.)
Please be aware of the quality of the results strongly depends on the homogeneity of the seed patents, the accuracy of the classification systems and the coverage with citations in the respective technology fields. The quality of the results may differ depending on the technology field.
The Similarity Search is not designed to search for several technology fields at once. E.g., it is not designed to find enterprises active in the same set of technology fields. Therefore, it is not recommended to use the whole portfolio of a company as seed tag – in most cases this will not lead to satisfying results.
To define several technology fields, the Similarity Search needs to be run several times: individually for each technology field.
Important: Make sure to always review the results you obtain from a Similarity Search !
The Similarity Search supports you in defining a technology field by searching for similar patents. The search builds on technology classes (IPC, CPC and F-Terms) and patent citations.
Depending on the use case, the defined technology field(s) can be used as final result set or as starting point for further analyses using the Custom Analysis.
In order to find similar patents, the Similarity Search first calculates a theoretical “ideal” patent based on the seed tag. This “ideal” patent serves as center to find patents that surround it at various distances. The distance from this center is expressed by the Similarity Score, with 1 being most similar (close to the center) and 0 not being similar at all.
If the seed tag is not homogenous and includes, e.g., patents belonging to technology A and technology B, the Similarity Search will calculate this central “ideal” patent as a technology between these technologies. As a result, the search may miss out both technology A and technology B and instead find patents belonging to a field it interprets as in between.
Seed Tag
You need to create a seed tag of relevant patents, i.e. patents that you know are relevant in a technology field in which you want to search for similar patents. The tag should contain a minimum of 10 patents. The patents should be homogenous in technology. We do not recommend tagging the entire portfolio of an owner, as this may lead to a tag containing patents of various technologies and therefore to poor results of the Similarity Search.
Irrelevant Tag(s)
Optionally, you can create one or more tags of irrelevant patents, i.e. patents that you know are irrelevant in a technology field you want to perform the Similarity Search in. This may be, e.g., patents that belong to the technology field of your Similarity Search (e.g., “wind turbine blades”) but cover a specific detail you are not interested in (e.g., “wind turbine blade transportation equipment”). The tag should contain a minimum of 10 patents. If you want to tag several irrelevant technology fields, we recommend you make a separate tag for each technology field.
Using Similarity Search
The Similarity Search Assistant, which guides you step-by-step through the tool can be activated and deactivated
Similarity Search Start Menu
First Stage: Choose seed patents
Second Stage (optional): Choose irrelvant patents
Third stage (optional): Review sample patents
Fourth stage: Determine technology field scope and finalize Similarity Search
Apply a Similarity Search in Custom Analysis
After a successful Similarity Search, your search area will contain the tag field(s) generated by the Similarity Search
By default, the filter includes inactive patents and other IP rights
Analysis Example 1: Seed Tag Review
Analysis Example 2: Macro Level
Analysis Example 3: Micro Level
Analyze and Evaluate Results of Similarity Search
Technology field size determination is the fourth stage of Similarity Search and is presented to you in this overview
The Recall-Precision-Graph
PatentSight suggests a division into three fields:
“Narrow” has, in this case, a default cutoff value of 0,80 and strongly focusses on the relevant technology field including very few irrelevant patents but also a smaller share of relevant patents than the other fields.
“Medium” has, in this case, a default cutoff value of 0,70 and includes a large share of relevant patents but also more irrelevant patents.
“Broad” has, in this case, a default cutoff value of 0,40 and includes even more irrelevant patents. It may also include patents from other, though related, technology fields.
However, you can adjust these thresholds in regards to your needs.
By default, PatentSight suggest a division into three fields: Narrow (0,8), Medium (0,7) and Broad (0,4).
In general, the selection of the cutoff value totally depends on the analyzed technology field. As a rough threshold, precision and recall should be above 80% for the narrow field. However, this might change within a different technology field or set of seed patents.