| |
Truncation
and Stemming
Enhanced
performance
The
English language is a strange and wonderful thing. However, most
of what makes it great makes it difficult to search effectively.
This is particularly the case if you are searching on uncontrolled
vocabulary terms in All Fields, Title, Abstract or Subject. That
is why Engineering Information offers a variety of tools to help
find the information you need. The examples given are from Engineering Village 2
but ChemVillage and Paper Village 2 will work the same way.
Let’s assume that you have been using one of the Village
products for some time now, but have been curious about the results
you have been getting, or want to take your searching to the next
level of expertise. Think of this as that difference between using
a point-and-shoot camera for family vacation photos and an experienced
photographer using the best equipment to get the perfect portrait.
Stemming
Stemming
uses an algorithm that determines the suffixes of words and allows
you to search for the term as entered, the root word and other words
formed with other possible suffixes. For example, if you enter the
term controllers, you will get results for:
| • |
controllers |
| • |
control
|
| • |
controlling
|
| • |
controlled
|
| • |
controls,
etc. |
Stemming
will provide you with much broader search results automatically.
You would not need to search for all the variations of the word.
Stemming will not find variants between British and American spellings.
For example color will not find colour or coloured.
But colour will find colourful or colours.
To be inclusive, in this case you would need to search for color
or colour.
When
using Quick Search, terms are automatically stemmed by default,
with the following exceptions:
| • |
Author
names |
| • |
Terms
entered as an exact phrase: in quotes or braces, e.g. “solar
energy” or {solar energy} |
| • |
Terms
being truncated |

Click
on screen to enlarge
When
using Expert Search, terms are not stemmed automatically as they
are in Quick Search. The stemming command is a dollar sign before
each word you want to stem. Unlike Quick Search, where stemming
is either “active” or “inactive,” in Expert
Search you can select individual terms to stem.

Click on screen to enlarge
Truncation
Truncation
is a function that allows you to search for all words that start
with the same set of letters. The truncation symbol is an asterisk.
Place the asterisk next to the last letter of the term you want
to truncate. All terms starting with the same letters as the term
you entered will be found. For example, color* will retrieve:
| • |
color
|
| • |
colors
|
| • |
Colorado
|
| • |
colorimeters
|
| • |
colorimetry
|
| • |
coloring,
etc. |
Truncation and Stemming
Truncation is a function similar to stemming but there are key
differences. Knowing how each function works will help you to
formulate the best possible search and control your output.
Truncation is a manual operation in both Quick and Expert Search
and is based on an exact match of characters. Stemming is based
on a linguistic formula and is automatic in Quick Search and manual
in Expert Search.
Here’s where things get interesting. When should you use
stemming and when should you use truncation and when should you
use neither? The definitive answer is “It depends on the
situation.”
If the word you are searching for is rather unique, stemming works
very well. For example if you search for diode in Quick
Search, chances are you are only going to find diode
and diodes. In Quick Search you will not have to consider
the singular or plural of a word, both will be retrieved. Searching
in Expert Search, it will not matter much if you search for diode*
or $diode. Some very minor exceptions may occur. $diode
will not retrieve diodenlasergepumpte (from the title
of a German language article) or diodecyldimethylammonium
bromide, but diode* will.
Stemming usually works well with words that describe a concept.
Search for managing and you will also get manage, managed, managers,
etc. Manag* will retrieve managanese.
Truncation can lead to some strange results if you are not careful.
Color* will find Colorado. This is a case where
you should probably use $color or $colour.
The best advice I can leave you with is when you get your search
results, look at a few records in the abstract or detailed record
formats. The search terms will be highlighted and you will be
able to see what the system did to your search. You should find
that you now have the tools to modify your search as needed.
As with all of the services offered by Engineering Information,
if you have any questions you can contact me at librarian@ei.org.
Until next time,
Karen Berryman
Staff Librarian
Engineering Information
BACK
TO TOP
|