Truncation and Stemming
Enhanced performance

 


Truncation and Stemming
Enhanced performance

The English language is a strange and wonderful thing. However, most of what makes it great makes it difficult to search effectively. This is particularly the case if you are searching on uncontrolled vocabulary terms in All Fields, Title, Abstract or Subject. That is why Engineering Information offers a variety of tools to help find the information you need. The examples given are from Engineering Village 2 but ChemVillage and Paper Village 2 will work the same way.

Let’s assume that you have been using one of the Village products for some time now, but have been curious about the results you have been getting, or want to take your searching to the next level of expertise. Think of this as that difference between using a point-and-shoot camera for family vacation photos and an experienced photographer using the best equipment to get the perfect portrait.


Stemming

Stemming uses an algorithm that determines the suffixes of words and allows you to search for the term as entered, the root word and other words formed with other possible suffixes. For example, if you enter the term controllers, you will get results for:
controllers
control
controlling
controlled
controls, etc.

Stemming will provide you with much broader search results automatically. You would not need to search for all the variations of the word.

Stemming will not find variants between British and American spellings. For example color will not find colour or coloured. But colour will find colourful or colours. To be inclusive, in this case you would need to search for color or colour.

When using Quick Search, terms are automatically stemmed by default, with the following exceptions:
Author names
Terms entered as an exact phrase: in quotes or braces, e.g. “solar energy” or {solar energy}
Terms being truncated


Click on screen to enlarge

When using Expert Search, terms are not stemmed automatically as they are in Quick Search. The stemming command is a dollar sign before each word you want to stem. Unlike Quick Search, where stemming is either “active” or “inactive,” in Expert Search you can select individual terms to stem.


Click on screen to enlarge


Truncation

Truncation is a function that allows you to search for all words that start with the same set of letters. The truncation symbol is an asterisk. Place the asterisk next to the last letter of the term you want to truncate. All terms starting with the same letters as the term you entered will be found. For example, color* will retrieve:
color
colors
Colorado
colorimeters
colorimetry
coloring, etc.


Truncation and Stemming

Truncation is a function similar to stemming but there are key differences. Knowing how each function works will help you to formulate the best possible search and control your output.

Truncation is a manual operation in both Quick and Expert Search and is based on an exact match of characters. Stemming is based on a linguistic formula and is automatic in Quick Search and manual in Expert Search.

Here’s where things get interesting. When should you use stemming and when should you use truncation and when should you use neither? The definitive answer is “It depends on the situation.”

If the word you are searching for is rather unique, stemming works very well. For example if you search for diode in Quick Search, chances are you are only going to find diode and diodes. In Quick Search you will not have to consider the singular or plural of a word, both will be retrieved. Searching in Expert Search, it will not matter much if you search for diode* or $diode. Some very minor exceptions may occur. $diode will not retrieve diodenlasergepumpte (from the title of a German language article) or diodecyldimethylammonium bromide, but diode* will.

Stemming usually works well with words that describe a concept. Search for managing and you will also get manage, managed, managers, etc. Manag* will retrieve managanese.

Truncation can lead to some strange results if you are not careful. Color* will find Colorado. This is a case where you should probably use $color or $colour.

The best advice I can leave you with is when you get your search results, look at a few records in the abstract or detailed record formats. The search terms will be highlighted and you will be able to see what the system did to your search. You should find that you now have the tools to modify your search as needed.

As with all of the services offered by Engineering Information, if you have any questions you can contact me at librarian@ei.org.

Until next time,

Karen Berryman
Staff Librarian
Engineering Information

BACK TO TOP

 

 

ABOUT Ei | SUBSCRIBE | UNSUBSCRIBE | CONTACT US

Ei UPDATE Issues Archive

Visit Engineering Information at
www.ei.org

Copyright Elsevier Engineering Information Inc. 2004

You are receiving this newsletter because you are an Engineering Information customer or because you have requested information.