Wednesday, August 15, 2012

Echo King: Result Ranking Unveiled

Echo King address 2012 BYU Conference on Family History and GenealogyEcho King presented “Searching” at the 2012 BYU Conference on Family History and Genealogy. King is director content product manager for Ancestry.

Relevance Ranking

“The way that we rank things [determines] the order in which we display things,” said King about how Ancestry decides the order of search results. Ancestry uses a system called relevance ranking to guess which results are most relevant to your search and display them first. As you proceed down the list, less and less relevant results are displayed.

“Behind the scenes we are doing all this scoring of different fields,” she said. “Different fields contribute differently.” A match on the last name is considered more important and is given a higher score than a match on given name. Date and place matches come next. Matches in other fields come last.

Scores are affected by partial matches. A misspelled name has a lower score. Estimated dates score lower. Anything inside a date range scores the same. For example, 1966 +/-1 gives the same score to 1965, 1966, and 1967.

There is a weakness in Ancestry’s ability to guess which results are the ones you are interested in. “The more fields that are indexed in a collection, the higher the score. That is not always a good thing,” said King. “That is why censuses are almost always at the top of the list.” City directories have very few fields, sometimes just first name, last name, and location. Lots of results with partial matches can show up higher than an exact match in a city directory, which has few fields.

Consequently, “don’t assume that you can do one search and find every record about a person,” King said. Consider two different search strategies. One is to start by specifying just basic information about an ancestor. Then iteratively add more information and examine the results. The other strategy is to start with everything you know about an ancestor and iteratively remove information. In either case, search specific collections where you feel your ancestor should appear.

There is another weakness you may encounter. “The drop down list [of locations] is not perfect. We’re working on improving that,” she said. If the location does not appear, she suggested putting the name in the keywords field. I always use the location field, since it allows entry of locations not in the list.

If you don’t want Ancestry ranking the results of your searches, the feature can be disabled by changing the view setting above the first result. Change the view from “Sorted by relevance” to “Summarized by Category.”

Ancestry has corrected a weakness you may have experienced in the past. Results outside a person’s lifespan are filtered. Results are filtered out that occur more than five years prior to birth or two years after death. If only the birth or only the death is specified, the person is assumed to have lived for 100 years.

Ancestry has fixed another weakness you may encounter: receiving results for the wrong country. Use the Collection Priority feature to limit results to a particular country. This setting is remembered and must be explicitly changed as desired. search typesSearch Types

“There are lots of different ways to search the records,” said King.

Global search searches all collections. Global search is performed from the home page or from the search page.

Category search searches a group of collections that have a common theme such as all immigration records or all U.S. census records. “Why would you want to search on a category?” she asked. “Because you get fields that are specific to a category of records.” For example, a category search of immigration records allows searching by the name of a ship. If a field is not present, use the keyword field. I’ve used this to get all the names on a single page of a census. There are two ways to locate a search category. Hover over the search menu. Or click search and pick from the list of categories along the right-hand side of the page.

Collection search searches a single record collection. “I often use collection specific searches as well,” said King.

Other Points

  • You can use a wildcard anywhere in a name, even at the beginning, but you must have at least three letters.
  • “It can take up to 45 days for [a correction] to show up in the search [index].”
  • If a census household has a mother-in-law with a different surname than the household head, Ancestry might add an alternate name for the wife because they think they know the wife’s maiden name.
  • To print a record, don’t use the browser print. Click on print just above the image.
  • Searching newspapers takes a different strategy because newspapers are indexed using OCR.

1 comment:

  1. "Ancestry uses a system called relevance ranking to guess which results are most relevant to your search and display them first. As you proceed down the list, less and less relevant results are displayed."

    Usually the most relevant results I get are on a 2d or 3rd page of results: exact spelling of name as entered, time period relevant to parameters entered, place relevant. After the results with wrong names, wrong time frames, wrong places.

    The search parameters prefers names with first-name initial of my target person (as middle initial of person with wrong gender) over exact spelling of first name (every Mary J. and Willard J. before my Jonathan) . . . and other wonkiness.

    NewSearch also makes it quite difficult to locate particular collections except for US Census items which has a one-page list of the enumeration years.