có một mô hình chung của các mã zip tăng từ Đông sang Tây. Các mã mà bắt đầu với 0 là ở New England và Puerto Rico, những người bắt đầu với 9 là trên bờ biển phía tây. Điều này cho thấy một chức năng khoảng cách xấp xỉ khoảng cách địa lý bằng cách nhìn vào các chữ số thứ tự cao của mã zip. | 278 Chapter 8 Furthermore there is a general pattern of zip codes increasing from East to West. Codes that start with 0 are in New England and Puerto Rico those beginning with 9 are on the west coast. This suggests a distance function that approximates geographic distance by looking at the high order digits of the zip code. dzip A B if the zip codes are identical dzip A B if the first three digits are identical . 20008 and 20015 dzip A B if the first digits are identical . 95050 and 98125 dzip A B if the first digits are not identical . 02138 and 94704 Of course if geographic distance were truly of interest a better approach would be to look up the latitude and longitude of each zip code in a table and calculate the distances that way it is possible to get this information for the United States from . For many purposes however geographic proximity is not nearly as important as some other measure of similarity. 10011 and 10031 are both in Manhattan but from a marketing point of view they don t have much else in common because one is an upscale downtown neighborhood and the other is a working class Harlem neighborhood. On the other hand 02138 and 94704 are on opposite coasts but are likely to respond very similarly to direct mail from a political action committee since they are for Cambridge MA and Berkeley CA respectively. This is just one example of how the choice of a distance metric depends on the data mining context. There are additional examples of distance and similarity measures in Chapter 11 where they are applied to clustering. When a Distance Metric Already Exists There are some situations where a distance metric already exists but is difficult to spot. These situations generally arise in one of two forms. Sometimes a function already exists that provides a distance measure that can be adapted for use in MBR. The news story case study provides a good example of adapting an existing function the relevance feedback .