
Inside the HRSA Datawarehouse, address records are run through a routine called “Geocoding”, which matches street addresses to geographic coordinate points. This is how a table of address records from the HRSA Datawarehouse can be shown on a map. This whole process is dependent on two main items, the quality of the input street addresses being geocoded and the background reference data being used to create the geographic coordinate points. When records are matched, they are done in a hierarchy using the street address as the “best match”, then the U.S.P.S ZIP codes are used to match at the ZIP code level if there is no exact street address, and if there still isn’t a match as a last resort, the process uses the U.S.G.S Populated places data to see if it can match the data on the City and State level (or Country for US Territories and Minor Islands). Once the data is past the street level of matching it, the geographic coordinate point becomes an approximate location. Using ZIP codes and City, State combinations are just a last resort method for showing that data on the map, but using this as an approximate geographic location is for display purposes only. This can happen when the quality of the input data doesn’t match the reference data. For example, P.O. Boxes are a very poor address to use for the geocoding process. There isn’t a reference geographic data set to use that can match P.O. Boxes to physical Street Addresses, so what generally happens is the data is then matched at the ZIP code level, therefore making it an approximate location. A bad street address is another scenario where this can occur. The street address will fall to the ZIP or City, State matching because it doesn’t match the address level data set. Usually in the HRSA Datawarehouse, if data is geocoded at the U.S.G.S Populated places level, it has either had bad addresses and ZIP codes in the data, or there isn’t any better reference data for us to use to improve the data’s spatial location.
These are the top reasons why approximate location is shown:
Address is a mail delivery address instead of a physical street address e.g. P.O. Box, Rural Route, etc.
Address is an intersection too vague to locate e.g. I-65 at 21st Street; 200 Stantonsburg Highway; 301 West Expressway 83 4 East
Address or some part of address does not exist
House number out of range;
No such street;
Too much abbreviation (e.g. “MLK” instead of “Martin Luther King”)
Unrecognizable alternate name (e.g. “1 Rockville Pike, Gaithersburg” )
Too many internal conflicts in address parts
Building number disagree with direction, zip, city, etc.
City name disagree with zip code;
Zip code does not exist;
Direction does not match;
Street type does not match;
Spelling errors;
Address is newer than reference data
Address match level or confidence level is below accepted threshold