Improve patron address parser to better map data to the patron_record_address table
Sierra has a mechanism for parsing the free text entered into patron address fields in the client into separate fields for elements such as city/state/postal code that exist in the patronrecordaddress table. Those distinct fields may then be searched in create lists or utilized in SQL queries. However, the algorithm used to parse out those data points is extremely rigid in terms of the data it expects to see and is quite prone to mapping data incorrectly if an address entered into a patron record does not conform to those guidelines.
For one common example in our system if an extended zip code is entered in any format other than #####-#### (say using a space instead of the hyphen, or just including a space before or after the hyphen for that matter) then the city/region/postal_code fields will all be lumped together in the city field, leaving the region and postal_code as NULL. Similar issues can occur if extra lines are used to separate out these elements in the client.
This leads to issues with searching patron records in create lists, and can lead to quite difficult query needs in SQL (not to mention an enormous amount of confusion for any third party vendors that are expected to work with query results).
The parser should be improved to at least account for some of these more common data entry scenarios.
-
MEEP candidate for the Sierra 6.7 release*
* This idea will require implementation across multiple Sierra releases.Functional Requirements (what does it need to do?)
● Accepts ZIP+4 formats: #####-####, ##### ####, #####—####, ##### - ####, ##### - ####.
● Normalizes to #####-####.
● Ignores whitespace variations.
● City and region/state fields correctly populated.
● Identifies address_line1 and address_line2.
● Parses city, region/state, ZIP even if placed on separate lines.
● Supports no-comma entries using token pattern detection.
● Trims whitespace.
● Normalizes punctuation.
● Standardizes casing on city and region/state.
● Country detection logic configurable.
● Normalizes valid patterns.
● Parser generates confidence scores.
● Prompts display when below threshold.
● Quick-edit preview provided.
● Existing records unchanged.
● New parsing applies only on add/edit.
● Create Lists and SQL behaviors remain intact.
● Accepts Canadian postal codes (A1A 1A1).
● Accepts UK postal formats.
● Accepts Australian postal formats.
● Accepts New Zealand postal formats.
● Support plug-in pattern libraries for more countries.
● Bulk reparse tool for legacy addresses.(Edited by admin) -
Jeremy Goldstein
commented
After a bit more experimenting I found another common occurrence that can lead to similar behavior. The parser seems to want the city and state to share a line of the address, and to either be separated by a comma, or for the state abbreviation to have both letters capitalized.
"Boston, Ma" works as does "Boston MA". However "Boston ma" or "Boston Ma" does not.
-
Jennifer Nicolotti
commented
I agree with all of the points Jeremy makes and more. We allow customers to self-register and we also have over 300 staff members that create customer records manually. One simple typo in the address fields can throw off the entire address and make it unusable both searching within Sierra, as well as sharing this data with 3rd party applications who can't parse our data because of the errors in the format.
-
Philip McNulty
commented
Having a sound foundation for our patron data is important in a variety of our functions, from reporting, to 3rd party integrations, to patron communications. Sierra's weirdly flexible patron input forms require strong back-end tools to make sure this data is correct, and this idea will improve those tools.
-
Ruth Souto
commented
The normalization of zip codes is important to ensure accuracy in the patron record and in the cases where notices are still physically mailed (we have quite a few that are) to ensure that the notice gets delivered by the USPS. Staff typos happen, if we could prevent those from slipping through, our patron database would be more useful and efficient.