Data Exchange should automatically identify and fix MARC-8 records.
Data Exchange should determine if a file being loaded is in UTF-8 or MARC-8 format before loading. If it is MARC-8, it should be converted to UTF-8 before loading.
I want to be clear that I'm not talking about using the MARC bit that supposedly identifies if the character set is UTF-8. That cannot be trusted. I want a system like the character detect feature in MarcEdit, that will actually look at the characters in the incoming records and determine what character set they really are.
There are still a lot of MARC-8 records out there, and it is not easy for libraries to determine which format any given record file is. When MARC-8 records are loaded diacritics and special characters come out completely messed up, making those terms unsearchable and unreadable. Our Sierra server has a lot of these records, and they are still being added because libraries can't get vendors to provide UTF-8, and they don't know how to fix them before loading.
-
Kimberly Allen
commented
If possible, this would be great. We recently have had a horrendous time with a certain vendor's records which have nonsensical diacritics. We've yelled but little to no help from them. Updating locally would take time and it's disconcerting that the next overlay you get will completely undo that work. This would be a huge help.