Seems I always write one of these when I need it and make little tweaks.
Looking for a better function for splitting a name into it’s components.
Any suggestions?
Seems I always write one of these when I need it and make little tweaks.
Looking for a better function for splitting a name into it’s components.
Any suggestions?
You’re not talking about a filename, I presume? (otherwise, FNSplit is good for that)
Sorry I wasn’t clear…
Correct - People Names
Hi Paul
This is not as simple as it first seems.
eg. if you split on space and say last word is the family name then what about names like:
Mc Donald
von Trapp
and people who put SNR or Jnr at the end, or Roman numerals as with “Loudon Wainwright III”, or MD or PhD or DDS and so on. Less common these days is “Esq” or “Esquire” but it still might crop up.
and what about some asian countries where the FIRST name is the family name?
and sometimes there are “double barrel” surnames, It is easy if they have a hyphen between them (as they go together as one word if splitting on space) but this is not always the case.
If you look at Force Capitalized String - #7 by vitesse you will see links to Clarion Magazine article by Mike Hanson that is “related” in that it is trying to capitalize names. Sure that is not what you are doing but it is related in the sense of breaking a string into components and handling some common exceptions.
Don’t forget a name may have a title at the front Mr Mrs Ms Miss Mstr Mdm Dr Sir Lord and so on.
Once you work out a “spec” the mechanics of implementation are the easy part especially if you use something like StringTheory split(). st.split() puts the component parts into a queue and you can group them accordingly into your separate fields.
Just to get you going here is something I knocked up. I’m sure it can be improved but it can serve as a starting point. Add extra prefixes, suffixes, and “joining words” in the strings where indicated, or tailor to your local requirements.
usage would be something like:
inName string(150),auto
outPrefix string(10), auto
outFirstName string(50), auto
outMiddleNames string(100),auto
outFamilyName string(50), auto
outSuffix string(10), auto
code
inName = 'Sir Paul Elvis Mac Farlane PhD'
splitName(inName, outPrefix, outFirstName, outMiddleNames, outFamilyName, outSuffix)
message(outPrefix & '|' & outFirstName & '|' & outMiddleNames & '|' & outFamilyName & '|' & outSuffix)
(your field names would likely be in your record buffer)
anyway see how you go
cheers for now…
SplitName PROCEDURE (*string inName,*string outPrefix,*string outFirstName,*string outMiddleNames,*string outFamilyName,*string outSuffix) ! Declare Procedure
st stringTheory
x long,auto
prefix string('|mr|miss|mrs|ms|mdm|madam|mister|master|sir|captain| and so on....')
suffix string('|jr|jnr|sr|snr|ii|iii|iv|v|vi|vii|viii|ix|x|md|phd|esq| and so on....')
joinedNames string('|von|van|de|da|la|le|los|las|o|mc|mac| and so on....')
CODE
st.setValue(inName,st:clip)
st.split(' ')
st.removeLines()
! look for prefix
st.setValue(st.getLine(1))
if st.endsWith('.') then st.tail(1). ! remove dot at end
if instring('|' & lower(st.getValue()) & '|',prefix,1,1) ! is it a prefix?
outPrefix = st.getLine(1)
st.deleteLine(1)
else
outPrefix = ''
end
! look for suffix
st.setValue(st.getLine(st.records()))
if st.endsWith('.') then st.tail(1). ! remove dot at end
if instring('|' & lower(st.getValue()) & '|',suffix,1,1) ! is it a suffix?
outSuffix = st.getLine(st.records())
st.deleteLine(st.records())
else
outSuffix = ''
end
! look for "joined" names - combine these with following word
loop x = st.records()-1 to 1 by -1
st.setValue(st.getLine(x))
if st.endsWith('.') then st.tail(1). ! remove dot at end
if instring('|' & lower(st.getValue()) & '|',joinedNames,1,1)
st.setLine(x,st.getLine(x) & ' ' & st.getLine(x+1))
st.deleteLine(x+1)
end
end
! if first name ends in comma, take it as format: FamilyName, Given Names
st.setValue(st.getLine(1))
if st.endsWith(',')
st.tail(1) ! remove last char
outFamilyName = st.getValue()
st.deleteLine(1)
else
outFamilyName = st.getLine(st.records()) ! use last token as family name
st.deleteLine(st.records())
end
outFirstName = st.getLine(1)
st.deleteLine(1)
st.join(' ')
outMiddleNames = st.getValue()
I also found this an interesting short article
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
Thank you - that’s great !
…and don’t forget to consider how single names should be handled. This has come up a couple of times for me, even though the people were not really good enough to go by a single name.
In addition to what Vitesse said, don’t forget
In most situations I deal with, a single name would have to be ‘passed-off’ as a business name as it would be invalid as an individual’s name which requires a first and last at minimum. Not my rules.