Person Name Split Function?

Seems I always write one of these when I need it and make little tweaks.

Looking for a better function for splitting a name into it’s components.

Any suggestions?

You’re not talking about a filename, I presume? (otherwise, FNSplit is good for that)

Sorry I wasn’t clear…

Correct - People Names

Hi Paul

This is not as simple as it first seems.

eg. if you split on space and say last word is the family name then what about names like:

Mc Donald
von Trapp

and people who put SNR or Jnr at the end, or Roman numerals as with “Loudon Wainwright III”, or MD or PhD or DDS and so on. Less common these days is “Esq” or “Esquire” but it still might crop up.

and what about some asian countries where the FIRST name is the family name?

and sometimes there are “double barrel” surnames, It is easy if they have a hyphen between them (as they go together as one word if splitting on space) but this is not always the case.

If you look at Force Capitalized String - #7 by vitesse you will see links to Clarion Magazine article by Mike Hanson that is “related” in that it is trying to capitalize names. Sure that is not what you are doing but it is related in the sense of breaking a string into components and handling some common exceptions.

Don’t forget a name may have a title at the front Mr Mrs Ms Miss Mstr Mdm Dr Sir Lord and so on.

Once you work out a “spec” the mechanics of implementation are the easy part especially if you use something like StringTheory split(). st.split() puts the component parts into a queue and you can group them accordingly into your separate fields.

1 Like

Just to get you going here is something I knocked up. I’m sure it can be improved but it can serve as a starting point. Add extra prefixes, suffixes, and “joining words” in the strings where indicated, or tailor to your local requirements.

usage would be something like:

inName         string(150),auto
outPrefix      string(10), auto
outFirstName   string(50), auto
outMiddleNames string(100),auto
outFamilyName  string(50), auto
outSuffix      string(10), auto

  code
  inName = 'Sir Paul Elvis Mac Farlane PhD'
  splitName(inName, outPrefix, outFirstName, outMiddleNames, outFamilyName, outSuffix)
  message(outPrefix & '|' & outFirstName & '|' & outMiddleNames & '|' & outFamilyName & '|' & outSuffix)

(your field names would likely be in your record buffer)

anyway see how you go

cheers for now…

SplitName            PROCEDURE  (*string inName,*string outPrefix,*string outFirstName,*string outMiddleNames,*string outFamilyName,*string outSuffix) ! Declare Procedure
st  stringTheory
x   long,auto

prefix      string('|mr|miss|mrs|ms|mdm|madam|mister|master|sir|captain|  and so on....')
suffix      string('|jr|jnr|sr|snr|ii|iii|iv|v|vi|vii|viii|ix|x|md|phd|esq| and so on....')
joinedNames string('|von|van|de|da|la|le|los|las|o|mc|mac| and so on....')

  CODE

  st.setValue(inName,st:clip)
  st.split(' ')
  st.removeLines()

  ! look for prefix
  st.setValue(st.getLine(1))
  if st.endsWith('.') then st.tail(1).  ! remove dot at end
  if instring('|' & lower(st.getValue()) & '|',prefix,1,1)  ! is it a prefix?
    outPrefix = st.getLine(1)
    st.deleteLine(1)
  else
    outPrefix = ''
  end

  ! look for suffix
  st.setValue(st.getLine(st.records()))
  if st.endsWith('.') then st.tail(1).  ! remove dot at end
  if instring('|' & lower(st.getValue()) & '|',suffix,1,1)  ! is it a suffix?
    outSuffix = st.getLine(st.records())
    st.deleteLine(st.records())
  else
    outSuffix = ''
  end

  ! look for "joined" names - combine these with following word
  loop x = st.records()-1 to 1 by -1
    st.setValue(st.getLine(x))
    if st.endsWith('.') then st.tail(1).  ! remove dot at end
    if instring('|' & lower(st.getValue()) & '|',joinedNames,1,1)
      st.setLine(x,st.getLine(x) & ' ' & st.getLine(x+1))
      st.deleteLine(x+1)
    end
  end

  ! if first name ends in comma, take it as format:  FamilyName, Given Names
  st.setValue(st.getLine(1))
  if st.endsWith(',')
    st.tail(1)  ! remove last char
    outFamilyName = st.getValue()
    st.deleteLine(1)
  else
    outFamilyName = st.getLine(st.records()) ! use last token as family name
    st.deleteLine(st.records())
  end
  
  outFirstName = st.getLine(1)
  st.deleteLine(1)
  st.join(' ')
  outMiddleNames = st.getValue()
1 Like

I also found this an interesting short article
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Thank you - that’s great !

1 Like

…and don’t forget to consider how single names should be handled. This has come up a couple of times for me, even though the people were not really good enough to go by a single name.

In addition to what Vitesse said, don’t forget

  1. some countries don’t have surnames
  2. people don’t always have a first name (IE, H. Norman Schwarzkopf)
  3. people can have > 1 middle name

In most situations I deal with, a single name would have to be ‘passed-off’ as a business name as it would be invalid as an individual’s name which requires a first and last at minimum. Not my rules.