Clean email address display name "Niels <[email protected]>" to extract just [email protected]

Hi

I’m looking for an easy/smart way to clean an email address like this “Niels Larsen <[email protected]>”.
The result should end up being: “[email protected]
Of course I have Stringtheory :wink:

Any cool ideas?

Regards Niels

search to the left for first Space and then cut all characters before

Sorry the <> was missing (question edited).
Some different ways:
“Niels Larsen <[email protected]>”.
“Niels Larsen<[email protected]>”.
[email protected]”.

What is the problem to find < and > characters?

1 Like

That’s not the problem.
There can be many different characters around the address itself (or none).
In my head is about finding the clean address from the right and left side of the @.
Finding the first illegal email character on both sides of the @.

What about str.Between(’<’,’>’)

StringTheory won’t help you much here. I just threw together this class that does the trick. I’m sure there are examples that it wouldn’t handle, but it does the three that you provided.

Mike Hanson

             PROGRAM
 
EmailParser  CLASS,TYPE
Name           CSTRING(200),PROTECTED
User           CSTRING(100),PROTECTED
Domain         CSTRING(100),PROTECTED
Parse          PROCEDURE(STRING E)
GetName        PROCEDURE,STRING
GetUser        PROCEDURE,STRING
GetDomain      PROCEDURE,STRING
GetEmail       PROCEDURE,STRING
GetComplete    PROCEDURE,STRING
             END

             MAP
Test           PROCEDURE(STRING E)
             END

  CODE
  Test('Niels Larsen <[email protected]>')
  Test('Niels Larsen<[email protected]>')
  Test('[email protected]')
  Test('@justdomain.com')
  Test('Name <@justdomain.com>')
  Test('Life begins @ the hop! <life*[email protected]>')
  
Test PROCEDURE(STRING E)
EP     EmailParser
  CODE
  EP.Parse(E)
  MESSAGE('Input:<9><9>'& E &'|'           |
      &  '|Name  <9><9>'& EP.GetName()     |
      &  '|User: <9><9>'& EP.GetUser()     |
      &  '|Domain:  <9>'& EP.GetDomain()   |
      &  '|Email:<9><9>'& EP.GetEmail()    |
      &  '|Complete:<9>'& EP.GetComplete() |
      )


EmailParser.Parse PROCEDURE(STRING E)
DomainChars         EQUATE('abcdefghijklmnopqrstuvwxyz01234567890123456789.-')
UserChars           EQUATE(DomainChars &'!#$%&''*+/=?^_`{{|}~')
A                   BYTE,AUTO
L                   BYTE,AUTO
X                   BYTE,AUTO
  CODE
  CLEAR(SELF.Name)
  CLEAR(SELF.User)
  CLEAR(SELF.Domain)

  E = LEFT(E)
  L = LEN(CLIP(LEFT(E)))

  A = INSTRING('@', E, -1, L)
  IF A = 0 THEN RETURN.

  LOOP X = A+1 TO L+1
    IF X > L OR NOT INSTRING(LOWER(E[X]), DomainChars)
      SELF.Domain = CLIP(E[A+1 : X-1])
      BREAK
    END
  END

  LOOP X = A-1 TO 0 BY -1
    IF X = 0 OR NOT INSTRING(LOWER(E[X]), UserChars)
      SELF.User = CLIP(LEFT(E[X+1 : A-1]))
      BREAK
    END
  END
  
  IF X > 0 AND E[X] = '<'
    X -= 1
  END

  IF X > 0
    SELF.Name = CLIP(E[1:X])
  END

EmailParser.GetName PROCEDURE!,STRING
  CODE
  RETURN SELF.Name
  
EmailParser.GetUser PROCEDURE!,STRING
  CODE
  RETURN SELF.User
  
EmailParser.GetDomain PROCEDURE!,STRING
  CODE
  RETURN SELF.Domain
  
EmailParser.GetEmail PROCEDURE!,STRING
  CODE
  RETURN SELF.User &'@'& SELF.Domain
  
EmailParser.GetComplete PROCEDURE!,STRING
  CODE
  IF SELF.Name <> ''
    RETURN SELF.Name &' <'& SELF.GetEmail() &'>'
  ELSE
    RETURN SELF.GetEmail()
  END
4 Likes

this can be done in two lines of StringTheory code - one to strip off the quotes and one to get the value between the angle brackets (if they are present)

st.setBetween('"','"')
st.setBetween('<<','>')

note that by default setBetween will not change the value if the start/end parameters are not found.

I would probably add a third line at the end in case there are leading or trailing spaces:

st.trim()

an alternative instead might be:

if st.setBetween(’<<’,’>’) = st:notFound
  st.setBetween('"','"')
end
st.trim()

hth

cheers

Geoff R

them’s fighting words :slightly_smiling_face:

Hi Mike, see my other message.

cheers

Geoff R

1 Like

Hi Mike

That’s exactly what I’m looking for. Had hoped I could use ST to do it in one line, but I couldn’t figure it out.
Your way of doing it is very universal and can be used in all sorts of contexts - even one like this “C <.> mpany <[email protected]>”

THANKS!

/Niels

Hi Geoff

I agree that it’s the easiest way, but I have seen all sorts of combinations where this one is not quite as safe as I need. For example like this: “C<.>mpany <[email protected]>”
Thanks for your thoughts and ideas.

/Niels

hey Niels - you are changing the spec! :grimacing: :slightly_smiling_face:

I agree that Mike’s way is definitely more robust and likely to handle more cases other than the initial three formats presented - but obviously at the price of some complexity.

I like the way Mike first finds the @ then gets the Domain to the right and the User (and perhaps Name) to the left.

I am so used to doing everything in ST these days it does feel a bit strange to see “old school” coding like Mike’s here (and also sometimes some of Carl’s stuff).

one suggestion - look at the lines:

L = LEN(CLIP(LEFT(E)))
LOOP X = A+1 TO L+1

here you could lose characters off the end of the domain if there are leading spaces before the email address…

probably better to

E=LEFT(E)

right at the start before searching for ‘@’

then just

L = LEN(CLIP(E))

and later on where there is

SELF.Name = CLIP(LEFT(E[1:X]))

you would then just say

SELF.Name = CLIP(E[1:X])

hth and cheers for now

Geoff R

1 Like

Good catch! I added that LEFT thing just as a matter of course, and didn’t realize the implications. I’ll adjust my class accordingly.

FYI, I’ve edited my original post to reflect the errant use of LEFT.

Mike Hanson

This Wikipedia article on Email Addresses is quiet good and covers validation. It also has links to all the RFC’s. I wonder if an @ could be outside the <Angel Brackets> e.g. "Carl @ Google <[email protected]>" ?

Based on that Wiki I had more special characters for the local part:

EQ:Local     EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.')
EQ:Separator EQUATE('@')
EQ:Domain    EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.')

Hi

Thanks for all the great posts

This is my final approach:

CleanEmail          PROCEDURE  (STRING pEmail)            ! Declare Procedure
st                      StringTheory
A                       LONG
E                       LONG
S                       LONG
EQ:Name                 EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.')
EQ:At                   EQUATE('@')
EQ:Domain               EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.')
    CODE
        st.SetValue(pEmail)
        !Find most right @
        A = st.Instring(EQ:At,-1)
        IF A
            !Find domain
            LOOP T# = A+1 TO st.Length()
                IF NOT st.ContainsA(EQ:Domain,st.Sub(T#,1))
                    BREAK
                END
                E = T#
            END
            !Find name
            LOOP T# = A-1 TO 1 by -1
                IF NOT st.ContainsA(EQ:Name,st.Sub(T#,1))
                    BREAK
                END
                S = T#
            END
            RETURN st.Sub(S,E-S+1)
        END
        RETURN('')
1 Like

Hi Niels

nice to see a solution using ST :grinning:

I have taken the liberty of optimizing your code.

a couple of points

  • I generally don’t like implicits so I removed your T#

  • At one stage I thought to pass the email address by reference rather than by value. ie. (*STRING pEmail) We can safely do that as we are not altering the pEmail value at all. It takes 8 bytes (a long for address and a long for length) which is particularly advantageous when you are dealing with large strings. But in the case of email addresses the strings would not be very large so it may be of minimal value and it does restrict how you can call it so I ended up leaving it passed by value.

  • I added “auto” to A, E and S as the first thing that they are used for assigns a value to them. Minor difference but every little bit adds up.

  • I added “static,thread” to the ST object so that it didn’t have to be initialized and later cleaned up on each call. This is a bit like a threaded global but limited in scope to this procedure so safer. Note when you enter the procedure st will have the value from the previous call - this is not a problem here as the first thing we do is st.setValue() - however if instead you did st.append then you should st.free() at the top of the procedure to clear the buffer. Also note you would NOT add this “static,thread” if a procedure was recursive as, in that case, each instance needs its own separate object. An alternative to this is to pass in a temporary “scratch” or “worker” ST object which is an approach Bruce is using a bit these days in some of his Capesoft stuff.

  • I used st.containsChar rather than st.containsA

  • on the loops I used the character values directly using st.valuePtr rather than st.sub. Sub is definitely safer in that it checks for “out of bounds” conditions but in these cases where you are constraining your loop bounds (to st.length() going forwards and to 1 going backwards) you are pretty safe using the direct approach.

  • I changed the EQUATEs to STRINGs. While using equates is generally considered good practice, I am much more comfortable using a STRING which I know will definitely be passed by reference not by value where that option is available.

  • the final return statement could either be by st.slice or again directly using valuePtr. As with the discussion above, slice is safer but going the direct route is faster. In this case it is not just the (tiny) overhead of the call to slice, but also how the compiler is able to efficiently return the string value by reference.

anyway an interesting exercise

cheers for now

Geoff R

CleanEmail           PROCEDURE  (STRING pEmail)
st                      StringTheory,static,thread
A                       LONG,auto  ! @ char pos
E                       LONG,auto  ! end char pos
S                       LONG,auto  ! start char pos
EQ:Name                 STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.')
EQ:At                   STRING('@')
EQ:Domain               STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.')
 CODE
 st.SetValue(pEmail)
 !Find most right @
 A = st.Instring(EQ:At,-1)
 if ~A then return ''.
 !Find domain
 loop E = A+1 TO st.Length()
   if ~st.ContainsChar(st.valuePtr[E],EQ:Domain) then break.
 end
 !Find name
 loop S = A-1 to 1 by -1
   if ~st.ContainsChar(st.valuePtr[S],EQ:Name) then break.
 end
 return st.valuePtr[S+1 : E-1] ! or st.slice(S+1,E-1)
1 Like

As you renamed the topic, now it is obvious that finding rightmost last angle brackets is enough.

One other interesting thing to note about EQUATEs: Each time you use an equate in code, the compiler injects a copy of the EQUATE value, not a reference back to the original EQUATE. Therefore, if you have a largish EQUATE string and you’re mentioning it often in code, then the size of your DLL/EXE can be significantly impacted. In that situation, I sometimes choose to use a STRING instead.

For this reason, I wish Clarion supported a constant attribute on a variable:

AtChar STRING('@'),CONST

Similarly, I wish we could use that in procedure prototypes:

This PROCEDURE(CONST STRING Parm)
That PROCEDURE(CONST &STRING RefParm)

The compiler would automatically warn if you attempted to set the const variable to a new value. It would also complain if you tried to pass a constant variable as a reference parameter, if it didn’t also have a corresponding CONST attribute in the prototype.

This is present in C# (including both const and readonly), but Clarion# (when it was available) didn’t offer the feature.

Submitted as a feature request?