Hi
I’m looking for an easy/smart way to clean an email address like this “Niels Larsen <[email protected]>”.
The result should end up being: “[email protected]”
Of course I have Stringtheory
Any cool ideas?
Regards Niels
Hi
I’m looking for an easy/smart way to clean an email address like this “Niels Larsen <[email protected]>”.
The result should end up being: “[email protected]”
Of course I have Stringtheory
Any cool ideas?
Regards Niels
search to the left for first Space and then cut all characters before
Sorry the <> was missing (question edited).
Some different ways:
“Niels Larsen <[email protected]>”.
“Niels Larsen<[email protected]>”.
“[email protected]”.
What is the problem to find < and > characters?
That’s not the problem.
There can be many different characters around the address itself (or none).
In my head is about finding the clean address from the right and left side of the @.
Finding the first illegal email character on both sides of the @.
What about str.Between(’<’,’>’)
StringTheory won’t help you much here. I just threw together this class that does the trick. I’m sure there are examples that it wouldn’t handle, but it does the three that you provided.
Mike Hanson
PROGRAM
EmailParser CLASS,TYPE
Name CSTRING(200),PROTECTED
User CSTRING(100),PROTECTED
Domain CSTRING(100),PROTECTED
Parse PROCEDURE(STRING E)
GetName PROCEDURE,STRING
GetUser PROCEDURE,STRING
GetDomain PROCEDURE,STRING
GetEmail PROCEDURE,STRING
GetComplete PROCEDURE,STRING
END
MAP
Test PROCEDURE(STRING E)
END
CODE
Test('Niels Larsen <[email protected]>')
Test('Niels Larsen<[email protected]>')
Test('[email protected]')
Test('@justdomain.com')
Test('Name <@justdomain.com>')
Test('Life begins @ the hop! <life*[email protected]>')
Test PROCEDURE(STRING E)
EP EmailParser
CODE
EP.Parse(E)
MESSAGE('Input:<9><9>'& E &'|' |
& '|Name <9><9>'& EP.GetName() |
& '|User: <9><9>'& EP.GetUser() |
& '|Domain: <9>'& EP.GetDomain() |
& '|Email:<9><9>'& EP.GetEmail() |
& '|Complete:<9>'& EP.GetComplete() |
)
EmailParser.Parse PROCEDURE(STRING E)
DomainChars EQUATE('abcdefghijklmnopqrstuvwxyz01234567890123456789.-')
UserChars EQUATE(DomainChars &'!#$%&''*+/=?^_`{{|}~')
A BYTE,AUTO
L BYTE,AUTO
X BYTE,AUTO
CODE
CLEAR(SELF.Name)
CLEAR(SELF.User)
CLEAR(SELF.Domain)
E = LEFT(E)
L = LEN(CLIP(LEFT(E)))
A = INSTRING('@', E, -1, L)
IF A = 0 THEN RETURN.
LOOP X = A+1 TO L+1
IF X > L OR NOT INSTRING(LOWER(E[X]), DomainChars)
SELF.Domain = CLIP(E[A+1 : X-1])
BREAK
END
END
LOOP X = A-1 TO 0 BY -1
IF X = 0 OR NOT INSTRING(LOWER(E[X]), UserChars)
SELF.User = CLIP(LEFT(E[X+1 : A-1]))
BREAK
END
END
IF X > 0 AND E[X] = '<'
X -= 1
END
IF X > 0
SELF.Name = CLIP(E[1:X])
END
EmailParser.GetName PROCEDURE!,STRING
CODE
RETURN SELF.Name
EmailParser.GetUser PROCEDURE!,STRING
CODE
RETURN SELF.User
EmailParser.GetDomain PROCEDURE!,STRING
CODE
RETURN SELF.Domain
EmailParser.GetEmail PROCEDURE!,STRING
CODE
RETURN SELF.User &'@'& SELF.Domain
EmailParser.GetComplete PROCEDURE!,STRING
CODE
IF SELF.Name <> ''
RETURN SELF.Name &' <'& SELF.GetEmail() &'>'
ELSE
RETURN SELF.GetEmail()
END
this can be done in two lines of StringTheory code - one to strip off the quotes and one to get the value between the angle brackets (if they are present)
st.setBetween('"','"')
st.setBetween('<<','>')
note that by default setBetween will not change the value if the start/end parameters are not found.
I would probably add a third line at the end in case there are leading or trailing spaces:
st.trim()
an alternative instead might be:
if st.setBetween(’<<’,’>’) = st:notFound
st.setBetween('"','"')
end
st.trim()
hth
cheers
Geoff R
them’s fighting words
Hi Mike, see my other message.
cheers
Geoff R
Hi Mike
That’s exactly what I’m looking for. Had hoped I could use ST to do it in one line, but I couldn’t figure it out.
Your way of doing it is very universal and can be used in all sorts of contexts - even one like this “C <.> mpany <[email protected]>”
THANKS!
/Niels
Hi Geoff
I agree that it’s the easiest way, but I have seen all sorts of combinations where this one is not quite as safe as I need. For example like this: “C<.>mpany <[email protected]>”
Thanks for your thoughts and ideas.
/Niels
hey Niels - you are changing the spec!
I agree that Mike’s way is definitely more robust and likely to handle more cases other than the initial three formats presented - but obviously at the price of some complexity.
I like the way Mike first finds the @ then gets the Domain to the right and the User (and perhaps Name) to the left.
I am so used to doing everything in ST these days it does feel a bit strange to see “old school” coding like Mike’s here (and also sometimes some of Carl’s stuff).
one suggestion - look at the lines:
L = LEN(CLIP(LEFT(E)))
LOOP X = A+1 TO L+1
here you could lose characters off the end of the domain if there are leading spaces before the email address…
probably better to
E=LEFT(E)
right at the start before searching for ‘@’
then just
L = LEN(CLIP(E))
and later on where there is
SELF.Name = CLIP(LEFT(E[1:X]))
you would then just say
SELF.Name = CLIP(E[1:X])
hth and cheers for now
Geoff R
Good catch! I added that LEFT thing just as a matter of course, and didn’t realize the implications. I’ll adjust my class accordingly.
FYI, I’ve edited my original post to reflect the errant use of LEFT.
Mike Hanson
This Wikipedia article on Email Addresses is quiet good and covers validation. It also has links to all the RFC’s. I wonder if an @ could be outside the <Angel Brackets>
e.g. "Carl @ Google <[email protected]>"
?
Based on that Wiki I had more special characters for the local part:
EQ:Local EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.')
EQ:Separator EQUATE('@')
EQ:Domain EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.')
Hi
Thanks for all the great posts
This is my final approach:
CleanEmail PROCEDURE (STRING pEmail) ! Declare Procedure
st StringTheory
A LONG
E LONG
S LONG
EQ:Name EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.')
EQ:At EQUATE('@')
EQ:Domain EQUATE('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.')
CODE
st.SetValue(pEmail)
!Find most right @
A = st.Instring(EQ:At,-1)
IF A
!Find domain
LOOP T# = A+1 TO st.Length()
IF NOT st.ContainsA(EQ:Domain,st.Sub(T#,1))
BREAK
END
E = T#
END
!Find name
LOOP T# = A-1 TO 1 by -1
IF NOT st.ContainsA(EQ:Name,st.Sub(T#,1))
BREAK
END
S = T#
END
RETURN st.Sub(S,E-S+1)
END
RETURN('')
Hi Niels
nice to see a solution using ST
I have taken the liberty of optimizing your code.
a couple of points
I generally don’t like implicits so I removed your T#
At one stage I thought to pass the email address by reference rather than by value. ie. (*STRING pEmail) We can safely do that as we are not altering the pEmail value at all. It takes 8 bytes (a long for address and a long for length) which is particularly advantageous when you are dealing with large strings. But in the case of email addresses the strings would not be very large so it may be of minimal value and it does restrict how you can call it so I ended up leaving it passed by value.
I added “auto” to A, E and S as the first thing that they are used for assigns a value to them. Minor difference but every little bit adds up.
I added “static,thread” to the ST object so that it didn’t have to be initialized and later cleaned up on each call. This is a bit like a threaded global but limited in scope to this procedure so safer. Note when you enter the procedure st will have the value from the previous call - this is not a problem here as the first thing we do is st.setValue() - however if instead you did st.append then you should st.free() at the top of the procedure to clear the buffer. Also note you would NOT add this “static,thread” if a procedure was recursive as, in that case, each instance needs its own separate object. An alternative to this is to pass in a temporary “scratch” or “worker” ST object which is an approach Bruce is using a bit these days in some of his Capesoft stuff.
I used st.containsChar rather than st.containsA
on the loops I used the character values directly using st.valuePtr rather than st.sub. Sub is definitely safer in that it checks for “out of bounds” conditions but in these cases where you are constraining your loop bounds (to st.length() going forwards and to 1 going backwards) you are pretty safe using the direct approach.
I changed the EQUATEs to STRINGs. While using equates is generally considered good practice, I am much more comfortable using a STRING which I know will definitely be passed by reference not by value where that option is available.
the final return statement could either be by st.slice or again directly using valuePtr. As with the discussion above, slice is safer but going the direct route is faster. In this case it is not just the (tiny) overhead of the call to slice, but also how the compiler is able to efficiently return the string value by reference.
anyway an interesting exercise
cheers for now
Geoff R
CleanEmail PROCEDURE (STRING pEmail)
st StringTheory,static,thread
A LONG,auto ! @ char pos
E LONG,auto ! end char pos
S LONG,auto ! start char pos
EQ:Name STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.')
EQ:At STRING('@')
EQ:Domain STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.')
CODE
st.SetValue(pEmail)
!Find most right @
A = st.Instring(EQ:At,-1)
if ~A then return ''.
!Find domain
loop E = A+1 TO st.Length()
if ~st.ContainsChar(st.valuePtr[E],EQ:Domain) then break.
end
!Find name
loop S = A-1 to 1 by -1
if ~st.ContainsChar(st.valuePtr[S],EQ:Name) then break.
end
return st.valuePtr[S+1 : E-1] ! or st.slice(S+1,E-1)
As you renamed the topic, now it is obvious that finding rightmost last angle brackets is enough.
One other interesting thing to note about EQUATEs: Each time you use an equate in code, the compiler injects a copy of the EQUATE value, not a reference back to the original EQUATE. Therefore, if you have a largish EQUATE string and you’re mentioning it often in code, then the size of your DLL/EXE can be significantly impacted. In that situation, I sometimes choose to use a STRING instead.
For this reason, I wish Clarion supported a constant attribute on a variable:
AtChar STRING('@'),CONST
Similarly, I wish we could use that in procedure prototypes:
This PROCEDURE(CONST STRING Parm)
That PROCEDURE(CONST &STRING RefParm)
The compiler would automatically warn if you attempted to set the const variable to a new value. It would also complain if you tried to pass a constant variable as a reference parameter, if it didn’t also have a corresponding CONST attribute in the prototype.
This is present in C# (including both const and readonly), but Clarion# (when it was available) didn’t offer the feature.
Submitted as a feature request?