Clean email address display name "Niels <[email protected]>" to extract just [email protected]

jslarve · November 20, 2021, 3:40pm

Nevermind - I see you’re getting the rightmost one now. Need coffee.

Hi Geoff -

What about emails that contain an ampersand in the name?

CarlBarnes · November 20, 2021, 4:45pm

You could make those character list equate strings STATIC to improve things. You would not have any CONST protection the EQUATE has, but that’s not an issue on this small amount of code.

EQ:Name                 STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.'), STATIC
EQ:Domain               STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.') ,STATIC

STATIC is the same as Global but local in scope i.e. only this procedure can "see’ them.

DotNet OOP is the coolest with what seems like every modifier and attribute possible. I like some of the VB.Net names as more obvious than C# that copies C++ e.g. MustInherit versus Abstract or NotInheritable versus Sealed.

jslarve · November 20, 2021, 5:52pm

I’m not seeing a difference in size of the .EXE when using the string/equate twice, whether it’s a STRING or EQUATE, nor am I seeing a copy of the data being created. Do you have an example of this happening?
Here’s my test, with debug turned off. With debug turned on, it seemed that the EQUATE was actually a little smaller.
When I disassembled my EXE, I saw labels being referenced. Not copies of the data. Maybe my example is too simple? junk.zip (3.1 KB)

CarlBarnes · November 20, 2021, 6:46pm

This is simple code that could be done without ST mainly by changing to INSTRING:

CleanEmail PROCEDURE(STRING pEmail)!,STRING
A  LONG,auto  ! @ char pos
E  LONG,auto  ! end char pos
S  LONG,auto  ! start char pos
EQ:Name   STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.'),STATIC
EQ:Domain STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.'),STATIC
 CODE
 A = Instring('@',pEmail,-1,SIZE(pEmail))  !Reverse (-1) Find right most @
 if ~A then return ''.
 !-- Find domain after name@ --
 loop E = A+1 TO SIZE(pEmail)
   if ~Instring(pEmail[E],EQ:Domain) then break.
 end
 !-- Find name before @domain --
 loop S = A-1 to 1 by -1
   if ~Instring(pEmail[S],EQ:Name) then break.
 end
 
 return pEmail[S+1 : E-1]        !'@' alone worst case is [0+1 : 2-1] which is valid

 return SUB(pEmail,S+1, E-1-S)   !Safer should the code be changed and bugs added

Must be careful using a [Slice] because if it is invalid it can GPF. I checked the above (that repeats Geoff’s slice code) and see no possible bad slice. The safe bet would be to RETURN SUB() mainly to protect against later code changes introducing a bug that results in a bad slice.

In the above “Clarion” code a disadvantage is INSTRING passes that EQ:Name and EQ:Domain as a STRING by Value made worse that this is done many times in a Loop of every character in the email string. There is a faster library function MemChr() that passes the list by address as *STRING. I checked and that is what ST Contains uses so ST is likely faster than the above.

I’m sure I have used MemChr in ClarionMag articles. It’s not a hard function to use, but since its a C function it returns a string address and not the Index as done by INSTRING. In this case we just want to know its present so checking Not Zero works.

  module('RTL')
    FindChar(*STRING ListOfCharacters, BYTE ValOfChar2Find, LONG Size_ListOfCharacters),LONG,RAW,NAME('_memchr'),DLL(DLL_Mode) !Returns Address of Byte in List
  end

CleanEmail2 PROCEDURE(STRING pEmail)!,STRING
A  LONG,auto  ! @ char pos
E  LONG,auto  ! end char pos
S  LONG,auto  ! start char pos
EQ:Name   STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&''*+-/=?^_`{{|}~.'),STATIC
EQ:Domain STRING('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.'),STATIC
 CODE                                                                     
 A = Instring('@',pEmail,-1,SIZE(pEmail))  !Reverse (-1) Find right most @
 if ~A then return ''.
 !-- Find domain after name@ --
 loop E = A+1 TO SIZE(pEmail)
   if ~FindChar(EQ:Domain,val(pEmail[E]),SIZE(EQ:Domain)) then break.
 end
 !-- Find name before @domain --
 loop S = A-1 to 1 by -1
   if ~FindChar(EQ:Name,val(pEmail[S]),SIZE(EQ:Name)) then break.
 end
 return pEmail[S+1 : E-1]        !'@' alone worst case is [0+1 : 2-1] which is valid
 return SUB(pEmail,S+1, E-1-S)   !Safer should the code be changed and bugs added

To understand this code (above and Geoff’s) one thing you must know about Clarion LOOP’s in a LOOP E=A+1 TO LEN() if no invalid character BREAK’s out of the LOOP it will END with E=LEN()+1. Some languages end with the Index variable Equal to the TO Limit i.e. E=LEN().

BoxSoft · November 20, 2021, 6:47pm

Thanks for checking this again, Jeff. I last looked into this some years back, so perhaps they’ve optimized the compiler since then. Or perhaps I had set up some type of equate that it couldn’t easily handle like that.

At the very least, it will need to create one copy per EXE/DLL, but that’s not a major deal. I did work on a system once that had over 200 APPs in the solution, but with that behemoth, repeated EQUATEs wasn’t their biggest problem.

CarlBarnes · November 20, 2021, 6:48pm

The 2 variables (Name and Domain) are only used once. Maybe repeat the Loop code so they are used more than once.

Also the compiler may be optimizing. If the (‘value’) is longer than 6 or 8 bytes seems like the compiler would make that static data and use the address.

jslarve · November 20, 2021, 7:17pm

Well, there are all kinds of scopes, and I was only checking one in my example. You could be right, but my example did not duplicate it

Perhaps you had the equates declared (or included) in multiple modules, and you saw them in memory that way?

jslarve · November 20, 2021, 7:18pm

Thanks Carl, I was just focusing on the one issue in my attached example. Not the address cleaner.

vitesse · November 21, 2021, 11:40am

Carl:

I thought overnight about adding STATIC to the list of valid characters and woke up to find you had already suggested it. And as it is constant there is no need for “thread”.

On your code

 loop E = A+1 TO LEN(CLIP(pEmail))
   if ~Instring(pEmail[E],EQ:Domain) then break.
 end

I would suggest

loop E = A+1 TO SIZE(pEmail)

as you will break out as soon as you hit a space anyway.

simple code that could be done without ST mainly by changing to INSTRING

sure any code that you can do using ST you could do also in “old school” Clarion. ST is, after all, written in Clarion. (Well OK the MD5 has some borrowed C code). But often with ST you are able to work at a higher level of abstraction without worrying about the lower-level implementation details.

one advantage of using ST is that it represents a tested and optimized code base. You are much more likely to add subtle bugs writing your own code.

I’m sure I have used MemChr in ClarionMag articles.

yes Carl you and I both gave an entry to one of those Clarion Mag code challenges/competitions where we both had very similar code using MemChr. I remember thinking “who is this guy who knows my inner secrets?”

st.containsChar is a wrapper around memChr that returns true if the char is found or false if it is not. This caters for people who write long-winded code like

if st.containsChar(‘:’) = true

rather than my preferred simpler

if st.containsChar(‘:’)

and in the case of someone using st.containsChar() they didn’t need to know about memchr because I already did it under the hood… Over time ST has been improved and optimized so that code you wrote using it 5 or 10 years ago will likely run faster today than it did back then whereas your own code probably hasn’t changed one bit.

talking of long-winded code, Bruce recently butchered all my ST code like:

if ~x

to be

if x = 0

which he considers easier to understand. Yuck!

what next?

if len(clip(myString)) = 0

rather than just

if ~myString

don’t laugh - I have seen that out in the wild.

Mike/Jeff/Carl the comments re equates just reinforces my preference to use strings instead.

Mike I like your proposal re CONST. At present you CAN use CONST but only where passing a parameter by reference. Having said that I am not sure that it really matters on passing by value:

This PROCEDURE(CONST STRING Parm)

as you can muck around with Parm without it affecting the calling procedure (as it is a copy that is thrown away when done).

and while I have your attention Mike and we are talking about changing a passed by value parameter:

E = LEFT(E)
L = LEN(CLIP(LEFT(E)))

you can get rid of the left() on the second line now.

one final thought - rather than searching for characters in a list, it might be faster to do a lookup on an array.

years ago in another of those Clarion Mag competitions, Gordon Smith was checking for alpha characters using something like

AlphaCharMap_       string('<0>{64}<1>{26}<0>{6}<1>{26}<0>{133}'), static
AlphaCharMap        byte, dim(255), over(AlphaCharMap_)

then

if AlphaCharMap[val(myChar)]

of course that assumes that there was not a null char (‘<0>’) which in this case was fine as he was using a Cstring where the null indicates end of string. Otherwise you would probably add 1 and have an array of the full 256 characters:

AlphaCharMap_       string('<0>{65}<1>{26}<0>{6}<1>{26}<0>{133}'), static
AlphaCharMap        byte, dim(256), over(AlphaCharMap_)

then

if AlphaCharMap[val(myChar)+1]

I did wonder at the time if it was faster than just

case val(myChar)
of   65 to  90 ! A-Z
orof 97 to 122 ! a-z
 ....

but in the case where you have lots of different chars as we have with EQ:Name then perhaps it is worth checking out.

Incidently more recently I saw Jeff Slarve was doing something similar in his CSV stuff with LegalChars:

github.com

jslarve/CSVParseClass/blob/main/JSCSVParseClass.clw#L366


      
            RETURN SELF.ColumnCount

          

          !------------------------------------------------------------------------------------------------------------------------------------------------------

          !!! <summary>Retrieve the label of a column</summary>

          !!! <param name="pColumn">Column where label is located</param>

          !!! <returns>A STRING</returns>

          !======================================================================================================================================================

          JSCSVParseClass.GetColumnLabel        PROCEDURE(LONG pColumn,BYTE pForClarion=FALSE)!,STRING

          ReturnName    CSTRING(61)  

          Ndx1          LONG

          LegalChars    STRING('_{48}0123456789:_{6}ABCDEFGHIJKLMNOPQRSTUVWXYZ_{6}abcdefghijklmnopqrstuvwxyz_{5}E__f_{10}Z_{15}zY_{5}Y_{26}AAAAAA_CEEEEIIIIDNOOOOOx0UUUUYPBaaaaaa_ceeeeiiiionooooo_ouuuuypy')

            CODE

          

            CASE pColumn

            OF 1 TO SELF.ColumnCount

            ELSE

              RETURN ''

            END

            GET(SELF.ColumnDefQ,pColumn)

            ReturnName = CHOOSE(NOT ERRORCODE(),SELF.ColumnDefQ.Name,'')   

            IF NOT pForClarion

and he has a nice KeepChars function along the same lines.

anyway enough for now - we’ve probably given Niels more than he bargained for

cheers

pinsard · November 25, 2021, 6:08pm

HECK! For a moment I experienced the joy of crossing ways with Neils Jensen.

Well… thanks for the brief moment of a hope, though.

jslarve · November 25, 2021, 6:26pm

Good to see you around, Gustavo

pinsard · November 25, 2021, 6:42pm

Geez. It’s been such a long time but I already got greeted by a heavy name? Cool.

Thanks, Jeff. I will try my best no be the prick I used to be.