For example, the common RegEx for valid email addresses doesnt work.
Not every mail server conforms to the standards, so the easiest thing I did in the past was to have a button beside the email address entry field and that button used Nettalk to start sending an email. If the mailserver hadnt rejected the email, I backed out of sending the rest of the email because I had my answer.
It was also used to track whether staff were still working for a company as well, but the EU have since ruled that named personal company email addresses have to be treated like private email address, so GDPR rules apply with the surveillance but I dont know how web trackers get around that…
So yes I do agree there are time’s to not use RegEx’s, but I use it mainly to replace InString.
You’re probably going to like what’s coming in vuMailKit.
We’ve built a new auto-detect setup flow where the user enters their email address and vuMailKit figures out the likely server settings automatically. It can recognize when OAuth is required, and for less obvious providers our deeper scan can usually detect the mail server, ports, and security settings too.
Because let’s be honest, when customers don’t know those settings, they don’t call their mail host first, they call you. And that usually turns into unpaid support.
We’re trying very hard to remove the pain, the guesswork, and a lot of that unnecessary support burden from email setup.
DNS holds a lot of information now but it generally won’t reveal specific client settings like POP3/IMAP port numbers or SSL/TLS requirements unless you scan the reg settings for popular existing email programs or can get the reg settings from servers running mail servers.
Key DNS Records for Finding Mail Settings:
MX (Mail Exchange) Records: Identify the hostname(s) of the servers responsible for receiving incoming mail.
SPF (Sender Policy Framework) Records (TXT): Define which IP addresses or hosts are authorized to send email on behalf of a domain.
A Records: Link the mail server hostname (found in the MX record) to an actual IP address.
DKIM (DomainKeys Identified Mail) Records (TXT): Used to authenticate the domain's email sending identity
Exactly. DNS can provide useful clues, but it usually does not hand you the full client configuration.
That is why vuMailKit uses a fairly sophisticated six-stage detection process instead of depending on any one source. That layered approach is what gives us such a high success rate. Most accounts are detected in under a second, but in the occasional difficult case the deeper scan may take a bit longer, and in rare situations it can take two or three minutes to fully sort out the settings.
We built it that way because every time a customer cannot figure this stuff out, the developer ends up providing free support.
ha ha - what’s wrong? - it is perfectly understandable at a glance
seriously though, that is a really extreme case and usually regex are smaller and more readable by humans.
as an aside, the VitRegex manual has this section which starts with emails:
===============================================================================
7. COMMON PATTERNS LIBRARY
===============================================================================
This section provides regex patterns for common use cases.
Remember to double < { ' in Clarion string literals (see Appendix H).
As well as the following examples, it is worth having a look at:
"a curated list of useful regular expressions for different programming languages."
https://uibakery.io/regex-library
-------------------------------------------------------------------------------
7.1 EMAIL VALIDATION
-------------------------------------------------------------------------------
Basic Email:
pattern = '\w+@\w+\.\w+'
Matches: 'user@example.com'
Limitations: Doesn't handle all valid email formats
Standard Email:
pattern = '[\w.%+-]+@[\w.-]+\.[a-zA-Z]{{2,}'
Matches: 'user.name+tag@example.co.uk'
More comprehensive
With Capture Groups:
pattern = '([\w.%+-]+)@([\w.-]+\.[a-zA-Z]{{2,})'
Group 1: Local part (before @)
Group 2: Domain
Named Groups:
pattern = '(?<<local>[\w.%+-]+)@(?<<domain>[\w.-]+\.[a-zA-Z]{{2,})'
Extract: regex.getNamedGroup('local')
regex.getNamedGroup('domain')
Example Usage:
text.setValue('Contact: john.doe@example.com')
if regex.match(text, '\w+@\w+\.\w+', 1)
email = regex.getGroup(0)
Message('Email: ' & email)
end
Complex Version:
pattern = |
'^[a-z0-9!#$%&''*+/=?^_`{{|}~-]+(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*@' & |
'(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$'
Gemini tells me that this pattern is the official W3C recommendation for
validating email addresses (based on the RFC 5322 specification). It is
highly robust and correctly handles edge cases like special characters in
the local part and complex domain structures.
See "8.1 Email Address Validation" in Section 8 PRACTICAL USAGE EXAMPLES for
implementation discussion on this.
that then refers to section 8.1 in the manual:
===============================================================================
8. PRACTICAL USAGE EXAMPLES
===============================================================================
This section provides common real-world examples to demonstrate how to combine
different VitRegex features to solve everyday string parsing problems.
-------------------------------------------------------------------------------
8.1 Email Address Validation
-------------------------------------------------------------------------------
Validating an email address requires anchoring the pattern to both the start
and end of the string to ensure the entire string is evaluated, rather than
just a matching substring.
st.setValue('test@example.com')
! ^ = Start of string
! [\w\.\-]+ = One or more word characters, dots, or hyphens
! @ = Literal @ symbol
! \. = Literal dot
! [\w]{{2,4}$ = 2 to 4 word characters at the end of the string
if regex.findPos(st, '^[\w\.\-]+@[\w\-]+\.[\w]{{2,4}$', 1) > 0
LogIt('Valid Email Address')
else
LogIt('Invalid Email Address')
end
Earlier in Section 7 (Common Patterns Library) in 7.1 on email validation
we had a more complex pattern that is the official W3C recommendation for
validating email addresses (based on the RFC 5322 specification). It looked
something like this:
pattern = |
'^[a-z0-9!#$%&''*+/=?^_`{{|}~-]+(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*@' & |
'(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$'
! The pattern is joined across two lines for readability.
! Remember that ' { and < are doubled up in Clarion strings so here '' is
! used for the literal single quote and {{ is used for the literal left
! brace. Do not confuse the backtick (or grave accent) ` with a single
! quote '. According to the official internet standard for email addresses
! (RFC 5322), the backtick is a perfectly valid character to use in the
! "local part" (the username part before the @ symbol) of an email address.
st.setValue('user.name+tag@subdomain.example.com')
if regex.findPos(st, pattern, 1)
message('Standard pattern matched: ' & regex.getGroup(0))
else
message('Standard pattern failed.')
end
Aside: As an alternative you could have used regex.match() that returns
either the matched text or '' if no match.
Anyway that pattern is fairly complex and hard to read/understand and one
way to make this more readable (by humans) is to use extended mode.
(see Section 5.5 for more information on extended mode)
pattern = '(?x) <13,10>' & |
'^ # Start of string <13,10>' & |
'[a-z0-9!#$%&''*+/=?^_`{{|}~-]+ # Local part: valid start chars <13,10>' & |
'(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)* # Local part: optional dot-separated blocks <13,10>' & |
'@ # The @ symbol <13,10>' & |
'(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+ # Domain: dot-separated subdomains <13,10>' & |
'[a-z0-9](?:[a-z0-9-]*[a-z0-9])? # Domain: top-level domain <13,10>' & |
'$ # End of string'
st.setValue('user.name+tag@subdomain.example.com')
if regex.findPos(st, pattern, 1)
message('Extended pattern matched: ' & regex.getGroup(0))
else
message('Extended pattern failed.')
end
Of course if you are doing this in Clarion code, you could instead just put
comments after the continuation character and not worry about extended mode:
pattern = |
'^' & | Start of string
'[a-z0-9!#$%&''*+/=?^_`{{|}~-]+' & | Local part: valid start chars
'(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*' & | Local part: optional dot-separated blocks
'@' & | The @ symbol
'(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+' & | Domain: dot-separated subdomains
'[a-z0-9](?:[a-z0-9-]*[a-z0-9])?' & | Domain: top-level domain
'$' ! End of string'
One email check that must be done is at data entry time to catch the user entering an obviously wrong email address. I do NOT think the best way to do that is to create a Single RegEx complicated expression. All that can do is tell the user the address Invalid (as seen above). This can leave a puzzle for user as to why specifically, and tech support, and maybe is kicked up to development (me) to answer why it is wrong.
This Email Validation topic covers some of what I am saying. The link goes to where I identify 4 specific things to tell the user, like there is no “@”, or two “@”''s, or double periods, etc. There are more.
A better example is this topic that requested a single RegEx expression to check is a Password met the requirements of
Minimum 8 characters, at least one uppercase letter, one lowercase letter, one number and one special character
I think the better way is to step through each requirement and tell the user exactly what is missing. This is example code, better would be to concatenate the errors into a single message.
PasswordBad PROCEDURE(STRING pwd),STRING
CODE
IF LEN(CLIP(Pwd)) < 8 THEN RETURN 'Minimum eight characters'.
IF ~MATCH(Pwd,'[A-Z]',Match:Regular) THEN RETURN 'at least one uppercase letter '.
IF ~MATCH(Pwd,'[a-z]',Match:Regular) THEN RETURN 'one lowercase letter '.
IF ~MATCH(Pwd,'[0-9]',Match:Regular) THEN RETURN 'one number ' .
IF ~MATCH(Pwd,'[^A-Z^a-z^0-9]',Match:Regular) THEN RETURN 'one special character '.
IF INSTRING(CHR(32),CLIP(Pwd)) THEN RETURN 'No Spaces'. !Suggested
RETURN '' !Ok
In summary, if your trying to communicate to the User IMO it’s better to split the checks into parts so you can provide detailed errors instead of just saying “something is invalid”. I worked on software used by 1000’s of users, with 10 in Tech Support, you want error messages as helpful as possible.
yes I certainly agree. Remember you can use Capture Groups and Named Groups to easily split it up for you - and in the case of named groups they are self documenting.
So one possible strategy would be to use a general validation regex and if it was invalid then look at the named groups (ie. parts of the email address) individually.
but don’t get hung up on using regex for emails - there are many better use cases.