Validating email addresses

RchdR · March 26, 2026, 12:04am

There are times when not to use RegEx’s.

For example, the common RegEx for valid email addresses doesnt work.

Not every mail server conforms to the standards, so the easiest thing I did in the past was to have a button beside the email address entry field and that button used Nettalk to start sending an email. If the mailserver hadnt rejected the email, I backed out of sending the rest of the email because I had my answer.

It was also used to track whether staff were still working for a company as well, but the EU have since ruled that named personal company email addresses have to be treated like private email address, so GDPR rules apply with the surveillance but I dont know how web trackers get around that…

So yes I do agree there are time’s to not use RegEx’s, but I use it mainly to replace InString.

vitesse · March 26, 2026, 10:27am

check out this one!

RchdR · March 26, 2026, 11:28am

Like I said, sometimes its better to click a button and see if the mail server in question responds with an existence acknowledgement or not.

LANSRAD · March 26, 2026, 12:33pm

You’re probably going to like what’s coming in vuMailKit.

We’ve built a new auto-detect setup flow where the user enters their email address and vuMailKit figures out the likely server settings automatically. It can recognize when OAuth is required, and for less obvious providers our deeper scan can usually detect the mail server, ports, and security settings too.

Because let’s be honest, when customers don’t know those settings, they don’t call their mail host first, they call you. And that usually turns into unpaid support.

We’re trying very hard to remove the pain, the guesswork, and a lot of that unnecessary support burden from email setup.

RchdR · March 26, 2026, 6:42pm

DNS holds a lot of information now but it generally won’t reveal specific client settings like POP3/IMAP port numbers or SSL/TLS requirements unless you scan the reg settings for popular existing email programs or can get the reg settings from servers running mail servers.

Key DNS Records for Finding Mail Settings:

MX (Mail Exchange) Records: Identify the hostname(s) of the servers responsible for receiving incoming mail.
SPF (Sender Policy Framework) Records (TXT): Define which IP addresses or hosts are authorized to send email on behalf of a domain.
A Records: Link the mail server hostname (found in the MX record) to an actual IP address.
DKIM (DomainKeys Identified Mail) Records (TXT): Used to authenticate the domain's email sending identity

LANSRAD · March 26, 2026, 6:49pm

Exactly. DNS can provide useful clues, but it usually does not hand you the full client configuration.

That is why vuMailKit uses a fairly sophisticated six-stage detection process instead of depending on any one source. That layered approach is what gives us such a high success rate. Most accounts are detected in under a second, but in the occasional difficult case the deeper scan may take a bit longer, and in rare situations it can take two or three minutes to fully sort out the settings.

We built it that way because every time a customer cannot figure this stuff out, the developer ends up providing free support.

seanh · March 26, 2026, 10:56pm

See, THAT’S why I don’t do regex

vitesse · March 27, 2026, 2:00am

ha ha - what’s wrong? - it is perfectly understandable at a glance

seriously though, that is a really extreme case and usually regex are smaller and more readable by humans.

as an aside, the VitRegex manual has this section which starts with emails:

===============================================================================
7. COMMON PATTERNS LIBRARY
===============================================================================

This section provides regex patterns for common use cases.
Remember to double < { ' in Clarion string literals (see Appendix H).

As well as the following examples, it is worth having a look at:

"a curated list of useful regular expressions for different programming languages."

https://uibakery.io/regex-library

-------------------------------------------------------------------------------
7.1 EMAIL VALIDATION
-------------------------------------------------------------------------------

Basic Email:
  pattern = '\w+@\w+\.\w+'

  Matches: 'user@example.com'
  Limitations: Doesn't handle all valid email formats

Standard Email:
  pattern = '[\w.%+-]+@[\w.-]+\.[a-zA-Z]{{2,}'

  Matches: 'user.name+tag@example.co.uk'
  More comprehensive

With Capture Groups:
  pattern = '([\w.%+-]+)@([\w.-]+\.[a-zA-Z]{{2,})'

  Group 1: Local part (before @)
  Group 2: Domain

Named Groups:
  pattern = '(?<<local>[\w.%+-]+)@(?<<domain>[\w.-]+\.[a-zA-Z]{{2,})'

  Extract: regex.getNamedGroup('local')
           regex.getNamedGroup('domain')

Example Usage:
  text.setValue('Contact: john.doe@example.com')
  if regex.match(text, '\w+@\w+\.\w+', 1)
    email = regex.getGroup(0)
    Message('Email: ' & email)
  end

Complex Version:
  pattern = |
   '^[a-z0-9!#$%&''*+/=?^_`{{|}~-]+(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*@' & |
   '(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$'

  Gemini tells me that this pattern is the official W3C recommendation for
  validating email addresses (based on the RFC 5322 specification). It is
  highly robust and correctly handles edge cases like special characters in
  the local part and complex domain structures.

  See "8.1 Email Address Validation" in Section 8 PRACTICAL USAGE EXAMPLES for
  implementation discussion on this.

that then refers to section 8.1 in the manual:

===============================================================================
8. PRACTICAL USAGE EXAMPLES
===============================================================================

This section provides common real-world examples to demonstrate how to combine
different VitRegex features to solve everyday string parsing problems.

-------------------------------------------------------------------------------
8.1 Email Address Validation
-------------------------------------------------------------------------------

Validating an email address requires anchoring the pattern to both the start
and end of the string to ensure the entire string is evaluated, rather than
just a matching substring.

  st.setValue('test@example.com')

  ! ^       = Start of string
  ! [\w\.\-]+ = One or more word characters, dots, or hyphens
  ! @       = Literal @ symbol
  ! \.      = Literal dot
  ! [\w]{{2,4}$ = 2 to 4 word characters at the end of the string

  if regex.findPos(st, '^[\w\.\-]+@[\w\-]+\.[\w]{{2,4}$', 1) > 0
    LogIt('Valid Email Address')
  else
    LogIt('Invalid Email Address')
  end

Earlier in Section 7 (Common Patterns Library) in 7.1 on email validation
we had a more complex pattern that is the official W3C recommendation for
validating email addresses (based on the RFC 5322 specification). It looked
something like this:

  pattern = |
   '^[a-z0-9!#$%&''*+/=?^_`{{|}~-]+(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*@' & |
   '(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$'

  ! The pattern is joined across two lines for readability.
  ! Remember that ' { and < are doubled up in Clarion strings so here '' is
  ! used for the literal single quote and {{ is used for the literal left
  ! brace.  Do not confuse the backtick (or grave accent) ` with a single
  ! quote '.  According to the official internet standard for email addresses
  ! (RFC 5322), the backtick is a perfectly valid character to use in the
  ! "local part" (the username part before the @ symbol) of an email address.

  st.setValue('user.name+tag@subdomain.example.com')

  if regex.findPos(st, pattern, 1)
    message('Standard pattern matched: ' & regex.getGroup(0))
  else
    message('Standard pattern failed.')
  end

  Aside: As an alternative you could have used regex.match() that returns
         either the matched text or '' if no match.

  Anyway that pattern is fairly complex and hard to read/understand and one
  way to make this more readable (by humans) is to use extended mode.
  (see Section 5.5 for more information on extended mode)

  pattern = '(?x)                                                                        <13,10>' & |
    '^                                       # Start of string                           <13,10>' & |
    '[a-z0-9!#$%&''*+/=?^_`{{|}~-]+          # Local part: valid start chars             <13,10>' & |
    '(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*   # Local part: optional dot-separated blocks <13,10>' & |
    '@                                       # The @ symbol                              <13,10>' & |
    '(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+  # Domain: dot-separated subdomains          <13,10>' & |
    '[a-z0-9](?:[a-z0-9-]*[a-z0-9])?         # Domain: top-level domain                  <13,10>' & |
    '$                                       # End of string'

  st.setValue('user.name+tag@subdomain.example.com')

  if regex.findPos(st, pattern, 1)
    message('Extended pattern matched: ' & regex.getGroup(0))
  else
    message('Extended pattern failed.')
  end

  Of course if you are doing this in Clarion code, you could instead just put
  comments after the continuation character and not worry about extended mode:

  pattern = |
    '^'                                      & | Start of string
    '[a-z0-9!#$%&''*+/=?^_`{{|}~-]+'         & | Local part: valid start chars
    '(?:\.[a-z0-9!#$%&''*+/=?^_`{{|}~-]+)*'  & | Local part: optional dot-separated blocks
    '@'                                      & | The @ symbol
    '(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+' & | Domain: dot-separated subdomains
    '[a-z0-9](?:[a-z0-9-]*[a-z0-9])?'        & | Domain: top-level domain
    '$'                                        ! End of string'

CarlBarnes · March 28, 2026, 4:36pm

One email check that must be done is at data entry time to catch the user entering an obviously wrong email address. I do NOT think the best way to do that is to create a Single RegEx complicated expression. All that can do is tell the user the address Invalid (as seen above). This can leave a puzzle for user as to why specifically, and tech support, and maybe is kicked up to development (me) to answer why it is wrong.

This Email Validation topic covers some of what I am saying. The link goes to where I identify 4 specific things to tell the user, like there is no “@”, or two “@”''s, or double periods, etc. There are more.

A better example is this topic that requested a single RegEx expression to check is a Password met the requirements of

Minimum 8 characters, at least one uppercase letter, one lowercase letter, one number and one special character

I think the better way is to step through each requirement and tell the user exactly what is missing. This is example code, better would be to concatenate the errors into a single message.

    PasswordBad     PROCEDURE(STRING pwd),STRING 
        CODE
        IF LEN(CLIP(Pwd)) < 8 THEN                RETURN 'Minimum eight characters'.
        IF ~MATCH(Pwd,'[A-Z]',Match:Regular) THEN RETURN 'at least one uppercase letter '.
        IF ~MATCH(Pwd,'[a-z]',Match:Regular) THEN RETURN 'one lowercase letter '.
        IF ~MATCH(Pwd,'[0-9]',Match:Regular) THEN RETURN 'one number '          .
        IF ~MATCH(Pwd,'[^A-Z^a-z^0-9]',Match:Regular) THEN RETURN 'one special character '.

        IF INSTRING(CHR(32),CLIP(Pwd)) THEN  RETURN 'No Spaces'. !Suggested

        RETURN '' !Ok

In summary, if your trying to communicate to the User IMO it’s better to split the checks into parts so you can provide detailed errors instead of just saying “something is invalid”. I worked on software used by 1000’s of users, with 10 in Tech Support, you want error messages as helpful as possible.

vitesse · March 28, 2026, 8:16pm

yes I certainly agree. Remember you can use Capture Groups and Named Groups to easily split it up for you - and in the case of named groups they are self documenting.

So one possible strategy would be to use a general validation regex and if it was invalid then look at the named groups (ie. parts of the email address) individually.

but don’t get hung up on using regex for emails - there are many better use cases.