Old School Parsing of a file name and path for validation

Some of the UNC problems I’ve encountered in the past were related to whether a “null session” was supported by the server, so we made sure to use an authenticated account.

1 Like

What’s wrong with the directory name '..' ?
What’s wrong with UNC names?

Hi also

I’ve really no idea.

If you back up the thread and read it, you will see in my first version (which was done very quickly), I first made a short “spec” to work from. And I also stated that if the spec was wrong then obviously the resulting code would also be wrong.

The reason to do this was to provide an example of an alternative approach to Owen’s FSM using StringTheory as ST had been mentioned but there was a criticism that no code had been supplied.

I’m kind of regretting going down this rabbit hole. (I woke up in the middle of the night last night realising a bug in my code and have now done an edit.)

As it turned out my initial “spec” was flawed and I forgot some important pieces like Owen’s reserved device names that apparently cannot be used in a “segment”.

My second version posted the following day (yesterday, my time) aimed to correct that and actually took me far longer as I effectively tried to “reverse engineer” Owen’s code to get a more accurate “spec” and finding differences along the way.

I mentioned yesterday:

My “specs” from yesterday did not make allowance for all dots ‘…’ that your code allows for - I have fixed that in this new version that now also checks for your ReservedDeviceNames.

It is possible that I have misunderstood what Owen’s code was doing in arriving at a “spec”. We all know one of the first requirements in developing a program is to have some kind of “spec” or idea of what you are trying to achieve to work from!

Anyway all of this was just to provide some context as to how and why my code was written.

TL;DR the spec might be wrong

I should add that perhaps one other reason to do the code this way was to demonstrate the expressiveness of ST.

Whorf’s hypothesis states, in essence, that your thought is shaped by your language. (Think: Eskimos and Snow)

One way to look at it is that StringTheory massively expands the vocabulary of your Clarion language, meaning that you start to think differently.

And just as people used to talk of RAD (like Clarion) having 10x productivity gains, I suggest something similar might apply (over the whole lifecycle) with using ST. ie. both reduced development time (including debugging) and reduced ongoing maintenance.

Of course it was always unlikely that Owen, on seeing my code, would have a “Road to Damascus” epiphany and decide to use ST but just as Owen stated “One of the reasons for the original article is to teach” then hopefully the same applies to my ST version.

Sorry I forgot to respond to this Owen and it is interesting that there has been a parallel discussion on “black boxes” (and the potential risks that they present) over on Skype. I just wanted to point out that ST comes with the source code so that anyone who is interested can look and possibly learn. IOW as with the ABC classes there is no black box - unlike, for example, the Clarion RTL.

  • '..' is a valid directory name but the IsValidFilePathGeoff function returns FALSE and sets none output variables
  • If the passed file name includes the drive letter, the function sets a variable passed as the DeviceName parameter. But if the name has the UNC format, the server name and the share name are joined to the path,
  • One of valid UNC format layouts is \\?\UNC\server\share\... but the ? character is in the list of invalid characters.

The specification I used to design my parser comes from Microsoft learn

Naming Files, Paths, and Namespaces - Win32 apps | Microsoft Learn

My understanding of \?\UNC\Server\share is that the path used may exceed the MAXFILEPATH limit of 260 characters, in fact 32767 characters are allowed. But you are bypassing the Windows Shell and it is possible to create a file name that the Windows Shell cannot interpret.

In my parser design, while I am not explicitly checking the length of the file path, I decided not to support the use of \?\ for this exercise

Cheers,

Owen Brunker

thanks also

you obviously know far more about this than me.

I did state way back

If you would like to give a corrected spec then I will do my best to implement it for you (and the world in general) as soon as I get a chance.

are four return fields enough?

if it is UNC what separates the “device” portion from the “path” portion?

if you give me a corrected spec I will do a third (and hopefully final) version.

Owen: at the top of this thread it says

This is the first time Owen has posted — let’s welcome them to our community!

So welcome! I apologize that your thread has gone awry and want to reassure you this is not usually the case.

It depends from your goal. For example, the filename extension has meaning only for invoking a default program working with corresponding content kind. Only files having some special meaning for the OS must have specific extensions. Therefore, in most cases the name part can be considered as a single entity without separating an extension.

Two major “external” forms of UNC names for files/directories are:
\\server\share\<path>\name
\\?\UNC\server\share\<path>\name
The second form allows more long names. Here the ‘share’ part can be a name of the volume, e.g. C$.
Major internal format using by Windows for naming different objects is similar but begins with \\.\ (back-slash, back-slash, dot, back-slash) characters. This “internal” format can be used in programs to name files too.

All specifications for files naming can be found in Windows documentation. It’s need to take into account some points during parsing of file names: whether named file/directory exists and whether name can be transformed during parsing. If file/directory has to be or can be created, any checks for path and name existence must be bypassed, only correctness of the name has value. If the name can’t be transformed (e.g. converted from relative to absolute), the parsing routine must handle special names like '..'.

If filename can be transformed, the question is how deep such normalization must be. In simplest case the call to LONGPATH is enough. The Clarion RTL in many cases is need in more strict analysis. For example, the same file can be named with different names, e.g. X:\something\file.tps and Y:\subdir\something\file.tps where X: is the mapping to \\server\share\dir\subdir and Y: is the mapping to \\server\share\dir. The TopSpeed driver is need to ignore attempts to include multiple copies of the physically same file into LOGOUT list even if they have different names. The RTL implements and exports the function which does the deep normalization of file names and breaks them to 4 strings: server/computer name, drive/share/volume name, path and name. This function first appeared in C60 and it performs more and more strict analysis from version to version. Whose who wish can find this function in the import list in the TopSpeed driver’s DLL and try to find its parameters.