Is it time for FILE:MaxFilePath EQUATE(260) to change to FILE:MaxFilePath EQUATE(32767)?

anon23294430 · May 24, 2022, 10:41pm

So after reading this blog My take on “where’s all the code” (nullprogram.com) and noting a very good point namely the maxpath point, a little investigation and it seems the limit for maxpath has been removed but its still an opt in operation.
Maximum Path Length Limitation - Win32 apps | Microsoft Docs

Its seems even in C11, [root]\libsrc\win\equates.clw its still set to 260

FILE:MaxFileName EQUATE(256)
FILE:MaxFilePath EQUATE(260)

The maximum cstring size is 4MB or 4,000,000 bytes, more than enough space for a long filename, so is it time for …

FILE:MaxFileName EQUATE(256)
FILE:MaxFilePath EQUATE(32767)

?

Any thoughts?

Mark_Sarson · May 25, 2022, 8:26am

Hi Richard

I would hazard a guess that the problem isn’t solved just by altering the length, but also the Win32 calls that are executed under the hood.

There is a more detailed explanation of what I mean here: Maximum Path Length Limitation - Win32 apps | Microsoft Docs

Mark

anon23294430 · May 25, 2022, 8:45am

I see what you mean. eg
This can only do 260 characters
GetLongPathNameA function (fileapi.h) - Win32 apps | Microsoft Docs

where as this can do 32,767 characters
GetLongPathNameW function (fileapi.h) - Win32 apps | Microsoft Docs

I was aware of the Posix case sensitivity which is switched off by default in Windows, but if switched on can cause its own headaches.

FILE_FLAG_POSIX_SEMANTICS

0x01000000 Access will occur according to POSIX rules. This includes allowing multiple files with names, differing only in case, for file systems that support that naming. Use care when using this option, because files created with this flag may not be accessible by applications that are written for MS-DOS or 16-bit Windows.

Edit
So I havent been able to find a MS Blog listing affected API’s and I couldnt see anything on the webpage last night but I’ve just noticed from a couple of blogs that do list affected API’s and then low and behold the final section on the bottom of the MS website lists all the affected API’s. Could have sworn it wasnt there last night, but this isnt the first time MS webpages have been missing info and I’ve had to use other devices to track it down!

These are the directory management functions that no longer have MAX_PATH restrictions if you opt-in to long path behavior: CreateDirectoryW, CreateDirectoryExW GetCurrentDirectoryW RemoveDirectoryW SetCurrentDirectoryW.

These are the file management functions that no longer have MAX_PATH restrictions if you opt-in to long path behavior: CopyFileW, CopyFile2, CopyFileExW, CreateFileW, CreateFile2, CreateHardLinkW, CreateSymbolicLinkW, DeleteFileW, FindFirstFileW, FindFirstFileExW, FindNextFileW, GetFileAttributesW, GetFileAttributesExW, SetFileAttributesW, GetFullPathNameW, GetLongPathNameW, MoveFileW, MoveFileExW, MoveFileWithProgressW, ReplaceFileW, SearchPathW, FindFirstFileNameW, FindNextFileNameW, FindFirstStreamW, FindNextStreamW, GetCompressedFileSizeW, GetFinalPathNameByHandleW.

CarlBarnes · May 25, 2022, 7:26pm

Clarion 2 - 11.1 calls the Windows ANSI or “A” API. It is limited to 260 and will forever be limited. It is deprecated and is unlikely to be enhanced.

All this new stuff requires the Windows UNICODE or “W” API. Almost every new API in Windows since Vista only has a UNICODE prototype with no ANSI wrapper. If Clarion 12 delivers on UNICODE this will be possible but will require lots of code changes e.g. to USTRINGs for everything.

jslarve · May 26, 2022, 1:41am

I can only imagine disaster in changing this equate. Plus, a lot of already compiled stuff has it baked in.

Create a new equate and there is peace in the world.

The existing size is already larger than some programs support. Not sure about the current version, but Excel only supported 218 not long ago.

anon23294430 · May 26, 2022, 4:09am

I suppose if it was easy, MS would have made the Ansi api’s wrappers to the Wide api’s.

This is sounding like maintaining an old country pile (slang for real estate) where backward’s compatibility comes back to bite you so to speak. I guess search and replace hasnt reached some quarters and fuzzer’s are a distant future.

vitesse · May 27, 2022, 12:39am

and possibly a performance hit for everything using USTRING instead of STRING - but I guess time will tell.

anon23294430 · May 28, 2022, 12:58am

Depends on the CPU.
instruction_tables.pdf (agner.org)

I wonder if this Ustring you refer to is this Unicode string?
_UNICODE_STRING (ntdef.h) - Win32 apps | Microsoft Docs

typedef struct _UNICODE_STRING {
  USHORT Length;
  USHORT MaximumLength;
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

RtlUnicodeStringInitEx function (ntstrsafe.h) - Windows drivers | Microsoft Docs

RtlAnsiStringToUnicodeString function (wdm.h) - Windows drivers | Microsoft Docs

RtlUnicodeStringToAnsiString function (wdm.h) - Windows drivers | Microsoft Docs

RtlFreeUnicodeString function (wdm.h) - Windows drivers | Microsoft Docs

jslarve · May 28, 2022, 1:22am

USTRING is a Clarion data type that was supposed to appear in C11, but it hasn’t appeared yet. Maybe C12.

That structure doesn’t look like something we’d want, as the MaximumLength is only 16 bits. A Clarion &STRING variable consists of 2 LONGs. One for ADDRESS() and one for SIZE().

anon23294430 · May 28, 2022, 2:20am

I thought windows only went up to UTF-16?

C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs | Microsoft Docs

UTF-8, as its name suggests, uses 8-bit code units.

UTF-16 uses 16-bit code units.

I know UTF-32 exists UTF-32 - Wikipedia but I dont think windows supports it.

C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs | Microsoft Docs

There is actually another Unicode encoding, which is less well-known and less used in practice than its siblings: UTF-32. As its name clearly suggests, it’s based on 32-bit code units. So a GCC/Linux 32-bit wchar_t is a good candidate for the UTF-32 encoding on the Linux platform.

This ambiguity on the size of wchar_t determines a consequent lack of portability of C++ code based on it (including the std::wstring class itself). On the other hand, std::string, which is char-based, is portable. However, from a practical perspective, it’s worth noting that the use of wstring to store UTF-16 encoded text is just fine in Windows-specific C++ code.

It’s worth noting that the C++ standard doesn’t specify the size of the wchar_t type, so while it amounts to 16 bits with the Visual C++ compiler, other C++ compilers are free to use different sizes. And, in fact, the size of wchar_t defined by the GNU GCC C++ compiler on Linux is 32 bits. Because the wchar_t type has different sizes on different compilers and platforms, the std::wstring class, which is based on that type, is non-portable. In other words, wstring can be used to store Unicode text encoded in UTF-16 on Windows with the Visual C++ compiler (where the size of wchar_t is 16 bits), but not on Linux with the GCC C++ compiler, which defines a different-sized 32-bit wchar_t type.

Here the MS wchar_t is defined as a 16bit value
char, wchar_t, char8_t, char16_t, char32_t | Microsoft Docs

The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems.

So all in all, I dont think we have much of a choice over whether we want that structure or not.

jslarve · May 28, 2022, 3:43am

Maximum length is a USHORT, which is the 16 bits of which I was referring. e.g. 65535 bytes. I thought there was a larger rendition.

anon23294430 · May 28, 2022, 9:59am

Thats all I can find, I’ve looked a few times over the years and those api’s is what I keep coming back to.

I’m already using in C6

UnicodeToAnsi        PROCEDURE  (pUnicodeAddress,pAnsiAddress) ! Declare Procedure
    Loc:CodePage            = GLO:DBCodePage
    Loc:dwFlags             = GLO:UnicodeFlags
    Loc:lpWideCharStr       = pUnicodeAddress
    Loc:cchWideChar         = IS_lstrlenW(pUnicodeAddress)
    Loc:lpMultiByteStr      = pAnsiAddress
    Loc:cbMultiByte         = IS_lstrlenA(pAnsiAddress)
    Loc:lpDefaultChar       = 0 !Set to NULL to use system defaults
    Loc:UsedDefaultChar     = 0
    Loc:lpUsedDefaultChar   = address(Loc:UsedDefaultChar)
    Loc:ResultLength        = IS_WideCharToMultiByte(Loc:CodePage,Loc:dwFlags,Loc:lpWideCharStr,Loc:cchWideChar,Loc:lpMultiByteStr,Loc:cbMultiByte,Loc:lpDefaultChar,Loc:lpUsedDefaultChar)
    IF Loc:ResultLength = 0

because the rules for processing unicode chars is a bit more complex.
The sad history of Unicode printf-style format specifiers in Visual C++ - The Old New Thing (microsoft.com)

Bottom line is whilst Windows was the first to use unicode as the UCS-2 encoding, the “industry” went another way.
Unicode in Microsoft Windows - Wikipedia

Some of the Ansi api calls also hint at the data coming from a unicode source which backs up the assertions made about using UCS-2.

With the web ie websites typically using UTF-8 and the default UTF for HTML5, and now windows is supporting UTF-16LE, it seems handling the Byte Order Mark properly will take a bit of time to get right.
Bush hid the facts - Wikipedia

One other thing, how do you create keys or sort orders if different unicode encodings are stored in the DB and whats everyone else doing?
Supporting Multilingual Databases with Unicode (oracle.com)

Considering the three Unicode encodings, UTF-8, UTF-16 and UTF-32, each UTF-n variation is merely a mathematical transformatio

Maybe the unicode standards will have settled down enough by the time C12 is out?

vitesse · May 28, 2022, 1:30pm

I was more referring to the upcoming Clarion USTRING datatype which AFAIK is not yet implemented - see ustring in the help:

USTRING (Unicode string)

				length
label	USTRING	(		string constant		)	[,DIM( )] [,OVER( )] [,NAME( )] [,EXTERNAL] [,DLL]
				picture			[,STATIC] [,THREAD] [,AUTO] [,PRIVATE] [,PROTECTED]

USTRING A Unicode character string.

Format: A fixed number of bytes.

Size: 4MB at design time. Can be extended using NEW at runtime.

length	A numeric constant that defines the maximum number of characters in the string. This must include the terminating null character.
string constant	A string constant containing the initial value of the string. The length of the string is set to the length of the string constant plus the terminating character. To define a Unicode string literal you must use the U specifier to tell the compiler that the static string is Unicode text (see details below).
picture	The picture token used to format the values assigned to the string. The length of the string is the number of bytes needed to contain the formatted string plus the terminating character. Ustring variables are not initialized unless given a string constant.
DIM	Dimension the variable as an array.
OVER	Share a memory location with another variable.
NAME	Specify an alternate, “external” name for the field.
EXTERNAL	Specify the variable is defined, and its memory is allocated, in an external library. Not valid within FILE, QUEUE, or GROUP declarations.
DLL	Specify the variable is defined in a .DLL. This is required in addition to the EXTERNAL attribute.
STATIC	Specify the variable’s memory is permanently allocated.
THREAD	Specify memory for the variable is allocated once for each execution thread. Also implicitly adds the STATIC attribute on Procedure Local data.
AUTO	Specify the variable has no initial value.
PRIVATE	Specify the variable is not visible outside the module containing the CLASS methods. Valid only in a CLASS.
PROTECTED	Specify the variable is not visible outside base CLASS and derived CLASS methods. Valid only in a CLASS.

USTRING declares a Unicode character string terminated by a null character (ASCII zero). The length parameter declares the number of characters (minus 1 for the for the null character) that the USTRING can contain. The memory allocated is double the declared size (2 bytes per character). The memory assigned to the USTRING is initialized to a zero length string unless the AUTO attribute is present.

A USTRING can contain both Unicode and ANSI characters. To define a Unicode string literal you must use the U specifier to tell the compiler that the static string is Unicode text. The specifier can be upper U or lower case u and must be placed immediately before the apostrophe. For example:

UST USTRING(U’Ω’) ! define a Unicode string literal

or

UST USTRING(u’Ω’) ! define a Unicode string literal

You can address multiple characters within a USTRING using the “string slicing” technique. This technique performs similar action to the SUB function, but does no bounds checking so care must be used. String slicing is not allowed on the left side of assignments.

For example:

USTR[1] = ‘A’ <----- Error, results in an invalid string

anon23294430 · May 28, 2022, 4:13pm

I forgot about that one, been doing my own unicode couldnt wait.

dcash · May 31, 2022, 10:50pm

I suggest to just do things the Clarion way, and assume that everyone will only ever use 8.3 character filenames. Problem solved.