Libcurl url spaces

carlos.pizzi · February 19, 2024, 11:23pm

Hello, how are you all!! I hope it’s ok !!

I am working with libcurl and I ran into the problem of “space” in the url.

I solved it with the following code:

        LOOP C#=1 TO LEN(CLIP(LEFT(pURL)))
            IF pURL[C#]=' '
                l:url=CLIP(LEFT(l:url))&'%20'
            ELSE
                l:url=CLIP(LEFT(l:url))&pURL[C#]
            END
        END

My question: can it be made more “elegant” or is there a libcurl method that solves it?

Thank you so much.

Bruce · February 20, 2024, 1:06am

What you are doing is sn example of url encoding. Space is one of the characters that needs to be encoded, but there are others too.

It likely wont shock you that StringTheory includes a UrlEncode (and UrlDecode) method.

jslarve · February 20, 2024, 4:59am

@Mike_Duglas has an addon called “printf”. GitHub - mikeduglas/printf: Convenient string formatting.
from the docs:

%u - an url encoded string (the spaces get encoded to %20)

vitesse · February 20, 2024, 11:03am

Hi Carlos

I see you already have a couple of answers including Bruce mentioning ST (which is what I would typically use) and also Jeff mentioning Mike’s printf.

But a couple of other suggestions if those are not an option or if you just wish to learn how to improve your code.

Your code is buggy - get rid of the left() statements. If you are not sure why consider a string ‘__________a_b_c’ where I have used the underscore character to show a space. Your loop will only process the first five characters.
If you are interested in speed, rather than using clip() within the loop, use subscripting. You will need to have two subscripts one for the pURL and the other for l:url. This code can be a little tricky (as the replacement of a space is three characters not one) which is why it is often better to use a pre-written string library that you hope has already been debugged. But you will learn more by doing it yourself. Also as your string is fixed length and will not automatically expand, you could potentially get characters truncated at the end. Some people get around this by defining a string much larger than they think they will ever need to have some “headroom” but it is not ideal.
It is better to not use implicit variables - better to define your LONG subscripts rather than using C# etc.

hope all that helps

Bruce · February 22, 2024, 9:47am

I would caution against this. Subscripting on text strings is a habit that will be coming back to bite us over the next decade. At this stage subscripting a text string is like storing a 2 digit year number, in 1999. It’s been “fine up to now” but in the short term it’s going to start being a problem.

The root of the issue is that subscripting is based on the premise that 1 byte in the string = 1 character in the string. This premise is only valid for some text encodings, and as we start using more main-stream encodings that premise is not true, and all code using it is going to have to be fixed.

If you need to optimise for speed (hint: in this case you don’t) use a string management class which is optimised for speed. That way the class has to handle multiple encodings, but you do not.

Bruce

vitesse · February 22, 2024, 11:59pm

Interesting comment Bruce.

Yes unicode is coming - and has been for many years. If you continue to use ANSI chars with STRING then presumably nothing would change and you would not have a sudden “Y2K moment”. Your code would run unchanged.

If you changed your variable to be USTRING (“coming soon”) then string slicing can still be used on the right (but not on the left) of an assignment. (See help for details). So in the case of using it on the left of a USTRING assignment you would get a compile error so it would not sneak in unnoticed.

From my reading of the help, string slicing on USTRINGs will be character based (not byte count based). So the comment “subscripting is based on the premise that 1 byte in the string = 1 character in the string” does not appear to be correct.

The USTRINGs look to be similar to CSTRINGs in that they are zero terminated and len() returns the used length of the contents - unlike STRINGs where len() returns the allocated length the same as size(). Size() returns the allocated size which in the case of USTRINGs is double, based on two bytes per character.

So if you want to store up to 10 chars you need:

10 byte STRING
11 byte CSTRING
22 byte USTRING

of course the implementation could change between now and USTRINGs making it into the language (presumably in C12, also coming soon) but that’s my understanding based on the currently available information.

many people will not have need of unicode (and its performance and memory overheads) but it will be a blessing and relief for those who need it and have been waiting (patiently or otherwise) for it to arrive for some time.

jslarve · February 23, 2024, 12:02am

22 bytes on the USTRING might not cut it, depending on the content.

vitesse · February 23, 2024, 12:06am

yes agree - but that is what the help is currently saying, hence my comment that things might change between now and it seeing the light of day out in the wild.

I guess we will see… soon

Bruce · February 23, 2024, 4:26am

This is a trap common to most programmers moving from ANSI to Unicode. Its -really- hard to let go of things you know yo be true.

All Unicode encodings are variable length. Let me repeat that, all Unicode rncodings zre variable length. So no, you cannot calculate the length of a string (number of chars) from the size (number of bytes). No, you cannot say “i have allocated n bytes, which means x characters”.

And string slicing should not be done in code. Yes, libraries will use it in places, but there will be a lot of boiler plate to make that work.

Ultimately its likely all the functions you mention (LEN etc) will operate on Code Points and not characters.

Yes, no existing code will be broken. So its not Y2K in that sense. But our mindsets need a change, and weaning off the string-slicing habit is a foundation of that.

Incidentally, for the original code above, performance could be improved by simply using a Cstring or Pstring instead of a String.

vitesse · February 23, 2024, 4:46am

yes I am in agreement but it seems the help clearly says that if you say ustring(20) it will allocate 42 bytes.

I guess we will not know if that is correct until it is released. The two options as I see it are:

a) the allocation is variable/dynamic depending on what characters or code points are to be represented

or

b) the allocation is fixed and the number of chars will be variable.

I don’t see that this follows providing (as the help suggests) the slicing is NOT based on bytes.

I don’t really want to argue with you in public Bruce but that is a gross oversimplification…

It depends on the size of the fields and how “full” they are as to whether appending to clip(myString) [which scans backwards from the right] or myCstring [which scans forwards from the left] is faster. What is definitely faster is to remove the scanning (whether backwards or forwards) all together!

anyway enough for now.

Bruce · February 23, 2024, 11:49am

That’s fine, but neither of those numbers are “number of characters”. the size, and number of characters are two distinct values that have no relationship to each other.

This is indeed the case, a USTRING is declared with fixed size. You -cannot- infer any “number of characters” from that. period.

Character level manipulation of unicode strings is a complex topic. As a first approximation people using slicing will get it wrong, because they are slicing code-points not characters. It will appear to work for some subset of characters, but be broken in the general case.