Moving beyond Clarion 10 11897 (Cross-Thread Deadlock Silent GPF's)

Back in September 2017 I first put into production an application that worked fine with C10 11897, but with C10 12799 would occasionally (as in maybe once a week, or 5 times in a day) give silent gpf’s on certain workstations running Windows 7 or Windows 10. Usually I would see a Cross-Thread Deadlock show up in the Windows Event Viewer for the hang. I rolled-back to C10 11897 and no problems. I shelved the issue for the time being, but then in June 2018 decided to track this down. It is something that I could never reproduce at will, but some sites had it more often than others.

I figured it was a global variable issue, so I made sure all of mine were updated with a critical section. Using DebugView and the Clarion debug runtime dll and Capesoft’s GPF reporter I got lots of data with line numbers and such, but no smoking gun. Fortunately, I had a customer site where I could load test versions on two of their computers, run DebugView, and use TeamViewer to record their 12-hour sessions, so I could replay what they were doing when a crash occurred. For all the logs generated, I only got one place in my code that I could fix, but the other crashes were happening when different procedures would exit (i.e. ThisWindow.Kill). Different procedures, different threads referenced for the crashes that would occur at various places in the program.

This is a standard ABC MDI app. I had put in the Faux Max template a while back, and that did help reduce the frequency of the crashes (by eliminating a window being maximized), but it didn’t fix things 100%. In fact quite by accident I saw in reviewing a TeamViewer session a workstation which still maximized a browse window since it had saved that window setting on the workstation before getting the updated program with FauxMax in it, and that would lead to a crash such that I was able to reproduce it myself. Eliminating the saved maximize setting in the ini file fixed the crash for that browse, but still the silent gpf’s persisted in other parts of the program.

I had seen a discussion in the Clarion chat group last summer about converting an MDI app to SDI, which fixed for the user a lot of weird lockups in a heavily-threaded app. The user wrote he had a template change which used the second form of Open(Window) to supply the parent (owner) window to get around possible situations where (say) an update form might appear behind its parent browse, but I was concerned it would take a lot of testing for my app, so I saved that one for another day. But 9 months later, I had run out of ideas, and decided to go for it. But I didn’t do a complete switch-over–I left the main frame (application) MDI, but changed all child windows to be SDI by removing the MDI attribute. My theory was since Microsoft has written that MDI is deprecated and and not thread-safe, if I kept the MDI usage to a single thread then I wouldn’t have any cross-thread issues. Doing it this way I still have the same behavior I did before without worrying about using open window with an owner–no update forms are being covered-up by their parent browses.

So anyway, I’m happy to report that during my last 12-hour session, I had no lockups on my two test computers (where before I would have at least 5 between the two of them on any given day) and the users didn’t even notice any changes. Of course now windows can be maximized since they are SDI (FauxMax only works on MDI windows), but since it was only MDI windows that have the maximize issue I’m fine with that. And I removed the Standard Window Top-Level Menu, since it only works with MDI windows.

These are the steps I did to convert my Child windows from MDI to SDI. Credit for all these ideas go to the chat between Peter Petropoulos, Owen Brunker, Mark Goldberg, Dennis Evans, and Bruce Johnson in CW-Talk back on June 19, 2018.

Steps to remove all MDI attributes from child windows (while leaving your Main Frame an Application)

  1. Export app to txa.

  2. Edit txa.

  3. Search and replace:
    ,MDI
    with nothing (case-sensitive).

  4. Search and replace:
    MDI,
    with nothing (case-sensitive). This one takes care of continued lines.

  5. Search and replace:
    %WindowOperationMode DEFAULT (‘MDI’)
    with
    %WindowOperationMode DEFAULT (‘Use WINDOW setting’)
    (This takes care of ABC reports that might have a progress window defaulted to MDI.)

  6. Save txa.

  7. Recreate app from this newly-saved txa (New Solution, Project, or Application–Application from Txa) I recreate it with the same filename with the app’s solution open but the app closed so it overwrites the existing app file, just don’t let it overwrite the appname.cwproj file.

  8. Recompile and test.

…jack

9 Likes

Solution for ABC Template Users of Clarion 10 and 11! Read on…

It has been almost a year since my original post. I had resigned myself to never using MDI again until a thread in the SoftVelocity Clarion 11 newsgroup entitled “MDI Hell” caught my eye back on 2020-01-29.

Lots of activity in this thread, and the OP (Original Poster) found several threading issues he fixed in his program, but the “smoking gun” he discovered was he had “Enable Run-time Translation” checked in the Global Properties (General Tab) of his Clarion 11 ABC Main MDI application. Once he unchecked it no more crashes.

I looked in my main app, and sure enough I had “Enable Run-time Translation” checked there as well. I dug-up the oldest version of my app I could find (Clarion 6), and it was checked there too. It never made any difference until after C10.11897. And I couldn’t find it checked in any of my other app files.

Though I was hesitant about testing this again since my test users had grown a bit annoyed with me forcing a crashing program on them, I went ahead and gave it a whirl. Unfortunately, it is a lot harder to convert an SDI program back to MDI than the other way around (which I documented in my original post), other than doing it manually. But it only took me an hour or so to add the MDI attribute back to 143 windows.

I’ve had this MDI version (C10.12799) running at two sites for the last couple of weeks, and no crashes!

Full credit for this goes to the OP of “MDI Hell” (Bostjan Laba). What convinced me to try his solution was the PTSS 42978 he posted with an example program. I compiled it and tried it on two physical machines and one virtual machine–finally on the virtual one I was able to reproduce a crash after about 50 repetitions of his “recipe.”

Of course, this solution doesn’t help users who need “Enable Run-time Translation” checked. For those look in the C11 Newsgroup for the topic “Re: New C11 Release, Blog entry and AnyScreen newsgroup”, a post by KC Chin on 2020-02-13. He made some changes to the abutil.clw file for the “TranslatorClass.TranslateControl” procedure which might be a solution.

At least one other user with this same issue in the newsgroup has a Clarion (Legacy) Template app. Since Run-time Translator is only available for ABC Templates, he uses PDTranslator. But maybe there is something similar in PDTranslator that triggers the crashes with Clarion runtimes after C10.11897.

Now that SoftVelocity has PTSS 42978 with a reproducible example, I’m hoping they can identify the issue in the runtime and fix it for a future release.

For me it’s been 2.5 years from the time I first identified the issue until an MDI workaround was given to me, but I’m glad to have that old feeling of rock-solid stability back in my Clarion-produced application.

…jack

4 Likes

A number of threads lately lead me to revisit this issue;

And a check there shows that the TranslatorClass indeed has this flaw. Whether it is the root of the problem or not is hard to say, but generally speaking using a CriticalProcedure can be dangerous, especially if the procedure calls other methods in the class.

FWIW, I did report the nesting CriticalProcedures (of which there are a couple) to support.
But in my testing, the removal of those nesting CriticalProcedures from the TranslatorClass (and replacement with “traditional” CS calls) did not fix the problem.
Didn’t investigate ASSERT() calls.

Using Bruce’s article on “Use of ASSERT statements…” as a guide, in Feb 2019 I commented-out any assert statements in the shipping ABC classes that were in procedures using a critical section in both my C10.12799 and C11 (all versions) files.

Though I didn’t comment out EVERY Assert statement–so if there is a nested call referencing one of these procedures from a critical section than that would be problematic.

…jack

Update: As of build 11.0.13622 the problem ASSERT’s were fixed.

Hi, we are small company from Pula, Croatia (QIQO d.o.o.) Working with Clarion 10.0.0.11975. Legacy template. We have same (or similar) problem with our application. From time to time, application went to silent gpf’s and in Windows Event we can found Cross-Thread Deadlockstopped. There is no error on screen. And window is still active. We tried to implement a changes from Clarion 11.0.13622 (as Bruce suggest - fixed problem with ASSERT). But it doesn’t help. What we found out is that, every time this happened, program stops on same command (POPBIND on the end of ProcedureReturn). Line before that command, program is still alive. Source after POPBIND never happened. Any suggestion will be helpful. Is that any way to parse/debug through POPBIND function, to see what happen inside.