Finding a Faster Way, and not Doing Boring Work

Friday, April 19, 2024

10 min read

One of my first really big projects was truly Epic in scope … convert all of Cohort’s direct Chronicles global references into using APIs, everywhere.

I definitely don’t remember how many lines of code there were across the Epic Cohort code base back then, but it wasn’t small. When I worked on this, EpicCare Ambulatory / Legacy hadn’t been released yet and every other application was fully text/terminal based. It wasn’t a two-week project. Worse, it really didn’t have a clear end-date, because no one knew how much work it would be, other than substantial. I’d proven I wasn’t terrible at my job and could be trusted to change a LOT of code.

MUMPS, used by Epic at the time (and still does, just with a different name), organizes source code into files that are named routines. The routines back in early-to-mid 1990s were size limited. I don’t remember if it was 2K or 4K of source code at the time — it really wasn’t much. One of the astonishingly wild things about MUMPS the programming language is that it allows keyword abbreviations. And back in the days when the file sizes were capped at such small sizes, the abbreviations were used exclusively.

Until you see old MUMPS code, you really don’t understand how unusual it was to read and understand MUMPS code. This is just a goofy little sample that doesn’t do anything important other than exercise a number of features of MUMPS:

MUMPS

MYEXAM ; 
        N c,i,d,t S %2=""
        W !,"I<3MUMPS: "
        S c=^G("LAB",LID,DATE),%2="",t=$$tm()
        F  D  Q:i=""
        . S i=$O(^G("LAB",LID,DATE,i))
        . Q:i="" 
        . S %1=^G("LAB",LID,DATE,i)
        . S:i>100 %2="!"
        . S ^G1("S",$I,t,c-i)=%1_%2        
        . W:i#5=0 "." W:i#100=0 !,i
        Q
tm()    Q $P($H,",",2)

Post-conditionals (S:i>100 %2="!") make for some fun code to read: perform the operation if the condition is true.

In addition to the limited file/routine sizes, MUMPS also was not fast. That meant that code needed to take a number of liberties if performance was desirable. For example, not calling functions with arguments. The stack wasn’t particularly efficient, so code would routinely document its requirements and expect variable values to be available. Variable values would be available in the current process/execution without needing to pass the variable. Calling a function didn’t prevent the called code from accessing the variables that had been declared or set in other code.

Aside: When a user would connect to an Epic system, they’d connect the terminal to a new captive session which was a new MUMPS process (also known as a $JOB). Epic code would be called immediately. That process would be dedicated to that user until they ended the Epic application. When the process starts, the MUMPS code executed and all variables declared from all prior execution stacks were available. A new stack could redeclare a variable and then that new variable’s value would be available to further code until the stack was popped. It’s ingeniously simple and dastardly at the same time. So if a variable X was declared in stack one and set to 1X and called a function, the function could read the value of X without it being passed! If X were declared (via NEW) and set to a new value (or not), they’d reference the newly declared X rather than the X that was lower on the stack. As soon as the stack level was exited, the prior X was in again in scope along with it’s value.

If you’re thinking that must have been a super fun way to code, you’d absolutely be right! Not uncommon at the time when writing MUMPS code was having some assumed scratch variables that you did not need to declare (Epic’s convention was variables %0-%9 were scratch), but you needed to use them at your own risk, as a function call/goto anywhere else might also use the same scratch variables.

I won’t lie to you. Scratch variables would often make you scratch your head. Repeatedly. Great for performance (via well tested performance metrics to confirm that their use was to the benefit of the end-user), but lousy for developers. Lousy. Additionally, there were a handful of variable values that were well-known across Epic code bases and much of the core Foundations code expected them, so those were generally reasonable to intuit without concern. But, occasionally, they’d leak and cause unexpected conflicts during development.

Back to the project. Cohort, due to the size and the way public health labs operated, often was executed in what was considered to be multiple “directories.” It was one MUMPS system, but Cohort had multiple directories in which it would run. As Cohort systems grew larger, the need for a different way to address not only the directories but also the core MUMPS globals became very important. (I don’t remember what the sizes of these Cohort systems were anymore, but at the time they seemed GIGANTIC—probably on the orders of tens of GBs or less honestly).

It was necessary for these Very Large Systems that the MUMPS globals would be stored in separate disk files, and that would require that all global references would need to include references to the proper global file. It went from ^Global() to ^[loc]Global(). And while the Cohort team could have adapted I suppose, there was enough configuration and complexity that switching to Foundations APIs that were designed to handle this new requirement was a much better long term strategy.

Every global reference needed to be reviewed and modified. Honestly, I wanted none of that. It was a mind-numbingly fraught-with-error-potential project that was as horrible as it sounds. Touch EVERYTHING. Test EVERYTHING.

As you can guess, MUMPS wasn’t designed around any modern design principles of effective coding. There were no pure functions (and as I mentioned, often no functions at all). There were these horrible horrible things called naked global references:

MUMPS

S ^GLOB("DATA",456,"ABC")=1
S ^("DEF")=2

That was the equivalent of:

MUMPS

S ^GLOB("DATA",456,"ABC")=1
S ^GLOB("DATA",456,"DEF")=2

MUMPS would reuse the most recent specified global name and all but the last subscript (the keys are called subscripts), and then build a new global reference from that. Any reference was considered within the process. Using a naked global reference was super risky if you didn’t control every line of code between the most recent Global reference and the line that used that syntax.

Sure, it was a shortcut, and so frustratingly difficult to find and understand. Thankfully, it was mostly in older Epic MUMPS code and was a practice that was actively discouraged in any new code.

I started to look at the Cohort code. The project was complex both from a technical perspective but also from a timing management perspective. I couldn’t just start on line one and work my way through the code. The other Cohort developers would also be touching the same code as they worked on other projects. We had source control — it was: “Here’s the source: One copy. Don’t mess up something another dev is working on.”

After a few days of getting a better understanding of the Cohort source code (HEY TL, I know what you did (and I know she’s reading this)—it was a great way for me to deeply learn the Cohort code), I came up with a different proposal for my TL from what she had originally suggested.

$TEXT is a function built into MUMPS that will read the source code for a file. My proposal: write a parser that would make the changes to as much code as it could and flag changes that needed to be manually modified. She gave me about 2 weeks to work on the parser as it seemed to address a lot of the technical and timing concerns. I jumped in. I’ve worked on a lot of parsers since and find them very intellectually stimulating. Writing a parser in MUMPS was all sorts of ridiculous at the time. It had to adhere to the same core requirements of all MUMPS code. Limited stack, limited memory, slow CPU, no parsers, no objects, simple string manipulations, … it was very basic.

If you step back and look at MUMPS from a high level and look at the language and the behavior, you’ll see machine code. It has so much of the same essential look and functionality (and limitations). The original designs and implementation of MUMPS had it running as its own operating systems (yeah, it’s THAT OLD). The connection to machine code patterns makes sense, even though MUMPS is interpreted. There weren’t lots of programming languages to inspire language design back in the 1960s. Yep: 1960s.

My approach was to build a global structure mirroring the code base as much as was needed. Things that didn’t matter were ignored and not recorded (as it was a slow process already and recording source code noise wasn’t worth the time/disk). Using dozens of test routines with the most complex Cohort code I encountered, my utility improved over the next two weeks to where it was processing and converting the majority of code successfully and logging code that couldn’t be safely converted automatically.

I know I went a little long and the TL was fine with that as she could see the end was in sight and the time spent was still less than if I had done the same work manually. I’d guess it took me about 3 weeks with normal interruptions. I specifically recall it was about 70 hours total.

It was sincerely cool to see it churn through the code and make the necessary modifications. The parser was actually pretty sophisticated and had to handle the nuances and complexity of the Cohort code and MUMPS. It recorded routine jumps, functions, gotos, variables, … so much.

At the end of my development, we ran the tool on the Cohort code base successfully. The dozens of logged issues were fixed by hand by the team collectively. It worked. I think there were a few issues, but it had made many many thousands of changes to the Cohort code base, so no one was surprised that it encountered a few bumps. I’d run it on the code base manually, but the sheer number of changes meant it was nearly impossible to spot the issues that it inadvertently caused.

The project was a success. My TL appreciated my out-of-the-box thinking about how to do the project successfully, not wasting Epic time or resources, and in fact doing it faster than had been expected if I’d done it manually, and with many fewer interruptions to the other Cohort developers (🤣, there were only 2 others at the time).

The code was eventually included in Cohort’s official code base for a variety of reasons, I believe it was called CLPARSE (or CLXPARSE?)

One of the other Cohort developers later took the code and modified and extended it to become Epic’s first linter for MUMPS code. My recollection is fuzzy, but I think the name became HXPARSE which is likely far more familiar to Epic devs than Cohort. 😁

Send me your thoughts.

Acorn Talks

Finding a Faster Way, and not Doing Boring Work