Renovating a CGI app talk

Tags: legacy_code perl_projects

Recently I was invited to give a tech talk at the Southampton Perl Mongers group online event. It was great to spend some time with local and not so local (Hello to our new friends in Texas!) Perl users.

The talk was an abbreviated version of the talk I had planned to give at the German Perl Workshop 2021 but was unable to do. In this post I want to share the content and some bonus content not included that I should have included probably.

Here is the video of the talk I made after the event:

Overview

The talk was some experiences I've had mainly renovating an old Perl CGI app, I wrote in the late 1990s and early 2000's which was parked gathering dust till Christmas time 2020.

The 3 steps I describes were: * Get it working * Tidy up * Modernise

Followed by some learnings and some advice.

Get it working

This seems obvious, but made a big difference for me. Mixing making it work and improvements is a mistake I think. It's tempting but just getting it to work will more than likely force changes in the code. Couple those changes with tidying up and you are more likely to break things.

One of the big challenges I found was the (re)learning of how the code works. What ideas and approaches were in use at the time. Just "getting it working" provides opportunity to learn the general shape of the code and what it depends on to work. Be that old CPAN modules, environment variable, files on disk or databases (and specific database versions).

You will have to change code even if you are just trying to get a working local version.

Getting it working TWICE is super valuable too. So both a local dev setup and a fresh server somewhere. This doing it twice helps find things that are not obvious. For me the differences from a local ArchLinux machine and a Ubuntu Server helped me identify differences.

In the talk and afterwards in the discussion, the idea of writing down what you learn as you go along was covered. This is skill and habit well worth developing if you are working on legacy code... or new code in fact. I kept notes on blog articles I read and followed. I scribbled notes on how to start/stop things. I absolutely had moments where I did something on Friday and come Monday had forgotten what I had done, why or how I came to the information I used to influence what I tried. WRITE IT DOWN... it's worth it!

Plack::App::CGIBin

Specifically for me, I was working with an old set of .cgi files. There were previously running on a server under Apache. Locally, I did not have; nor want that overhead. So I needed a tool called Plack::App::CGIBin which allows you to simply point it at a directory of .cgi files and serve it as a plack app via plackup. This was essential for my local development setup so I could see the working app once again.

At this stage the home page loaded nicely but not a lot else as CPAN modules were not on my system.

Carton and cpanfile

Managing the dependencies the app had was important (especially with an older application, where breaking changes can often appear). My choice here is to use the tool carton which reads a cpanfile and installs the CPAN modules locally (into a directory called local in the working directory). Then you can run with these specific CPAN modules using carton exec.

When I was trying to get the application up and running, I had some issues and it was really helpful to have two copies of the source code in different directories and be able to use carton to run different versions of the same CPAN modules. This helped me identify breaking changes. Not having to rely (or mess with) system wide CPAN modules was/is really valuable.

Hard coding and deleting

As I got to understand the code better when getting it working; it became clear that some things were not worth retaining. So the delete key was a really effective method to get the application working. The other trick that I used was to simplify the problem by hard coding some variable that were originally designed to more flexible but generated complexity.

Deleting code and hard coding things helped get the app to the state that it "worked" again. It was not 100% functionality restored; that in itself was a great learning experience. It's easy to think that all the features the legacy code had in place matter and/or worked.

Tidy up

Once the application was working, the next phase was to tidy the code in advance of planned modernisation.

I feel it's valuable to separate the tidy up from the modernisation. I find and found that the act of tidying up involves some modernisation anyway. What I mean here is that knew at this stage that I wanted to replace the data handling part of the code; but decided that I wanted to tidy up first as I knew that it would involve some changes to the code AND some collateral modernisation. If I tried to modernise during or while cleaning up I think it makes the task more difficult.

Perltidy

I suspect most Perl developers have used Perltidy and are familiar with the way it formats the source code in a consistent manner. When picking up a legacy code base it's really helpful to find a moment to perltidy everything. This will make the existing code look more familiar (especially for those of use using Perltidy a lot and are used to the style choices). I tend not to customise the perltidy setting too much leaving it pretty much on defaults.

Perlcritic

Another well used tool, perlcritic allows you to automate some stylistic decisions that are regarded and "Best Practices". Specifically, "Perl Best Practices" the book standards. I am not saying that all the PBP standards are ones I follow but I appreciate the standardisation it offers. It is a wonderful tool to help shape legacy codebase. It did help me identify some common things to improve (double to triple arg file opens for example).

WebService::SQLFormat

SQL is a language in itself, so like the Perl I think it's valuable to have some consistent formatting of the SQL in the app. I didn't use a ORM when the app was first written and broadly find the benefits of writing SQL outweigh the benefits of an ORM... your mileage may vary.

I started out using a website and copy and pasting SQL back and forward. Later I identified the WebService::SQLFormat module and was able to write a small script that would format my SQL in a more automated fashion.

As with Perltidy I don't necessarily agree/like all the formatting choices it makes. But I value the consistency and ease to automate more than my aesthetic preferences.

Human Eye

A trick I applied to this code is the simple trick of zooming out my editor so that the code is tiny and I can see the "shape" of the code.

It's a remarkably effective way of "seeing" code smells. The easiest to describe is complexity added by layers of loops or if statements. You can easily see multiple levels of indentation and see that something there is complex. This is helped by having previously having run Perltidy; so do that first.

You also see large/long subs and are able to see dense code.

Zooming in on the areas that from "30,000 feet" look wrong, you can make quick gains by tackling these areas, then zooming back out to find what else looks problematic.

I did use some other tools to help with this, like trying to see cyclomatic complexity or file size. Frankly though helpful, the human eye and mind is exceptionally good at seeing patterns and I got more benefit from this trick than the tools.

Modernise

Having tidied up the code I was in a position to modernise. Starting with introducing Dancer2 as my web framework (Mojo/Catalyst might have been your choice... I went with Dancer2 as it's a tool I know well).

Having created the basics, the next stage was a cut and paste exercise of moving code out of CGIs and into routes. I was fortunate that I had used HTML templates originally so was saved the pain of breaking the HTML out of the code... many of use have been there and it's not fun.

Database change

The original code used a module called DBD::Anydata which was (it's deprecated now) handy at the time. I used it to read and write from CSV files using SQL statements. Yes really. It was a mad decision to use CSV files as the data store for the app; but in terms of modernisation it was fortunate as it meant that I'd not had written code to read/write data to files. I had written SQL inserts and selects, which meant that it was comparatively easy to migrate the app to Postgres.

I did need to write the schema creation etc.

Database Migrations

The schema was "OK" but not perfect and as I tidied more and had to make more changes I became annoyed with destroying and recreating the database each time. I explored using migration tools like sqitch XXX, but ended up quickly writing a migration tool in the admin area of the app that applied SQL statements in numerical order from a directory of .sql files. (With a migration level being stored in the database to prevent re-applying the same changes. Not sophisticated... but it works and was simplest solution at the time.

Docker

Initially I had a local installation of Postgres, but I work from multiple machines and quickly moved to using a dockerised installation of Postgres to simplify my development cycles.

Following this I added a Perl container to run the app itself.

Learnings

In preparing and giving the talk I was struck by how much of what mattered to me was not "technical" but "human" parts.

Write it down This ended up being more important pretty much anything else. Having notes on what I was doing and why proved important when I took breaks from the code and came back to it. When I did poorly I would come back and not recall what specific things I was trying to do and why. This was really prevalent in the Postgres changes. I am not a docker guru and followed several guides; at least once I came back the next day and got lost as I did not take notes on what article I had read and what it taught me.
Automate it Perltidy, Perlcritic, etc. The more I was able to automate the easier it became to use the tools and to remain consistent. This is true also of deploying the code. A "pipeline" be it Github actions or bash script makes life so much easier and the easier the process the better. My mind could stay of the code and the problems not on how did I get this deployed etc. So take those moments to automate things.
Do the simple things first It's tempting to get stuck in and tackle hard problems at the start. I think this is a mistake. By starting with small simple things you gain familiarity with the code and the "domain". You discover the hidden complexity before you get deep into hard problems. This way when you get into the hard problems, your familiarity is high and you've discovered many of the complexities already. So do those simple things; it's why we often give the new member to a development team the simple update the template typo ticket right. Simple tasks mean you run through all the steps sooner and resolve the things that are not obvious.
Legacy code is a great place There is enjoyment to be had in an older code base. Unlike a writing from new, a legacy code base is nuanced. You often have multiple ideas spanning the code base. As a developer touching legacy code you have the opportunity to discover how it was built, how it was changed, and why. This is satisfying work; learning how to code in the "old" style can be fun. Also moving a code from old to new can be satisfying. Finding the multiple styles of a code base and bringing them into alignment is a skill in itself and one that deserves more highlighting.

Legacy code, has made many decisions already. So often that "paralysis by analysis" problem is avoided as you are stepping into a situation where decisions have already been made. Another skill is understanding the constraints and mindsets that ended up in the code looking as it does. The old adage that the people who wrote were doing the best they could given the situation is valuable to keep in mind.

Legacy code is important, it exists and did something. New code is a gamble in a way. It's not proven, it's not been used. So Legacy code and maintaining it is a vital part of our industry and we need more people to value it.

Perl's Legacy Working in multiple languages, it's interesting to see the influence it has had on newer languages.

For example, Go has great testing tools out of the box and formatting. JavaScript's NPM is CPAN. PHP is getting better and has better tooling than it once did. Raku oddly given it's lineage does not have great tooling. There is no Rakucritic or Rakutidy. Go in many ways shows more influence of legacy Perl. Go is arguably built with Legacy in mind. It's intentionally constrained and comes with tools to help the legacy code age well.

Summary

I'll try and record myself giving the talk and share it on the site till then, thanks for reading along.