More ways of starting on legacy code

Tags: legacy_code

This week I wanted to explore more alternative ways of approach a new code base; be it Perl or any other programming language.

Online tools

If you are using a tool like GitHub, you have some tools at your disposal already. In the earlier posts, the screen shots of code I shared were from GitHub, I just zoomed out in the browser. I find the Contributors info in "Insights" really useful. You can see who wrote much of the code, then look at the commits to get a feel for where the churn is.

As a Perl person, you might like to take a look at Kritika.io as an example of the value of good tools. It unlike many has strong Perl support, such as Perl::Critic (more on that shortly), along with giving school style "grades" on specific files (A, B , B-, etc). It also tells you about churn, complexity and lines of code.

One of the interesting parts of Kritika (and similar tools) is the diagrammatic depictions of your code base, such as this one:

Diagram showing files, large circles showing frequently changed dark circles changed often

This gives a perspective on change "frequency over X time", in this case you can see the larger files have changed more frequently and the darker ones are more recently changed. This can give some insights into areas that are frequently changed (often problem areas) and so forth. A large frequently changed file for example might be one to consider breaking up.

Local tools

Kritika, is an interesting tool that prings us to Perl::Critic a staple of Perl development and a really great tool that is undervalued sometimes. It's incredibly extensible, so can do far more than what it does "out of the box". I'd strongly recommend running it on any new code base to help identify known "code smells".

Along with identifying code smells, it's very good at giving you some statistics on your code base, via the --statistics command, for example:


$ carton exec 'perlcritic --statistics-only lib'
   19 files.
   33 subroutines/methods.
1,153 statements.

1,670 lines, consisting of:
      316 blank lines.
        3 comment lines.
        0 data lines.
    1,351 lines of Perl code.
        0 lines of POD.

Average McCabe score of subroutines was 2.33.

2 violations.
Violations per file was 0.105.
Violations per statement was 0.002.
Violations per line of code was 0.001.

2 severity 4 violations.

2 violations of Subroutines::RequireFinalReturn.

So in this (not terrible) example; we can see that the complexity score of subroutines is 2.33, we have some statistics there about violations per file, statements and lines of code.

We also can make some measures of subroutines to files rations etc.

Another interesting way of looking at your new code base is gource

In the video above, you can see how developers have worked on the code base and see hot-spots and trends.

Another interesting tool, that may help with a legacy Perl code base is Code::Statistics. Once installed, you can run with codestat collect which will collect statistics on your code base. After which you run codestat report which will give you something like this:

================================================================================
                                  RootDocument
================================================================================

averages
ccomp: 19.5462060820785
lines: 438.453793917921
sdepth: 0
size: 12888.2081488043


                                     ccomp
top ten
Path                                  Line Col Ccomp Lines Sdepth Size    Dev.
--------------------------------------------------------------------------------
/SQL/Translator/Parser/DB2/Grammar.pm    1   1  6419 47955      0 2484388 328.40
ocal/lib/perl5/Perl/Tidy/Formatter.pm    1   1  3389 20274      0  782689 173.38
ocal/lib/perl5/Perl/Tidy/Tokenizer.pm    1   1  1193  8885      0  322357  61.03
/local/lib/perl5/Module/Build/Base.pm    1   1  1108  5568      0  161767  56.69
vwjl/local/lib/perl5/IO/Socket/SSL.pm    1   1   984  3509      0  109767  50.34
ib/perl5/Perl/Tidy/VerticalAligner.pm    1   1   681  4892      0  183931  34.84
l/local/lib/perl5/Parse/RecDescent.pm    1   1   592  6610      0  221223  30.29
cal/lib/perl5/DBIx/Class/ResultSet.pm    1   1   587  4824      0  143831  30.03
dev/vwjl/local/lib/perl5/Perl/Tidy.pm    1   1   572  4140      0  156109  29.26
ancew/dev/vwjl/local/lib/perl5/MCE.pm    1   1   562  2106      0   67048  28.75
--------------------------------------------------------------------------------

bottom ten
Path                                  Line Col Ccomp Lines Sdepth Size    Dev.
--------------------------------------------------------------------------------
e/lancew/dev/vwjl/lib/VWJL/Contest.pm    1   1     1    15      0     151   0.05
w/dev/vwjl/lib/VWJL/Infrastructure.pm    1   1     1    12      0     252   0.05
WJL/Infrastructure/DatabaseResults.pm    1   1     1    29      0     541   0.05
home/lancew/dev/vwjl/lib/VWJL/Waza.pm    1   1     1    27      0     251   0.05
/lancew/dev/vwjl/lib/vwjl_redirect.pm    1   1     1     8      0      99   0.05
ome/lancew/dev/vwjl/local/bin/plackup    1   1     1   236      0    7444   0.05
cal/lib/perl5/App/Cmd/ArgProcessor.pm    1   1     1    51      0     934   0.05
/lib/perl5/App/Cmd/Command/version.pm    1   1     1    68      0    1255   0.05
wjl/local/lib/perl5/App/Cmd/Plugin.pm    1   1     1    43      0     688   0.05
hare/dist/Dancer2/skel/lib/AppFile.pm    1   1     1    10      0     151   0.05
--------------------------------------------------------------------------------

And much much more.

Run the tests!

This is probably one of the key approaches, run the tests. Understand the tests.

Then write more tests.

No tech solution

Draw it; never underestimate the power of drawing a diagram that shows the connections between classes.

Start with whatever you find easiest (say the homepage if a web app), and work through all the connections. Mapping out perhaps what the data provide to the template is, what does that connect to? What methods are called? What data is passed around.

This might help you identify the code that is "business logic", what is "infrastructure" etc. Where are the linkages? What files have too much code in them? Is there a clear flow to the code?

Visualising it can be super powerful, even if like me you are not particularly artistic.