Fixing GitHub language detection of Perl

Tags:

Recently I discovered the way to tell GitHub to correctly identify Perl code as Perl when it thinks it is Raku/Perl6.

GitHub has a tool called "Linguist" that detects what languages are in your repository; unfortunately sometimes it gets things wrong. I have some Raku repositories and some Perl... but sometimes .pm and .t files get classified as Raku although being Perl.

So I did a bunch of googling and worked out that you can be explicit about what you want to show using a .gitattributes file. To this file I add:

*.pm linguist-language=Perl

*.t linguist-language=Perl

This was enough to help persuade GitHub to classify my code correctly. This change put my pure Perl code base to 100% Perl, which I think is a good thing for the language and people seeing the amount of Perl code online on GitHub.

Using the same process I was able to fix a Elm application, that I host via GitHub that reported as having JavaScript and HTML, where is fact it's all Elm. It's a trick you could use to stop vendor-ed code appearing in your language statistics. In my case I just added the following to the .gitattributes file:

*.html linguist-detectable=false

Which is because elm make creates an HTML file with JavaScript within it. So this prevents the repo looking like a mix of Elm, HTML and JavaScript when in actuality it is solely Elm.

Not a huge thing, but thought I'd post it so others might find it and be able help ensure Perl shows up correctly in the percentages.