Building a tool to integrate Readwise.io highlights into my Zettelkasten via Perl

Tags:

Recently I started using the Readwise.io as a replacement both for my RSS reader and my bookmarking app. I also use a Zettelkasten system of thinks I want to record; so decided to integrate via the API so that highlights I make in Readwise end up as markdown files on my local machine (and git repo).

The editor I use mostly for Zelltelkasten is the lovely Zettlr or vim; so I wanted to craft simple markdown files. Being me; I reached for Perl and have written a small CPAN module along the way called WebService::Readwise which includes a example script of the basics of what I cover in this tale.

The API

Readwise provide a nice simple API, (the details are here: hreadwise.io/api_deets) which with a simple token authentication header allows you to access your entire set of highlights.

Having done some simple curl based testing; I wanted to build a script. So of course I needed to yak shave and create a module for the Readwise interactions.

The WebService::Readwise CPAN module

The module (at time of writing version 0.002 after one release) is a simple Moo based module that provides a object oriented approach to using the Readwise API.

Currently It only provides the specific features I wanted; namely getting the highlights from Readwise via either /export or /highlights.

Neither of this require much to start with so all I did was build HTTP::Tiny via default of an attribute so I can call $self->http(...) within the class:


has http => (
    is      => 'ro',
    default => sub {
        return HTTP::Tiny->new;
    },
);

As I need a token for authentication; I created an attribute for that also; this time defaulting to a environment variable if one is present:


has token => (
    is       => 'ro',
    required => 0,
    default  => sub { return $ENV{WEBSERVICE_READWISE_TOKEN} },
);

I use direnv a lot to manage environment variables; so I have written a few modules that use the environment variables like this.

To make life easier; I also default an attribute called base_url to the url of the API.

Then all I need is a simple method to GET the appropriate url:


sub export {
    my ( $self, %params ) = @_;

    my $path = 'export/';
    if ( %params && $params{'pageCursor'} ) {
        $path .= '?pageCursor=' . $params{pageCursor};
    }

    my $response = $self->http->request(
        'GET',
        $self->base_url . $path,
        { headers => { Authorization => "Token $self->{token}", }, }
    );

    if ( !$response->{success} ) {
        return 'Response error';
    }

    my $json = decode_json $response->{content};

    return $json;
}

The above is slightly more complicated than you might have expected as there is pagination happening via the API. I could/should create a export_all method that gets all the pages of data; but currently I do that via the script.

export_to_zettlkasten.pl

Having built myself a little module to take care of the API, it was time to get all my highlights and write each one out as a markdown file.

The script ended up looking like this:


use strict;
use warnings;

use lib './lib';
use WebService::Readwise;
use DateTime;
use DateTime::Format::ISO8601;
use utf8::all;

#use Data::Dumper;

$|++;

# This is an example script that exprts all your highlights from
# Readwise.io and writes Markdown files for your Zettelkasten notes
# system.
# This was the module author's original use case.
#
# Assumes that the WEBSERVICE_READWISE_TOKEN environment variable is set
# has been set

my $rw     = WebService::Readwise->new;
my $result = $rw->export;

my @entries;
push @entries, @{ $result->{results} };
while ( $result->{nextPageCursor} ) {
    $result = $rw->export( pageCursor => $result->{nextPageCursor} );
    push @entries, @{ $result->{results} };
}

for my $entry (@entries) {
    #warn Dumper [ sort keys %$entry ];
    for my $h ( @{ $entry->{highlights} } ) {
#        warn Dumper [ sort keys %$h ];

        my @tags = sort map { $_->{name} } @{ $h->{tags} };
        push @tags, 'readwise';

        my $dt = DateTime::Format::ISO8601->parse_datetime(
            $h->{highlighted_at}
            || $h->{created_at}
        );
        my $zettel_id
            = $dt->ymd('')
            . sprintf( '%02d', $dt->hour )
            . sprintf( '%02d', $dt->minute )
            . sprintf( '%03d', $dt->fractional_second * 10 );

            $h->{text} =~ s/\n/\n> /g;

        my $text = sprintf(
            <<'END',
# %s
(Author: \%s)

ZettelID: \%s

\%s

> \%s

\%s 

SOURCE: [\%s](\%s)

\%s

Date Highlighted: \%s
END
            $entry->{readable_title},
            $entry->{author} || '',
            $zettel_id,
            @tags
            ? 'Tags: #' . join( " #", @tags )
            : '',
            $h->{text},
            $h->{note} ? 'NB: ' . $h->{note} : '',
            $h->{readwise_url},
            $h->{readwise_url},
            $entry->{source_url} ? '[' . $entry->{source_url} .'](' . $entry->{source_url} . ')': '',
            $h->{highlighted_at},
        );

        open my $out, '>', './z/' .$zettel_id . '.md';
        print $out $text;
        close $out;
    }
}

This is pretty basic; though it could easily be simpler if I moved the pagination into the module and created the export_all method.

All it really does is loop around all the sources and then all the highlights (i.e. you can highlight more than one passage in a book or webpage) and writes out a file with the filename that fits with how I identify all the documents in my Zettelkasten.

Releasing to CPAN

I have written a few modules and released them; so I had my PAUSE credentials and so forth.

My preferred tool for managing CPAN modules at the moment is dzil as it does most of the work for me.

So when I started the module I actually used dzil to do that.

My dist.ini currently looks like this:


name    = WebService-Readwise
author  = Lance Wicks <lw@judocoach.com>
license = Perl_5
copyright_holder = Lance Wicks
copyright_year   = 2023

version = 0.002

[GitHub::Meta]
[PodWeaver]
[@Starter]

[Test::CPAN::Changes]

[Prereqs]
HTTP::Tiny       = 0.088 ;
JSON::MaybeXS    = 1.004005 ;
Moo              = 2.005005 ;
namespace::clean = 0.27 ;
perl             = v5.010 ;

[Prereqs / TestRequires]
Test2::V0 = 0.000159 ;
Test::Pod::Coverage = 1.10 ;

Though originally I really only had the [@Starter] plugin in there. I am not a Dist::Zilla epert by a long shot so it's not perhaps one to copy.

I added [Gethub::Meta] which means the metacpan show nice things like the git repo, issues, etc.

[Podweaver] takes care of some of the basic documentation for me; so the module itself only really has the relevant things and boilerplate stuff it takes care of for me.

Using dzil release --trial I was able to upload a couple of test versions of the module before release the initial "proper" one with dzil release.

What's next?

WebService::Readwise is really basic at the moment. It can be extended and refactored; and honestly I would invite any novice CPAN interested developer to reach out to me as I'd love to help you do that.

I might even ticket up the remaining methods and any ideas I have for a refactor. That export_all method would save space in the script that creates the markdown files for a start!

So please do say hello and if you've never contributed to a CPAN module; and fancy giving it a go.. I would really welcome helping you give it a try with this module.

It's pretty simple and the Readwise service with their new Reader app is seemingly pretty popular (I like it) as there are existing plugins for projects like Obsidian and logseq. It's nice to add a Perl tool to that mix.