An Unsuccessful Evernote to Google Docs Conversion Project

I’ve been using Evernote for around 20 years. It’s OK, but the product seems to be flailing these days. Plus, I was forced to become a paying user. That, in itself, is no sin but it motivated me to investigate alternatives.

Meanwhile, I’ve been reflecting on all the gigabytes of Google Drive storage space available to me. Since I’ve been looking for a programming project to exercise my new skills in the Go programming language, all of a sudden I had the idea of trying to come up with a way to use Go to move all my Evernote notes to Google Docs.

This note describes my efforts to use Go to migrate my 480 Evernote notes into Google Docs. Although I will eventually write some code, there are various things to think about before starting. I’ll include my thought process as the project progresses.

How To Export My Evernote Notes

This is an important question. Even though I have only 480 notes, manually copying them to Google Docs is out of the question. Plus, what kind of programming project would that be? I should say that the vast majority of my notes only contain plain text. I don’t make much use of Evernote’s more advanced storage features, which is another reason why I’m thinking of migrating away.

I know that both Evernote and Google Docs have APIs that I might be able to use for this project. But, after some reflection, I realized that using them might be overkill, at least for the export step. If Evernote includes a way of exporting my notes into a single file in a usable format, I might be able to then use this file as input to the Google Docs import step.

I did some research and found that currently the only way to export from Evernote is by saving the notes in ENEX format, which is XML based. Older versions of Evernote allowed exporting in HTML format, but this is no longer supported. That’s probably just as well because I suspect the XML format will be easier to deal with. Actually doing the export turned out to be trivial, and I now have a file called EverNote.enex, which is 11MB in size.

One interesting thing is that every day or so an Evernote popup appears telling me about new Evernote features to expect. One of the new features is Additional export options. I wonder if such options will help this project.

In any case, I now have EverNote.enex on my Fedora Linux system. (I should mention that I run the Evernote desktop client on Windows. However, for this project, I’m doing all the conversion work on Linux, mainly because my Linux machine is much more powerful than my Windows machine.)

How Will Evernote Notes be Organized in Google Docs?

Google Docs, which is built on top of Google Drive, allows the creation of a directory hierarchy. So, I’m thinking that each Evernote tag will be represented by a sub-directory. One thing I haven’t figured out is what to do with notes that have multiple labels. If Google Drive supported symbolic links then this would be easy, but I don’t know if it does. [Update] I did some searching and it looks like it does, but I’m not going to spend much time on this now.

Working With EverNote.enex

Evernote has a document called How Evernote’s XML Export Format Works at https://evernote.com/blog/how-evernotes-xml-export-format-works/. It says it was written back in 2013, which isn’t a surprise due to the fact that it mentions HTML as an output format which isn’t supported anymore. But, it’s a start. It mentions that the ENEX file is in a format defined by version 3 of the Evernote Export doctype declaration defined in http://xml.evernote.com/pub/evernote-export3.dtd .

I now have an XML file, but I don’t really know what to do with it. I’ve never worked with XML before. I’m thinking that as a first step I could write something that would show me the tags assigned to each note. That would accomplish two things

  1. Parse the XML file.
  2. See if any of my notes have more than one tag assigned to them, which is an issue I mentioned above.

A shortcut to accomplishing these goals is to use Firefox to do the initial parsing. Indeed, if I tell Firefox to open the XML file, it shows me a tree-structured representation of the XML. From that it shouldn’t be hard to look at the <tag> lines to find notes with multiple tags. [Update] I used Vim to remove all non-<tag> lines from EverNote.enex. It turns out that doing this didn’t help because although I now see a whole bunch of tags, I can’t see which note they belong to. This means I can’t see if any single note has more than one tag. I’ll have to figure out a more intelligent scheme for this. So, I used a regular expression that deleted all lines that didn’t have “note>” or “tag>”. This still left a bunch of lines with XML elements I didn’t want to see, so I removed these lines too. The end result showed me that I do indeed have multiple notes with multiple tags. This isn’t a bad thing – it just means I’ll have to be careful to do the right thing with the multiple tags.

Querying EverNote.enex

But this got me to thinking that it would be better if there were some kind of query language I could apply to the parsed XML representation so that I could see arbitrary collections of items. I have to confess that at this stage of the project I have no idea how to do this. But, this seems like any project dealing with XML would have similar needs so I suspect that such a program already exists.

Flash forward a couple of days, and I’ve learned about XPath, which Wikipedia says “is a query language for selecting nodes from an XML document”. This sounds like exactly what I’m looking for. I’ve also learned about XQuery, which is an advanced XPath. There are web sites for both in which you can copy and paste some XML, and then execute queries. This might be fine for some simple queries, but after I’m done playing around, I’m going to need a local program to do this because the web sites only permit a limited amount of XML. I did some research and I found xmlstarlet, which is a simple command-line XQuery processor.

Now that I have some experience manipulating EverNote.enex it’s time to look at it in more detail. I’m especially interested in deciding which elements I can ignore, and which I’ll have to put in Google Docs.

Here’s a schema for the elements in EverNote.enex that I care about:

<en-export
<note>
<title> </title>
<tag> </tag> [<tag> </tag>] …
<content> </content>
<data> </data>
</note>
</en-export>

I remembered that I had heard of the Go xml.Unmarshal() function. This parses legal XML and puts the results into a struct. I started looking around for examples of how to use xml.Unmarshal(). Although I found several good examples, none of them really described how to construct the struct to contain the results. So I blindly tried modifying the example at https://tutorialedge.net/golang/parsing-xml-with-golang/. Amazingly, after fixing a few dumb mistakes, the result worked! It found 480 notes, which is correct.

Here’s the code:

package main

import (
        “encoding/xml”
        “fmt”
        “io/ioutil”
        “os”
)

type Notes struct {
        XMLName xml.Name `xml:”en-export”`
        Notes   []Note   `xml:”note”`
}

type Note struct {
        XMLName xml.Name `xml:”note”`
        Title   string   `xml:”title”`
        Tag     string   `xml:”tag”`
        Content string   `xml:”content”`
        Data    string   `xml:”data”`
}

func main() {
        var notes Notes

        xmlFile, err := os.Open(“EverNote.enex”)
        if err != nil {
                fmt.Println(err)
        }

        byteValue, err := ioutil.ReadAll(xmlFile)
        if err != nil {
                fmt.Println(err)
        }

        xml.Unmarshal(byteValue, &notes)

        fmt.Printf(“Found %d notes\n”, len(notes.Notes))
        for i := 0; i < len(notes.Notes); i++ {
                fmt.Printf(“\n\nNote %d\n”, i)
                fmt.Printf(“Title: %.40s\n”, notes.Notes[i].Title)
                fmt.Printf(“Tag: %.40s\n”, notes.Notes[i].Tag)
                fmt.Printf(“Content: %.40s\n”, notes.Notes[i].Content)
                fmt.Printf(“Data: %.40s\n”, notes.Notes[i].Data)
        }

}

In order to make the output semi-readable I only print the first 40 characters from each element.

Naming Google Doc Files

As I said above, I had been thinking of creating one Google Doc file for each Evernote note. One of the first problems with such an approach is deciding what to name each file. I had naively thought that I could use each Evernote note’s title as the file name. Now that I can see the XML representation of each note clearly, I no longer think this is such a good idea. This is because the default title for a note is the first line of the note. In some cases this would work fine, assuming I don’t mind spaces in filenames. For example, I might have files named Setting Up Fedora or tmux notes. But what about # diff mod_status.c mod_status.c.orig or A 32Kb (2^13) RAM chip organized as 8K X 4 is composed of 8192 units, each with? Short of manually modifying all 480 notes to have a useful title, I’m not sure what to do right now.

Formatting Commands in Notes

I noticed another problem. Even a simple looking note can contain a lot of formatting commands. The best way to explain what I’m talking about is a short example.

Somehow the following note:

# diff mod_status.c mod_status.c.orig
 413d412
 <         ap_rputs(“<script src=\”sorttable.js\”></script>\n”, r);
 565c564
 <             ap_rputs(“\n\n<table class=\”sortable\” border=\”0\”><tr>”
 —
 >             ap_rputs(“\n\n<table border=\”0\”><tr>”

appears in the exported XML file as (partially edited)

<content>

<![CDATA[<?xml version=”1.0″ encoding=”UTF-8″ standalone=”no”?>

<!DOCTYPE en-note SYSTEM “http://xml.evernote.com/pub/enml2.dtd”&gt;

<en-note style=”word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;”>

]# diff mod_status.c mod_status.c.orig<br/>

413d412<br/>

&lt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ap_rputs(&quot;&lt;script src=\&quot;sorttable.js\&quot;&gt;&lt;/script&gt;\n&quot;, r);<br/>

565c564<br/>

&lt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ap_rputs(&quot;\n\n&lt;table class=\&quot;sortable\&quot; border=\&quot;0\&quot;&gt;&lt;tr&gt;&quot;<br/>

—<br/>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ap_rputs(&quot;\n\n&lt;table border=\&quot;0\&quot;&gt;&lt;tr&gt;&quot;<br/><div><br/></div><div><br/></div><div>tar rvf httpd-2.2.3.tar.gz&nbsp; $a/mod_status.c</div></en-note>      ]]>

</content>

This is very discouraging because I wouldn’t want all those formatting commands to appear in a Google Doc file. But, I’m not sure I could reliably strip them out. In retrospect this isn’t surprising because the only thing that Evernote says can be done with an exported file is to import it back into Evernote. Maybe using an XML export is not the way to handle this problem. I think that there’s also an API into Evernote, which I’m going to look into next.

[Update 1] I just found https://dev.evernote.com/doc/articles/enml.php which describes the format actually used to store notes. It’s just as I guessed above. This leaves me in a bad state. Even if I were to use the Evernote API I’d still end up with the same extraneous formatting commands. I don’t know how to overcome this so I’m going to stop writing. If I figure out how to solve it I’ll continue this note.

[Update 2] I tried several of the opensource equivalents of Evernote but none of them appealed to me. So, I bit the bullet and manually copied all my Evernote notes into Microsoft OneNote. It’s not perfect but it’s good enough. Plus, I save enough from not having to pay for Evernote to mostly pay for an Office365 subscription.

Posted on January 21, 2021, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: