ML engineer discovers pagination
One of my motivation behind using a database for lifelab is my slight obsession with performance. I want my notes to be snappy even after 60 years and not require a faster computer. If society collapses and I am stuck with this m3 macbook air until the rest of my life, lifelab shouldn't lag over time. I was fairly traumatized by tiddlywiki, logseq and roam because their performance scaling over time is atrocious. It always felt slower after a year, but I never had quantified proof. Turns out some fantastic dude did it for me even before LLM era. I wish I had that kind of dedication.
I found a extensive benchmark by Alexander Rink, which compares various apps (all of which I used!) with a common dataset.
As a proper ML engineer, I intended to game the hell out of it and gloat.
I adapted the 5000-page dataset listed here. In lifelab, each tag (link) is a separate page, so on top of the original 5k pages we generate 10k implicit pages from the backlinks. Additionally, each top level bullet becomes a block, creating about 36k blocks across 15k pages. It is not apples to apples but I felt I did sufficient effort for comparison. The end content should be the same: individual paragraphs on pages, with backlinks to other pages.
Alexander measures a lot of things that matter when you use a PKM daily: app startup time, opening pages with heavy backlinking, expanding the backlinks/references panel, full-text search, and import/export. He tests five tools — Roam, Obsidian, Logseq, RemNote, and Craft — at 2k, 5k, and 10k pages.
For my own comparison (to win), I would mention that not all things are that relevant for daily use Export and import only demonstrates online-only Roam Research completely fails, but for daily use it is irrelevant.
Let's start with a number of curated wins to show off.
| Operation | Obsidian | Roam | LifeLab (E2E) | LifeLab (API) |
|---|---|---|---|---|
| Import (5K files) | 18s | minutes¹ | 3.3s | — |
| Cold start | instant | 10-14s | 1.1s | 3ms |
| Open page | fast | slow | 143ms | 3.3ms |
| Backlinks (10K+) | 5s | 2m 44s | ~600ms | 0.4-0.9s |
| Full-text search | fast | — | 2-3.5s | 2-3.5s |
| Create page | — | — | 95ms | 3.5ms |
Import
Import is something that seems incredibly stressful for app users, but becomes trivial if you own your full stack. For common apps, you need to rely on the app being able to ingest data in a particular format (even markdown has variations which are not easily compatible) and then hope the authors anticipated the volumes of slop you intend to push through their system. Since Roam Research is online only, the only way to send data is by doing uploading it via their API which might or might not support bulk import, you are at their mercy and can only complain on forums.
Lifelab has (among other things) a postgres database backend. A bit of python code and pg_restore loads 3x the number of pages in 3 seconds. In the table above, obsidian has it easiest since the markdown files are already there and it can do indexing asynchronously. You can use obsidian while that happens in the background. Roam research would be completely blocked until it finishes.
Cold start
For cold start, obsidian can immediately open a file and render it, there should be no dependency. According to the benchmark Roam Research took 14s with 5k pages ingested. I feel that is unacceptable and hope they improved that since it would not scale over time. Lifelab is a website, so there is some overhead from react code loading and some overhead from a bunch of database queries to populate all fields. A fairer comparison would be obsidian with a bunch of dataview plugins but I will just take the L here.
Open page
In the benchmark, Alexander would switch pages from open obsidian and measured it. There isn't any quantitative measurement from what I saw, and I will say my 600ms is in "fast" category and call it a tie
Backlinks
For backlinks I can flex easily. To resolve backlinks and get all linked blocks, it is a single database join between two tables, with a god damn GIN index. When I designed Lifelab I thought about backlinks a lot. I was really confident I would win here. And I did. Look at it!
Full text search
Search I confidently lost, because I did not optimize the postgres full text search at all and it does table scan on every search query. Obsidian got this nailed down really well and it is quite snappy.
I feel pretty good about the numbers above and could've just declared victory, but in spirit of academic honesty, I will not stop there. You can notice above, I had two columns for E2E test and API test. This is because I quickly realized the backend can handle anything, but it never really was the problem.
In the rest of the article, I will share my complete fuckups and how I actually got to that table. Because when I first ran the benchmark, the UI would crash! There was about 2 days of work fixing lifelab to even get to the benchmarking stage.
The root cause ended up being me:
- Being lazy and implementing an API to query all available pages from the database
- Me being strict and forcing all relevant components to use this one inefficient API.
I had a lot of niceties like tag (== page name) suggestions, frequently visited pages on sidebar, autocomplete for pages in editor, monthly calendar view in journal and more. Each of these components would independently query the backend for a list of all pages before filtering it to show whats relevant. The backend happily sends out 10k lists because it can cache them in memory, but the react frontend got the lists 9 different times, completely nuking it.
The solution was simple (and I feel stupid for having to type this out). Each component should query what it needs instead of getting all pages and filtering them, taking up less frontend memory.
I ended up making a more configurable endpoint instead of listSummary that allows each component to define a filter on the pages. The filter is applied on both database and the rust in memory cache keeping a limited number of entries. Each react component also has a dedicated cache and memory.
For autocomplete I had to do more thinking. To have instant autocomplete and suggestions for tags, I do need to have all the tags in memory when autocomplete prompt triggers. This clearly did not scale at all, making my app join the roam research corner of shame. Even Obsidian had to do tricks to actually handle so many links!
Instead of lame tricks, I did something better. I implemented the software to work a certain way and hope people don't hold their Lifelab wrong. Lifelab introduces a hierarchy to tags. For example this blog post is in :blog/lifelab/ML engineer discovers pagination:journal/2026-03-14 I adjusted the autocomplete and tag suggestion code to suggest the root nodes and only query their children if I select them. YAY PREFIXES!
The cool things it it made everything run instantly, the bad thing is this is not PKM acceptable, because the moment you use completely flat tags with no hierarchy, it will regres to the out of memory crash for you after some years. I agonized over doing same tricks as obsidian like regressing to a simpler algorithm once stuff gets too big, but I would instead argue this should be a part of how the system is organized. Take it or leave it. Flat tags are not sustainable over a lifetime if you want fancy autocomplete.
It is ironic I hated folders in filesystem because they felt limiting, but reintroduced them via hierarchical tags which ended up saving me. Jokes on me. Use folders.
Alternative solution
I wonder if it would make sense to group data by temporal proximity instead of hierarchy. Like in a given year a set of tags would be visible and cached, and everything in the more distant past would be considered archived. They would still be available, but only the last 1 year of tags would get actively retrieved for autocomplete. This would allow us to support flat tags, since only a subset of them would be retrieved.
In this exploration, I did a full curriculum of computer science undergrad.
- Find a benchmark
- Try to game it for a quick win in an afternoon.
- Get obliterated by implementation reality.
- Rediscover CS concepts that I should have totally already known, like:
- demand paging
- the beauty of prefix trees and query efficiency gained by having a hierarchical structure on your data
I will begrudgingly admit folders (namespaces) are very useful and brought the log to the n that I so desired. I did quite a bit about benchmarking and had fun with the testing harness. Doing this also made me value the power of owning your own software. With the help of your friend claude, you can tailor your software to serve you perfectly and never have to rely on some dumb ML vibe coder to do it properly for you. Also online-only note taking apps should have a really robust import system or they will get shamed.
P.S. I fixed the full text search to take average 38ms on the same dataset, down from 2.4s I absolutely hate losing.