Journal: 2025-11-28
What is lifelab? Techie version
LifeLab is also two things:
1. Ode to databases
Dinosaur era tech
My longtime friend sixhobbits always harps about just using a database and I finally took their words literally. LifeLab is centered around a postgres database with a REST api. Everything is in a main table of blocks with jsonb metadata. No Document Databases, no CRDTs, no Graph Databases, just PostgreSQL. There are some operations but most boil down to a wrapper around SQL. The purpose was to make it work with a database and with as few tables as possible. For example, even the CSS theming is just a data block in the database. We query all css blocks and then check which one has “css_enabled” in the metadata to load it in CSS. When we run code cells, we can query the same database for active css and use it to theme matplotlib charts!
I was fairly liberal with indices to trade memory consumption for speed, but I am fairly confident the database will outlast a lifetime worth of notes with sub second queries. The closest app is trilium notes which is built around a sqlite database with embeddable javascript. I just did not like the user interface interface so much.
Blocks is all you need
The schema is fairly simple: a table for blocks. Blocks have content, metadata, links and block type (data, markdown, code, task). Another table for pages autogenerated from block links. A single block contains the entire cell (for example, this entire article is one block, a task is also one block). I thought about this heterogenous data; maybe it's better for each block type to have a different table? But then I would have to do multi table joins. My 2 CS degrees were not sufficient to make the call, so I opted with the conceptually simpler option. I am sorry.
DIY RAG
Everyone was obsessed RAG and embedding databases. I caved in and added a pgvector column to add custom embeddings for pages. The idea is to evaluate various embedding approaches, models and strategies on my own data to see how it behaves, because I felt most benchmarks are rather vague. Do I average embeddings over all blocks? Do I read all blocks into a single model? What is the difference? Can we do cross modal embeddings for image pages and text pages?
2. Over-engineered to hell
Rust backend
LifeLab backend was vibe-written in rust despite it just being a simple web server. I could have probably written a (worse) fastapi server by myself but ended up spending $100 worth of claude code tokens to write it in rust. I initially thought for a rest API it doesn't matter, but the compiler actually found a lot of bugs and prevented claude code from getting away with slop. For vibe coded python I had to be fairly careful to review changes, but somehow for rust if it compiles, it just works?!
Python kernel
LifeLab includes a Python kernel for code execution. Data flows between Rust and Python via Apache Arrow's binary serialization. Do I need this to query a bunch of task blocks and show them? Probably not. It also generic JSON-RPC interface to integrate any scripting language to extend the system. As long as you define the notebook bindings, the rust backend handles all database operations. Right now there is support for Rhai just because I wanted to learn it, but I can add datalog and racket just because I can.
Analysis libraries
Python has huggingface and matplotlib installed by default, but you can define another data block with python requirements and install them on demand.
Passkeys, PWA, all the bling
I have never done a fullstack app before so there was a lot of learning about authentication involved.
O(n) OCD
I tried to make it snappy and only run computations on demand with aggressive caching. One thing I am not happy about other note taking apps is the data schema. Notion is block-based, but every character run is a block, which makes for constant database joins. A paragraph can be like 50 blocks which just feels weirdly wrong to me. (also, most block based editors have this ever so slight latency which drives me nuts.) On the other hand, if you want to get tasks in emacs, you need to parse the text using regex. Now this is fast as fuck, but it is also O(N). Querying tasks or json objects in LifeLab and Notion is O(1) due to indexing. No parsing text files, no temporary databases, just single source of truth with some B-trees.
I tried to kill as many linear scans as possible by using SQL wrappers. I am fairly confident I am not able to implement a faster text data structure by myself. Does it matter for a text platform? Probably not. When code snippets create blocks, it is all collected by the rust endpoint and executed as a single transaction instead of blocking the database in a for loop.