Journal: 2025-11-28

blog/lifelab/what is lifelab techie edition

What is lifelab? Techie version LifeLab is also two things:
1. Ode to databases

** Dinosaur era tech My longtime friend [[https://x.com/sixhobbits][sixhobbits]] always harps about just using a database and I finally took their words literally. LifeLab is centered around a postgres database with a REST api. Everything is in a main table of blocks with jsonb metadata. No Document Databases, no CRDTs, no Graph Databases, just PostgreSQL. There are some operations but most boil down to a wrapper around SQL. The purpose was to make it work with a database and with as few tables as possible. For example, even the CSS theming is just a data block in the database. We query all css blocks and then check which one has “css_enabled” in the metadata to load it in CSS. When we run code cells, we can query the same database for active css and use it to theme matplotlib charts!

I was fairly liberal with indices to trade memory consumption for speed, but I am fairly confident the database will outlast a lifetime worth of notes with sub second queries. The closest app is trilium notes which is built around a sqlite database with embeddable javascript. I just did not like the user interface interface so much.

** Blocks is all you need The schema is fairly simple: a table for blocks. Blocks have content, metadata, links and block type (data, markdown, code, task). Another table for pages autogenerated from block links. A single block contains the entire cell (for example, this entire article is one block, a task is also one block). I thought about this heterogenous data; maybe it's better for each block type to have a different table? But then I would have to do multi table joins. My 2 CS degrees were not sufficient to make the call, so I opted with the conceptually simpler option. I am sorry.

** DIY RAG Everyone /was/ obsessed RAG and embedding databases. I caved in and added a pgvector column to add custom embeddings for pages. The idea is to evaluate various embedding approaches, models and strategies on my own data to see how it behaves, because I felt most benchmarks are rather vague. Do I average embeddings over all blocks? Do I read all blocks into a single model? What is the difference? Can we do cross modal embeddings for image pages and text pages?

1. Over-engineered to hell ** Rust backend LifeLab backend was vibe-written in rust despite it just being a simple web server. I could have probably written a (worse) fastapi server by myself but ended up spending $100 worth of claude code tokens to write it in rust. I initially thought for a rest API it doesn't matter, but the compiler actually found a lot of bugs and prevented claude code from getting away with slop. For vibe coded python I had to be fairly careful to review changes, but somehow for rust if it compiles, it just works?!

** Python kernel LifeLab includes a Python kernel for code execution. Data flows between Rust and Python via Apache Arrow's binary serialization. Do I need this to query a bunch of task blocks and show them? Probably not. It also generic JSON-RPC interface to integrate any scripting language to extend the system. As long as you define the notebook bindings, the rust backend handles all database operations. Right now there is support for Rhai just because I wanted to learn it, but I can add datalog and racket just because I can.

** Analysis libraries Python has huggingface and matplotlib installed by default, but you can define another data block with python requirements and install them on demand.

** Passkeys, PWA, all the bling I have never done a fullstack app before so there was a lot of learning about authentication involved.

** O(n) OCD I tried to make it snappy and only run computations on demand with aggressive caching. One thing I am not happy about other note taking apps is the data schema. Notion is block-based, but every character run is a block, which makes for constant database joins. A paragraph can be like 50 blocks which just feels weirdly wrong to me. (also, most block based editors have this ever so slight latency which drives me nuts.) On the other hand, if you want to get /tasks/ in emacs, you need to parse the text using regex. Now this is fast as fuck, but it is also O(N). Querying tasks or json objects in LifeLab and Notion is O(1) due to indexing. No parsing text files, no temporary databases, just single source of truth with some B-trees.

I tried to kill as many linear scans as possible by using SQL wrappers. I am fairly confident I am not able to implement a faster text data structure by myself. Does it matter for a text platform? Probably not. When code snippets create blocks, it is all collected by the rust endpoint and executed as a single transaction instead of blocking the database in a for loop.

(3 fields)

:migrated_at :2026-05-14T06:45:49.836Z

:migration_source:settings_button

:syntax_version :org-lite-v1

:END:

blog/lifelab/what is lifelab techie edition

📝 blog/lifelab/what is lifelab data scientist edition

📝 blog/lifelab/what is lifelab techie edition