I spend an insane amount of time on the computer and have
for many years. I find software deeply facinating. I feel
that there are secrets still hidden within the nature of
computing.
However, computing is just a piece of the puzzle. The world
is an integrated system - every part is of at least some
interest to me.
Most important is my wife Anna, Mr. Moose, Calvin, and our
family and friends.
AI Zeitgeist Explained
Posted May 7th, 2025
There's a mood toward AI, an intuition, a gut feel so many
people have but cannot quite explain. Here are three
statements that crystalize it:
Autonomy is more valuable than intelligence.
There is no substitute for the type of intelligence that
emerges from biological brains.
As long as a human mind is needed, it is beneficial for
the human mind to be intelligent. Once it is not
beneficial for the human mind to be intelligent, the
human mind is not needed.
JSON Columns are Technical Debt
Posted May 5th, 2025
On many occasions i've stuffed data into a json column
instead of a proper normalized schema.
Most of the time, i'm just trying to implement something
fast, for one reason or another. Rarely is there a true
technical justification.
However, there are good use cases for json columns:
truly unstructured data
when the number of fields exceeds or approaches the
databases column limit
There could be more, but the above are the reasons i've
encountered.
Single File Site
Posted May 4, 2025
My blog was previously a multi-page blog statically
generated using my
Minimum Static Site Generator (minssg). 5-20s from commit to published. Yet, that's not simple
enough for me. Still needed the Bun runtime, a Github
Action, template syntax, etc.
Introducing, single file site:
1 file, 1 page
no class names
no separate css files
no separate js files
no build
and a couple other things about this new site:
no LLM. hand typing and googling things like the old
days still has huge value. might write a post about this
topic.
html semantics, plain css, js only when necessary
I suspect I could have 100s of blog posts on this page
before noticing any latency. And with modern HTTP streaming,
I might not even notice the latency because the first few
posts load instantly and the rest stream in below the
viewport.
This desire for ultra minimalism is catalyzed by my day job
experience where it takes 10m to deploy the simplest
frontend update. THOUSANDS of indirect dependencies, npm
vulnerability warnings, constant breaking changes, bugs
unfixed for years, abstraction lock-in, etc.
Recently came across a software company with over a dozen
patent numbers listed at the bottom of their website. In
looking at a few I noticed a pattern.
The patent filing includes a beefy PDF specification thats
around 40 pages long. The core idea is usually captured by
the abstract and perhaps a single diagram. The rest of the
PDF is filled in with many different use case examples,
variations of the same diagram, and pages of text explaining
each diagram.
Most importantly I noticed that the core idea is very
high-level. The specification is NOT nearly enough
instruction to develop the actual thing. It takes
considerable time and technical effort to take one of these
ideas and create an actual working system.
So i'm left wondering:
Do these patents exist to protect ideas that would
otherwise be too easy to "steal" or re-invent
independently?
Is the idea worth more than the implementation?
Perhaps, but implementation is hard, and there's
almost always multiple ways of doing it, each way
with pros and cons over others. There's so much room
for competition just within implementation of the
same idea.
If I were to build a system similar to the one
described in one of these patents - where is the
line of infringement? What can be similar and what
must be different?
It is a challenging series of puzzles with human scores
around 80% while the state-of-the-art algorithm has achieved
only 43% as of August 2024.
ARC-AGI is explicitly designed to compare artificial
intelligence with human intelligence. To do this, ARC-AGI
explicitly lists the priors knowledge human have to provide
a fair ground for comparing AI systems. These core knowledge
priors are ones that humans naturally possess, even in
childhood. ... ARC-AGI avoids a reliance on any information
that isn’t part of these priors, for example acquired or
cultural knowledge, like language.
Discovering ARC-AGI reminded me of a thought experiment I
was obsessed with a few years ago. An idea of "Universl
Econimic Value".
Adjacent to universal basic income (UBI), UEV is where there
is some way to capitilize on the basic value of human
intelligence to give everyone some baseline economic worth.
Physical labor is UEV historically, but we observe how wages
stagnate while costs explode, and we also observe how labor,
of all types, becomes automated overtime.
ARC-AGI rekindled the idea by demonstrating there is still
some fundamental disparity between human and machine
intelligence.
What if any human could earn a wage by completing small
cognitive tasks within a larger information system.
Similarly to how car manufacturing pipelines work - where
there are big complicated machines doing large portions -
alongside humans doing smaller more intricate segments.
Like wikipedia but queryable, highly structured,
programatically interactive.
Like Wolfram Language but open source, open data, and a
simpler interface.
The schema of this database and query interface must be
intuitive, and grow at a significantly slower rate than the
data. If the schema and interface grow in complexity equal
to the data - then it adds litte to no value. I'm not sure
if it's possible to find a schema and interface that are
static. I have to imagine at least in the beginning, as data
is added, new requirements of the schema and interface will
be discovered frequently.
A semi-automated or fully-automated ingestion pipeline will
be key to success. In comparison, Wolfram Language started
development over 30 years ago. There is a massive amount of
knowledge and it exists in an unstructured form or has a
different structure. My hope is that scripts and language
model technology can aid in the import of knowledge into the
system from sources like Wikipedia, academic papers, open
source projects, text books, and more.
Large language models are in some sense type of knowledge
engine - just with a different architecture than imagined
here. LLMs are neural, stochastic, evolved via gradient
descent, tokens in and tokens out. This is a very powerful
type of knowledge engine. However, the type of knowledge
engine I'm talking about is more classical, built on a
relational/graph database, with a query interface. The
advantages of the classical type of knowledge engine is
efficiency (speed, scale, energy), structure/determinism,
and interpretability. The beauty is that both types of
knowledge engines compliment one another.
One of the most important topics in the history of human
technology.
Cellular agriculture focuses on the production of
agricultural products from cell cultures using a combination
of biotechnology, tissue engineering, molecular biology, and
synthetic biology to create and design new methods of
producing proteins, fats, and tissues that would otherwise
come from traditional agriculture.[1] Most of the industry
is focused on animal products such as meat, milk, and eggs,
produced in cell culture rather than raising and
slaughtering farmed livestock which is associated with
substantial global problems of detrimental environmental
impacts (e.g. of meat production), animal welfare, food
security and human health.
— Wikipedia
Further scaling will require commercial production in
significantly larger facilities than what currently exists.
Scaling will also require solving an array of complex
challenges that will influence the cost of production. These
challenges span five key areas: cell lines, cell culture
media, bioprocess design, scaffolding, and end product
design and characterization.
Most programming languages drop you off the IO cliff, leave
you in the middle of turing complete nowhere, invoice you
after "it works on my machine".
We end up installing huge graphs of dependencies, some well
maintained, some not, some secure, some vulnerable, some
synergize with eachother, many don't.
We end up paying for and fetching against third-party APIs
to do things our language cannot do, or there is no open
source package for, or our hardware cannot do.
We end up setting up docker stacks, kubernete clusters, "App
Platform" stacks, insert orchestration abstraction here,
composed of proprietary and open source images that are
wired together to have some emergent behavior.
We use postgres or msql because we need persistence, it's
too hard to manage a database, so we use managed database
service in the cloud, but those managed services have bad DX
and are limited so we use PaaS that are wrappers over the
same managed service.
We don't test our systems - or orchestrate dependencies,
mocks, pipelines, headless browsers wrapped in SaaS
products, to give developers who don't test a reason to
stick to their ways.
Some language ecosystems are better at X, don't have problem
Y, can do Z, can't do V, language U is better than language
P for B but not A, good for E but slow for anything else.
Language 123 is perfect and solves all problems but won't be
1.0 for 10 years. Language AAA solves 90% of all problems,
but it doesn't have any open source packages I can leech off
of to build an app - and I'll still be fired for using it in
production even if it can serve 10 billion hello world
requests per second.
We write our code on OS 1 but it doesn't work on OS 2, and
we search the internet for wrappers that will port it to
mobile, or we'd like to run it on an Arduino that has 2kb of
flash memory but a hello world has 4MB runtime overhead.
The reality
Making computers is insaley insanely
insanely complicated. Many many
many humans are needed to build one like we expect
today. Software is also quite complicated. Many
many humans are needed to develop the stuff we use
today. It is a shock that any of it works at all.
So many humans involved makes things even more complicated.
There is intrinsic complexity to computation and engineering
- but emergent complexity in so many humans developing the
technology in parallel.
Standards and protocols are something we humans developed to
wrangle the mess. k is a protocol - a way to talk to
computers and for computers to talk to each other.
The protocol
Humans talk to computers through characters over keys,
clicks over buttons, taps over screens, voice over
microphone.
Computers talk to each other through bits over wire or
wave.
Computers talk to humans through pixels, audio.
Every computer is different. Different hardware,
different software, different internal state, different
location, different connectivity.
Every human is different. Different body, different
cognition and behavior, different knowledge, different
location, different capability.
A computer should be aware of the knowledge,
capabilities, and constraints of computers it interacts
with.
A computer should be aware of the knowledge,
capabilities, and constraints of humans it interacts
with.
A human should be aware of the knowledge, capabilities,
and constraints of computers they interact with.
i've long felt there could be a more idiomatic database for
javascript.
i.e. one that feels more like javascript and less an SQL
interface
the bun (https://bun.sh) runtime has a builtime synchronous sqlite client that
is very ergonomic and very performant.
The bun client is based on
https://www.npmjs.com/package/better-sqlite3 so the hope
is that this library can be run on nodejs too by
configuring an adapter.
the synchronous nature of the sqlite client makes the
map and array operations feel like their vanilla js
counterpart. no promises/async/await/trycatch.
as an alternative to node:cluster, one can you use
bun.spawn and bun.serve({ reusePort: true }) to create
multiple bun processes that can handle requests
concurrently.
sqlite WAL mode allows multiple concurrent readers and
the writer does not block the readers.
sqlite can be easily backed up and restored because it's
a single file (2 files if including the WAL file which
is persisted on macos)
Questions:
what are the performance/scaling characteristics? So far
it seems very good for most low-to-medium scale
applications. To help with scaling, it is easy to create
table partitions and even database partitions. More on
this to come.
how to handle the intercept/hooks feature across
multiple spawned bun processes? Because the same code is
running on each process, the same intercept/hook code
will execute. It's that simple.
how to achieve zero downtime deployments? I do not think
it is possible without the two container versions
sharing the same file system. However, I think a couple
minutes of downtime is acceptable especially during
low-traffic hours. Also, if application code is built
ontop a "serverless" runtime, instead of baked into the
container image, most deployments won't involve an
instance restart/swap. more on this to come.
Just completed the first pass at writing a basic TCP relay
server "Written in Rust"™️®. Getting Rust programs to
compile has a similar feeling of doing a newspaper sodoku.
Its been a good learning experience.
The purpose of this tcp relay server is to eventually act as
a relay for replicas in a distributed pubsub cluster.
Replicas that receive a message from client A and need to
broadcast to client B that is connected to another replica
will broadcast the message through the relay.
Without the relay server, the replicas would need to connect
to each other directly, which is a possible
alternative btw.
Update on August 8, 2024: I'm thinking about
jsolite instead.
Can S3 be used in place of a database? Should S3 be used in
place of a database?
I saw a comment on stackoverflow "s3 wasn't designed for
that". Perhaps they're right, but this is just an excercise
at this point. If it were to work out and actually be good
for some situations - then fantastic! If it sucks - then I
know more about databases and distributed systems.
Notes:
Each replica of the s3db server will post a file to s3
with it's IP address. Each replica periodically lists
the IPs, and inits a tcp connection to any replicas that
have a lesser IP than its own. This way all replicas
will automatically discover and connect to each other.
S3 is strongly consistent but there is no native locking
feature, so locking has to be implemented in the s3db
server logic. This is probably going to be the greatest
source of complexity and latency within the entire
system.
I have some thoughts on how locking might work, but i'll
write more about that later.
Today I'm starting a thought diet. Here are the guidelines:
I can only think about computer stuff while I'm at the
computer (unless I specifically get up from the computer
to pace around the room over a problem)
Avoid simulating the past, future or possible realities
Focus on smells, sounds, tastes, feelings, and entities
around me
I'll keep this page updated (I think) as I go.
a few days later, i'm doing ok with it. I have violated the
first guideline several times. I give myself slack, it's a
longterm lifestyle goal, not a fad.