vineri, 12 decembrie 2025

 <optimization>

Assume you have several Docker containers running PostgreSQL — for example, postgres1, postgres2, and so on — and you want to dump their databases and copy them to another server. Doing this manually for each container quickly becomes tedious. Here’s a streamlined approach I use.

Instead of dumping each container manually, combine docker exec with xargs to iterate over them:

docker ps --filter "name=postgres" --format "{{.Names}}" | xargs -I {} docker exec {} pg_dumpall -U postgres > {}.sql

You can then use xargs with scp to transfer all dumps efficiently:

ls postgres*.sql | xargs -I {} scp {} user@remote-server:/path/to/dumps/

 

How I tracked down a thread block that looked like a database problem

 <method>

Recently I had a web application that behaved as if it were quietly freezing from the inside out. Threads were piling up, each one waiting forever, never terminating, like cars stuck behind an invisible traffic light that never turns green.

At first glance, the symptom looked straightforward: the threads were blocked at the database level. They were trying to update a table, couldn’t acquire a lock, and ended up waiting indefinitely. Because only operations related to a single element were stuck, I assumed it wasn’t a full table lock but a row lock. Unfortunately, I couldn’t confirm this, because by the time I saw the issue, the application had already been restarted, clearing all locks.

Still, the database block felt like a consequence, not the root cause. My job was to find the first thread that got stuck. That original thread triggered the row lock; all other threads simply joined the queue behind it.

So I needed a way to reconstruct the story from logs alone.

My method was simple:

  1. Export all logs from Graylog for the timeframe into a CSV.
  2. Instead of analysing it with Python/Pandas, I chose a quicker path: upload the CSV into a Postgres table with matching columns.
  3. Query the table to find the last time each thread wrote a log line. If a thread is blocked, you stop seeing it. And there it was: the first thread that went silent.
  4. Check what endpoint it called and with what parameters. And the second voilà: the request was trying to move a folder under itself.
  5. In parallel I checked the endpoint code with Cursor, and it clearly showed several recursive branches that could loop forever.
  6. Reproduce: call the endpoint with parameters that move a folder under itself. Instant block.
  7. Fix: add validation to prevent this. 

Momentul Contradicției: Punctul în Care Începe Învățarea

Necesitatea… dacă ai o necesitate, vrei să te miști și înveți.
Dacă n-ai necesitate, nu vrei să te miști din starea în care ești și nu înveți.

E una când povestești, dar când exersezi ajungi la momentul contradicției, când ceea ce știi nu te mai ajută să rezolvi ceva.
Prima reacție e că te enervezi.
Dar dacă continui, te forțezi să te schimbi, să faci o modificare sau o acumulare.

When Learning Jumps Levels: Rethinking the Path from Simple to Advanced

I often say that with complex topics, we need to approach them step by step, so that the reader understands how we arrived there: from simple to complex, just as one of the core principles of education suggests, where sequencing matters.
And indeed, the order does matter. But it seems to me that sometimes certain concepts, even though they are more advanced than earlier ones and should logically come later, might actually represent the start of a different stage of thinking and therefore could deserve to be introduced first.

I often think about this: when a problem appears, the first solution that comes to mind is usually the trivial one. But sometimes it would be better if the advanced, optimal solution came to mind first.

***

 From https://youtu.be/X1HSGEADAhE?t=1573 : 

***

We even teach student programmers in their first years, when they still don’t understand what programming is and are seeing a computer for the first time. Many of them we teach to program in C or C++.

And then, once they’ve learned C or C++, a year or two passes, and in their third or fourth year, or even later when they’ve already become programmers, we start retraining or further educating them in proper design.

We tell them:
"You know, objects are actually certain entities, certain abstractions inside your program. And these objects should have the property of encapsulation. They should hide information."

The programmer—well, the C-style programmer—looks at this and it’s all new to them.
"What do you mean, hide information? How is it that they should encapsulate data and not expose it?"

Well, in C, everything is exposed. In C, there’s global state basically everywhere.
Sure, it’s not encouraged, but people still do it. Global variables are a common practice.

And suddenly, in their third or fourth year—or even in their fifteenth year of working as a programmer—they’re told that everything should have been done differently in OOP. And that’s the problem.

So Richard proposes:
"Let’s introduce them to OOP in the first year—before we even teach them how to program, before they write their first program. Let’s explain what OOP is, why objects are needed, what encapsulation and polymorphism are."

He believes they can understand it—even if it’s with sticks and pictures, with some auxiliary or educational languages.

***
 

joi, 13 noiembrie 2025

Avoiding the Planning Trap: Why N×M is a Planning Disaster (The Cartesian Product of Tasks)

I’ve been thinking a lot about planning lately. I realized I’ve fallen into the same trap many times, and understanding the root cause has made all the difference.

My first impulse was always to over-organize. I’d create a neat system:

  •  I'd create a separate folder and file for each family of tasks (e.g., "Work," "Personal," "Learning"). It felt great at the beginning—so structured! But after a few days, it became a hassle.
  •  The same feeling occurred when I tried building a complex goalscape, with first-level goals for task families, and then another level of goals for each individual task.

In both cases, I was creating a system that demanded too much energy just to maintain. 

*** 

I was unknowingly introducing an unnecessary dimension to my planning.

When I used multiple files, it felt like I was generating a final complexity of N x M, where N is the number of files (or task families) and M is the total number of tasks. 

*** 

Then, I tried the opposite: one single file.

In this file, I created a section for each task family, and a simple Table of Contents (ToC) at the beginning. Each task lives in its relevant section.

Immediately, the system simplified. By removing the file-switching dimension, the complexity was converted to a simple line: M, the total number of tasks. 

***

This is exactly like when I analyze code.

If I am analyzing a method x() with N lines, and it calls another method y() with M lines, it’s easy to get lost by constantly jumping: analyze a line in x(), then go deep into y(), then jump back to x(), and so on.
 

The better approach is to stay at the level of x(), or finish analyzing y() completely, but you must avoid the inefficient back-and-forth.

 

duminică, 19 octombrie 2025

PostgreSQL: ERROR: missing chunk number 0 for toast value 123456 in pg_toast_2619

Problem: After database migration

SELECT * FROM <table_name>

fails with the following message:

ERROR: missing chunk number 0 for toast value 123456 in pg_toast_2619


At first, I found https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a and https://newbiedba.wordpress.com/2015/07/07/postgresql-missing-chunk-0-for-toast-value-in-pg_toast/, but I could not isolate corrupted row, i.e. every query of the following form
SELECT * FROM <table_name> ORDER BY id LIMIT 5000 OFFSET 0;
SELECT * FROM <table_name> ORDER BY id LIMIT 5000 offset 5000;
... 
even
SELECT * FROM <table_name> ORDER BY id LIMIT 1;
failed with the same error message.

A second article (see https://fluca1978.github.io/2021/02/08/PostgreSQLToastCorruption) together with a set of Postgres functions for detecting corrupted toasts (see https://gitlab.com/fluca1978/fluca1978-pg-utils/-/blob/master/examples/toast/find_bad_toast.sql) inspired me how to find the table backed by `pg_toast_2619`, since it was already clear that the problem was not in my <table_name>:
SELECT reltoastrelid, reltoastrelid::regclass, pg_relation_filepath( reltoastrelid::regclass ), relname
FROM   pg_class
WHERE  reltoastrelid = 'pg_toast.pg_toast_2619'::regclass
  AND  relkind = 'r';
The result was pg_statistic and I executed the inverse query to recheck:
SELECT reltoastrelid, reltoastrelid::regclass, pg_relation_filepath( reltoastrelid::regclass ), relname
FROM   pg_class
WHERE  relname = 'pg_statistic'
  AND  relkind = 'r';
the result was pg_toast_2619. Hence the problem was not in <table_name> but in pg_statistic, more precisely in the TOAST of pg_statistic. But why the SELECT * FROM <tabel_name> generated this problem? Because Postgres uses pg_statistic to plan the query. Meanwhile a collegue of mine 
found https://www.ongres.com/blog/solving-postgresql-statistic-corruption/. This article shows how to deal with such cases but also shows that pg_toast_2619 is somehow constant between different Postgres installations.

What I learned from this problem:
  1. How to find a table by its corresponding TOAST.
  2. There are hidden actions behind a simple SELECT * FROM <tabel_name>, i.e. PostgreSQL  can perform a reading from another table first and then from the table `<tabel_name>` (not everything is as it seems at first sight).

PostgreSQL: ERROR: Key (id)=(4703) is duplicated. could not create unique index ""

Problem: After database migration
REINDEX <tabel_name>;
fails with the following message:

ERROR:  Key (id)=(4703) is duplicated. could not create unique index "<table_name>"

It's very strange because a simple
SELECT * FROM <table_name> WHERE id = 4703;
returns only one row. So where does REINDEX find the second (or more) row with id = 4703?

After searching in Google, with "Key (id)=(4703) is duplicated. could not create unique index", I found  
https://stackoverflow.com/questions/45315510/postgres-could-not-create-unique-index-key-is-duplicated, 
that inspired me to try 
SET LOCAL enable_indexscan = off;
SET LOCAL enable_bitmapscan = off;
SET LOCAL enable_indexonlyscan = off;
SELECT * FROM <table_name> WHERE id = 4703;
This query returned two rows. To understand the difference in behavior modify both queries to execute an EXPLAIN SELECT * FROM <table_name> ... instead of SELECT * FROM <table_name> ... and you'll see that Postgres is using the index in the first query to search for the row, that is why 
we don't see the second row, it's not present in the index (by the definition of the index). While the second query is searching for it physically in the table and thus finds both rows.

Why there are two rows with the same id, is another question. My assumption is that one row was marked as deleted in the source database and somehow the data was corrupted and it appeared as non deleted in the destination database. The database was migrated byte by byte, not using .sql export/import.

What I learned from this problem:

  1. How to force Postgres not to use (or use) table's index for a specific query.
  2. A SELECT * FROM <table_name> in most cases is not a physical read of `<table_name>`, i.e. more generally, in Postgres, not everything is as it seems.