RSS

Tag Archives: programming

Storage is Essentially Free!

Twice in the last few days, I ran into low, artificial limits on how much text a website would allow me to enter into a field.

Once upon a time, this was reasonable. Disk was expensive. RAM was expensive. Constraints were real, and tradeoffs were unavoidable.

That time is long past.

Out of curiosity, I asked Gemini to help me research the history of nonvolatile storage costs. This is what I learned:

  • In 1956, an IBM 350 provided about 3.75 MB of nonvolatile storage for $9,200/MB.
  • Today — during a price spike caused by a NAND shortage — a 2 TB SSD costs around $250.

That’s a 99.99999864% reduction in price per megabyte, without adjusting the 1956 price for nearly 70 years of inflation. For all practical purposes, storage is free.

And yet, both places I hit limits this week were meant to hold directions.

In one case, I was updating instructions to cover a scenario that had been missed in the previous version. I couldn’t add the necessary text without hitting the limit. I trimmed the existing content and ended up with something serviceable, but fragile. The very next update will run into the same wall.

This is absurd.

Text (especially text meant to instruct) should be effectively infinite. Modern disks can handle it. Modern memory can handle it. The constraint is not technical; it’s conceptual.

(And don’t get me started on amateurs who think eight characters is enough for a first name, so I end up with mail addressed to “Christop!” See Falsehoods Programmers Believe About Names.)

 
Leave a comment

Posted by on February 10, 2026 in Software techniques

 

Tags: , , ,

DAGs in SQL

What is a DAG and Why Should You Care?

DAG stands for Directed, Acyclic Graph.  It sounds a bit obscure but it has many practical applications, especially in software design.  But what is it?  The first two words are just modifiers so, what is a graph?

Fundamentally, a graph is a mathematical abstraction.  I could tell you it is made up of nodes and edges but that’s still fairly abstract. You can visualize a graph as being islands (the nodes) connected by bridges (the edges), or cities and the roads connecting them.  Indeed, one of the classic software problems, the traveling salesman, is about just such a scene. The salesperson wants to visit their customers on all of the islands without visiting any island (or using any bridge) twice.  Is it possible?  Can you write a program to do it?  Or a program to show it’s not possible?

There are many variations on the problem. What if each bridge had a different toll and the goal is not to avoid repeats but to minimize total tolls paid?

Another interesting variation arises when the bridges or roads allow traffic to pass in only one direction. In the concrete, we recognize this as a one-way street.  In the abstract, it is a directed edge: an edge that can be used to go from A to B but not back from B to A.

A poor traveler with a bad map or a bad sense of direction might set out on a trip across these one-way bridges and come to realize they have been driving in circles (A to B to C and somehow back to A).  A good civil engineer might be able to set the direction of the bridges so that you couldn’t go in circles.  The graph corresponding to these islands and bridges has no cycles, it’s acyclic.

So, a directed, acyclic graph is one where the edges have direction and you can’t return to your starting point following the direction of the edges.

But why should you care?

Graphs are abstract but describe various real-world problems very well.  For example, consider the files and folders (or directories) on your computer.  We’re used to seeing these presented as an outline where you can navigate down from the root of the disk to a folder in the root, to another folder inside that, and so on and so on.  An outline or organization like this is a “tree,” a special type of directed graph where there is only one path from the root to the leaf.  Picture a real tree and you see the same thing: for any leaf, there is only one path from root to trunk to limb to branch to twig to leaf.

For a more complex example, consider trying to classify animals that have frequent contact with people.  We’ll exclude animals seen on safari or in zoos. We can start by dividing them into Farm Animals and Pets.  Horse, pig, chicken: all farm animals. Goldfish, canaries, guinea pigs: all pets. What about dogs? Dogs are great pets but they also are used on farms to herd livestock. So is a dog a farm animal or a pet?  It’s both. You can’t really use a tree to draw these categories but you can use a DAG!  There are two paths from the root to dog: Frequent Contact → Farm Animals → Dog, and Frequent Contact → Pets → Dog.  (We could divide farm animals into food animals and work animals, or pets into cuddly and not cuddly but dogs still fit two categories.)  A dog is not a goldfish or a pig but because going from one category to a subcategory is directional, we’ll never be confused about that.

A Simple DAG Database

Codd and others did a lot of work creating relational databases. Open source tools like MySQL and products like SQL Server have decades or maybe centuries of labor in them to implement those theories. An edge is a relationship between two nodes so why wouldn’t you use a relational database to store it? Why reinvent the wheel?

You might say because node and edge aren’t SQL types but neither are bank account and product, but banking and e-commerce systems certainly use relational databases.

How do we represent a node in SQL? A table, naturally. What columns does the table have?  Names are convenient for people to refer to things but numeric IDs are somewhat better for computers. So, let’s say a node has an ID, a name, and (why not?) a description.

CREATE TABLE [dag].[DagNode](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Description] [nvarchar](250) NULL)

An edge connects two nodes.  The ID we gave nodes gives us an easy way to represent the nodes.  The natural way to present this is a two-column table where each row is two IDs: one for the parent (the source of the edge) and one for the child (the destination of the edge).

CREATE TABLE [dag].[DagEdge](
[ParentID] [int] NOT NULL,
[ChildID] [int] NOT NULL)

Leaves and Referential Integrity

The table creation above omits the referential integrity constraints in my live database. The DagEdge table in my database has foreign key constraints that make sure both IDs refer to an actual node.

But what about leaves? In graph theory, a leaf is a node with only one edge. But that has some limitations in a database. First, the items we’re organizing in a DAG likely have different attributes than the name and description on a node. So, we create a new table for our leaves.

CREATE TABLE [dag].[DagLeaf](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL)

But then, we can’t use DagEdge to connect a node to a leaf because of the foreign key on ChildID. So, we create a new table to hold the node/leaf relationships.

CREATE TABLE [dag].[DagNodeLeaf](
[NodeID] [int] NOT NULL,
[LeafID] [int] NOT NULL)

And then add foreign keys to those referring to DagNode and DagLeaf, respectively.

While the DagNode and DagEdge tables are generic, we need a data table and a relation table for every type of data we’re going to organize into DAGs.

Examples

Code supporting this post is available on GitHub. That includes a script to create the tables mentioned above (complete with indices and constraints), a script to populate with some sample data, and a script full of sample queries to show how the data and functions behave. My examples are for SQL Server. Other databases of interest are MySQL and Postgresql. I’m happy to consider pull requests for those or others you may choose to implement.

Thanks

Thanks to CyberReef for allowing me to share this code. Something like it is used in the MobileWall product and related services. Hopefully, I didn’t add too many bugs as I tried to abstract the concepts.

Thanks, also, to my colleagues Carmen, Jessica, and Siva who reviewed the original code and provided valuable feedback.

 
Leave a comment

Posted by on December 4, 2025 in Software techniques

 

Tags: , , ,

Test 25x Faster!

My very first professional programming project was debugging and completing a lab automation system written in BASIC. It ran on a desktop HP computer and controlled instruments and equipment through GPIB. The challenge was that it controlled experiments in “real time,” not the “really fast response” that “real time” often means, but rather by the clock. It would do something, wait a minute or 15 minutes or something, do the next thing, wait a while, etc. The system started with bugs fairly early in the run of the experiment so I could start the program running, take a short break, and come back to find the program had crashed. I’d figure out what went wrong, fix it, and start the program again. But this time the program didn’t crash in the code I’d just fixed; it ran longer and crashed in 10 minutes instead of the previous five. Do you see where this is going? Eventually, I had to wait hours for the next crash and the better the code got, the longer I had to wait to fix the next issue!

My current team also does work that has to happen by the clock. The software counts some things and takes certain actions at certain limits. The counts reset at the end of the period which might be an hour, a day, a week, or a month. We can fake the data that causes the counts to change, but we don’t want to wait around to see if a monthly action works as it should. And messing with the system clock to fool the software is messy. Fortunately, one of the most interesting aspects of the system (one that we need to test carefully and repeatedly to avoid regression) involves when things are supposed to happen at different time scales. Do things that happen at a small time scale interact appropriately with things that happen at a larger time scale?

Once we were confident that the system properly recognized the end of an hour, day, etc. (in core code that was unlikely to change), we sought a way to speed up testing of other features so we didn’t have a month-long test in our release process. What we realized is that an hour is 1/24 of a day and a day is around 1/30 of a month. So if our production system is primarily concerned with days and months, we can take production configuration, change days to hours and months to days, and test in roughly 1/25 of the time. An overnight test with this substitution effectively tests two weeks (14-18 days) of real execution that straddles a month boundary. A weekend-long test covers 3-4 months of real world cycling through the logic in the system! And we don’t have to disable NTP or play any other shenanigans with the system time.

Just don’t ask me what happens around Daylight Saving Time transitions.

 
Leave a comment

Posted by on June 17, 2024 in Uncategorized

 

Tags: , ,

Language Shapes Thought

My wife is a public relations professional. She works daily with the English language. When reviewing or editing a colleague’s or client’s writing, she is constantly looking to see if it is clear and if it is correct. English has rules and while they may be looser than those imposed by computer languages, they are important to clear communication. We frequently discuss the irony that when I am reviewing code, I am doing the same thing but in various computer languages. I work to make sure the code clearly communicates intent to the computer and comments clearly communicate intent to other developers.

Some formality can be achieve with things like UML but, for the most part, my colleagues and I talk about code using English. We might say that two files in the same directory are “siblings.” Or that one node in a DAG is a cousin to another. These kinds of relationships are important and I’ve found myself realizing that there’s no easy way to refer to a parent’s sibling. Of course English has “aunt” and “uncle” but those gendered words don’t fit well in computer science.

I was reminded of this recently when I read A Psychologist Explains How The Language You Speak Manifests Your Reality in Forbes. It talks about how language shapes perception and what you can convey. In Mandarin, it seems, the word you choose for “aunt” conveys whether she is on your father’s or mother’s side, as well as whether she’s an aunt by birth or marriage. (They don’t say if there is a vague, gender-neutral word for “parent’s sibling.”) I was also intrigued by Bilingualism Is Reworking This Language’s Rainbow (in Scientific American) which discussed how some human languages are better than others for describing a range of colors.

Similarly, computer languages restrict what you can express easily, and in some cases limit what you can do at all. Early in my computer science education, I took a course called “Computer Languages.” It was a survey course designed to introduce students to varied languages. It covered APL, LISP, Fortran, and SNOBOL. The instructor drove home the strengths and weaknesses of the languages by having us use each language to solve a problem it was ilsuited for. We were tasked with solving the travelling salesman problem in Fortran. That is a classic illustration of the power of recursion, often used to demonstrate how LISP works. But Fortran does not support recursion!

It is said that to a man with a hammer, every problem looks like a nail. If Fortran was the only language in my toolbox I could be forgiven for using it when presented with a problem better suited to LISP or C. But my toolbox contains more than a dozen languages. I can and do pick from a handful of modern candidates when picking the tool for a new problem.

As a hiring manager, I’ve often said that I would prefer not to hire a developer who only knows one language. But even two similar languages are fairly limiting. I’d look for a compiled language and a scripting language. Or a procedural language and a declarative language. If you know C, you can get up to speed on C# fairly quickly. But if all your experience is in procedural languages, you’re likely to write a lot of loops in C# instead of using LINQ. If you know SQL, then LINQ feels natural. Like a Tsimane’ speaker borrowing azul from Spanish to describe blue, knowing multiple computer languages allows you to express more programs more clearly than you could otherwise.

Languages — human and computer — grow by borrowing from other languages. Speakers and programmers benefit from knowing more than one language, even if they routinely use just one. Go learn another language; whether it is your 3rd or 13th you’ll be a better programmer for it.

 
Leave a comment

Posted by on March 20, 2024 in Uncategorized

 

Tags: ,