From time to time a scientific Python user will tell me that they’d like to use COUNTLESS 2D, but they’re working with a sparse dataset. For instance, they might have a volume of cells and they’d like to paint a dot over each cell body’s center of mass. However, if they were to use COUNTLESS 2D, the surrounding numerous black pixels would erase the small dots within a small number of downsamples, leaving the canvas entirely black when zoomed out.
If the dot size is a single pixel and will never be adjacent to another pixel (even when downsampled), then a simple solution is to use
np.maximum. However, if there is a chance that your labels might collide, it makes more sense to treat the surrounding black pixels as background and proceed with choosing the mode of each 2x2 block.
Stippled COUNTLESS can be stated simply. Let the four pixels of a 2x2 block be labeled A, B, C, and D.
- If there’s a non-zero pair match among A,B, and C return that value.
- If there is no non-zero pair match between A, B, or C pick D unless D is zero.
- If D is zero, try A, then B, then C until a non-zero value is found or all elements have been tried in which case return zero.
The discussion below describes why this is correct.
Recall the most important lines of the COUNTLESS 2D algorithm:
# LISTING 1: COUNTLESS 2D Algorithm
1. def countless2d(a, b, c, d):
2. ab_ac = a * ((a == b) | (a == c))
3. bc = b * (b == c)
4. abc = ab_ac | bc
5. return abc + (abc == 0) * d
First consider situation (b) from Figure 1 where only A is black. On Line 2, if A is zero and B,C are nonzero, then
ab_ac is zero, Line 3 will pick B if B and C match else it will also be zero, in which case D will be chosen on Line 5.
Second, consider situation (j) where C and D are black. If A does not match B, it can’t match C, and thus
ab_ac is zero. B does not match C, so Line 3 is zero. D will be chosen, but D is also zero when we really wanted either A or B to be picked. What a dilemma.
Case (n), where A, B, and D are zero, is also nefarious. A matches B but evaluates to zero, B doesn’t match C, and D is chosen, but we really wanted C!
Fortunately, there is a simple solution. I’ll derive it by treating increasingly complex cases from single black pixels, then double black, then to triple and quadruple black.
Single Black Pixel (b — e)
In the case of the single black pixel, the original algorithm does the right thing in cases (b), (c), and (d). It either picks D if there are no matches, or AB, AC, or BC if there is a match. At first glance it appears that (b) might pose a problem because factor A is zero, but luckily Line 2 would produce zero anyway because it differs from B and C.
Case (e) provides the first real challenge. If there are no matches among A, B, and C, D will be chosen, but we’d like to have picked either A, B, or C instead. This can be remedied by inserting A if D is zero.
# LISTING 2: Single Black Pixel Solution# Modified Line 5
5. return abc + (abc == 0) * (d + (d == 0) * a)
Double Black Pixel (f — k)
The double black pixels pose more of a challenge but they also provide some relief at the same time. It’s now possible for A to be zero while matching B, C, or D, for B to be zero while matching C, and for D to be zero at the same time as another pixel. However, if two pixels are black, we can choose any of the non-zero pixels as they will either be different (making the choice arbitrary) or matching (making the choice identical).
Our original algorithm correctly handles cases (f), (g), and (i). Our modified algorithm correctly handles cases (f), (g), (i), (j), and (k). However, it misses case (h). This is because (h) is zeroed at both A and D. If B matches C, it will be handled correctly, but if B and C don’t match the algorithm will return zero when it should return one of them.
We can modify our algorithm some more. Instead of only checking D then A, why not check D, A, and B?
# LISTING 3: Double Black Pixel Solution# Insert an extra line before the return
5. nonzero = a + (a == 0) * b
6. return abc + (abc == 0) * (d + (d == 0) * nonzero)
This solves case (h) for us by selecting B in the case of a non-match. It doesn’t interfere with the other examples because this logic only has an impact if zero is about to be returned.
Triple Black Pixel (l — o)
The triple black makes things even tougher. Consider case (n), which is case (h) with B blacked out. However, it looks like our modified algorithm already handles cases (l), (m), and (o) already! The original algorithm would have only handled case (o) properly. Let’s add in logic for case (n).
# LISTING 4: Triple Black Pixel Solution# Insert an extra line before the return
5. nonzero = a + (a == 0) * (b + (b == 0) * c)
6. return abc + (abc == 0) * (d + (d == 0) * nonzero)
We’ve done it! All of the cases are solved. The final case of the quadruple black pixel will always return zero correctly.
Rendered in a slightly different form, our algorithm would look like this:
# LISTING 5: Stippled COUNTLESS as if statementsif ((a == b or a == c) and a > 0):
elif (b == c and b > 0):
elif (d > 0):
elif (a > 0):
elif (b > 0):
Stippled COUNTLESS has some additional overhead compared with
quick_countless as can be seen in Figure 2. However, because zero is now treated as a separate category, we no longer have to widen the array and add one as in
countless. This results in nearly double the performance as the amount of memory requiring processing is about half.
The code used for testing this pipeline can be found on Github. Contributions in additional languages and feedback are welcomed and will be credited.
This article is a followup to “COUNTLESS — High Performance 2x Downsampling of Labeled Images Using Python and Numpy” (2017).