All the logic is contained in Node 2.
We need to “look at” each input value twice: first for determining which one is larger, then again to create the output stream. Node 1 passes two copies of IN.A
to Node 2, and Node 2 stores a copy of IN.B
in BAK
.
In Chapter 3, Sequence Amplifier taught us to speed up a calculation by using multiple nodes to do the same operations on multiple streams of data. Today we’ll use multiple nodes to do different operations on the same stream of data.
It wasn’t very efficient to have Node 2 doing all the work; in the size-optimized solution, Node 2 is always running while Nodes 6 and 9 are idle almost half the time. We can split its work into three separate tasks.
Node 2 still calculates IN.B - IN.A
, but instead of using the result to sort the outputs as before, it just passes the result to Node 6 and grabs another input value as soon as possible.
Node 6 puts the outputs in the correct order, and Node 9 adds a 0
onto the end.
After running the program, look at the path from IN.B
to OUT
. Node 6 is busiest at 8% idle, but Nodes 2 and 9 are only 17% idle. (The other nodes are less busy, but they’re just for plumbing.) By keeping each node about equally busy with its own step in the pipeline, we cut down on the amount of time spent waiting.
For homework, swap the order of the MOV ACC RIGHT
and MOV ACC DOWN
instructions in Node 1. What happens? Why?
Solution by scrat98.
The idea is the following: we need to store the difference IN.B
– IN.A
and the number IN.A
. First, we can determine in what order to display the numbers A and B, and second, we know these two numbers (B = B – A + A).
For a start, we will send the number A into two processors. Fundamentally, to first send to the right processor, and then only to the down(below will be clearer).
In the right CPU, we calculate the difference B – A, while at the down CPU we will only get a top – for parallelization processes.
Next we look at the value of B – A, and left get A twice – first time to determine the number of B(as B is B – A + A), and the second time to display the number of A. Now it becomes clear that logical in the beginning to submit A into the right CPU and then only to the down, because most of the work is here and we fundamentally get the difference B – A before the number A.
Note: If you decide to change the flow of A first at the down processor and then to the right, we get 138 cycles, you can check:)
Neither of these solutions uses the SWP
instruction.
Solution by gmnenad.
Node 7 behaves like an extra register for node 8, storing the sum of all the inputs so far.
Node 5 checks whether the input is 0 (signaling the end of a sequence) and passes the result to node 9, which counts the length of each sequence.
Solution by CaitSith2.
Node 4 decides whether the input is 0 or not. If it is not 0, the input is passed to Node 8, where it is added, and node 9 gets instructed to increment the ACC. A 0 likewise tells nodes 8 and 9 to output their totals, and reset. This is done JRO style.
Node 6 subtracts the current input from the previous input stored in ACC
, takes the absolute value of the difference, and uses it to determine the output. It then takes a second copy of the current input (generated by node 1) and stores it in ACC
.
Optimizations by CaitSith2 and Solomute.
Node 1 splits the input stream in order to stagger it. Node 2 uses the BAK
register to continuously supply the previous value to node 5, which subtracts it from the current value to get the difference. To eliminate the need to check for two distinct cases (difference > 10 and difference < -10), node 8 computes the absolute value of the difference. Finally, node 9 checks whether the absolute difference is at least 10.
For homework:
MOV ACC RIGHT
and MOV ACC DOWN
instructions in node 1. How many cycles are wasted?Finishing touches by CaitSith2 and Solomute.
Nodes 0 through 3 output their respective input numbers when their inputs change from zero to one; they output zero for all other conditions. (They are ARMED
when the previous input was zero, and DISARMED
when the previous input was one.)
Nodes 5, 6, and 9 collect the outputs from nodes 0 through 3. Since we are guaranteed that two interrupts will never change in the same input cycle, adding the output values together results in the correct value for OUT
.
Solution by CaitSith2.
Works much like the optimized for speed solution, ecept that node 0 passes the value to node 1; node 1 passes its value and node 0’s value to node 2; node 3 passes its value to node 2; and node 2 passes its value down, along with the values from nodes 0, 1, and 3. Node 6 then adds up the values and passes the result down.