All the logic is contained in Node 2.
We need to “look at” each input value twice: first for determining which one is larger, then again to create the output stream. Node 1 passes two copies of
IN.A to Node 2, and Node 2 stores a copy of
In Chapter 3, Sequence Amplifier taught us to speed up a calculation by using multiple nodes to do the same operations on multiple streams of data. Today we’ll use multiple nodes to do different operations on the same stream of data.
It wasn’t very efficient to have Node 2 doing all the work; in the size-optimized solution, Node 2 is always running while Nodes 6 and 9 are idle almost half the time. We can split its work into three separate tasks.
Node 2 still calculates
IN.B - IN.A, but instead of using the result to sort the outputs as before, it just passes the result to Node 6 and grabs another input value as soon as possible.
Node 6 puts the outputs in the correct order, and Node 9 adds a
0 onto the end.
After running the program, look at the path from
OUT. Node 6 is busiest at 8% idle, but Nodes 2 and 9 are only 17% idle. (The other nodes are less busy, but they’re just for plumbing.) By keeping each node about equally busy with its own step in the pipeline, we cut down on the amount of time spent waiting.
For homework, swap the order of the
MOV ACC RIGHT and
MOV ACC DOWN instructions in Node 1. What happens? Why?
Solution by scrat98.
The idea is the following: we need to store the difference
IN.A and the number
IN.A. First, we can determine in what order to display the numbers A and B, and second, we know these two numbers (B = B – A + A).
For a start, we will send the number A into two processors. Fundamentally, to first send to the right processor, and then only to the down(below will be clearer).
In the right CPU, we calculate the difference B – A, while at the down CPU we will only get a top – for parallelization processes.
Next we look at the value of B – A, and left get A twice – first time to determine the number of B(as B is B – A + A), and the second time to display the number of A. Now it becomes clear that logical in the beginning to submit A into the right CPU and then only to the down, because most of the work is here and we fundamentally get the difference B – A before the number A.
Note: If you decide to change the flow of A first at the down processor and then to the right, we get 138 cycles, you can check:)
Neither of these solutions uses the
Solution by gmnenad.
Node 7 behaves like an extra register for node 8, storing the sum of all the inputs so far.
Node 5 checks whether the input is 0 (signaling the end of a sequence) and passes the result to node 9, which counts the length of each sequence.
Solution by CaitSith2.
Node 4 decides whether the input is 0 or not. If it is not 0, the input is passed to Node 8, where it is added, and node 9 gets instructed to increment the ACC. A 0 likewise tells nodes 8 and 9 to output their totals, and reset. This is done JRO style.
Node 6 subtracts the current input from the previous input stored in
ACC, takes the absolute value of the difference, and uses it to determine the output. It then takes a second copy of the current input (generated by node 1) and stores it in
Node 1 splits the input stream in order to stagger it. Node 2 uses the
BAK register to continuously supply the previous value to node 5, which subtracts it from the current value to get the difference. To eliminate the need to check for two distinct cases (difference > 10 and difference < -10), node 8 computes the absolute value of the difference. Finally, node 9 checks whether the absolute difference is at least 10.
MOV ACC RIGHTand
MOV ACC DOWNinstructions in node 1. How many cycles are wasted?
Nodes 0 through 3 output their respective input numbers when their inputs change from zero to one; they output zero for all other conditions. (They are
ARMED when the previous input was zero, and
DISARMED when the previous input was one.)
Nodes 5, 6, and 9 collect the outputs from nodes 0 through 3. Since we are guaranteed that two interrupts will never change in the same input cycle, adding the output values together results in the correct value for
Solution by CaitSith2.
Works much like the optimized for speed solution, ecept that node 0 passes the value to node 1; node 1 passes its value and node 0’s value to node 2; node 3 passes its value to node 2; and node 2 passes its value down, along with the values from nodes 0, 1, and 3. Node 6 then adds up the values and passes the result down.