Perhaps the GPU parking lot, aka register file waiting on long latency returns,
is a side effect of not having ability to issue a load which pushes data to a different SIMD unit's register file?
If loads could be issued and return somewhere else, one could possibly split a problem into 2 components:
the part figuring out how to route memory traffic, and the part consuming the memory traffic.
No call and return, thus no parking of state after loads.
Hackaday Prize Entry: Crowdsourced Tactile Interfaces - Your microwave, your TV, and almost the entire inventory of Best Buy have one thing in common: they all uses membrane switches for user interaction, and ...
1 hour ago