Perhaps the GPU parking lot, aka register file waiting on long latency returns,
is a side effect of not having ability to issue a load which pushes data to a different SIMD unit's register file?
If loads could be issued and return somewhere else, one could possibly split a problem into 2 components:
the part figuring out how to route memory traffic, and the part consuming the memory traffic.
No call and return, thus no parking of state after loads.
Down the Rabbit Hole and Back Out Again: Serial Over Headphone Jack - [ttsiodras] tells an epic tale of getting a custom Debian kernel installed on an Asus MemoPAD (ME103K) tablet. Skipping to the end of the saga, he discov...
1 hour ago