20130415

Joined Epic

Moved on to something new, joined the render team at Epic!

My Other Project

Hope to finally get this beast on the track this year. Fabrication work thanks to Luke and team at Blu808. Full build thread here.

Repost: Linear Lighting Food for Thought

Sorry, another requested repost, restoring old links was broken..

I'm going to start with some reference material here, then later get to how this relates to physically based shading in games. If you are a photographer and you start with un-edited raw captures from your camera, then you already understand the visual example: what is rendered by light in real life, is a lot different than what your mind sees. If you don't have an sRGB color calibrated monitor, then these images are going to look wrong, sorry....

First A Visual Example
Here is what the raw un-edited capture looks like of a scene shot around sun-down on the lava fields on the Big Island in Hawaii. This is not what I would describe as a good landscape shot professionally, but it is good enough to make the point at hand. The only adjustment done here, is application of the camera's auto white balance recorded at the time of capture, and a color conversion from the camera's native linear colorspace to sRGB for saving as a PNG. No adjustment was done to contrast or saturation or anything else. Take note of the lack of contrast, the lack of saturation, the camera's auto white point not understanding the color of the sunset, etc. The real physically correct image looks "dull" and "flat".

Below is an example edited version which contains white point adjustment, tone curve to bring out contrast, and a little increased saturation masked to low saturation areas of the photo (vibrancy control in Aperture). This is closer to what I remember the scene to have actually looked like, and very close to how it felt to be in this location around sunset. Notice how the difference between the darker old and lighter newer lava flow is visible. Notice how the sunset skybox reflection is clearly seen in the rock, how now the lighting direction is obvious from the contrast of the warm tones of the sunset to the cool tones of the rest of the sky reflecting in the scene. Side note, I'm apologizing early for the clamping on the blacks on this image. A good edit of this photo would not have this problem, but I used Aperture and it lacks good black control when using correct linear curves adjustment.


Relating to Physically Based Shading in Games
The post processing step has the function of translating a physically lit scene into something closer to what the human mind would perceive. Typical art pipeline does not decouple the physical lighting from the post processing. Often it is the environmental artist's job to work under some standard tone-mapping, and attempt to place lights to bring out the feeling of the scene. An interesting complement to this would be to also work with post processing OFF and attempt to match un-edited raw camera reference material. Then later attempt to tune the post processing, then repeat this cycle.

Post processing is a critical step in reaching the desired feeling of the scene. A few games have employed adaptive exposure level control, but I'm not sure if anyone is doing adaptive tone-mapping in the blacks, and this might be useful in games. From the photography perspective, working with scenes with large dynamic range, control over the toe of the tone mapping curve is very important. The aim is to squash the darks and the highlights to leave as much contrast for the mid tones as possible without making the scene look to unnatural. Games have this easy compared to print. The dynamic range of luster photo paper is bad, the dynamic range of canvas is horribly bad. Below is the tone curve from the above edited image. It would take a lot more fine control over the darks to avoid the incorrect black clipping in my example,


Surface Format Choice
When working in a linear colorspace, the 10-bit and 11-bit formats do not offer enough precision to have a lossless conversion to 8-bit/channel sRGB even for just the 0.0 to 1.0 range. Integer formats have a loss of precision in the darks and will result in a certain amount of banding during conversion. Floating point formats have a loss of precision in the lights and will also result in banding. The N:1 column below states that in the worst case 1 value in the linear format skips N values in the 8-bit sRGB conversion.

COLORSPACE & FORMAT   sRGB BANDING    LOCATION OF BANDING   
Linear UNORM10 4:1 blacks (0 to 0.13)
Linear UNORM11 2:1 blacks (0 to 0.06)
Linear FP10/64512.0 3:1 whites (0.39 to 1.0)
Linear FP11/65024.0 2:1 whites (0.75 to 1.0)

FP16 has more than enough range to work without banding. Assuming direct linear FP16 to 8-bit/channel sRGB, the following table shows the amount of total dynamic range given certain scale factors, and provides the worst case precision after the conversion. In this case 1:N, means N FP16 values map to 1 8-bit sRGB value. N larger than 1 is describing cases of increased precision.

LINEAR FP16 SCALING   sRGB PRECISION    DYNAMIC RANGE   
FP16*5348.0 1:1 1:350M
FP16*2604.0 1:2 1:170M
FP16*1284.0 1:4 1:84M
FP16*638.0 1:8 1:41M

Note that 8-bit sRGB does not have enough precision itself to reproduce an image on today's high contrast displays without showing banding, so games still need to temporally dither using some kind of film grain even if they don't want the grain to be visible outside of just removing the banding seen with 24-bit/pixel display.

Repost: Understand the Speed of Light of Game Input Latency

This is just a requested repost of the technical parts of the prior post...

----BACKGROUND----

PC CRTs
After the graphics card finishes rendering the frame, the graphics card scans out the frame, and during scan out the display is physically displaying the frame. This was the ultimate in high quality and low latency. In contrast, modern flat panels need to buffer some amount of the frame (and sometimes the entire frame) before display, and have pixel switching latency.

PC Flat-Panel Display Latency
TFT Central measures input latency for PC displays. Low quality displays can have upwards of 30ms of latency, average displays around 16ms, and better displays around 5ms. Even 120Hz displays can have 12ms of latency. Note all these numbers represent the time to add on top of scan-out.

HDTV Display Latency
For game mode, HDTVtest measures input latency for HDTV displays. I personally use a Samsung PS50C series plasma for a PC monitor which has a 18ms display latency. Looking through the site, the popular or newer reviewed plasma HDTVs typically have latencies ranging between 16ms for the more popular less expensive models, to 30ms on the more expensive models. Non-plasma popular or newer reviewed HDTVs showing similar 16ms to 30ms input latency also. Looks like the HDTV industry has made some advances in reducing latency, now matching typical PC displays with the exception of the best PC displays.

Mouse Input Latency
Most default crappy PC mice have a 125Hz update rate (or an 8ms latency). My crappy mouse updates at 8ms. Gaming mice go all the way down to 1ms.

Dissecting PC DX WDDM Graphics Stack Latency
For DirectX, GPUView is the tool to use. Here are the stages of the Windows WDDM driver stack (note this graph is not exactly correct),


For DX, the vendor's graphics driver is cut into two parts, user-mode and kernel-mode, and in-between is a Microsoft layer which controls memory managment and GPU scheduling. When the application issues a DX API call, this hits the graphics vendor's user-mode driver (which note might queue the API call to a user-mode driver background thread, and then immediately return so the game thread can continue to issue draw calls as fast as possible). When the user-mode driver (finally) processes the API call, it queues grouped commands to the Microsoft layer which schedules the groups, then eventually passes the groups of commands to the graphics vendor's kernel-mode driver which then enques the groups of commands to the GPU.

Latency of the PC DX driver stack depends on when all these threads physically get scheduled on the machine. Quality of service of frame rate also depends on when these threads get scheduled on the machine. Since the Windows CPU scheduler cannot insure real-time scheduling, the DX driver stack utilizes a latency buffer of up to 3 frames before the GPU starts to render if the application is GPU bound.

Two to three frames of latency is typical of the PC driver stack. For this reason, many serious gamers disable v-sync, accept tearing, and attempt to maximize frame rate --- all in the name of reducing input latency.

Some games introduce GPU queries to force the PC driver stack to remove these extra frames of latency. This can result in the GPU idling waiting on the long CPU pipeline to feed in commands. When doing this it is likely a good idea to insure some idle CPU hardware threads which can be used by the driver threads.

----SPEED OF LIGHT----

First The 60Hz Console Title
Starting with measured numbers from Digital Foundry. Subtracting out the 50ms display latency of their low quality Dell panel (16.6ms of scanout + 33ms of display latency), they have found a typical good 60Hz title has a 16ms game pipeline latency. They are using a 60Hz camera, so lets assume +/-16ms of possible error.

Diving into the optimized game pipeline,

(1.) Kick view-independent draw calls.
(2.) Read input.
(3.) Kick view-dependent draw calls.
(4.) CPU starts on next frame which can include (1.).
(5.) GPU finishes frame.
(6.) V-sync window.
(7.) Scan-out + extra flat panel display input latency.

Note the game issues view-independent draw calls before reading input. Reading input is delayed to as late as physically possible (2.). Controller input to display input jitter will effect latency. Lets say controller input is 10ms (due to OS or hardware buffering). In the impossible best case, the controller and the display are completely in sync and the game always gets the controller input right when it needs it. In the worst case the game just misses the controller input resulting in 10ms extra latency in this made up example.

The second addition of latency is the time it takes for the engine to begin rendering view-dependent commands (3.) based on input. Typical 60Hz games are likely to have relatively little view-independent draws (with the exception of streaming reasources, or a game which does something awesome like texture space shading).

A third addition of latency is the GPU rendering far enough ahead of v-sync to insure the frame can have some run-time varibility without missing.

Add all these extra sources of latency, and one is likely looking at up to an extra entire frame of latency (which is within the possible measurement error of the Digital Foundry article). Total latency for a good 60Hz console title is probably somewhere around 16.7ms to 33.3ms, adding 16.7ms for scanout, then pairing with a good HDTV (like a 16.7ms latency plasma), and total button to display latency is likely somewhere around 50ms to 66.7ms (or 3-4 frames). The Digital Foundry article measures around 66ms button to screen on a display which has similar latency to a good plasma HDTV.

Extending to 30Hz Console Title
A 30Hz title can have more view-independent draw calls, this can be leveraged to reduce latency. The cost of display to input jitter, the latency of view-dependent issue to kick, etc, can have relatively less of an effect at 30Hz. The Digital Foundry article measures 50ms-66ms of game latency for a good 30Hz title (given possible error and subtracting out scan-out and display latency). Now if a 30Hz game scans out at 60Hz (double scan frames), add 16.7ms for scan-out then 16.7ms for a good HDTV display's added latency, and totals for a 30Hz title could be as low as 83ms-100ms for button to display.

Quick Estimates of PC Latency
Lets start with speed of light for a 120Hz display with 6ms latency over scan-out, but lets skip over some important details: 8.3ms rendered frame + 8.3ms scanout + 6ms display = 22.6ms. Now if DX driver stack buffers an extra 1 frame = 31ms. Or 2 frames = 39ms. Or 3 frames = 47ms. Which is starting to get near the latency of a 60Hz console title with a good HDTV. That's at 120Hz, if your PC title is 30Hz then the 2-3 frames of PC DX driver stack latency is a huge deal.

How about my setup with a mid-range GTX 560ti. I've removed the driver buffering problem by my CPU input thread writing directly into a pinned memory buffer, with the GPU reading input from pinned memory directly then computing the view matrix and storing to the constant buffer right before the GPU renders the view-dependent part of the frame. My crappy mouse is 8ms of latency, and I have say 12ms/frame of view-dependent GPU work. Expected total latency: 8ms mouse + 12ms drawing + 4ms v-sync window + 16.7ms scanout + 18ms plasma display = 59ms expected mouse to display. Hopefully right in-line with a good 60Hz console experience.

Continuing with a high-end GPU with a gamer mouse and good PC display: 1ms mouse + 6ms drawing + 2ms v-sync window + 8.3ms scanout + 5ms display = 20.3ms.

----EXTRA----

Understanding Tiler GPUs and Latency with Low Frame Rates
In contrast to desktop and fast notebook GPUs, most phone and tablet GPUs (one exception being NVIDIA's Tegra GPUs), are tilers. Tilers buffer up a complete frame, then optionally reorder draw calls to avoid scene (or framebuffer target) changes, and then issue commands to the GPU. The Tiler GPU needs to run two frames in parallel: the vertex load from the current frame, and the pixel load from the prior frame. So on top of whatever CPU latency window is required because of the non-real-time scheduling in the OS, the GPU itself needs to at least double buffer.

20130406

XIPH Videos

XIPH Videos - Second video is a great intro to digital signals.

20130401

Blu808 C6 Race Car

The Blu808 C6 Corvette Race Car is up for sale. This car won its class at the Global Time attack series championship finals at Infineon raceway in 2011.

20130331

Hardware Official Teaser

Mini ITX with GTX Titan

Anyone have any other sources for mini ITX cases which fit full size GPUs, please write a comment.

Those small form factor (4" wide) boxes sporting a GTX Titan GPU at the NVIDIA booth at GDC are Falcon Northwest's Tiki.


For those looking for just a mini ITX case (which fits a full size GPU) instead of the entire machine, m3cases.com is croud funding the M3A2.