20150722

1536-3 : Simplify, Repeat

Night 3 on the 1536 project, decided to make some changes before going full ahead with writing an editor.

(1.) Switched the boot loader to not relocate to zero. Now leaving the BIOS areas alone, I doubt I'll ever go back to real mode after switching to long mode, this ends up making the code easier, and provides room for the next change.

(2.) Switched the boot loader to fetch the first 15 tracks instead of just 1 track. Now have a little over 472KB of source to work with on boot, which is effectively infinite for this project. The motivation for this change was the realization that x86-64 assembly source would get big. Don't want to change this later. 472KB is close to the maximum without checking for things like EBA, or handling non-full-track reads.

(3.) Switched to an easier source model. Lines are now always 64 characters long. Comments are switch to a single \ which functions like the C // style comment, ignoring the rest of the line. Since lines are always 64 bytes (cacheline sized and aligned), the interpreter can quickly skip over comments. This is a trade in increased source size, for simplification of the editor: fixed size lines makes everything trivial.

(4.) Making a convention that syntax highlighting with color only has a line of context. Which translates into don't let things wrap. Easy.

Star Wars Battle Pod Arcade Review : Save Your Tokens for Air Hockey

Went to the Cary North Carolina Dave and Busters to try out the Star Wars Battle Pod a few days ago after posting someone's youtube review ages ago. The arcade experience in the US has certainly changed since I was a youth. Nearly everything I loved as a kid is gone, with the exception of some classic physical games like air hockey, skee ball, pool, etc. The Battle Pod is a great example of how the spirit of the arcade is getting lost. Starting with the screen: it's a spherical projection screen where Dave and Busters had the awesome idea of keeping their card reader illuminated so strongly during gameplay, that the screen black level was practically white: nearly impossible to see what was going on. That might have actually been a blessing, because whoever wrote the spherical projection code apparently figured out how to do something worse than bilinear filtering: it looks horrible. What is left is relatively low resolution which would be fine if properly filtered, except in this case the aliasing is so bad, I kept on getting the feeling that the only point to the card reader was to hand out a refund to pay for the player's eye pain. It gets better: the game hitches, doesn't even feel like 30 Hz, let alone the 60 Hz which sets the minimum bar for frame rate in a real arcade game. The classic arcades were defined by perceptually zero latency input designed to take a beating, with locked frame rates at the highest rate possible on the display hardware, and stunning visuals pushing the limits of the hardware. Someone badly needs to bring that experience back...

20150719

CRT Shadow Masks vs LCD

Found these two images below (click for full resolution) on this thread http://www.vogons.org/viewtopic.php?t=32331 when looking for source photographs for CRT shadow masks. Great visual description of why CRT shadow masks were so good for image quality in comparison to modern LCDs. Sure LCD image is double scanned (2 scanlines per pixel), but the comparison would still hold even if it was not.



20150718

Stochastic 1 Sample/Pixel Lit Fog Stills

Found a the old stochastic 1 sample/pixel lit fog. Left the post process grain on, it is smoother in practice. This could be an algorithm which only looks good in this demo, never really tried adjustments on other content...











General idea is to stochastically select a z value/pixel between the eye and the opaque backing z value based on the volume of material in between. In this demo I just used a very large sphere of volumetric stuff behind the center sphere. Each {z} point is shaded and "lit" (fake in the demo), and also has some opacity value. Then there is a separate spatial+temporal filter process which attempts to remove the noise from the extremely sparse volume sampling, and also correctly manage un-occlusions, etc. The volume is treated separately from the opaque layer and blended together before the final temporal noise reduction pass (scene is traced). The demo was running at 120Hz, and didn't ever look right at 60Hz. These temporal techniques are all about visual masking of artifacts in motion, so they tent to be highly tuned just to the point of perceptual artifacts at a given target frame rate. The one takeaway from this little project, was to weight samples in the filter based on similarity of their backing opaque z value to center backing opaque z value (instead of using shading z values). This tends to maintain an even gradient based on objects which are at a similar distance from the eye. Which is what one would expect in general for diffuse fog volumes.

Something I didn't try but would help here is to decouple volume density sampling (aka alpha value) from shaded color. Run alpha computation at a higher sampling rate, then mix together later...

Algorithm
Runs a spatial filter on {color,alpha} with 13 taps in the following pattern,

. . . . . . . . .
. . x . . . x . .
. . . . x . . . .
. . x . . . x . .
x . . . x . . . x
. . x . . . x . .
. . . . x . . . .
. . x . . . x . .
. . . . . . . . .


Pixel weights are "gaussian * f(sampleOpaqueBackingZ,centerOpaqueBackingZ)", where f(s,c) decreases weight as opaque z-buffer value becomes non-matching (filter intent is that fog tends to have similar effect when the opaque backing is at a similar distance away),

r = min(c,s)/max(c,s);
return (r*r)*(r*r);


Runs a second spatial filter with 13 taps in the following pattern,

. . . . . . . . .
. . . . . . . . .
. . . . x . . . .
. . . x x x . . .
. . x x x x x . .
. . . x x x . . .
. . . . x . . . .
. . . . . . . . .
. . . . . . . . .


Pixel weights are "gaussian * f(sampleOpaqueBackingZ,centerOpaqueBackingZ)", where f(s,c) does something similar,

r = 1.0/(1.0+abs(c-s)/min(s,c));

Cannot remember why these two spatial filter passes have different z based weighting functions. Turns out the temporal filter has another depth weighting function. They both have some fixes for when depths are zero which I didn't bother to copy in. The temporal filter reprojects 5 points in a packed + pattern. Want to use reprojected Z (project reprojected backing Z into the current frame). Does a neighborhood clamp, then has reprojection weights based on "gaussian * f(sampleOpaqueBackingZ,centerOpaqueBackingZ)", where f(s,c) does something similar but with a depth bias,

r = 1.0/(1.0+abs(c-s)/c);

20150715

1536-2 : Assembling From the Nothing

Started bringing up a limited subset x86-64 assembler. The full x86-64 opcode encoding space is an unfortunate beast of complexity which I'd like to avoid. So I did...

Compromises
This prototype sticks to only exactly 4-byte or 8-byte instructions (8-byte only if the instruction contains a 32-bit immediate/displacement). The native x86-64 opcodes are prefix padded to fill the full 4-byte word. Given that x86-64 CPUs work in chunks of 16-bytes of instruction fetch, this makes it easy to maintain branch alignment visually in the code. Since x86-64 float opcodes are natively 4-bytes without the REX prefix, I'm self limiting to only 8 registers for this assembler, which is good enough for the intended usage. I'm not doing doubles and certainly not wasting time on vector instructions (have an attached GPU for that!). Supported opcode forms in classic Intel syntax,

op;
op reg;
op reg,reg;
op reg,imm32;
op reg,[reg];
op reg,[reg+imm8];
op reg,[reg+imm32];
op reg,[imm32];
op reg,[reg+reg]; <- For LEA only.


This is a bloody ugly list which needed translation into some kind of naming in which "op" changes based on the form. I borrowed some forthisms: @ for load, ! for store. Then added ' for imm8, " for imm32, and # for RIP relative [imm32]. A 32-bit ADD and LEA ends up with this mess of options (note . pushes word value on the stack, so A. pushes 0 for EAX in this context, and , pushes a hex number, and / executes the opcode word which assembles the instruction to the current assembly write position),

A.B.+/ .......... add eax,ebx;
A.1234,"+/ ...... add eax,0x1234;
A.B.@+/ ......... add eax,[rbx];
A.B.12,'@+/ ..... add eax,[rbx+0x12];
A.B.1234,"@+/ ... add eax,[rbx+0x1234];
A.LABEL.#@+/ .... add eax,[LABEL]; <- RIP relative
A.B.12,'+=/ ..... lea eax,[rbx+0x12];
A.B.C.+=/ ....... lea eax,[rbx+rcx*1];


Then using L to expand from 32-bit operand to 64-bit operand,

A.B.L+/ .......... add rax,rbx;
A.1234,L"+/ ...... add rax,0x1234;
A.B.L@+/ ......... add rax,[rbx];
A.B.12,L'@+/ ..... add rax,[rbx+0x12];
A.B.1234,L"@+/ ... add rax,[rbx+0x1234];
A.LABEL.L#@+/ .... add rax,[LABEL]; <- RIP relative
A.B.12,L'+=/ ..... lea rax,[rbx+0x12];
A.B.C.L+=/ ....... lea rax,[rbx+rcx*1];


Source Example With Google Docs Mockup Syntax Highlighting
Font and colors are not what I'm going for, just enough to get to the next step. This is an expanded example which starts building up enough of an assembler to boot and clear the VGA text screen. Some of this got copied from older projects in which I used "X" instead of "L" to mark the 64-bit operand (just noticed I need to fix the shifts...). I just currently copy from this to a text file which gets included into the boot loader on build.



From Nothing to Something
This starts by semi-self-documenting hand assembled x86 instructions via macros. So "YB8-L'![F87B8948,/]" reads like this,

(1.) Y.B.8-,L'! packed to a word name YB8-L'! with tag characters removed.
(2.) [ which starts the macro.
(3.) F87B8948 which is {48 (REX 64-bit operand), 89 (store version of MOV), 79 (modrm byte: edi,[rbx+imm8]), F8 (-8)}.
(4.) , which pushes the number on the data stack.
(5.) / which after , executes the empty word, which pops the data stack and writes 32-bit to the asm position.
(6.) ] which ends the macro.

Later YB8-L'! with ; appended can be used to assemble that instruction by interpreting the macro.

The first assembled words are $ which pushes the current assembly position on the stack, and $DRP (which is actually a bug which needs to be removed). The $! pops an address from the data stack, and stores the current assembly position to given address. This is later used for instruction build macros which do things like PSH` where the ` results in the dictionary address for the PSH word to be placed on the data stack. The end game is getting to the point where given one of the opcode forms, it is possible to write the following to produce a function which compiles an opcode,

C033403E,^`_;

Which pushes the 4-byte opcode base 0xC033403E, then the opcode name ^ for XOR, then runs the _ macro which assembles this into:

MOV eax,0xC033403E;
JMP X86-RM;


Immediately afterwards it is possible to execute the ^ word (call it) and assemble an XOR instruction. The X86-RM expects to get the REG and RM operands from the data stack with base instruction opcode data in EAX.

Making a Mess to Clean Up
This about concludes the worst part of getting going from nothing, except for the PTSD dreams where people only speak in mixed hex and x86 machine code: FUCOM! REX DA TEST JO. When placed into final context there will be a few KB of source to build an assembler which covers all functionality I need for the rest of the system. At this point I can easily add instructions and a few more of the opcode forms as they are they are needed. And it becomes very easy to write assembly like this,

A.A.^/ Y.B8000,"/ C.1F40,"/ L!REP/

Which is this in Intel syntax,

xor eax,eax; <- set eax to zero
mov edi,0xB8000; <- VGA text memory start address
mov ecx,0x1F40; <- 80x50 times two bytes per character
cld;
rep storq; <- using old CISC style slow form to "do:mov [edi],rax;add rdi,8;dec rcx;jnz do;"


20150714

1536-1 : The Programmer Addiction = Feedback

Continuing on the 1536-byte loader based system. Interpreter finished, under the 1536-byte goal. Second major goal is to get to the instant feedback productivity addiction loop going: modify, get feedback, repeat. Have simple ASCII to custom 64-character set encoding converter, and way to include the converter source text starting at sector 3 in the boot loader. First major test, getting an infinite loop or infinite reboot working. Source without any syntax coloring,
\1536-1 --- BOOTUP INTO SPIN OR REBOOT LOOP\

800000,     \PUSH BOOT-UP COMPILE POSITION\
BOOT:       \STORE IN BOOT WHICH GETS CALLED TO RUN SYSTEM\
FEEBFEEB,/  \WRITE OPCODE TO JUMP TO SELF\
EAEAEAEA,/  \OR WRITE OPCODE TO CRASH ON INVALID INSTRUCTION\
]           \END COMPILE - LOADER WILL THEN JMP TO BOOT WORD\
Loader sets up memory with dictionary at 1MB (2MB chunk with 1MB overflow), copies source to 4MB (4MB chunk maximum), then starts the compile position at 8MB (so 8MB and on is the rest of the memory on the system). Had one major bug getting the interpreter up, forgot that \ in NASM results in a line continuation even when in a comment, removing a line of a lookup table resulting in a crash. Tracking down bugs is very easy, add "JMP $" or "db 0xEA" in NASM to hang or reboot respectively.

Adjustments
Adjusted the character syntax.
- - Negate the 64-bit number, add dash to the string.
. - Lookup word in dictionary, and push 64-bit value from word entry onto data stack.
, - Push 64-bit number on data stack.
: - Lookup word in dictionary, pop 64-bit value from data stack to word entry.
; - Lookup word in dictionary, interpret string starting at address stored in word entry.
[ - Lookup word in dictionary, store pointer to source after the [ in the word entry, skip past next ].
] - When un-matched with ], this ends interpretation via RET.
\ - Ignore text until the next \.
/ - Lookup word in dictionary, call to address stored in dictionary entry.
` - Lookup word in dictionary, push address of word on data stack.

Space and every character above except the - char, clear the working word string and number. So , results in pushing a zero on the data stack. And / results in calling the empty word, which I've setup as a word that pops from the data stack and writes a 32-bit word to the compile position, then advances the compile position. This provides the base mechanics to start to create opcodes via manual hand assembly and build out an assembler, which is the topic of the next post...

Great Tube: Old computers did it better!

20150712

Solskogen 2015 and Misc Demo Tubes







Running on a ZX Evolution,




Oh How Programming Has Changed

Programming in the 1990's: power on computer, open up text file in source editor, edit, run shell script to compile, test, repeat.

Programming today: Put in windows password to get machine to wake up after automatically going to sleep, find that trackpad no longer works for some reason, pull usb mouse from another computer, click away popup of firefox asking for a security update right now, then find that Windows is also forcing an update and a reboot, find something else to do for a while, come back after machine reboots, bring up text file in source editor, edit, open developer command prompt for visual studio, find error message "ERROR: Cannot determine the location of the VS Common Tool folder", search on the internet for solution and find nothing that works, open up visual studio instead, get message that visual studio license no longer works, requires login, attempt to login with windows sign on, find that account has been temporarily disabled for some reason, go through process to re-enable account via email, click through message on email, re-login to windows sign on, get new message, account still disabled and for "security reasons" must re-enable via process through phone, choose SMS process, wait for a while, phone never gets SMS message, scramble to find another solution, try call method instead, finally get windows sign on re-enabled, finally get visual studio to work, build console project, click on option to not create a new directory, find it creates a new directory anyway, close visual studio, move around files, reopen, add source file to project, attempt to make a quick change to the code, notice that by default visual studio is reforming the code to something other than what is desired, look on internet to find out how to disable auto-format, disable auto-format, press F7 to compile, test, out of time for the day, repeat again tomorrow with a different selection of things which are randomly broken...

20150710

Inspiration Reboot

Quite inspired by the insane one still or video per day at beeple.tumblr.com. Attempting to get back in the grove of consistently taking a small amount of non-work time every day to reboot fun projects. I'm on week 2 now of probably a three month process of healing from a torn lower back, sitting in front of a computer is now low enough pain to have fun again...

1536
Setting a new 1536-byte (3x 512-byte sector) constraint for a bootloader + source interpreter which brings up a PC in 64-bit long-mode with a nice 8x8 pix VGA font and with 30720-bytes (60 sectors, to fill out one track) of source text for editor and USB driver. USB providing thumb drive access to load in more stuff. Have 1st sector bringing up VGA and long mode, 2nd sector with 64-character font, and last 512-byte sector currently in progress as the interpreter. Went full circle back to something slow, but dead simple: interpreter works on bytes as input. The following selection of characters appends simultaneously to a 60-bit 10 6-bit/char word string, and a 64-bit number,

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ#$@!+*-^&|=?<>_

Then giving a "color forth tag" like meaning to another fixed set of characters,

~ - Negate the 64-bit number.
. - Lookup word in dictionary, and push 64-bit value onto data stack.
, - Push 64-bit number on data stack.
: - Lookup word in dictionary, pop 64-bit value from data stack to word.
; - Write 32-bit number at compile position.
" - Lookup word in dictionary, interpret the string at address stored in word.
[ - Lookup word in dictionary, store compile position in word, append string from [ to ] compile position.
] - When un-matched with ], this ends interpretation via RET.
\ - Ignore text until the next \.
` - Lookup word in dictionary, call to address stored in word.

Those set of characters replace the "space" character in forth, and work like a post-fix tag working on either the string or number just parsed from input. The set of tags is minimal but flexible enough to build up a forth style macro language assembler, with everything defined in the source itself. More on this next time. One nice side effect of post-fix tags is that syntax highlight is trivial by reading characters backwards starting at the end of the block.

Sony Wega CRT HDTVs
The old Wega CRT HDTVs work quite well. They apparently are nearly fixed frequency 1080 interlaced with around 540 lines (or fields) per ~60 Hz frame, and unlike prior NTSC CRT TVs, they seem to not do any real progressive scanning. Taking a working 1080i modeline and converting it to 540p and driving the CRT results in the Wega initiating a "mode-reset" when it doesn't see the interlaced fields for the 2nd frame. However 480p modes do work (perhaps with an internal conversion to 1080i). Given that 1080i modes are totally useless as the 60Hz interlace flicker is horrible, and 540p won't work, these HDTVs should be complete garbage. However 720p works awesome as the TV's processing to re-sample to 1080i does not flicker any worse than 60Hz already does. In theory the even and odd fields (in alternating frames) share around 75% of an input line (540/720), and likely more if the re-sampling has some low-pass filtering. Drop in a PS4 game which has aliasing problems, and the CRT HDTV works like magic. These late model "hi-scan" Wega CRTs only had roughly 853 pixel width aperture grille: 853x540 from what was a 1920x1080 render is a good amount of super-sampling...

20150709

GPU Unchained ASCII Notes

Posted stills prior, here are the notes, hopefull the "pre" tag works for everyone...
================================================================================


                                                               .
                     ##X=--=X##  ##X=--=X##  ##      ##
                     ##          __      ##  ##      ##
                .    ##    -=##  ##      ##  ##      ##
  .                  ##      ##  ##X=--=X##  ##      ##               .
                     ##X=--=X##  ##          ##X=--=X##
          .                                 _               .
                              #                                 #        .
 .      .   #   # #=-=# #=-=# #=-=# #=-=# -=#   #=-=# #=-=# #=-=#         .
.:...    .. #   # #   # #     #   # ___ #   #   #   # # __# #   #   ...  :::..:.
:::::::::::.# . # # . # # .:. # : # #   # . = . # . # # ... # . #.::::::::::::::
|||||||'||| #=-=# # : # #=-=# # | # #=-=# #=-=# # : # #=-=# #=-=# ||||.|||||||||           
!!!''!;!!!!!.....!..!..!.....!.!!!.!.....!.....!.!!!.!.....!.....!!!!!!!!!'''!!!
:/:::/::::::::::::::::::::::::::'''''''''':::''::::::::::::::::::::/:::::::.::::
''''/   ''''';' ' '     '                              ';'    '' '  ''''''''';''  
   /     '  /                  ___                                   . '''  /
                                |imothy Lottes



================================================================================
         VISUALIZING ALIASING IN MOTION - STILL VS SCROLLING FONTS [a]
================================================================================

     "Evaluation criteria for correct sharpness
                                should be different for stills and video"

SPATIAL PRECISION VS SHARPNESS
------------------------------
 - In motion, sharpess must be reduced to have good spatial precision

LIVE EXAMPLE OF FONT RENDERING
------------------------------
 - 1/4 x 1/4 resolution rendering 
 - Render via stylized even field aperture grille (arcade Trinitron)
 - Font designed on pixel grid 
 - Font rendered via function of min 1D distance to capsules (for lines)


================================================================================
                         FILTERING DIGITAL ORIGAMI [b]
================================================================================

      "Filtering is critical 
              to convince the skeptical mind that a scene is believable"

LIVE EXAMPLE OF SIMPLE SDF SCENE
--------------------------------
 - 1/2 x 1/2 resolution tracing
 - Full resolution reconstruction
 - Lots of high frequency sub-pixel sized content

PROGRESSION TO SIMPLE SAMPLE JITTERED TEMPORAL FILTERING
--------------------------------------------------------
 - Simple gaussian resolve kernel (not enough samples/pix for negative lobes)
 - Temporal feedback also weights samples by rcp of reprojected luma difference


================================================================================
                   LIMITS OF PER-PIXEL SEARCH AND SDF LOD [c]
================================================================================

      "Highly variable frame timing is a poison for high refresh rates"

RAY HIT?
--------
 - Standard termination,
     if(abs(distanceToSurface) <= distanceAlongRay * coneRadius) hit
 - LOD termination,
     ... <= distanceAlongRay * (coneRadius + steps * lodFactor)) hit
 - Expands surfaces as ray march gets progressively more expensive

CHOOSING A GOOD FILTER KERNEL
-----------------------------
 - Biased: 
     1/(1 + abs(a-b) * const)
 - Still not great: 
     1/(1 + (abs(a-b) / max(minimum,min(a,b))) * const)
 - Better:
     square(1 - abs(a-b) / max(a,b,minimum))
     

================================================================================
            1/4 AREA TRACE+FEEDBACK + FULL AREA RECONSTRUCTION [d]
================================================================================

     "Maintaining higher spatial precision
               without increasing shading rate for higher quality visual"

VISUAL EXAMPLE
--------------   
 - Still tracing at sample-jittered half resolution (1/4 area).
 - Still sampling reprojection at half resolution.
 - But sampling reprojection from full resolution source.
 - And running reconstruction at full resolution.


================================================================================
                  HIERARCHICAL TRAVERSAL IN SINGLE KERNEL [e]
================================================================================

    "Reorder work to keep ALUs fully loaded,
                      ok to leverage poor data access patterns if ALU bound"

THREAD TO WORK MAPPING
----------------------
 - Not pixel to thread -> too much ALU under utilization after rays hit surf
 - Rather fixed distance estimator iterations per thread

SHADER
------
 - Fetch ray
 - For fixed traversal iterations
    - Distance estimation
    - Walk forward
    - If ray hit (aka "close enough")
       - Store result
       - Start on another ray

--------------------------------------------------------------------------------
                                  BREAD CRUMBS [e]
--------------------------------------------------------------------------------

  "When working in parallel, don't wait, if data isn't ready, just compute it"

TRAVERSAL ORDERING
------------------
 - Order rays by parent cones first in space filling curve,

     0 -> 12 -> 4589 -> and so on
          34    67ab
                cdgh
                efij

ACCELERATION
------------
 - Cones end by writing their stop depth into memory.
 - Child cones will check for completed parent result,
   otherwise will start from scratch.

--------------------------------------------------------------------------------
                                 IN THE NOISE [e]  
--------------------------------------------------------------------------------

                   "Degrade to pleasing noise when out of time"

TRADE RELATIVELY FIXED TIME FOR NOISE
-------------------------------------
 - Fixed maximum DE iterations -> not always going to traverse full scene
 - Ok fine, lets not require tracing all rays

RAY ORDERING FOR FULL RESOLUTION
--------------------------------
 - Fill ordering,

    0 7 E 5   0 . . .   0 7 . 5   0 7 . 5
    C 3 A 1   . 3 . 1   . 3 . 1   . 3 A 1
    8 F 6 D   . . . .   . . 6 .   8 . 6 .
    4 B 2 9   . . 2 .   4 . 2 .   4 B 2 9

 - Pattern defined by math (fast but perhaps not ideal)
 - Each frame gets different fill order (holes not always in same place)


================================================================================
                FILTERED RECONSTRUCTION FROM DIFFERENT DOMAIN [f]
================================================================================

360 DEG FISHEYE EXAMPLE
-----------------------
 - Rendering into an octahedron map
 - Each sample writes out {x,y} projected screen position
 - Distortion finds the nearest texel in the octahedron map
 - Then samples a neighborhood around the texel
 - Then uses the difference in pixel center and sample {x,y} to filter
 - Reprojection samples from the warped reconstructed frame
 - Ideally adjust the filter kernel based on domain distortion
    - Showing simple adjustment of kernel size here (anisotropic is better)

20150630

Sugar Free Peppermint Chocolate Chip Custard Ice Cream [SFPCCCIC]

EDIT: Photo and updated ingredients...



Been experimenting with ketogenic friendly ultra-low-carb ice cream. This, along with bacon-wrapped sour cream, is one of the perks of the extremely high fat and low carb life style. First experiment didn't go as planned, placing the result in the freezer was a mistake. Apparently sugar in normal ice cream actually is the key component which enables the ice cream to maintain a great texture when frozen. Second pass, I'm keeping the result in the fridge. Ingredients for part one,

1 pint - whole cream
3 - egg yokes
1/2 tsp - peppermint extract
6 drops - pure liquid stevia

Blend everything together in a blender on high. Pour in a pan on the stove, stirring and slowly bring up to 160 deg F. Pour in a chilled container, store in fridge until chilled. Extra ingredients for part two,

1 - Lindt 90% chocolate square

Pour chilled mixture in ice cream maker, along with chopped chocolate square. Lindt 90% is better than the lower cocoa thanks to it's higher fat content. Churn until mixture looks like ice cream. Add to chilled container, store in fridge until chilled. Eat. Turned out awesome. EDIT: 1/2 tsp of peppermint per pint of cream is about the max which seems to taste good.

20150624

AMD Fury X (aka Fiji) is a Beast of a GPU Compute Platform

Compute Raw Specs
Raw specs from wikipedia adjusted to per millisecond: comparing what the two vendors built around 600 mm^2 on 28 nm,

AMD FURY X: 8.6 GFlop/ms, 0.5 GB/ms, 0.27 GTex/ms
NV TITAN X: 6.1 GFlop/ms, 0.3 GB/ms, 0.19 GTex/ms


Or the same numbers in operations per pixel at 1920x1080 at 60 Hz,

AMD FURY X: 69 KFlop/pix, 4.0 KB/pix, 2.2 KTex/pix
NV TITAN X: 49 KFlop/pix, 2.4 KB/pix, 1.5 KTex/pix


Think about what is possible with 69 thousand flops per pixel per frame.

HBM
HBM definitely represents the future of bandwidth scaling for GPUs: a change which brings the memory clocks down and bus width up (512 bytes wide on Fury X vs 48 bytes wide on Titan X). This will have side effects on ideal algorithm design: ideal access granularity gets larger. Things like random access global atomics and random access 16-byte vector load/store operations become much less interesting (bad idea before, worse idea now). Working in LDS with shared atomics, staying in cache, etc, becomes more rewarding.

20150623

BenQ XL2730Z Blur Reduction vs CRT

TFT Central Review for the BenQ XL2730Z

The BenQ XL2730Z is one of the best 2560x1440 LCDs on the market. This display has a blur reduction mode which enables a strobe back-light (which only works properly with fixed 120Hz or 144Hz refresh rates). Its sharpness control also enables a reduction of sharpness, which can cut out a lot of the hard exact square pixel LCD feel, leaving an image which is more pleasing to look at. And it provides an exact pixel mode which instead of enlarging lower resolution modes, keeps a 1:1 mapping of input to output pixel, and draws the outside borders with black. So 1920x1080 still looks good, just doesn't fully fill the screen. Compared to other LCDs this display is awesome, and running with a 1000Hz gaming mouse at 144Hz with a strobe back-light is a wonderful experience in comparison to what most people are used to.

Compared to CRTs?
CRTs still easily surpass even these top of the line LCDs in quality. One of the core differences is that even the best LCDs cannot transition pixels fast enough for a true low persistence display. The BenQ for instance has frame cross-talk (seeing part of one or more frames). At the default "Area" setting (which controls the vertical position of minimal cross-talk), at 144Hz, up to 2 extra frames are partly visible at the top and bottom of the screen, however toward the center just 1 extra frame is partly visible. I have a feeling that variable cross-talk has to do with the fact that the back-light has to be a global strobe, but pixel rows are still scanned (changed) in the classic CRT scan order. So row to strobe timing is only best at one point on the screen. Likewise the LCD uses a column dither on the transition overdrive, even and odd pixel columns over-shoot then under-shoot respectively the transition such that the 2 column average is closer to the correct signal. The minimal 1 extra frame cross-talk is a product of the lack of ability to transition to the proper signal in one frame. This is quite distracting in comparison with the CRT. The CRT can easily offer low persistence at much lower frame rates without transition artifacts, while providing filtered pixels at a lower resolution while presenting a proper image instead of a lesson in cubism. So what would be 2560x1440x120Hz or 442 Mpix/s on the LCD, still does not compare to the quality of 1600x1200x76Hz or 146 Mpix/s on the CRT. And the CRT can use that 3x GPU performance for higher quality visuals.

Other Technology?
OLED isn't yet cost effective for consumers for desktop displays. For example, for $26K it is possible to get a Sony BVME250A - 25" Trimaster EL OLED Master Monitor. Sony's pro OLEDs seem to max out at 75Hz, not sure they even have a global strobe?

Prysm sells Laser Phosphor Display (LPD) Tiles: CRT like phosphor display driven instead by a laser. Each 4:3 aspect ratio tile is a near border-less 20"x15" (25" diagonal) with 320x240 or 427x320 resolution, scanned at 360 Hz according to their product sheet. I'm guessing these are also prohibitively expensive? If this is fixed 360 Hz scanning rate, that would also be a serious problem: a 90Hz input would be quad strobed, and that won't look good at all.