background image

Introduction to the

Introduction to the

Direct3D 11 Graphics 

Direct3D 11 Graphics 

Pipeline

Pipeline

Allison Klein

Senior Lead Program Manager

Direct3D

Microsoft

background image

Executive Summary: D3D 

Executive Summary: D3D 

11

11

Direct3D 11 focuses on scalability and 

performance, a creating a better 

development experience, and extending the 

reach of the GPU
Direct3D 11 is a strict superset of D3D 10 & 

10.1

D3D 11 adds support for new features to D3D 

10.1
The fastest way to move to Direct3D 11 is to start 

developing on Direct3D 10/10.1 today

Direct3D 11 will be available on Windows 

Vista & future Windows operating systems
Direct3D 11 will run on down-level hardware

You can all go back to sleep now.

background image

Outline

Outline

Overview
Drilldown
Summary

background image

Direct3D 10

Direct3D 10

Cleaner API 

 Easier coding than 

Direct3D 9
More efficient DDI 

 Driver 

Optimization
A more consistent experience across 

hardware!

Tighter specification
Elimination of caps

background image

Direct3D 10.1

Direct3D 10.1

Improved multisampling

MSAA depth access in shader
Expose sample positions
Explicit coverage control
4-sample MSAA required

Improved fixed-function blending

Per-MRT blend mode
16-bit integer blending

Arrays of cube maps

background image

Direct3D 10.1 (Cont’d)

Direct3D 10.1 (Cont’d)

Improved performance over Direct3D 

10

6-10% for common cases
20-30% for applications relying on MSAA 

such as deferred shading engines

Algorithms closer to Direct3D 11 and 

future APIs

background image

Direct3D 

Direct3D 

Issues/Opportunities

Issues/Opportunities

Scalability
Performance
Cross-Platform Content and 

Techniques
General-Purpose Data-Parallel 

Computing

background image

Outline

Outline

Overview
Drilldown
Summary

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

Current Authoring 

Current Authoring 

Pipeline

Pipeline

(Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“)

Sub-D Modeling

Animation

Displacement 

Map

Polygon Mesh

Generate LODs

background image

Character Authoring 

Character Authoring 

(Cont’d)

(Cont’d)

Trends

Denser meshes, more detailed characters

~5K triangles -> 30-100K triangles

More complex animations

Animations on polygon mesh vertices more costly

Result

Indirection in authoring pipeline more painful
Painful I/O issues

Solution

Use higher-level surface representation longer

Animate control cage (~5K vertices)
Generate displacement & normal maps

background image

Direct3D 11 Pipeline

Direct3D 11 Pipeline

Direct3D 10 

pipeline

Plus

Three new stages 

for Tessellation

Input 

Assembler

Vertex 

Shader

Pixel Shader

Hull Shader

Rasterizer

Output 

Merger

Tessellator

Domain 

Shader

Geometry 

Shader

Stream 
Output

background image

Hull Shader

Hull Shader (HS)

Hull Shader (HS)

Tessellator

Domain 

Shader

HS output:

Patch control pts 
after
Basis conversion

HS output:

• TessFactors (how much to 
tessellate) 

• fixed tessellator mode 
declarations

HS input:

 patch control 
pts

One Hull 
Shader 
invocation per 
patch

background image

Tessellator

Fixed-Function Tessellator 

Fixed-Function Tessellator 

(TS)

(TS)

Domain 

Shader

Hull 

Shader

TS input:

• TessFactors (how much to 
tessellate)

• fixed tessellator mode 
declarations

TS output:

• U V {W} domain 
points

TS output: 

• topology
(to primitive 
assembly)

Note: 
Tessellator 
does not see 
control points

Tessellator 
operates 
per patch

background image

Domain Shader (DS)

Domain Shader (DS)

Domain 

Shader

Hull 

Shader

Tessellator

DS input:

• U V {W} domain 
points

DS input:

• control points

• TessFactors

DS output:

• one vertex

One Domain 
Shader 
invocation per 
point from 
Tessellator

background image

Direct3D 11 Pipeline

Direct3D 11 Pipeline

Input 

Assembler

Vertex 

Shader

Pixel Shader

Hull Shader

Rasterizer

Output 

Merger

Tessellator

Domain 

Shader

Geometry 

Shader

Stream 
Output

D3D11 HW 

Feature
D3D11 Only
Fundamental 

primitive is patch 

(not triangle)
Superset of Xbox 

360 tessellation

background image

displacement

map

Evaluate 

surface

including

displacement

domain shader

Example Surface Processing 

Example Surface Processing 

Pipeline

Pipeline

patch

control points

Animate/skin

Control

Points

transformed

control points

vertex shader

Transform basis,

Determine how

much to tessellate

control points

in Bezier patch

U V {W} 

domain points

Single-pass process!

Sub-D Patch

Bezier Patch

hull shader

Tess 
Factors

Tessellate!

tessellator

background image

New Authoring Pipeline

New Authoring Pipeline

(Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“)

Sub-D Modeling

Animation

Displacement 

Map

Optimally Tessellated 

Mesh

GPU

background image

Tessellation: Summary

Tessellation: Summary

Helps us get closer to eliminating “pointy heads”
Scales visual quality across PC hardware 

configurations
Supports performance increases

Coarse model = compression, faster I/0 to GPU
Rendering tailored to each end user’s hardware

Better cross-platform (Windows + Xbox 360) 

development experience

Xbox 360 has a subset of D3D11’s tessellation
Parity = ease of cross-platform development
Extra features = innovation for Windows gaming

Render content as the artist created it!

background image

Want to Know More?

Want to Know More?

“Direct3D 11 Tessellation”

Tuesday, 4:00-4:55pm (Next)

Kev Gee (Microsoft)

“Advanced Topics in GPU Tessellation”

Wednesday, 10:15-11:10am

Natasha Tatarchuk (AMD)

“Water-Tight, Textured, Displaced Subdivision 

Surface Tessellation Using Direct3D 11”

Wednesday, 1:30-2:25pm

Ignacio Castano (NVIDIA)

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

GPGPU = Data Parallel 

GPGPU = Data Parallel 

Computing

Computing

GPU performance continues to grow
Many applications scale well to 

massive parallelism without tricky 

code changes
Direct3D is the API for talking to GPU
How do we expand Direct3D to 

GP

GPU?

background image

Direct3D 11 Pipeline

Direct3D 11 Pipeline

Direct3D 10 

pipeline

Plus

Three new stages 

for Tessellation

Plus

Compute Shader

Input 

Assembler

Vertex 

Shader

Pixel Shader

Hull Shader

Rasterizer

Output 

Merger

Tessellator

Domain 

Shader

Geometry 

Shader

Stream 
Output

Compute 

Shader

Data Structure

background image

Integration with Direct3D

Integration with Direct3D

Fully supports all Direct3D resources
Targets graphics/media data types
Evolution of DirectX HLSL
Graphics pipeline updated to emit 

general data structures…
…which can then be manipulated by 

compute shader…
And then rendered by Direct3D again

background image

Example Scenario

Example Scenario

Input 

Assembler

Vertex 

Shader

Pixel Shader

Hull Shader

Rasterizer

Output 

Merger

Tessellator

Domain 

Shader

Geometry 

Shader

Stream 
Output

Compute 

Shader

Data Structure

Render scene
Write out scene 

image
Use Compute for   

image post-

processing
Output final image

background image

Target Applications

Target Applications

Image/Post processing:

Image Reduction
Image Histogram
Image Convolution
Image FFT

A-Buffer/OIT
Ray-tracing, radiosity, etc.
Physics
AI

background image

Compute Shader: 

Compute Shader: 

Summary

Summary

Enables much more general 

algorithms
Transparent parallel processing 

model
Full cross-vendor support

Broadest possible installed base

background image

Want to Know More?

Want to Know More?

“Direct3D 11 Compute Shader—             

 More Generality for Advanced 

Techniques”

Wednesday, 4:00-4:55pm

Chas Boyd (Microsoft)

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

Multithreading Today

Multithreading Today

Physics

Graphics

AI

GPU

background image

Multithreading Today

Multithreading Today

Physics

CPU-Bound Graphics

AI

GPU

background image

D3D11 Multithreading 

D3D11 Multithreading 

Usage

Usage

Enables distribution across threads of

Application code
Runtime
Driver

Device: free threaded resource 

creation
Immediate Context: your single 

primary device for state & draws
Deferred Contexts: your per-thread 

devices for state & draws
Display Lists: Recorded sequence of 

graphics commands

background image

Direct3D 11 

Direct3D 11 

Multithreading

Multithreading

Now, the following can be distributed 

across threads:
Application
Direct3D 11 Runtime
Direct3D 11 Drivers
Updated Direct3D 10 and 10.1 

Drivers

background image

Direct3D 11 

Direct3D 11 

Multithreading

Multithreading

Application

Application

Direct3D 11 Runtime

Direct3D 11 Runtime

Direct3D 10/10.1 

HW

Existing 10/10.1 

Drivers

Direct3D 11 HW

Direct3D 11 Driver

Direct3D 11 Driver

background image

Direct3D 11 

Direct3D 11 

Multithreading

Multithreading

Application

Application

Direct3D 11 Runtime

Direct3D 11 Runtime

Direct3D 10/10.1 

HW

New 10/10.1 Drivers

Direct3D 11 HW

Direct3D 11 Driver

Direct3D 11 Driver

background image

Multithreading: Summary

Multithreading: Summary

Improves performance
Scalable across hardware 

configurations in two ways:

# of CPUs
Graphics cards/drivers

Better cross-platform (Windows+Xbox 

360) development experience

background image

Want to Know More?

Want to Know More?

“Multithreaded Rendering for Games”

Wednesday, 1:30-2:25pm
Matt Lee (Microsoft)

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

Shader Issues Today

Shader Issues Today

Shaders getting bigger, more complex
Shaders need to target wide range of 

hardware 
Two approaches today:

Write specialized shaders

Good: Build optimal shaders as specializations
Bad: Generates lots of shaders

Write “one shader to rule them all”

Combines multiple shaders
Good: Reduces shader binding changes
Bad: Code is complex

Answer: Subroutines

background image

Shader Subroutines

Shader Subroutines

Über-shader

foo (…) {

if (m == 1) { 
// do material 1
} else if (m == 2) {
// do material 2
}
if (l == 1) {
// do light model 1
} else if (l == 2) {
// do light model 2
}

}

Dynamic Subroutine

Material1(…) { … }
Material2(…) { … }
Light1(…) { … }
Light2(…) { … }

foo(…) {

(*material)(…);
(*light)(…);

}

Application binds appropriate 

*material, *light

background image

Shader Subroutines

Shader Subroutines

Details

Calls must be fast
Binding applies to all primitives in a Draw call
Binding operation must be fast
Need parameter passing mechanism
Need access to textures, samplers, etc. 

Advantages

Reduce register usage in Über-shaders

Not worst case of all if statements

Allows specialization of subroutines

background image

Want to Know More?

Want to Know More?

“High Level Shader Language (HLSL) 

Update—Introducing Version 5.0”

Tuesday, 5:05-6:00pm

Michael Oneppo (Microsoft)

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

Why New Texture 

Why New Texture 

Formats?

Formats?

Existing block palette interpolations 

too simple
Results often rife with blocking 

artifacts
No high dynamic range (HDR) 

support
NB: All are issues we heard from 

developers

background image

Two New BC’s for 

Two New BC’s for 

Direct3D11

Direct3D11

BC6 (aka BC6H)

High dynamic range
6:1 compression (16 bpc RGB)
Targeting high (not lossless) visual 

quality

BC7

LDR with alpha 
3:1 compression for RGB or 4:1 for 

RGBA
High visual quality

background image

New BC’s: Compression

New BC’s: Compression

Block compression (unchanged)

Each block independent
Fixed compression ratio

Multiple block types (new)

Tailored to different types of content
Smooth gradients vs. noisy normal maps
Varied alpha vs. constant alpha

Also new: decompression results must be bit-accurate with spec

background image

Multiple Block Types

Multiple Block Types

Different numbers of color interpolation 

lines

Less variance in one block means:

1 color line
Higher-precision endpoints

More variance in one block means:

2 (BC6 & 7) or 3 (BC7 only) color lines
Lower-precision endpoints and interpolation bits

Different numbers of index bits

2 or 3 bits to express position on color line

Alpha

Some blocks have implied 1.0 alpha
Others encode alpha

background image

Partitions

Partitions

When using multiple color lines, each 

pixel needs to be associated with a 

color line

Individual bits to choose is expensive

For a 4x4 block with 2 color lines

16

2

 possible partition patterns

16 to 64 well-chosen partition patterns 

give a good approximation of the full set
BC6H: 32 partitions
BC7: 64 partitions, shares first 32 with 

BC6H

background image

Example Partition Table

Example Partition Table

A 32-partition table for 2 color lines

background image

Comparisons

Comparisons

Orig

BC3

Orig

BC7

Abs Error

background image

Comparisons

Comparisons

Orig

BC3

Orig

BC7

Abs Error

background image

Comparisons

Comparisons

Abs Error

HDR Original at

given exposure

BC6 at

given exposure

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

A Plethora of Other 

A Plethora of Other 

Features

Features

Addressable Stream 

Out
Draw Indirect
Pull-model attribute 

eval
Improved Gather4
Min-LOD texture 

clamps
16K texture limits
Required 8-bit 

subtexel, submip 

filtering precision

Conservative oDepth
2 GB Resources
Geometry shader 

instance programming 

model
Optional double 

support
Read-only depth or 

stencil views

background image

Outline

Outline

Overview
Drilldown

Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features

Summary

background image

Direct3D 11

Direct3D 11

Direct3D 11 is strict superset of Direct3D 10 

& 10.1

Direct3D 11 adds support for features like 

multithreading, tessellation, compute to 

Direct3D 10.1
The fastest way to move to Direct3D 11 is to 

start developing on Direct3D 10/10.1 today

Direct3D 11 will be available on Windows 

Vista and future Windows operating systems
Direct3D 11 will run on down-level hardware

Multithreading!
Direct3D 10.1, 10, and 9 hardware/drivers
Full functionality (for example, tessellation) will 

require Direct3D 11 hardware

background image

When Can I Get It?

When Can I Get It?

Preview bits will be in November 

2008 SDK

Will work on Windows Vista
Will run on Direct3D10/10.1 hardware
Full documentation, samples, etc.

background image

Questions?

Questions?

background image

www.xnagamefest.com

© 2008 Microsoft Corporation. All rights reserved.

This presentation is for informational purposes only. 

Microsoft makes no warranties, express or implied, in this 

summary.


Document Outline