OpenGL-Terms.txt


- OpenGL vs. GLSL

	OpenGl is not a programming language, but a (graphics) API (or library); GLSL is a (shadding) programming language.
	OpenGL solution (e.g. game, simulator, or animation): application + rendering (or computing) programs.
	Application: orchestrate what to render (geometry) and supply (or consume) resources (other vertex attributes, uniforms, shader storage blocks, atomic counters, and textures) required (or generated) by rendering programs.
	Rendering programs: define how rendering is performed (based on supplied resources) e.g. coordinate space transformations, lighting models, visual effects (fog or shadows), etc.

- GPU architectures

	Hardware solution to implement rendering (graphics) pipeline.
	Major architecture types: immediate-mode rendering vs. tile-based rendering.
	
		Immediate-mode rendering (traditional): implement the logical rendering pipeline quite literally.
		Tile-based rendering: execute the rendering pipeline on a per-tile basis (i.e. execute many pipeline backend stages - such rasterization or framebuffer operations - separately for each individual tile after the pipeline frontend stages completed).

- Core profile vs. Compatibility profile

	OpenGL specification is "forked" into two profiles:
	
		Core profile: Modern; Removes a number of legacy features, leaving only those that are truly accelerated by current graphics hardware.
		Compatibility profile: Maintains backward compatibility with all revisions of OpenGL back to version 1.0.

- OpenGL extensions

	Enhance OpenGL core functionality.
	3 major classifications of extensions: vendor, EXT (written together by two or more vendors), and ARB (official part of OpenGL).

- Direct State Access (DSA)

	A functionality (core since version 4.5) that allows modifying OpenGL objects without having to bind them to the context.

- Fixed-function stage (block) vs. Programmable shader stage

	Fixed-function stage: Non-programmable stage of the rendering pipeline. Can be (slightly) customized setting up built-in variables of the pipeline.
	Programmable shader stage: Programmable stage of the rendering pipeline. Fully customizable.

- Rendering pipeline stages

	Front-end vs. Back-end
	
	Front-end (vertex processing): Vertex fetching/pulling (FF); Vertex shader; Tessellation (Tessellation Control shader + Tessellation Engine (FF) + Tessellation Evaluation shader); Geometry shader.
	Vertex post-processing stage (FF): Primitive assembly; Clipping; Viewport transformation; Culling; Rasterization.
	Back-end (fragment processing): Fragment shader; Framebuffer operations (FF); Pixel operations - testing (FF); Compute shader (special; single-stage pipeline).
	
	Vertex fetching/pulling: Provide (automatically) inputs to the vertex shader.
	Tessellation control shader: Calculate (LOD: Level of Detail) tessellation level (inner and outer) to be input into the tessellation engine and define data to be passed to the tessellation evaluation shader.
	Tessellation primitive generator (TPG; or just tessellator or tessellation engine): Generate new vertices based on specified tessellation levels.
	Tessellation evaluation shader: Calculate position (and others) for newly generated vertices.
	Geometry shader: Create/Discard geometry.
	
	Primitive assembly: Grouping of vertices into points (trivial), lines and triangles, i.e. primitives.
	Clipping: Determine potentially visible primitives (or which part of them). Occurs in Cartesian space (not in Clipping space; perspective division required). Results in normalized device coordinates (NDC; from -1.0 to 1.0 in x and y dimensions; from 0.0 to 1.0 in z dimension).
	Viewport transformation: Place geometry in the window (or viewport). Results in window coordinates (from (0, 0) - usually bottom left - to (width - 1, height - 1); Not pixels!!!).
	Culling: Determine whether the triangle faces toward (front-facing) or away (back-facing) from the viewer and discard desired.
	Rasterization: Determine which pixels are covered by the primitive (half-space based method for triangles). Fragment (need to be colored) generation.
	
	Fragment shader: Determine the color of each fragment.
	Framebuffer operations: Represent visible content of the screen and a number of additional regions of memory used to store per-pixel values other than color. Default framebuffer is provided by windowing system. Framebuffer object.
	Pixel operations - testing: Scissor test, stencil test (any purpose testing), depth test, blending (or logical operation) stage. Each uses its own framebuffer (single or multiple).
	
	Compute shader: Can be thought of as a separate pipeline that runs indepdendently of the other graphics-oriented stages. Work item and local workgroups.

- Degenerate primitive

	A line with zero length or a triangle with zero area (have to or more vertices in the exact same place). OpenGL discards (degenerate) this type of primitive.

- Barycentric Coordinates Interpolation

	Interpolate vertex data across the triangle's surface.

- Homogeneous coordinte system

	Used in projective geometry (four-component variables); Here, math is simpler than in regular Cartesian space (three-component variables).

- Perspective division

	Transformation (divide all four components of the position by the last, w component) from homogeneous coordinates (or homogeneous space) to Cartesian coordinates (or Cartesian space).
	Results in normalized device coordinates (or normalized device space).

- Vertex array object

	Represent vertex fetching stage of the OpenGL pipeline.
	Only one active (bound to the OpenGL context) at a time; required (mandatory) to execute drawing commands; setup is optional if configuration does not require any vertex attribute.
	Store all of the state related to the input to the OpenGL pipeline i.e. vertex attributes configuration (e.g. settings of dynamic values feeding - external data format and type, mapping between vertex attribute indices and buffer binding points, and data source - or index buffer binding).

- Vertex attribute

	An input variable to the vertex shader (i.e. input to the OpenGL graphics pipeline).
	It is how vertex (array) data is introduced into the OpenGL pipeline.
	If you don’t specify locations in shader code (not recommended approach), OpenGL will assign locations for you (it is possible to get these indices later).
	Can be indepdendently set up (i.e. each can be feed - value assignment - different way).
	
		Static value: same value for all vertices on drawing command invoke (e.g. color of the material bound to a game object). Set by command (not stored in a vertex array object, but a special context state). Enabled by default.
		Dynamic value: own value for each vertex on drawing command invoke (e.g. space coordinates - position, surface normal or texture coordinates of a vertex relative to the geometry of a game object). Feed from buffer (require specifying data format and its location). Require enabling.
	
	Depending on how data is laid out in memory (for dynamic values):
	
		Separate attributes: Located either in different buffers or at least at different locations (offsets) in the same buffer. Structure-of-arrays (SoA) data (a set of tightly packed, independent arrays of data).
		Interleaved attributes: Located in the same buffer. Array-of-structures (AoS) form of data.
	
	Per-vertex-attribute stored data is separated into to concepts: vertex attribute format and (source) buffer binding point.
	
		Vertex attribute format: size, type and normalization; relative offset from mapped buffer binding point base offset; mapped buffer binding point; enabled/disabled.
		Buffer binding point: source buffer object; base offset, stride and instance divisor (used for all vertex attributes pulling data from this binding point).
	
	It is recommended to use single (or as few as possible) large (vertex) buffer object to store vertex data - even to store data from multiple geometries. In any case, it is possible to use any desired buffer object configuration (e.g. per-vertex-attribute buffer object).
	It is recommended to group multiple vertex attributes into a single buffer binding point when possible:
	
		Data must be stored interleaved (because must use same stride value - vertex size) in the same buffer object; vertex attributes must be a) all pure (non-instanced) or b) all instanced using same divisor value.
		Switching the source vertex attribute data will be feed from (e.g. to draw new geometry batch) is simplified: just binding a different buffer object to corresponding buffer binding point.
		This approach was not possible in the past, because of the lack of buffer binding points - so vertex attributes where mapped one-by-one to corresponding buffer(s); switching the source suppossed multiple (as many as vertex attributes) OpenGL functions calls.

- Instanced (vertex) attribute

	Dynamic value (buffered; instanced - per-instance - arrays) vertex attribute whose value is updated (read from buffer) every certain number of instances (a.k.a. divisor; configurable).
	Formula for calculating fetching index: (instance / divisor ) + baseInstance
	Works with both instanced and non-instanced drawing commands.

- Target

	Type of use an object (e.g. buffer or texture object) is intended for.
	Depending on the number of available binding points: non-indexed vs. indexed.
	
	Non-indexed: a single generic binding point.
	Indexed: multiple indexed binding points additional to the generic one.
	
- Binding point

	Access point an object is intended to be bound to.
	Depending on intended use: generic vs. indexed.
	
	Generic: non-specific reference to corresponding target.
	
		Nominate an object (e.g. the buffer containing vertex indices for indexed drawing or command data for indirect drawing).
		Perform operations on nominated object via OpenGL functions (e.g. buffer - or texture - memory allocation).
		Let OpenGL context perform operations on nominated object when required.
	
	Indexed: specific reference (by index) to corresponding target.
	
		Only one object (full or a range) can be bound at a time.
		Terminology varies depending on the object (e.g. buffer binding point for a buffer object or texture/image unit for a texture object - sampler/image).

- Buffer

	Linear blocks (memory allocations) of untyped (generic; can be used for a number of purposes - even at same time) data.
	Buffer object: represent a buffer in OpenGL. Require naming (so it can be identified), memory allocation (so data can be put in and read from) and to be bound to a target (or buffer binding point - so it can be used by OpenGL).
	
		Name: identifier for a buffer object. It is possible to generate a buffer name without an associated buffer object, but never the other way around.
		Data store: (manually) allocated memory (server side; in the memory of the graphics card) for a buffer object. Depending on used allocation command, can become immutable (neither can be resized nor redefined - flags). It can be accessed (write and read) by a) using OpenGL commands (copy existing data) or b) mapping - using also OpenGL commands - the buffer oject (direct access - e.g. load from file).
	
	Target: non-indexed and indexed; a buffer can be bound to multiple binding points (generic and indexed) at same time.
	
		Vertex buffer object (VBO): common buffer object (same as any other buffer object) intended to store vertex data (so its name - indexed target GL_ARRAY_BUFFER) for a vertex array object.
		Element buffer object (EBO): common buffer object intended to store index data (non-indexed target GL_ELEMENT_ARRAY_BUFFER) for a vertex array object.
		Uniform buffer object (UBO): common buffer object intended to store uniform data (indexed target GL_UNIFORM_BUFFER) for a program object.
		Shader storage buffer object (SSBO): common buffer object intended to store shader data (indexed target GL_SHADER_STORAGE_BUFFER) for a program object.
		Atomic counter buffer object (ACBO): common buffer object intended to store atomic counter data (indexed target GL_ATOMIC_COUNTER_BUFFER) for a program object.
		Draw indirect buffer object (???): common buffer object to store parameters (non-indexed target GL_DRAW_INDIRECT_BUFFER) for a drawing command.
		Transform feedback buffer object (???): common buffer object to store pipeline frontend output data (indexed target GL_TRANSFORM_FEEDBACK_BUFFER) for a program object.

- Uniform

	Data that stays tha same for an entire primitive batch (e.g. transformation matrix of the geometry attached to a single game object) or longer (e.g. light sources of an entire scene).
	Pass data (not really a form of storage) directly from your application into any shader stage.
	Depending on usage in a program: active uniforms (affects the stage output; exposed by a fully linked program) vs. inactive uniforms (the compiler discards unused declared uniforms out).
	Depending on how they are declared: default block uniforms vs. uniform blocks.
	
	Default block uniforms (or non-buffered uniforms): uniforms declared in the default block - at the global scope in a shader (using GLSL uniform keyword).
	
		If you don’t specify locations in shader code (not recommended approach), OpenGL will assign locations for you (it is possible to get these indices later). All non-array/struct types will be assigned a single location.
	
	(Named) Uniform blocks (or buffered uniforms): uniforms declared in a named block (using GLSL interface blocks) whose values are stored in buffer objects.
	
		Efficient usage: single call to command to bind a uniform buffer object to a program (set values of all related uniform) - vs. multiple calls to command to set the value of all required uniforms from default block.
		Easy to reuse: use same uniform values on different programs.
		Easy to update (depending of represented data a uniform may be updated more or less frequently): update the value once in a unique buffer with a single call to command.
		Depending on memory layout (how data is stored in the buffer): standard layout vs. shared layout.
		
		Standard layout: assume (following a set of rules) specific locations (offset; can be directly specified in shader code) for members within the block. May leave some empty space between the various members of the block (making the buffer larger than it needs to be - performance loss).
		Shared layout (by default): let OpenGL decide where it would like the data (application needs to figure out - query - where to put the data so that OpenGL can read it; requires more work from the application) - most efficient.

- Shader storage block

	Buffered storage that can be both read and write (therefore more versatile than uniforms but slower access) by a shader.
	Declared in a named block (using GLSL interface blocks).
	Supports the more efficient "std430" packing layout qualifier.
	Unlimited (only by hardware resources) size.

- Atomic counter

	Buffered counter that can be both read and write by a shader.

- Texture

	Structured form of storage that can be both read and write by a shader.
	Most often used to store image data; also as an output alternative to default framebuffer.
	Require (similarly to buffers) creation (texture object and related name), memory allocation and target binding (to one of the multiple texture units - 0, by default, if not specified - of a target to be precise), always specifying the corresponding target (or texture type).
	Layouts: 1D, 2D (most common), 3D, rectangle (legacy 2D texture use case), cube map (related 6 square - 2D - images), buffer (buffered 1D texture use case), and multisample (multiple colors per texel), together with corresponding array forms (1D, 2D, cube map, and multisample array).
	Store multiple textures: higher order texture type vs. corresponding array texture type (e.g. 3D vs. 2D array to store multiple 2D texture type)? Array texture type not apply filtering between elements (or layers); require also using higher order texture type functions (e.g. memory allocation and data update).
	Usage depends on variable type used in shader code: sampling or imaging.
	
	KTX (Khronos TeXture format): image format that can store all of the formats supported by OpenGL and represent advanced features like mipmaps, cube maps, and so on.
	
	Texture coordinates: set of values (component number depends on texture type) used to read from a texture.
	
		Usually, pulled from per-vertex input, passed through unmodified and interpolated on vertex post-processing fixed-function stage.
		Normally, single per-vertex set is enough to access every texture of the material bound to a game object.
		Can be generated offline procedurally or assigned by hand by an artist using a modeling program and stored in a game object file.
		Usually, normalized: range between 0.0 and 1.0 (it is possible specifying texture coordinates out of range).
	
	Compression
	
		Benefits: reduce the amount of storage space required for image data; require less memory bandwidth (because the graphics processor needs to read less data when fetching from a compressed texture).
		Supported format types: generic (implementation specific), RGTC (one- and two-channel signed and unsigned textures), BPTC (8-bit per-channel normalized data and 32-bit per-channel floating-point data), ETC2 / EAC (extremely low bit-per-pixel applications), and others implementation specific (e.g. S3TC - earlier version of DXT format - and ETC1).
		Ask OpenGL to compress a texture in some formats when load (not recommended) vs. load compressed texture (stored in a file - imaging tools used for creating textures and other images allow to save data directly in a compressed format).
		No real difference from using uncompressed textures (GPU handles the conversion when it samples from the texture).
	
	Views
	
		Reuse texture data in one texture object with another (because existing textures might not match what shaders expect).
		Use cases: pretend that a texture of one type is actually a texture of a different type (e.g. 2D texture and single layer 2D array view); pretend that the data in the texture object is actually a different format (must be compatible i.e. same class) than what is really stored in memory (e.g. GL_RGBA32F texture and GL_RGBA32UI view).
		Most texture targets (all except buffer textures) can have at least a view of a texture with the same target.
		Once a view of a texture is created, it can be used like any other texture of the new type (i.e. on either OpenGL and shader code).

- Texture sampling

	Represent a whole texture.
	Require binding to a texture unit.
	
	Sampler object: store sampling parameters (i.e. control how texture data is read).
		
		Easy to use same sampling parameters for a large number of textures.
		Each texture object contains an embedded sampler object used when no sampler object is bound to the corresponding texture unit.
	
	Wrapping: define how values out of the real samples are calculated (when supplied texture coordinate falls outside expected range).
	
		Mode: repeat, mirrored repeat, clamp to edge, or clamp to border (color can be specified as a sampling parameter).
		Can be specified for each component of a texture coordinate individually.
	
	Filtering: define how values between the real samples are calculated (when supplied texture coordinate falls between texels).
	
		There is almost never a one-to-one correspondence between texels in the texture map and pixels on the screen (possible by texturing geometry in 2D graphics rendering - e.g. UI elements or text).
		Texture images are always either stretched (magnification) or shrunk (minification) as they are applied to geometric surfaces (orientation, among others, can also affect).
		Mode: nearest neighbor (blocky), or linear interpolation (blurry).
	
	Mipmaps: powerful texturing technique that can improve both the rendering performance and the visual quality of a scene.
	
		Use a whole series of images (levels) from largest to smallest (each one-half the size on each axis, until all dimensions reach 1) into a single texture.
		Require one-third more memory.
		Deal with scintillation effect: aliasing artifacts that appear on the surface of objects rendered very small on screen compared to the relative size of the texture applied (most noticeable when the camera or the objects are in motion).
		Support rectangular (non-square) texture.
		Can be automatically generated by OpenGL or precomputed (best quality).
		Only for minification, as all mip levels are smaller versions of the base texture (level 0) itself.
		Supports filtering between levels (i.e. level selection).
	
	Format (sized) and type: describe existing components (and its ordering) and data type (and its size - commonly same for all components).
	
		Internal: Format and type of texture data in server side (i.e. graphics card - GPU).
		External: Format and type of texture data in client side (i.e. application - CPU).
		
	Swizzle: rearrange the component (even if missing) order of texture data on the fly as it is read by the graphics hardware (e.g. generate grayscale image from single channel texture). It is also possible to assign a fixed value (0 or 1).
	
	Layout: physical arrangement of texture data in memory (e.g. left-to-right, top-to-bottom in memory with texels closely following each other). Specify separately on both unpack (how OpenGL will read texture data from client memory - or corresponding buffer) and pack (how OpenGL will write texture data into memory
	
		Byte swapping; LSB first: redefine byte/bit ordering. Useful when application share images with other machines (endianess).
		Row length; Skip pixel; Skip rows: select only a subrectangle of the entire rectangle of image data stored in memory.
		Alignment: fit machine byte alignment that optimizes moving pixel data to and from memory (e.g. 4 bytes multiple in machines with 32-bit words). Tightly packed and byte-aligned data uses 1 byte aligment.
		Image height; Skip images: delimit and access any desired subvolume or subset of slices of an array texture.

- Texture imaging

	Represent a single image from a texture.
	Require binding to an image unit.
	Explicit texel access (no filtering) to a texture for either read and write.

- Buffer vs. Texture

	Buffer
	
		Generic storage system, i.e. support any data type.
		Require target specification only on binding (not on creation nor memory allocation).
		Can be bound to multiple targets (even at same time).
		Variable types in shader code: vertex array, uniform, storage block, and atomic counter.
		Read and write (storage block and atomic counter).
		Support atomic operation and memory barriers (storage block and atomic counter).
	
	Texture
	
		Specific storage system, i.e. support only specified data type and format.
		Required target specification on creation, memory allocation and binding.
		Can be only bound to single target (the one specified on creation).
		Variable types in shader code: sampler and image.
		Read and write (image).
		Support atomic operation and memory barriers (image).

- Memory access safety mechanisms

	OpenGL is expected to be running in deeply pipelined and highly parallel systems: multithreading (multiple shader instances running simultaneously).
	Multiple processes accessing same memory at same time to perform read and write operations is potentially risky: incomplete cycle of operations and improper ordering of operations (memory hazards: RAW or read-after-write, WAW and WAR).
	OpenGL include mechanisms to alleviate and control these dangers: atomic memory operations and memory barriers.
	
	Atomic memory operation:
	
		Sequence of a read from memory potentially followed by a write to memory that must be uninterrupted for the result to be correct.
		Operations are serialized to avoid contention (condition on which two or more threads of execution attempt to use a single shared resource), i.e. only one will get to go at one time.
		On shader code (on a member of a shader storage block).
	
	Memory barrier (or memory ordering):
	
		Synchronize (await) access to memory (by subsystem).
		On both application and shader code.

- Texel

	Represent a color from a texture that is applied to a pixel fragment in the framebuffer.

- Fragment

	An element that may ultimately contribute to the final color of a pixel.

- Shader - Compiler / Program - Linker

	Every OpenGL implementation includes built-in compiler (generate binary file - internal form - from shader code) and linker (merge shader binaries into a program).
	It is possible retrieving information about both compiling and linking processes e.g. process status (is compiled/linked) or information log.

- Program pipeline

	Different architecture strategies (or configurations): monolithic vs. separated.
	Monolithic program: contain a shader (or more than one) for each active stage.
	
		Allow compiler to perform inter-stage optimizations during linking e.g. discard unused output from a vertex shader.
		Potential cost of flexibility (because of program specialization; high number of possible shader combinations) and/or performance (because of program switching).
		Interface matching occurs during program linking.
	
	Separated programs: multiple program objects each containing shaders for only a single stage in the pipeline (or just a few of the stages) i.e. each representing a section of the pipeline.
	
		Program objects are linked in separable mode (specified before linking) and attached (by stage; not full program required) to a program pipeline object.
		Attached program objects still benefict from inter-stage optimizations individually (own attached shaders), but not between each other; can be switched around at will with relatively little cost in performance.
		Interface matching occurs during program switching (every part of the interface is considered to be active and used during program linking).
		It is necessary to redeclare "gl_PerVertex" block (e.g. field "gl_Position") in the vertex shader stage.

	Interface matching: specific set of rules and optimization applied during program linking.
	
		Output variables of one shader stage end up connected to the inputs of the subsequent stage if they match exactly in name, type and qualification (plus others, depending on data type); it is possible to avoid matching by name if assign a location (with layout quialifier) to corresponding inputs and outputs.
		It is possible to query input and output interfaces (and corresponding resources) of a program.
		
	Shader subroutine: abstract function declaration (subroutine uniform) with multiple implementations (subroutine function or subroutine); C-like function pointer.	
	
		The value of a subroutine uniform (active subroutine index) must be always set (because default value is not defined); values are lost every time a new program is used (thus, reset requied).
		Setting the value of a single subroutine uniform take less time than switching a program object (even linked in saperable mode).
		If you don’t specify indices - "index" layout qualifier - for subroutines in shader code, OpenGL will assign indices for you (it is possible to get these indices later - by subroutine name); it is not possible to specify indices for subroutine uniforms.
	
	Program binary: binary object that represents the internal version (OpenGl implementation - graphics card vendor - specific; not portable between machines) of a program object.
	
		Program objects are linked in retrievable mode (specified before linking).
		Hand the binary back bypassing both compiler and linker (time saving).
		Wait until used the shaders a few times for real rendering before retrieving binaries (recommended); give the OpenGL implementation a chance to recompile any shaders that need it, and store a number of versions of each program in a single binary.

- Drawing commands

	Categorized as: Indexed / Non-Indexed - Instanced / Non-Instanced - Direct / Indirect
	Some can be considered supersets of others (e.g. a non-instanced drawing command is actually a call to a single-copy instanced command).
	
	Non-Indexed ("...Array...")
	
		Per-vertex data (values of vertex attributes) is issued to the vertex shader sequentially - in the order it appears in the buffers.
		It is possible to select a subset (first vertex and number of vertices) of the source data (e.g. draw specific geometry from a buffer storing data for multiple geometries).
		It is possible to use gl_VertexID GLSL built-in variable to index into per-vertex data (not recommended; using vertex attributes is much more efficient) or as a parameter into a shader function to generate (random) per-vertex data.
		Require much more memory storage (because vertices shared by multiple surfaces across a geometry require define - even just repeat - per-vertex data multiple times).
	
	Indexed ("...Element...")
	
		Include an indirection step to read from a buffer (target GL_ELEMENT_ARRAY_BUFFER) the indices of the vertices whose data is to be accessed.
		Data type storing index value can be 1 byte (GL_UNSIGNED_BYTE), 2 bytes (GL_UNSIGNED_SHORT) or 4 bytes (GL_UNSIGNED_INT) long.
		It is possible to select a subset (offset of first vertex index and number of vertex indices) of the source index data and to offset (base vertex) queried values before they are used (e.g. draw specific geometry from an index buffer object storing locally defined indices - starting at the value 0 - of multiple geometries).
		It is possible to use gl_BaseVertex GLSL built-in variable (OpenGL v4.6 - or "ARB_shader_draw_parameters" extension or manually inyected).
		All vertex attributes will use same index value; managing certain vertex attributes can be problematic in specific scenarios due to their nature (e.g. surface normal at vertices shared between surfaces forming a right angle - like a cube).
		Require less memory storage (shared vertices define per-vertex data only once).
	
	Non-Instanced
	
		Draw a single copy of a geometry.
	
	Instanced ("...Instanced")
	
		Draw many copies (thousands or millions) of the same geometry with a single function call (vertex data is sent to the server only once).
		It is possible to offset (base instance) instance index values before they are used (to query instanced vertex attributes).
		It is possible to use gl_BaseInstance GLSL built-in variable (OpenGL v4.6 - or "ARB_shader_draw_parameters" extension or manually inyected).
		Extremelly powerfull when used together with gl_InstanceID GLSL built-in variable to index into per-instance data (e.g. textures or (instanced) uniform arrays - whose elements represent each of the instances to be processed) or as a parameter into a shader function to generate (random) per-instance data (e.g. object pose or scale transformations).
		Allow the use of instanced vertex attributes (efficiency improvement againts indexing instanced data structures is not guaranteed - OpenGL implementation specific).
		Reduce synchronization (save bandwidth) between client and server.
		
	Direct:
	
		Pass drawing command parameters directly into the function call.
	
	Indirect ("...Indirect...")
	
		Include an indirection step to read from a buffer (target GL_DRAW_INDIRECT_BUFFER) the parameters of the drawing command (or multiple drawing commands) to be executed.
		Data storing drawing command parameters (distinguishing non-indexed from indexed) must use a predefined structure.
		Mixing parameters of non-indexed and indexed drawing commands in multidrawing mode is not allowed.
		It is possible to select a subset (offset of first drawing command parameter structure) of the source data.
		It is possible to use gl_DrawID GLSL built-in variable (OpenGL v4.6 - or "ARB_shader_draw_parameters" extension or manually inyected as an instanced vertex attribute shared among instances of each drawing command) to index into per-drawing-command data or as a parameter into a shader function to generate (random) per-drawing-command data.
		Reduce synchronization (save bandwidth) between client and server, and allow multiple threads on the CPU (not only the rendering one) and the GPU itself (generate own work - parameters do not make a round trip from the GPU to the application and back) to generate parameters for drawing commands. All the CPU does is decide when to issue the drawing command.
	
	Multidrawing is supported by both non-indexed and indexed drawing commands, either direct or indirect; all the geometries being drawn within the same multidrawing command must share same VAO and VBO (plus EBO in the case of indexed drawing), which should be already bound to corresponding targets or buffer binding points.

- Stripping

	Draw (indexed mode only) large strippified geometry within a single command (performance enhancement).
	Each triangle in a strip (excepting first one) is represented by a single vertex (i.e. less geometry to process in the pipeline); face direction of the strip is determined by the winding of the first triangle.
	Specialized tools are used to create long strips from unconnected triangle geometries.
	Primitive restart (optional - requires enabling): inform the position in the geometry (vertex index) where one strip (or fan or loop) ends and the next begins (i.e. reduce the number of required drawing commands); to be obtained from corresponding stripping tool; restart test happens before the base vertex value is added to queried index.

- Transform feedback

	Store per-vertex outputs from the last available programmable stage of the pipeline frontend (a.k.a. varyings).
	It becomes the last (highly configurable fixed-function) stage of the pipeline frontend when active.
	It is possible to stop pipeline processing after transform feedback execution by disabling rasterization if drawing is not required.
	
	Prior to the use of the transformation feedback it is required to specify varyings state and to prepare corresponding buffer objects.
	Varyings can be recorded separately (each into own buffer) or interleaved (all into single buffer, one after another; allows leaving gaps in the output structures stored in the transform feedback buffer and/or writing different sets of varyings interleaved into multiple buffers).
	It is not possible to manually select varyings transform feedback buffer binding points; they are automatically assigned sequentially based on specification (so the order matters).
	Transform feedback buffers size must be allocated carefully (ensure enough memory space to hold expected output data). Also need to be carefully bound into corresponding transform feedback buffer binding points.
	Buffers used for writing during transform feedback mode are also normally used for reading in subsequent drawing commands (which may cause an error if still bound to transform feedback buffer target).
	
	Transform feedback varyings state is maintained per-progam (i.e. different programs using even same shaders can record different sets of attributes).
	Chosen varyings (unlike remaining common shader outputs) may (or not) be processed further in the pipeline.
	Varyings must be specified before program linking occurs (and any change on varyings specification require to link corresponding program again); otherwise, the linker could discard such outputs in case it does not know their intended use (i.e. considered unused further in the pipeline).
	
	Transform feedback mode can be started (made active) and stopped, as well as paused and resumed.
	
		On activation, it starts writing data at the beginning of the buffers bound for transform feedback, overwriting what might be there already.
		Pausing after completing a draw call is typically done to load up different data or load up an entirely different program for another draw call (store outputs from multiple draw calls all to the same buffer).
		On resume, it continues to record the output of the frontend from wherever it left off in the transform feedback buffers.
		Varyings are recorded into the transform feedback buffers until transform feedback mode is exited or until the space allocated for the transform feedback buffers is exhausted.
	
	Changing its state (e.g. resize - reallocate - transform feedback buffers or change bindings) is not allowed while active.
	A transform feedback object is included with OpenGL context by default and, furthermore, it is also possible to create multiple transform feedback objects.
	Output data can be read back by the application, but it is mostly used as the source of data for subsequent drawing commands (e.g. particle systems or physics simulations; used along double-buffering technique).
	Moving massive parallel executions (e.g. state mutation: state data, update operations and results) from the CPU to the GPU results in faster processing and minimizes data uploading (between client - CPU - and server - graphics card).
	
	It is recommended (if possible; cumbersome for applications already using tessellation and/or geometry shading because of required ordering guarantees) to use the compute shader in modern graphics APIs to improve performance.
	From the OpenGL implementation point of view, when transform feedback mode is active the data must land in the transform feedback buffer in the same order as the input data:

		You have to wait until the last shader stage for the "n" primitive has been executed before you know where to put the resulting data from the "n+1" primitive (easy if the vertex shader is the only stage in the frontend pipeline, but a big deal if tessellation and/or geometry shader are involved - because it can discard/generate geometry).
		Tile-based rendering (architecture) also pose a problem compared to immediate-mode rendering. You have to process all the primitives in full (cannot drop any output) and in order, leading to a significant performance drop because GPU vendors can no longer play all their binning games and keep that data on-chip.

- Clipping

	Determine which primitives may be fully or partially visible and construct a set of primitives from them that will lie entirely inside the (canonical) view volume.
	Clip volume is made up by six standard clip planes; for each vertex, signed distance is computed againts each plane (a total of six signed distances).
		
		Absolute value determines how far the vertex is to the plane.
		Sign determines which side of the plane the vertex is on (positive, inside the plane; negative, outside).
	
	For very primitive type (point, live or triangle):
	
		If all the vertices lie inside the volume, primitive is trivially accepted.
		If some of the vertices lie inside the volume, clipping is required.
		If all the vertices lie outside the same plane, primitive is trivially discarded.
		If all the vertices lie outside the volume (but not the same plane), clipping may be required (to be checked; otherwise, primitive is discarded).
	
	Guard band (included in the GPU)
	
		A region outside clip volume in which primitives are allowed to pass through even though they will not be visible.
		In some cases, it may be faster to allow such primitives to pass through the clipping phase unmodified and instead have the rasterizer throw away parts that will not be visible.
		Primitives requiring clipping that fall inside the guard band are also considered to be trivially accepted; otherwise, clipping is performed.
		Does not affect neither trivially accepted nor discarded primitives.
		The width of the guard band is quite large - usually at least as big as the view volume itself.
	
	User-defined clipping
	
		A set of additional distances (OpenGL implementation specific) is available to the application that can be written inside the vertex, tessellation or geometry shaders.
		Required distances must be enabled by the application before being available for a shader; otherwise, written values are ignored.

- Tessellation

	The process of breaking (or subdividing) a large general-purpose primitive type referred to as a patch (GL_PATCHES constant) into many smaller primitives (i.e. triangles, lines or points).
	A patch is a group of vertices (or control points) whose size can range within [1, GL_MAX_PATCH_VERTICES) - maximum number is OpenGL implementation specific, but never less than 32.
	The number of vertices per patch may be specified by the application (GL_PATCH_VERTICES constant) before issuing corresponding drawing command (OpenGL uses it later - after vertex shader stage - to group vertices into patches); otherwise, default value is 3.
	Tessellation process is divided into three distinct stages fitted between the vertex shader and the geometry shader stages.
	
	Tessellation control shader (TCS)
	
		Programmable-shader stage responsible of two main jobs: determine the amount of tessellation to be applied; perform any special transformations on the input patch data.
		
		The amount of tessellation to be applied (on TPG stage) is described by outer and inner tessellation levels.
		Tessellation levels may be understood as the number of segments into which the corresponding edge (of the abstract patch) must be divided.
		Required tessellation levels and their interpretation depend on the abstract patch type (i.e. quad, triangle or isolines block) specified on TES stage - Warning! They do not refer the output patch from TCS stage.
		Values are stored in the array-form built-in variables gl_TessLevelOuter[4] and gl_TessLevelInner[2], respectively.
		When not missing, TCS must write values into corresponding variables; oherwise, they must be set from the application (GL_PATCH_DEFAULT_OUTER_LEVEL and GL_PATCH_DEFAULT_INNER_LEVEL constants, respectivelly) - Warning! Same values used along all patches within issued drawing command.
		It is possible to use static values, but most usually there are dynamically calculated (recommended only once per patch to improve performance; otherwise, all invocations must compute and write same values) based on a specific approach (e.g. distance from the camera) - Warning! Use same values for edges shared by two (or more) patches to avoid gaps and breaks.
		A negative or zero value in any of required tessellation levels would make the TPG discard the output patch.
		
		Performing any special transformations on the input patch data will produce an output patch (which is later sent untouch to TES stage) - Warning! Most usually no transformations are performed (i.e. output patch is the same as input one), but in any case the data transfer from the input patch to the output one must be done manually.
		When not missing, TCS must specify output patch size ("vertices = patch_size" output layout qualifier) which may (or not) be the same as input patch one (specified by the application); otherwise, it will be same as input patch size.
		The number of TCS invocations will be the same as output patch size i.e. TCS invocations do not run per input patch vertex, but per output patch vertex.
		Each TCS invocation has access to whole input patch data (per-vertex array-form built-in - gl_in[gl_MaxPatchVertices] and others - and user-defined variables) and can index it as desired - Warning! Per-vertex invocation identifier gl_InvocationID is commonly used when output patch size matches input patch one.
		Each TCS invocation can only write own (corresponding per-vertex invocation identifier) output data (per-vertex array-form built-in - gl_out[] - and user-defined variables), but can access other invocations output data (require synchronization - i.e. use of barriers).
		Similarly to tessellation levels, all TCS invocations can write into same per-patch user-defined variables ("patch" keyword); usually not in array-form, but there is no limitation on this.
		
		TCS stage is optional (it can be discarded when there is no need to neither calculate tessellation levels nor perform any special transformations on the input patch data; most usually this does not happen).
		When discarded, output patch data is directly sent from vertex shader stage to TES stage.
	
	Tessellation primitive generator (TPG)
	
		Fixed-function stage responsible for creating a set of new primitives from the abstract patch - Warning! TPG stage does not process the output patch from TCS stage, but a standard abstract patch type specified on TES stage.
		The purpose of TPG stage is to determine how many (tessellated) vertices to generate, in which order to generate them, and what kind of primitives to build out of them.
		Affecting factors: tessellation levels, abstract patch type, spacing, primitive ordering and primitive generation.
		It is only executed if TES stage is active in the current program or program pipeline (therefore many options controlling tessellation are specified in the TES stage).
		
		Each generated vertex has a normalized position (i.e. the coordinates are on the range [0, 1] - "uvw" space) within the abstract patch.
		This position component number (two or three) and their interpretation depend on the abstract patch type.
		It is provided to the TES stage via "gl_TessCoord" built-in input variable.
		
		TPG stage is commonly missunderstood to break the output patch from TCS stage down into quads, triangles or isolines blocks.
		It actually breaks the abstract patch of the type specified on TES stage (i.e. quad, triangle or isolines block) down into multiple primitives of the corresponding type (triangles for quad and triangle or lines for isolines block; points if explicitly specified).
		Position coordinates of tessellated vertices refer to corresponding unit-sized abstract patch (therefore they are said to be normalized).
		It is later in the TES stage where these tessellated vertices are interpolated along the output patch from TCS stage.
		Interpolation process is fairly straightforward when the size of the output patch from TCS stage matches the number of vertices of the abstract patch type; otherwise, it is cumbersome (actually possible?) to arrange tessellated vertices along the output patch.
		Thus, common patch sizes include 3 for triangular patches, 4 for quad patches or 16 for bicubic patches.
	
	Tessellation evaluation shader (TES)
	
		Programmable-shader stage responsible for either specifiying many tessellation options and interpolatig tessellated vertices data along the output patch from TCS stage.
		
		TES must specify abstract patch type (or domain), spacing, primitive ordering and primitive generation.
		Abstract patch type: quad, triangle or isolines block.
		Spacing: equal (by default) or fractional (even or odd; provide smoother behavior as the tessellation levels change).
		Primitive ordering: clockwise or counter-clockwise - Warning! Only controls the winding order of the triangles within the abstract patch; winding order of final triangles is based on computed positions on TES stage.
		Primitive generation: triangles for quad and triangle or lines for isolines block (i.e. implicitly defined by the abstract patch type); points if explicitly specified.
		
		The number of TES invocations will be the same as output patch size i.e. TES invocations do not run per output patch vertex, but per abstract patch tessellated vertex.
		Each TES invocation has access to whole output patch data (per-vertex array-form built-in - gl_in[gl_MaxPatchVertices] and others - and user-defined variables) and can index it as desired.
		Tessellated vertices positions will be processed according to abstract path type and output patch (e.g. as barycentric coordinates for triangle or horizontal-vertical coordinates for quad and isolines block).
		Outputs Each TCS invocation can only write own output data (per-vertex scalar-form built-in and user-defined variables).
		
		The TES stage must know perfectly how the control points that form the output patch of the TCS stage are arranged in memory so that the applied interpolation works correctly; otherwise, the results obtained will probably be different from what is expected.
		It is very important to be aware of how the coordinates of the tessellated vertices should be applied in the interpolation, as well as to know the type of interpolation to apply (it will depend both on the TCS stage output patch and on the abstract patch type).
	
	Main use case: dynamic level of detail (LOD) e.g. procedural terrain generation (with either height or displacement maps - more control; use of textures - or without - best overall performance; use of pure maths) or Bezier patches.