dplug.graphics.stb_image_resize source code

1 /* stb_image_resize - v0.96 - public domain image resizing
2    by Jorge L Rodriguez (@VinoBS) - 2014
3    http://github.com/nothings/stb
4 
5    Written with emphasis on usability, portability, and efficiency. (No
6    SIMD or threads, so it be easily outperformed by libs that use those.)
7    Only scaling and translation is supported, no rotations or shears.
8    Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
9 
10    QUICKSTART
11       stbir_resize_uint8(      input_pixels , in_w , in_h , 0,
12                                output_pixels, out_w, out_h, 0, num_channels)
13 
14       stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
15                                output_pixels, out_w, out_h, 0,
16                                num_channels , alpha_chan  , 0)
17       stbir_resize_uint8_srgb_edgemode(
18                                input_pixels , in_w , in_h , 0,
19                                output_pixels, out_w, out_h, 0,
20                                num_channels , alpha_chan  , 0, STBIR_EDGE_CLAMP)
21                                                             // WRAP/REFLECT/ZERO
22 
23    FULL API
24       See the "header file" section of the source for API documentation.
25 
26    ADDITIONAL DOCUMENTATION
27 
28       SRGB & FLOATING POINT REPRESENTATION
29          The sRGB functions presume IEEE floating point. If you do not have
30          IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
31          a slower implementation.
32 
33       MEMORY ALLOCATION
34          The resize functions here perform a single memory allocation using
35          malloc. To control the memory allocation, before the #include that
36          triggers the implementation, do:
37 
38             #define STBIR_MALLOC(size,context) ...
39             #define STBIR_FREE(ptr,context)   ...
40 
41          Each resize function makes exactly one call to malloc/free, so to use
42          temp memory, store the temp memory in the context and return that.
43 
44       DEFAULT FILTERS
45          For functions which don't provide explicit control over what filters
46          to use, you can change the compile-time defaults with
47 
48             #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
49             #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
50 
51          See stbir_filter in the header-file section for the list of filters.
52 
53       NEW FILTERS
54          A number of 1D filter kernels are used. For a list of
55          supported filters see the stbir_filter enum. To add a new filter,
56          write a filter function and add it to stbir__filter_info_table.
57 
58       MAX CHANNELS
59          If your image has more than 64 channels, define STBIR_MAX_CHANNELS
60          to the max you'll have.
61 
62       ALPHA CHANNEL
63          Most of the resizing functions provide the ability to control how
64          the alpha channel of an image is processed. The important things
65          to know about this:
66 
67          1. The best mathematically-behaved version of alpha to use is
68          called "premultiplied alpha", in which the other color channels
69          have had the alpha value multiplied in. If you use premultiplied
70          alpha, linear filtering (such as image resampling done by this
71          library, or performed in texture units on GPUs) does the "right
72          thing". While premultiplied alpha is standard in the movie CGI
73          industry, it is still uncommon in the videogame/real-time world.
74 
75          If you linearly filter non-premultiplied alpha, strange effects
76          occur. (For example, the 50/50 average of 99% transparent bright green
77          and 1% transparent black produces 50% transparent dark green when
78          non-premultiplied, whereas premultiplied it produces 50%
79          transparent near-black. The former introduces green energy
80          that doesn't exist in the source image.)
81 
82          2. Artists should not edit premultiplied-alpha images; artists
83          want non-premultiplied alpha images. Thus, art tools generally output
84          non-premultiplied alpha images.
85 
86          3. You will get best results in most cases by converting images
87          to premultiplied alpha before processing them mathematically.
88 
89          4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
90          resizer does not do anything special for the alpha channel;
91          it is resampled identically to other channels. This produces
92          the correct results for premultiplied-alpha images, but produces
93          less-than-ideal results for non-premultiplied-alpha images.
94 
95          5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
96          then the resizer weights the contribution of input pixels
97          based on their alpha values, or, equivalently, it multiplies
98          the alpha value into the color channels, resamples, then divides
99          by the resultant alpha value. Input pixels which have alpha=0 do
100          not contribute at all to output pixels unless _all_ of the input
101          pixels affecting that output pixel have alpha=0, in which case
102          the result for that pixel is the same as it would be without
103          STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
104          input images in integer formats. For input images in float format,
105          input pixels with alpha=0 have no effect, and output pixels
106          which have alpha=0 will be 0 in all channels. (For float images,
107          you can manually achieve the same result by adding a tiny epsilon
108          value to the alpha channel of every image, and then subtracting
109          or clamping it at the end.)
110 
111          6. You can suppress the behavior described in #5 and make
112          all-0-alpha pixels have 0 in all channels by #defining
113          STBIR_NO_ALPHA_EPSILON.
114 
115          7. You can separately control whether the alpha channel is
116          interpreted as linear or affected by the colorspace. By default
117          it is linear; you almost never want to apply the colorspace.
118          (For example, graphics hardware does not apply sRGB conversion
119          to the alpha channel.)
120 
121    CONTRIBUTORS
122       Jorge L Rodriguez: Implementation
123       Sean Barrett: API design, optimizations
124       Aras Pranckevicius: bugfix
125       Nathan Reed: warning fixes
126 
127    REVISIONS
128       0.97 (2020-02-02) fixed warning
129       0.96 (2019-03-04) fixed warnings
130       0.95 (2017-07-23) fixed warnings
131       0.94 (2017-03-18) fixed warnings
132       0.93 (2017-03-03) fixed bug with certain combinations of heights
133       0.92 (2017-01-02) fix integer overflow on large (>2GB) images
134       0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
135       0.90 (2014-09-17) first released version
136 
137    LICENSE
138      See end of file for license information.
139 
140    TODO
141       Don't decode all of the image data when only processing a partial tile
142       Don't use full-width decode buffers when only processing a partial tile
143       When processing wide images, break processing into tiles so data fits in L1 cache
144       Installable filters?
145       Resize that respects alpha test coverage
146          (Reference code: FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage:
147          https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp )
148 */
149 /**
150 Resizer ported to D from C. Removed a few features that did'nt make sense in Dplug.
151 Added Ryhor Spivak work on Lanczos filter... also added a few more lanczos kernels.
152 Copyright: (c) Guillaume Piolat (2021)
153 */
154 module dplug.graphics.stb_image_resize;
155 
156 
157 import core.stdc.stdlib: malloc, free;
158 import core.stdc.string: memset;
159 
160 import inteli.smmintrin;
161 import inteli.math;
162 
163 import dplug.core.math : fast_fabs, fast_pow, fast_ceil, fast_floor, fast_sin;
164 import dplug.core.vec;
165 
166 
167 nothrow:
168 @nogc:
169 
170 
171 //////////////////////////////////////////////////////////////////////////////
172 //
173 // Easy-to-use API:
174 //
175 //     * "input pixels" points to an array of image data with 'num_channels' channels (e.g. RGB=3, RGBA=4)
176 //     * input_w is input image width (x-axis), input_h is input image height (y-axis)
177 //     * stride is the offset between successive rows of image data in memory, in bytes. you can
178 //       specify 0 to mean packed continuously in memory
179 //     * alpha channel is treated identically to other channels.
180 //     * colorspace is linear or sRGB as specified by function name
181 //     * returned result is 1 for success or 0 in case of an error.
182 //       #define assert() to trigger an assert on parameter validation errors.
183 //     * Memory required grows approximately linearly with input and output size, but with
184 //       discontinuities at input_w == output_w and input_h == output_h.
185 //     * These functions use a "default" resampling filter defined at compile time. To change the filter,
186 //       you can change the compile-time defaults by #defining STBIR_DEFAULT_FILTER_UPSAMPLE
187 //       and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or you can use the medium-complexity API.
188 
189 int stbir_resize_uint8(const(ubyte)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
190                        ubyte* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
191                        int num_channels, int filter, void *alloc_context)
192 {
193     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
194                                    output_pixels, output_w, output_h, output_stride_in_bytes,
195                                    0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT8, filter, filter,
196                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
197 }
198 
199 int stbir_resize_uint16(const(ushort)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
200                        ushort* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
201                        int num_channels, int filter, void *alloc_context)
202 {
203     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
204                                    output_pixels, output_w, output_h, output_stride_in_bytes,
205                                    0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT16, filter, filter,
206                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
207 }
208 
209 
210 // The following functions interpret image data as gamma-corrected sRGB.
211 // Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
212 // or otherwise provide the index of the alpha channel. Flags value
213 // of 0 will probably do the right thing if you're not sure what
214 // the flags mean.
215 
216 enum STBIR_ALPHA_CHANNEL_NONE      = -1;
217 
218 // Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
219 // use alpha-weighted resampling (effectively premultiplying, resampling,
220 // then unpremultiplying).
221 enum STBIR_FLAG_ALPHA_PREMULTIPLIED = (1 << 0);
222 
223 // The specified alpha channel should be handled as gamma-corrected value even
224 // when doing sRGB operations.
225 enum STBIR_FLAG_ALPHA_USES_COLORSPACE = (1 << 1);
226 
227 int stbir_resize_uint8_srgb(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
228                             ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
229                             int num_channels, int alpha_channel, int flags, void* alloc_context, int filter)
230 {
231     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
232                                    output_pixels, output_w, output_h, output_stride_in_bytes,
233                                    0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
234                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB);
235 }
236 
237 alias stbir_edge = int;
238 enum : stbir_edge
239 {
240     STBIR_EDGE_CLAMP   = 1,
241     STBIR_EDGE_REFLECT = 2,
242     STBIR_EDGE_WRAP    = 3,
243     STBIR_EDGE_ZERO    = 4,
244 }
245 
246 
247 //////////////////////////////////////////////////////////////////////////////
248 //
249 // Medium-complexity API
250 //
251 // This extends the easy-to-use API as follows:
252 //
253 //     * Alpha-channel can be processed separately
254 //       * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
255 //         * Alpha channel will not be gamma corrected (unless flags&STBIR_FLAG_GAMMA_CORRECT)
256 //         * Filters will be weighted by alpha channel (unless flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
257 //     * Filter can be selected explicitly
258 //     * uint16 image type
259 //     * sRGB colorspace available for all types
260 //     * context parameter for passing to STBIR_MALLOC
261 
262 alias stbir_filter = int;
263 enum : stbir_filter
264 {
265     STBIR_FILTER_DEFAULT      = 0,  // use same filter type that easy-to-use API chooses
266     STBIR_FILTER_BOX          = 1,  // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
267     STBIR_FILTER_TRIANGLE     = 2,  // On upsampling, produces same results as bilinear texture filtering
268     STBIR_FILTER_CUBICBSPLINE = 3,  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
269     STBIR_FILTER_CATMULLROM   = 4,  // An interpolating cubic spline
270     STBIR_FILTER_MITCHELL     = 5,  // Mitchell-Netrevalli filter with B=1/3, C=1/3
271     STBIR_FILTER_LANCZOS2     = 6,  // Lanczos 2
272     STBIR_FILTER_LANCZOS2_5   = 7,  // Lanczos 2.5
273     STBIR_FILTER_LANCZOS3     = 8,  // Lanczos 3
274     STBIR_FILTER_LANCZOS4     = 9,  // Lanczos 4
275     STBIR_FILTER_MK_2013      = 10, // Magic Kernel, without sharpening
276     STBIR_FILTER_MKS_2013_86  = 11, // Magic Kernel + Sharp 2013, but with only 86% sharpening (Dplug Issue #729)
277     STBIR_FILTER_MKS_2013     = 12, // Magic Kernel + Sharp 2013 (the one recommended by John Costella in 2013)
278     STBIR_FILTER_MKS_2021     = 13, // Magic Kernel + Sharp 2021 (the one recommended to us by John Costella in 2022)
279 
280     // To be continued, as John Costella has other kernels...
281 }
282 
283 alias stbir_colorspace = int;
284 enum : stbir_colorspace 
285 {
286     STBIR_COLORSPACE_LINEAR,
287     STBIR_COLORSPACE_SRGB,
288 
289     STBIR_MAX_COLORSPACES,
290 }
291 
292 
293 //////////////////////////////////////////////////////////////////////////////
294 //
295 // Full-complexity API
296 //
297 // This extends the medium API as follows:
298 //
299 //     * uint32 image type
300 //     * not typesafe
301 //     * separate filter types for each axis
302 //     * separate edge modes for each axis
303 //     * can specify scale explicitly for subpixel correctness
304 //     * can specify image source tile using texture coordinates
305 
306 alias stbir_datatype = int;
307 enum : stbir_datatype
308 {
309     STBIR_TYPE_UINT8 ,
310     STBIR_TYPE_UINT16,
311     STBIR_TYPE_UINT32,
312     STBIR_TYPE_FLOAT ,
313 
314     STBIR_MAX_TYPES
315 }
316 
317 // (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing style: [0, 1]x[0, 1]) of a region of the input image to use.
318 
319 struct STBAllocatorContext
320 {
321 nothrow:
322 @nogc:
323     void* buf = null;
324     size_t length = 0;
325 
326     @disable this(this);
327 
328     ~this()
329     {
330         alignedFree(buf, 1);
331     }
332 
333     void* reallocDiscard(size_t numBytes)
334     {
335         if (length < numBytes)
336         {         
337             buf = alignedReallocDiscard(buf, numBytes, 1);
338             length = numBytes;
339         }
340         return buf;
341     }
342 }
343 
344 void* STBIR_MALLOC(size_t size, void* context)
345 {
346     assert(context !is null);
347     STBAllocatorContext* alloc = cast(STBAllocatorContext*)context;
348     return alloc.reallocDiscard(size);
349 }
350 
351 void STBIR_FREE(void* p, void* context)
352 {
353     assert(context !is null);
354     // will be freed when resizer is freed, because it's relatively small and shared.
355 }
356 
357 enum STBIR_DEFAULT_FILTER_UPSAMPLE = STBIR_FILTER_CATMULLROM;
358 
359 enum STBIR_DEFAULT_FILTER_DOWNSAMPLE = STBIR_FILTER_MITCHELL;
360 
361 enum STBIR_MAX_CHANNELS = 4;
362 
363 // This value is added to alpha just before premultiplication to avoid
364 // zeroing out color values. It is equivalent to 2^-80. If you don't want
365 // that behavior (it may interfere if you have floating point images with
366 // very small alpha values) then you can define STBIR_NO_ALPHA_EPSILON to
367 // disable it.
368 enum float STBIR_ALPHA_EPSILON = (cast(float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20));
369 
370 // must match stbir_datatype
371 static immutable ubyte[4] stbir__type_size = 
372 [
373     1, // STBIR_TYPE_UINT8
374     2, // STBIR_TYPE_UINT16
375     4, // STBIR_TYPE_UINT32
376     4, // STBIR_TYPE_FLOAT
377 ];
378 
379 // Kernel function centered at 0
380 alias stbir__kernel_fn = float function(float x, float scale);
381 alias stbir__support_fn = float function(float scale);
382 
383 struct stbir__filter_info
384 {
385     stbir__kernel_fn kernel;
386     stbir__support_fn support;
387 }
388 
389 // When upsampling, the contributors are which source pixels contribute.
390 // When downsampling, the contributors are which destination pixels are contributed to.
391 struct stbir__contributors
392 {
393     int n0; // First contributing pixel
394     int n1; // Last contributing pixel
395 }
396 
397 struct stbir__info
398 {
399     const(void)* input_data;
400     int input_w;
401     int input_h;
402     int input_stride_bytes;
403 
404     void* output_data;
405     int output_w;
406     int output_h;
407     int output_stride_bytes;
408 
409     float s0, t0, s1, t1;
410 
411     float horizontal_shift; // Units: output pixels
412     float vertical_shift;   // Units: output pixels
413     float horizontal_scale;
414     float vertical_scale;
415 
416     int channels;
417     int alpha_channel;
418     uint flags;
419     stbir_datatype type;
420     stbir_filter horizontal_filter;
421     stbir_filter vertical_filter;
422     stbir_edge edge_horizontal;
423     stbir_edge edge_vertical;
424     stbir_colorspace colorspace;
425 
426     stbir__contributors* horizontal_contributors;
427     float* horizontal_coefficients;
428 
429     stbir__contributors* vertical_contributors;
430     float* vertical_coefficients;
431 
432     int decode_buffer_pixels;
433     float* decode_buffer;
434 
435     float* horizontal_buffer;
436 
437     // cache these because ceil/floor are inexplicably showing up in profile
438     int horizontal_coefficient_width;
439     int vertical_coefficient_width;
440     int horizontal_filter_pixel_width;
441     int vertical_filter_pixel_width;
442     int horizontal_filter_pixel_margin;
443     int vertical_filter_pixel_margin;
444     int horizontal_num_contributors;
445     int vertical_num_contributors;
446 
447     int ring_buffer_length_bytes;   // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
448     int ring_buffer_num_entries;    // Total number of entries in the ring buffer.
449     int ring_buffer_first_scanline;
450     int ring_buffer_last_scanline;
451     int ring_buffer_begin_index;    // first_scanline is at this index in the ring buffer
452     float* ring_buffer;
453 
454     float* encode_buffer; // A temporary buffer to store floats so we don't lose precision while we do multiply-adds.
455 
456     int horizontal_contributors_size;
457     int horizontal_coefficients_size;
458     int vertical_contributors_size;
459     int vertical_coefficients_size;
460     int decode_buffer_size;
461     int horizontal_buffer_size;
462     int ring_buffer_size;
463     int encode_buffer_size;
464 }
465 
466 
467 static immutable float stbir__max_uint8_as_float  = 255.0f;
468 static immutable float stbir__max_uint16_as_float = 65535.0f;
469 static immutable double stbir__max_uint32_as_float = 4294967295.0;
470 
471 
472 int stbir__min(int a, int b)
473 {
474     return a < b ? a : b;
475 }
476 
477 float stbir__saturate(float x)
478 {
479     if (x < 0)
480         return 0;
481 
482     if (x > 1)
483         return 1;
484 
485     return x;
486 }
487 
488 static immutable float[256] stbir__srgb_uchar_to_linear_float = 
489 [
490     0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
491     0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
492     0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
493     0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
494     0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
495     0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
496     0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
497     0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
498     0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
499     0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
500     0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
501     0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
502     0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
503     0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
504     0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
505     0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
506     0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
507     0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
508     0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
509     0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
510     0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
511     0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
512     0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
513     0.982251f, 0.991102f, 1.0f
514 ];
515 
516 float stbir__srgb_to_linear(float f)
517 {
518     if (f <= 0.04045f)
519         return f / 12.92f;
520     else
521         return cast(float)fast_pow((f + 0.055f) / 1.055f, 2.4f);
522 }
523 
524 float stbir__linear_to_srgb(float f)
525 {
526     if (f <= 0.0031308f)
527         return f * 12.92f;
528     else
529         return 1.055f * _mm_pow_ss(f, 0.4166666666f) - 0.055f;
530 }
531 /*
532 __m128 stbir__linear_to_srgb(__m128 f)
533 {
534     __m128 below = f * _mm_set1_ps(12.92f);
535     __m128 exponentiated = _mm_set1_ps(1.055f) * _mm_pow_ps(f, 0.4166666666f) - _mm_set1_ps(0.055f);
536     __m128 mask  =_mm_cmplt_ps(f, _mm_set1_ps(0.0031308f));
537     __m128i result = (cast(__m128i)below & cast(__m128i)mask) | (cast(__m128i)exponentiated & ~cast(__m128i)mask);
538     return cast(__m128)result;
539 }*/
540 
541 union stbir__FP32
542 {
543     uint u;
544     float f;
545 }
546 
547 static immutable uint[104] fp32_to_srgb8_tab4 = 
548 [
549     0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
550     0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
551     0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
552     0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
553     0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
554     0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
555     0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
556     0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
557     0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
558     0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
559     0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
560     0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
561     0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
562 ];
563 
564 ubyte stbir__linear_to_srgb_uchar(float in_)
565 {
566     static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
567     static const stbir__FP32 minval = { (127-13) << 23 };
568     uint tab,bias,scale,t;
569     stbir__FP32 f;
570 
571     // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
572     // The tests are carefully written so that NaNs map to 0, same as in the reference
573     // implementation.
574     if (!(in_ > minval.f)) // written this way to trap NaNs
575         in_ = minval.f;
576     if (in_ > almostone.f)
577         in_ = almostone.f;
578 
579     // Do the table lookup and unpack bias, scale
580     f.f = in_;
581     tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
582     bias = (tab >> 16) << 9;
583     scale = tab & 0xffff;
584 
585     // Grab next-highest mantissa bits and perform linear interpolation
586     t = (f.u >> 12) & 0xff;
587     return cast(ubyte) ((bias + scale*t) >> 16);
588 }
589 
590 // same but 4 float at once
591 __m128i stbir__linear_to_srgb_uchar(__m128 in_)
592 {
593     static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
594     static const stbir__FP32 minval = { (127-13) << 23 };
595     in_ = _mm_max_ps(in_, _mm_set1_ps(minval.f));
596     in_ = _mm_min_ps(in_, _mm_set1_ps(almostone.f));
597 
598     __m128i f = cast(__m128i) in_;
599     __m128i tblIndex = _mm_srli_epi32(f - _mm_set1_epi32(minval.u), 20);
600 
601     __m128i tab = _mm_setr_epi32(fp32_to_srgb8_tab4[ tblIndex.array[0] ], 
602                                  fp32_to_srgb8_tab4[ tblIndex.array[1] ],
603                                  fp32_to_srgb8_tab4[ tblIndex.array[2] ],
604                                  fp32_to_srgb8_tab4[ tblIndex.array[3] ]);
605     __m128i bias = _mm_slli_epi32(_mm_srli_epi32(tab, 16), 9);
606     __m128i scale = _mm_and_si128(tab, _mm_set1_epi32(0xffff));
607 
608     __m128i t = _mm_srli_epi32(f, 12) &  _mm_set1_epi32(0xff);
609     __m128i r = _mm_srli_epi32(bias + _mm_mullo_epi32(scale, t), 16);
610     __m128i zero = _mm_setzero_si128();
611     r = _mm_packs_epi32(r, zero);
612     r = _mm_packus_epi16(r, zero);
613     return r;
614 }
615 
616 float stbir__filter_trapezoid(float x, float scale)
617 {
618     float halfscale = scale / 2;
619     float t = 0.5f + halfscale;
620     assert(scale <= 1);
621 
622     x = cast(float)fast_fabs(x);
623 
624     if (x >= t)
625         return 0;
626     else
627     {
628         float r = 0.5f - halfscale;
629         if (x <= r)
630             return 1;
631         else
632             return (t - x) / scale;
633     }
634 }
635 
636 float stbir__support_trapezoid(float scale)
637 {
638     assert(scale <= 1);
639     return 0.5f + scale / 2;
640 }
641 
642 float stbir__filter_triangle(float x, float s)
643 {
644     x = cast(float)fast_fabs(x);
645 
646     if (x <= 1.0f)
647         return 1 - x;
648     else
649         return 0;
650 }
651 
652 float stbir__filter_cubic(float x, float s)
653 {
654     x = cast(float)fast_fabs(x);
655 
656     if (x < 1.0f)
657         return (4 + x*x*(3*x - 6))/6;
658     else if (x < 2.0f)
659         return (8 + x*(-12 + x*(6 - x)))/6;
660 
661     return (0.0f);
662 }
663 
664 float stbir__filter_catmullrom(float x, float s)
665 {
666     x = cast(float)fast_fabs(x);
667 
668     if (x < 1.0f)
669         return 1 - x*x*(2.5f - 1.5f*x);
670     else if (x < 2.0f)
671         return 2 - x*(4 + x*(0.5f*x - 2.5f));
672 
673     return (0.0f);
674 }
675 
676 float stbir__filter_mitchell(float x, float s)
677 {
678     x = cast(float)fast_fabs(x);
679 
680     if (x < 1.0f)
681         return (16 + x*x*(21 * x - 36))/18;
682     else if (x < 2.0f)
683         return (32 + x*(-60 + x*(36 - 7*x)))/18;
684 
685     return (0.0f);
686 }
687 
688 float stbir__filter_lanczos(float A)(float x, float s)
689 {
690     x = cast(float)fast_fabs(x);
691 
692     if (x <= float.min_normal)
693         return 1.0f;
694 
695     if (x < A)
696     {
697         float pix = 3.14159265358979323846f*x;
698         return A*fast_sin(pix)*fast_sin(pix/A)/(pix*pix);
699     }
700 
701     return 0.0f;
702 }
703 
704 float stbir__filter_mk2013(float x, float s) nothrow @nogc
705 {
706     x = fast_fabs(x);
707     if (x < 0.5)
708         return 0.75 - x * x;
709 
710     if (x < 1.5)
711         return 0.5 * (x - 1.5)*(x - 1.5);
712 
713     return 0.0f;
714 }
715 
716 float stbir__filter_mks2013_hs(float x, float s) nothrow @nogc
717 {
718     // Perhaps possible to do better with "MKS 2021".
719     return 0.14f * stbir__filter_mk2013(x, s)
720          + 0.86f * stbir__filter_mks2013(x, s);
721 }
722 
723 float stbir__filter_mks2013(float x, float s) nothrow @nogc
724 {
725     x = fast_fabs(x);
726 
727     if (x <= float.min_normal)
728         return 17.0f / 16.0f;
729 
730     if (x < 0.5)
731         return 17.0 / 16.0 - 7.0 * x * x / 4.0;
732 
733     if (x < 1.5)
734     {
735         double x2 = x * x;
736         return 0.25 * (4 * x2 - 11.0 * x + 7.0);
737     }
738 
739     if (x < 2.5)
740     {
741         return -0.125 * (x - 5.0 / 2.0)*(x - 5.0 / 2.0);
742     }
743     return 0.0f;
744 }
745 
746 float stbir__filter_mks2021(float x, float s) nothrow @nogc
747 {
748     x = fast_fabs(x);
749     float x2 = x * x;
750 
751     if (x < 0.5)
752         return 577.0f / 576.0f - (239.0f / 144.0f) * x2;
753 
754     if (x < 1.5)
755         return (140 * x2 - 379 * x + 239) / 144.0f;
756 
757     if (x < 2.5)
758         return -(24 * x2 - 113 * x + 130) / 144.0f;
759 
760     if (x < 3.5)
761         return (4 * x2 - 27 * x + 45) / 144.0f;
762 
763     if (x < 4.5)
764         return -(4 * x2 - 36 * x + 81) / 1152.0f;
765 
766     return 0.0f;
767 }
768 
769 float stbir__support_zero(float s)
770 {
771     return 0;
772 }
773 
774 float stbir__support_one(float s)
775 {
776     return 1;
777 }
778 
779 float stbir__support_two(float s)
780 {
781     return 2;
782 }
783 
784 float stbir__support_three(float s)
785 {
786     return 3;
787 }
788 
789 float stbir__support_four(float s)
790 {
791     return 4;
792 }
793 
794 float stbir__support_five(float s)
795 {
796     return 5;
797 }
798 
799 static immutable stbir__filter_info[14] stbir__filter_info_table = 
800 [
801         { null,                      &stbir__support_zero },
802         { &stbir__filter_trapezoid,  &stbir__support_trapezoid },
803         { &stbir__filter_triangle,   &stbir__support_one },
804         { &stbir__filter_cubic,      &stbir__support_two },
805         { &stbir__filter_catmullrom, &stbir__support_two },
806         { &stbir__filter_mitchell,   &stbir__support_two },
807         { &stbir__filter_lanczos!2.0f, &stbir__support_two },
808         { &stbir__filter_lanczos!2.5f, &stbir__support_three },
809         { &stbir__filter_lanczos!3.0f, &stbir__support_three },
810         { &stbir__filter_lanczos!4.0f, &stbir__support_four },
811         { &stbir__filter_mk2013,       &stbir__support_three },
812         { &stbir__filter_mks2013_hs,   &stbir__support_three },
813         { &stbir__filter_mks2013,      &stbir__support_three },
814         { &stbir__filter_mks2021,      &stbir__support_five },
815         ];
816 
817 
818 static int stbir__use_upsampling(float ratio)
819 {
820     return ratio > 1;
821 }
822 
823 static int stbir__use_width_upsampling(stbir__info* stbir_info)
824 {
825     return stbir__use_upsampling(stbir_info.horizontal_scale);
826 }
827 
828 static int stbir__use_height_upsampling(stbir__info* stbir_info)
829 {
830     return stbir__use_upsampling(stbir_info.vertical_scale);
831 }
832 
833 // This is the maximum number of input samples that can affect an output sample
834 // with the given filter
835 static int stbir__get_filter_pixel_width(stbir_filter filter, float scale)
836 {
837     assert(filter != 0);
838     assert(filter < stbir__filter_info_table.length);
839 
840     if (stbir__use_upsampling(scale))
841         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2);
842     else
843         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2 / scale);
844 }
845 
846 // This is how much to expand buffers to account for filters seeking outside
847 // the image boundaries.
848 static int stbir__get_filter_pixel_margin(stbir_filter filter, float scale)
849 {
850     return stbir__get_filter_pixel_width(filter, scale) / 2;
851 }
852 
853 static int stbir__get_coefficient_width(stbir_filter filter, float scale)
854 {
855     if (stbir__use_upsampling(scale))
856         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1 / scale) * 2);
857     else
858         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2);
859 }
860 
861 static int stbir__get_contributors(float scale, stbir_filter filter, int input_size, int output_size)
862 {
863     if (stbir__use_upsampling(scale))
864         return output_size;
865     else
866         return (input_size + stbir__get_filter_pixel_margin(filter, scale) * 2);
867 }
868 
869 static int stbir__get_total_horizontal_coefficients(stbir__info* info)
870 {
871     return info.horizontal_num_contributors
872          * stbir__get_coefficient_width      (info.horizontal_filter, info.horizontal_scale);
873 }
874 
875 static int stbir__get_total_vertical_coefficients(stbir__info* info)
876 {
877     return info.vertical_num_contributors
878          * stbir__get_coefficient_width      (info.vertical_filter, info.vertical_scale);
879 }
880 
881 static stbir__contributors* stbir__get_contributor(stbir__contributors* contributors, int n)
882 {
883     return &contributors[n];
884 }
885 
886 // For perf reasons this code is duplicated in stbir__resample_horizontal_upsample/downsample,
887 // if you change it here change it there too.
888 static float* stbir__get_coefficient(float* coefficients, stbir_filter filter, float scale, int n, int c)
889 {
890     int width = stbir__get_coefficient_width(filter, scale);
891     return &coefficients[width*n + c];
892 }
893 
894 static int stbir__edge_wrap_slow(stbir_edge edge, int n, int max)
895 {
896     switch (edge)
897     {
898     case STBIR_EDGE_ZERO:
899         return 0; // we'll decode the wrong pixel here, and then overwrite with 0s later
900 
901     case STBIR_EDGE_CLAMP:
902         if (n < 0)
903             return 0;
904 
905         if (n >= max)
906             return max - 1;
907 
908         return n; // NOTREACHED
909 
910     case STBIR_EDGE_REFLECT:
911     {
912         if (n < 0)
913         {
914             if (n < max)
915                 return -n;
916             else
917                 return max - 1;
918         }
919 
920         if (n >= max)
921         {
922             int max2 = max * 2;
923             if (n >= max2)
924                 return 0;
925             else
926                 return max2 - n - 1;
927         }
928 
929         return n; // NOTREACHED
930     }
931 
932     case STBIR_EDGE_WRAP:
933         if (n >= 0)
934             return (n % max);
935         else
936         {
937             int m = (-n) % max;
938 
939             if (m != 0)
940                 m = max - m;
941 
942             return (m);
943         }
944         // NOTREACHED
945 
946     default:
947         assert(false, "Unimplemented edge type");
948     }
949 }
950 
951 static int stbir__edge_wrap(stbir_edge edge, int n, int max)
952 {
953     // avoid per-pixel switch
954     if (n >= 0 && n < max)
955         return n;
956     return stbir__edge_wrap_slow(edge, n, max);
957 }
958 
959 // What input pixels contribute to this output pixel?
960 static void stbir__calculate_sample_range_upsample(int n, float out_filter_radius, float scale_ratio, float out_shift, int* in_first_pixel, int* in_last_pixel, float* in_center_of_out)
961 {
962     float out_pixel_center = cast(float)n + 0.5f;
963     float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
964     float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
965 
966     float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) / scale_ratio;
967     float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) / scale_ratio;
968 
969     *in_center_of_out = (out_pixel_center + out_shift) / scale_ratio;
970     *in_first_pixel = cast(int)(fast_floor(in_pixel_influence_lowerbound + 0.5));
971     *in_last_pixel = cast(int)(fast_floor(in_pixel_influence_upperbound - 0.5));
972 }
973 
974 // What output pixels does this input pixel contribute to?
975 static void stbir__calculate_sample_range_downsample(int n, float in_pixels_radius, float scale_ratio, float out_shift, int* out_first_pixel, int* out_last_pixel, float* out_center_of_in)
976 {
977     float in_pixel_center = cast(float)n + 0.5f;
978     float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
979     float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
980 
981     float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale_ratio - out_shift;
982     float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale_ratio - out_shift;
983 
984     *out_center_of_in = in_pixel_center * scale_ratio - out_shift;
985     *out_first_pixel = cast(int)(fast_floor(out_pixel_influence_lowerbound + 0.5));
986     *out_last_pixel = cast(int)(fast_floor(out_pixel_influence_upperbound - 0.5));
987 }
988 
989 static void stbir__calculate_coefficients_upsample(stbir_filter filter, float scale, int in_first_pixel, int in_last_pixel, float in_center_of_out, stbir__contributors* contributor, float* coefficient_group)
990 {
991     int i;
992     float total_filter = 0;
993     float filter_scale;
994 
995     assert(in_last_pixel - in_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
996 
997     contributor.n0 = in_first_pixel;
998     contributor.n1 = in_last_pixel;
999 
1000     assert(contributor.n1 >= contributor.n0);
1001 
1002     for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
1003     {
1004         float in_pixel_center = cast(float)(i + in_first_pixel) + 0.5f;
1005         coefficient_group[i] = stbir__filter_info_table[filter].kernel(in_center_of_out - in_pixel_center, 1 / scale);
1006 
1007         // If the coefficient is zero, skip it. (Don't do the <0 check here, we want the influence of those outside pixels.)
1008         if (i == 0 && !coefficient_group[i])
1009         {
1010             contributor.n0 = ++in_first_pixel;
1011             i--;
1012             continue;
1013         }
1014 
1015         total_filter += coefficient_group[i];
1016     }
1017 
1018     assert(stbir__filter_info_table[filter].kernel(cast(float)(in_last_pixel + 1) + 0.5f - in_center_of_out, 1/scale) == 0);
1019 
1020     assert(total_filter > 0.9);
1021     assert(total_filter < 1.1f); // Make sure it's not way off.
1022 
1023     // Make sure the sum of all coefficients is 1.
1024     filter_scale = 1 / total_filter;
1025 
1026     for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
1027         coefficient_group[i] *= filter_scale;
1028 
1029     for (i = in_last_pixel - in_first_pixel; i >= 0; i--)
1030     {
1031         if (coefficient_group[i])
1032             break;
1033 
1034         // This line has no weight. We can skip it.
1035         contributor.n1 = contributor.n0 + i - 1;
1036     }
1037 }
1038 
1039 static void stbir__calculate_coefficients_downsample(stbir_filter filter, float scale_ratio, int out_first_pixel, int out_last_pixel, float out_center_of_in, stbir__contributors* contributor, float* coefficient_group)
1040 {
1041     int i;
1042 
1043      assert(out_last_pixel - out_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale_ratio) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
1044 
1045     contributor.n0 = out_first_pixel;
1046     contributor.n1 = out_last_pixel;
1047 
1048     assert(contributor.n1 >= contributor.n0);
1049 
1050     for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
1051     {
1052         float out_pixel_center = cast(float)(i + out_first_pixel) + 0.5f;
1053         float x = out_pixel_center - out_center_of_in;
1054         coefficient_group[i] = stbir__filter_info_table[filter].kernel(x, scale_ratio) * scale_ratio;
1055     }
1056 
1057     assert(stbir__filter_info_table[filter].kernel(cast(float)(out_last_pixel + 1) + 0.5f - out_center_of_in, scale_ratio) == 0);
1058 
1059     for (i = out_last_pixel - out_first_pixel; i >= 0; i--)
1060     {
1061         if (coefficient_group[i])
1062             break;
1063 
1064         // This line has no weight. We can skip it.
1065         contributor.n1 = contributor.n0 + i - 1;
1066     }
1067 }
1068 
1069 static void stbir__normalize_downsample_coefficients(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, int input_size, int output_size)
1070 {
1071     int num_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1072     int num_coefficients = stbir__get_coefficient_width(filter, scale_ratio);
1073     int i, j;
1074     int skip;
1075 
1076     for (i = 0; i < output_size; i++)
1077     {
1078         float scale;
1079         float total = 0;
1080 
1081         for (j = 0; j < num_contributors; j++)
1082         {
1083             if (i >= contributors[j].n0 && i <= contributors[j].n1)
1084             {
1085                 float coefficient = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0);
1086                 total += coefficient;
1087             }
1088             else if (i < contributors[j].n0)
1089                 break;
1090         }
1091 
1092         assert(total > 0.9f);
1093         assert(total < 1.1f);
1094 
1095         scale = 1 / total;
1096 
1097         for (j = 0; j < num_contributors; j++)
1098         {
1099             if (i >= contributors[j].n0 && i <= contributors[j].n1)
1100                 *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0) *= scale;
1101             else if (i < contributors[j].n0)
1102                 break;
1103         }
1104     }
1105 
1106     // Optimize: Skip zero coefficients and contributions outside of image bounds.
1107     // Do this after normalizing because normalization depends on the n0/n1 values.
1108     for (j = 0; j < num_contributors; j++)
1109     {
1110         int range, max, width;
1111 
1112         skip = 0;
1113         while (*stbir__get_coefficient(coefficients, filter, scale_ratio, j, skip) == 0)
1114             skip++;
1115 
1116         contributors[j].n0 += skip;
1117 
1118         while (contributors[j].n0 < 0)
1119         {
1120             contributors[j].n0++;
1121             skip++;
1122         }
1123 
1124         range = contributors[j].n1 - contributors[j].n0 + 1;
1125         max = stbir__min(num_coefficients, range);
1126 
1127         width = stbir__get_coefficient_width(filter, scale_ratio);
1128         for (i = 0; i < max; i++)
1129         {
1130             if (i + skip >= width)
1131                 break;
1132 
1133             *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i) = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i + skip);
1134         }
1135 
1136         continue;
1137     }
1138 
1139     // Using min to avoid writing into invalid pixels.
1140     for (i = 0; i < num_contributors; i++)
1141         contributors[i].n1 = stbir__min(contributors[i].n1, output_size - 1);
1142 }
1143 
1144 // Each scan line uses the same kernel values so we should calculate the kernel
1145 // values once and then we can use them for every scan line.
1146 static void stbir__calculate_filters(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, float shift, int input_size, int output_size)
1147 {
1148     int n;
1149     int total_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1150 
1151     if (stbir__use_upsampling(scale_ratio))
1152     {
1153         float out_pixels_radius = stbir__filter_info_table[filter].support(1 / scale_ratio) * scale_ratio;
1154 
1155         // Looping through out pixels
1156         for (n = 0; n < total_contributors; n++)
1157         {
1158             float in_center_of_out; // Center of the current out pixel in the in pixel space
1159             int in_first_pixel, in_last_pixel;
1160 
1161             stbir__calculate_sample_range_upsample(n, out_pixels_radius, scale_ratio, shift, &in_first_pixel, &in_last_pixel, &in_center_of_out);
1162 
1163             stbir__calculate_coefficients_upsample(filter, scale_ratio, in_first_pixel, in_last_pixel, in_center_of_out, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1164         }
1165     }
1166     else
1167     {
1168         float in_pixels_radius = stbir__filter_info_table[filter].support(scale_ratio) / scale_ratio;
1169 
1170         // Looping through in pixels
1171         for (n = 0; n < total_contributors; n++)
1172         {
1173             float out_center_of_in; // Center of the current out pixel in the in pixel space
1174             int out_first_pixel, out_last_pixel;
1175             int n_adjusted = n - stbir__get_filter_pixel_margin(filter, scale_ratio);
1176 
1177             stbir__calculate_sample_range_downsample(n_adjusted, in_pixels_radius, scale_ratio, shift, &out_first_pixel, &out_last_pixel, &out_center_of_in);
1178 
1179             stbir__calculate_coefficients_downsample(filter, scale_ratio, out_first_pixel, out_last_pixel, out_center_of_in, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1180         }
1181 
1182         stbir__normalize_downsample_coefficients(contributors, coefficients, filter, scale_ratio, input_size, output_size);
1183     }
1184 }
1185 
1186 static float* stbir__get_decode_buffer(stbir__info* stbir_info)
1187 {
1188     // The 0 index of the decode buffer starts after the margin. This makes
1189     // it okay to use negative indexes on the decode buffer.
1190     return &stbir_info.decode_buffer[stbir_info.horizontal_filter_pixel_margin * stbir_info.channels];
1191 }
1192 
1193 int STBIR__DECODE(int type, int colorspace)
1194 {
1195     return type * STBIR_MAX_COLORSPACES + colorspace;
1196 }
1197 
1198 static void stbir__decode_scanline(stbir__info* stbir_info, int n)
1199 {
1200     int c;
1201     int channels = stbir_info.channels;
1202     int alpha_channel = stbir_info.alpha_channel;
1203     int type = stbir_info.type;
1204     int colorspace = stbir_info.colorspace;
1205     int input_w = stbir_info.input_w;
1206     size_t input_stride_bytes = stbir_info.input_stride_bytes;
1207     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1208     stbir_edge edge_horizontal = stbir_info.edge_horizontal;
1209     stbir_edge edge_vertical = stbir_info.edge_vertical;
1210     size_t in_buffer_row_offset = stbir__edge_wrap(edge_vertical, n, stbir_info.input_h) * input_stride_bytes;
1211     const void* input_data = cast(char *) stbir_info.input_data + in_buffer_row_offset;
1212     int max_x = input_w + stbir_info.horizontal_filter_pixel_margin;
1213     int decode = STBIR__DECODE(type, colorspace);
1214 
1215     int x = -stbir_info.horizontal_filter_pixel_margin;
1216 
1217     // special handling for STBIR_EDGE_ZERO because it needs to return an item that doesn't appear in the input,
1218     // and we want to avoid paying overhead on every pixel if not STBIR_EDGE_ZERO
1219     if (edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info.input_h))
1220     {
1221         for (; x < max_x; x++)
1222             for (c = 0; c < channels; c++)
1223                 decode_buffer[x*channels + c] = 0;
1224         return;
1225     }
1226 
1227     switch (decode)
1228     {
1229     case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1230         for (; x < max_x; x++)
1231         {
1232             int decode_pixel_index = x * channels;
1233             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1234             for (c = 0; c < channels; c++)
1235                 decode_buffer[decode_pixel_index + c] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + c]) / stbir__max_uint8_as_float;
1236         }
1237         break;
1238 
1239     case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1240         if (channels == 4 && alpha_channel == 3 && !(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1241         {
1242             // This avoids one table lookup, but the table is the fastest way to onvet from sRGB to linear float
1243             for (; x < max_x; x++)
1244             {
1245                 int decode_pixel_index = x * channels;
1246                 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1247                 for (c = 0; c < 3; c++)
1248                     decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1249                 ubyte alpha = (cast(const(ubyte)*)input_data)[input_pixel_index + 3];
1250                 decode_buffer[decode_pixel_index + 3] = cast(float)(alpha * 0.00392156862f);
1251             }
1252         }
1253 
1254         for (; x < max_x; x++)
1255         {
1256             int decode_pixel_index = x * channels;
1257             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1258             for (c = 0; c < channels; c++)
1259                 decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1260 
1261             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1262                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint8_as_float;
1263         }
1264         break;
1265 
1266     case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1267     {
1268         if (channels == 1 && edge_horizontal == STBIR_EDGE_CLAMP)
1269         {
1270             for (; x < max_x; x++)
1271             {
1272                 int decode_pixel_index = x;
1273                 int input_pixel_index = stbir__edge_wrap(STBIR_EDGE_CLAMP, x, input_w) * channels;
1274                 ushort depth = (cast(const(ushort)*)input_data)[input_pixel_index];
1275                 decode_buffer[decode_pixel_index] = depth / stbir__max_uint16_as_float;
1276             }
1277         }
1278         else if (channels == 4 && edge_horizontal == STBIR_EDGE_CLAMP)
1279         {
1280             __m128i zero = _mm_setzero_si128();
1281             __m128 normalizingFactor = _mm_set1_ps(1 / 65535.0f);
1282 
1283             for (; x < max_x; x++)
1284             {
1285                 int decode_pixel_index = x * channels;
1286                 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1287 
1288                 // load four values at once
1289                 __m128i mmPixel = _mm_loadu_si64( (cast(const(ushort)*)input_data) + input_pixel_index );
1290                 mmPixel = _mm_unpacklo_epi16(mmPixel, zero); // convert to 32-bit
1291                 __m128 fPixel = _mm_cvtepi32_ps(mmPixel) * normalizingFactor;
1292                 _mm_storeu_ps(&decode_buffer[decode_pixel_index], fPixel);
1293             }
1294         }
1295         else
1296         {
1297             for (; x < max_x; x++)
1298             {
1299                 int decode_pixel_index = x * channels;
1300                 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1301                 for (c = 0; c < channels; c++)
1302                 {
1303                     ushort depth = (cast(const(ushort)*)input_data)[input_pixel_index + c];
1304                     decode_buffer[decode_pixel_index + c] = depth / stbir__max_uint16_as_float;
1305                 }
1306             }
1307         }
1308         break;
1309     }
1310 
1311     case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1312         for (; x < max_x; x++)
1313         {
1314             int decode_pixel_index = x * channels;
1315             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1316             for (c = 0; c < channels; c++)
1317                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float);
1318 
1319             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1320                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint16_as_float;
1321         }
1322         break;
1323 
1324     case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1325         for (; x < max_x; x++)
1326         {
1327             int decode_pixel_index = x * channels;
1328             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1329             for (c = 0; c < channels; c++)
1330                 decode_buffer[decode_pixel_index + c] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float);
1331         }
1332         break;
1333 
1334     case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1335         for (; x < max_x; x++)
1336         {
1337             int decode_pixel_index = x * channels;
1338             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1339             for (c = 0; c < channels; c++)
1340                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float));
1341 
1342             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1343                 decode_buffer[decode_pixel_index + alpha_channel] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint32_as_float);
1344         }
1345         break;
1346 
1347     case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1348         for (; x < max_x; x++)
1349         {
1350             int decode_pixel_index = x * channels;
1351             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1352             for (c = 0; c < channels; c++)
1353                 decode_buffer[decode_pixel_index + c] = (cast(const(float)*)input_data)[input_pixel_index + c];
1354         }
1355         break;
1356 
1357     case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1358         for (; x < max_x; x++)
1359         {
1360             int decode_pixel_index = x * channels;
1361             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1362             for (c = 0; c < channels; c++)
1363                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(const(float)*)input_data)[input_pixel_index + c]);
1364 
1365             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1366                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(const(float)*)input_data)[input_pixel_index + alpha_channel];
1367         }
1368 
1369         break;
1370 
1371     default:
1372         assert(!"Unknown type/colorspace/channels combination.");
1373         break;
1374     }
1375 
1376     if (!(stbir_info.flags & STBIR_FLAG_ALPHA_PREMULTIPLIED))
1377     {
1378         for (x = -stbir_info.horizontal_filter_pixel_margin; x < max_x; x++)
1379         {
1380             int decode_pixel_index = x * channels;
1381 
1382             // If the alpha value is 0 it will clobber the color values. Make sure it's not.
1383             float alpha = decode_buffer[decode_pixel_index + alpha_channel];
1384 
1385             version(STBIR_NO_ALPHA_EPSILON)
1386             {}
1387             else
1388             {
1389                 if (stbir_info.type != STBIR_TYPE_FLOAT) {
1390                     alpha += STBIR_ALPHA_EPSILON;
1391                     decode_buffer[decode_pixel_index + alpha_channel] = alpha;
1392                 }
1393             }
1394 
1395             for (c = 0; c < channels; c++)
1396             {
1397                 if (c == alpha_channel)
1398                     continue;
1399 
1400                 decode_buffer[decode_pixel_index + c] *= alpha;
1401             }
1402         }
1403     }
1404 
1405     if (edge_horizontal == STBIR_EDGE_ZERO)
1406     {
1407         for (x = -stbir_info.horizontal_filter_pixel_margin; x < 0; x++)
1408         {
1409             for (c = 0; c < channels; c++)
1410                 decode_buffer[x*channels + c] = 0;
1411         }
1412         for (x = input_w; x < max_x; x++)
1413         {
1414             for (c = 0; c < channels; c++)
1415                 decode_buffer[x*channels + c] = 0;
1416         }
1417     }
1418 }
1419 
1420 static float* stbir__get_ring_buffer_entry(float* ring_buffer, int index, int ring_buffer_length)
1421 {
1422     return &ring_buffer[index * ring_buffer_length];
1423 }
1424 
1425 static float* stbir__add_empty_ring_buffer_entry(stbir__info* stbir_info, int n)
1426 {
1427     int ring_buffer_index;
1428     float* ring_buffer;
1429 
1430     stbir_info.ring_buffer_last_scanline = n;
1431 
1432     if (stbir_info.ring_buffer_begin_index < 0)
1433     {
1434         ring_buffer_index = stbir_info.ring_buffer_begin_index = 0;
1435         stbir_info.ring_buffer_first_scanline = n;
1436     }
1437     else
1438     {
1439         ring_buffer_index = (stbir_info.ring_buffer_begin_index + (stbir_info.ring_buffer_last_scanline - stbir_info.ring_buffer_first_scanline)) % stbir_info.ring_buffer_num_entries;
1440         assert(ring_buffer_index != stbir_info.ring_buffer_begin_index);
1441     }
1442 
1443     ring_buffer = stbir__get_ring_buffer_entry(stbir_info.ring_buffer, ring_buffer_index, stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof));
1444     memset(ring_buffer, 0, stbir_info.ring_buffer_length_bytes);
1445 
1446     return ring_buffer;
1447 }
1448 
1449 
1450 static void stbir__resample_horizontal_upsample(stbir__info* stbir_info, float* output_buffer)
1451 {
1452     int x, k;
1453     int output_w = stbir_info.output_w;
1454     int channels = stbir_info.channels;
1455     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1456     stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1457     float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1458     int coefficient_width = stbir_info.horizontal_coefficient_width;
1459 
1460     for (x = 0; x < output_w; x++)
1461     {
1462         int n0 = horizontal_contributors[x].n0;
1463         int n1 = horizontal_contributors[x].n1;
1464 
1465         int out_pixel_index = x * channels;
1466         int coefficient_group = coefficient_width * x;
1467         int coefficient_counter = 0;
1468 
1469         assert(n1 >= n0);
1470         assert(n0 >= -stbir_info.horizontal_filter_pixel_margin);
1471         assert(n1 >= -stbir_info.horizontal_filter_pixel_margin);
1472         assert(n0 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1473         assert(n1 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1474 
1475         switch (channels) {
1476             case 1:
1477                 for (k = n0; k <= n1; k++)
1478                 {
1479                     int in_pixel_index = k * 1;
1480                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1481                     //assert(coefficient != 0);
1482                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1483                 }
1484                 break;
1485             case 2:
1486                 for (k = n0; k <= n1; k++)
1487                 {
1488                     int in_pixel_index = k * 2;
1489                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1490                     //assert(coefficient != 0);
1491                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1492                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1493                 }
1494                 break;
1495             case 3:
1496                 for (k = n0; k <= n1; k++)
1497                 {
1498                     int in_pixel_index = k * 3;
1499                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1500                     //assert(coefficient != 0);
1501                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1502                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1503                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1504                 }
1505                 break;
1506             case 4:
1507                 for (k = n0; k <= n1; k++)
1508                 {
1509                     int in_pixel_index = k * 4;
1510                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1511                     //assert(coefficient != 0);
1512                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1513                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1514                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1515                     output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1516                 }
1517                 break;
1518             default:
1519                 for (k = n0; k <= n1; k++)
1520                 {
1521                     int in_pixel_index = k * channels;
1522                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1523                     int c;
1524                     //assert(coefficient != 0);
1525                     for (c = 0; c < channels; c++)
1526                         output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1527                 }
1528                 break;
1529         }
1530     }
1531 }
1532 
1533 static void stbir__resample_horizontal_downsample(stbir__info* stbir_info, float* output_buffer)
1534 {
1535     int x, k;
1536     int input_w = stbir_info.input_w;
1537     int channels = stbir_info.channels;
1538     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1539     stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1540     float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1541     int coefficient_width = stbir_info.horizontal_coefficient_width;
1542     int filter_pixel_margin = stbir_info.horizontal_filter_pixel_margin;
1543     int max_x = input_w + filter_pixel_margin * 2;
1544 
1545     assert(!stbir__use_width_upsampling(stbir_info));
1546 
1547     switch (channels) {
1548         case 1:
1549             for (x = 0; x < max_x; x++)
1550             {
1551                 int n0 = horizontal_contributors[x].n0;
1552                 int n1 = horizontal_contributors[x].n1;
1553 
1554                 int in_x = x - filter_pixel_margin;
1555                 int in_pixel_index = in_x * 1;
1556                 int max_n = n1;
1557                 int coefficient_group = coefficient_width * x;
1558 
1559                 for (k = n0; k <= max_n; k++)
1560                 {
1561                     int out_pixel_index = k * 1;
1562                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1563                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1564                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1565                 }
1566             }
1567             break;
1568 
1569         case 2:
1570             for (x = 0; x < max_x; x++)
1571             {
1572                 int n0 = horizontal_contributors[x].n0;
1573                 int n1 = horizontal_contributors[x].n1;
1574 
1575                 int in_x = x - filter_pixel_margin;
1576                 int in_pixel_index = in_x * 2;
1577                 int max_n = n1;
1578                 int coefficient_group = coefficient_width * x;
1579 
1580                 for (k = n0; k <= max_n; k++)
1581                 {
1582                     int out_pixel_index = k * 2;
1583                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1584                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1585                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1586                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1587                 }
1588             }
1589             break;
1590 
1591         case 3:
1592             for (x = 0; x < max_x; x++)
1593             {
1594                 int n0 = horizontal_contributors[x].n0;
1595                 int n1 = horizontal_contributors[x].n1;
1596 
1597                 int in_x = x - filter_pixel_margin;
1598                 int in_pixel_index = in_x * 3;
1599                 int max_n = n1;
1600                 int coefficient_group = coefficient_width * x;
1601 
1602                 for (k = n0; k <= max_n; k++)
1603                 {
1604                     int out_pixel_index = k * 3;
1605                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1606                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1607                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1608                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1609                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1610                 }
1611             }
1612             break;
1613 
1614         case 4:
1615             for (x = 0; x < max_x; x++)
1616             {
1617                 int n0 = horizontal_contributors[x].n0;
1618                 int n1 = horizontal_contributors[x].n1;
1619 
1620                 int in_x = x - filter_pixel_margin;
1621                 int in_pixel_index = in_x * 4;
1622                 int max_n = n1;
1623                 int coefficient_group = coefficient_width * x;
1624 
1625                 for (k = n0; k <= max_n; k++)
1626                 {
1627                     int out_pixel_index = k * 4;
1628                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1629                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1630 
1631                     version(DigitalMars)
1632                     {
1633                         output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1634                         output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1635                         output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1636                         output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1637                     }
1638                     else
1639                     {
1640                         __m128 A = _mm_loadu_ps(&decode_buffer[in_pixel_index]);
1641                         __m128 B = _mm_loadu_ps(&output_buffer[out_pixel_index]);
1642                         B = B + A * _mm_set1_ps(coefficient);
1643                         _mm_storeu_ps(&output_buffer[out_pixel_index], B);
1644                     }
1645                 }
1646             }
1647             break;
1648 
1649         default:
1650             for (x = 0; x < max_x; x++)
1651             {
1652                 int n0 = horizontal_contributors[x].n0;
1653                 int n1 = horizontal_contributors[x].n1;
1654 
1655                 int in_x = x - filter_pixel_margin;
1656                 int in_pixel_index = in_x * channels;
1657                 int max_n = n1;
1658                 int coefficient_group = coefficient_width * x;
1659 
1660                 for (k = n0; k <= max_n; k++)
1661                 {
1662                     int c;
1663                     int out_pixel_index = k * channels;
1664                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1665                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1666                     for (c = 0; c < channels; c++)
1667                         output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1668                 }
1669             }
1670             break;
1671     }
1672 }
1673 
1674 static void stbir__decode_and_resample_upsample(stbir__info* stbir_info, int n)
1675 {
1676     // Decode the nth scanline from the source image into the decode buffer.
1677     stbir__decode_scanline(stbir_info, n);
1678 
1679     // Now resample it into the ring buffer.
1680     if (stbir__use_width_upsampling(stbir_info))
1681         stbir__resample_horizontal_upsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1682     else
1683         stbir__resample_horizontal_downsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1684 
1685     // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
1686 }
1687 
1688 static void stbir__decode_and_resample_downsample(stbir__info* stbir_info, int n)
1689 {
1690     // Decode the nth scanline from the source image into the decode buffer.
1691     stbir__decode_scanline(stbir_info, n);
1692 
1693     memset(stbir_info.horizontal_buffer, 0, stbir_info.output_w * stbir_info.channels * float.sizeof);
1694 
1695     // Now resample it into the horizontal buffer.
1696     if (stbir__use_width_upsampling(stbir_info))
1697         stbir__resample_horizontal_upsample(stbir_info, stbir_info.horizontal_buffer);
1698     else
1699         stbir__resample_horizontal_downsample(stbir_info, stbir_info.horizontal_buffer);
1700 
1701     // Now it's sitting in the horizontal buffer ready to be distributed into the ring buffers.
1702 }
1703 
1704 // Get the specified scan line from the ring buffer.
1705 static float* stbir__get_ring_buffer_scanline(int get_scanline, float* ring_buffer, int begin_index, int first_scanline, int ring_buffer_num_entries, int ring_buffer_length)
1706 {
1707     int ring_buffer_index = (begin_index + (get_scanline - first_scanline)) % ring_buffer_num_entries;
1708     return stbir__get_ring_buffer_entry(ring_buffer, ring_buffer_index, ring_buffer_length);
1709 }
1710 
1711 
1712 static void stbir__encode_scanline(stbir__info* stbir_info, int num_pixels, void *output_buffer, float *encode_buffer, int channels, int alpha_channel, int decode)
1713 {
1714     int x;
1715     int n;
1716     int num_nonalpha;
1717     ushort[STBIR_MAX_CHANNELS] nonalpha;
1718 
1719     if (!(stbir_info.flags&STBIR_FLAG_ALPHA_PREMULTIPLIED))
1720     {
1721         for (x=0; x < num_pixels; ++x)
1722         {
1723             int pixel_index = x*channels;
1724 
1725             float alpha = encode_buffer[pixel_index + alpha_channel];
1726             float reciprocal_alpha = alpha ? 1.0f / alpha : 0;
1727 
1728             // unrolling this produced a 1% slowdown upscaling a large RGBA linear-space image on my machine - stb
1729             for (n = 0; n < channels; n++)
1730                 if (n != alpha_channel)
1731                     encode_buffer[pixel_index + n] *= reciprocal_alpha;
1732 
1733             // We added in a small epsilon to prevent the color channel from being deleted with zero alpha.
1734             // Because we only add it for integer types, it will automatically be discarded on integer
1735             // conversion, so we don't need to subtract it back out (which would be problematic for
1736             // numeric precision reasons).
1737         }
1738     }
1739 
1740     // build a table of all channels that need colorspace correction, so
1741     // we don't perform colorspace correction on channels that don't need it.
1742     for (x = 0, num_nonalpha = 0; x < channels; ++x)
1743     {
1744         if (x != alpha_channel || (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1745         {
1746             nonalpha[num_nonalpha++] = cast(ushort)x;
1747         }
1748     }
1749 
1750     static int STBIR__ROUND_INT_f(float f)
1751     {
1752         return cast(int)(f + 0.5f);
1753     }
1754     static int STBIR__ROUND_INT_d(double f)
1755     {
1756         return cast(int)(f + 0.5);
1757     }
1758     static int STBIR__ROUND_UINT_f(float f)
1759     {
1760         return cast(uint)(f + 0.5f);
1761     }
1762     static int STBIR__ROUND_UINT_d(double f)
1763     {
1764         return cast(uint)(f + 0.5);
1765     }
1766 
1767     static ubyte STBIR__ENCODE_LINEAR8(float f)
1768     {
1769         return cast(ubyte) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint8_as_float );
1770     }
1771 
1772     static ushort STBIR__ENCODE_LINEAR16(float f)
1773     {
1774         return cast(ushort) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint16_as_float );
1775     }
1776 
1777     switch (decode)
1778     {
1779         case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1780             for (x=0; x < num_pixels; ++x)
1781             {
1782                 int pixel_index = x*channels;
1783 
1784                 for (n = 0; n < channels; n++)
1785                 {
1786                     int index = pixel_index + n;
1787                     (cast(ubyte*)output_buffer)[index] = STBIR__ENCODE_LINEAR8(encode_buffer[index]);
1788                 }
1789             }
1790             break;
1791 
1792         case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1793         {
1794             // Special case because of how slow it is in normal stb_image_resize.
1795             if (channels == 4 && alpha_channel == -1 && (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1796             {
1797                 for (x = 0; x < num_pixels; ++x)
1798                 {
1799                     __m128i zero = _mm_setzero_si128();
1800 
1801                     __m128 fpixels = _mm_loadu_ps( &encode_buffer[4*x] );
1802                     __m128i fpixels_desrgb = stbir__linear_to_srgb_uchar(fpixels);
1803                     _mm_storeu_si32( (cast(ubyte*)output_buffer) + 4*x, fpixels_desrgb);
1804                 }
1805             }
1806             else
1807             {
1808                 for (x = 0; x < num_pixels; ++x)
1809                 {
1810                     int pixel_index = x*channels;
1811 
1812                     for (n = 0; n < num_nonalpha; n++)
1813                     {
1814                         int index = pixel_index + nonalpha[n];
1815                         (cast(ubyte*)output_buffer)[index] = stbir__linear_to_srgb_uchar(encode_buffer[index]);
1816                     }
1817 
1818                     if (!(stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1819                         (cast(ubyte*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR8(encode_buffer[pixel_index+alpha_channel]);
1820                 }
1821             }
1822             break;
1823         }
1824 
1825         case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1826             for (x=0; x < num_pixels; ++x)
1827             {
1828                 int pixel_index = x*channels;
1829 
1830                 for (n = 0; n < channels; n++)
1831                 {
1832                     int index = pixel_index + n;
1833                     (cast(ushort*)output_buffer)[index] = STBIR__ENCODE_LINEAR16(encode_buffer[index]);
1834                 }
1835             }
1836             break;
1837 
1838         case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1839             for (x=0; x < num_pixels; ++x)
1840             {
1841                 int pixel_index = x*channels;
1842 
1843                 for (n = 0; n < num_nonalpha; n++)
1844                 {
1845                     int index = pixel_index + nonalpha[n];
1846                     (cast(ushort*)output_buffer)[index] = cast(ushort)STBIR__ROUND_INT_f(stbir__linear_to_srgb(stbir__saturate(encode_buffer[index])) * stbir__max_uint16_as_float);
1847                 }
1848 
1849                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1850                     (cast(ushort*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR16(encode_buffer[pixel_index + alpha_channel]);
1851             }
1852 
1853             break;
1854 
1855         case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1856             for (x=0; x < num_pixels; ++x)
1857             {
1858                 int pixel_index = x*channels;
1859 
1860                 for (n = 0; n < channels; n++)
1861                 {
1862                     int index = pixel_index + n;
1863                     (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__saturate(encode_buffer[index])) * stbir__max_uint32_as_float);
1864                 }
1865             }
1866             break;
1867 
1868         case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1869             for (x=0; x < num_pixels; ++x)
1870             {
1871                 int pixel_index = x*channels;
1872 
1873                 for (n = 0; n < num_nonalpha; n++)
1874                 {
1875                     int index = pixel_index + nonalpha[n];
1876                     (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__linear_to_srgb(stbir__saturate(encode_buffer[index]))) * stbir__max_uint32_as_float);
1877                 }
1878 
1879                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1880                     (cast(uint*)output_buffer)[pixel_index + alpha_channel] = cast(uint) STBIR__ROUND_INT_d((cast(double)stbir__saturate(encode_buffer[pixel_index + alpha_channel])) * stbir__max_uint32_as_float);
1881             }
1882             break;
1883 
1884         case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1885             for (x=0; x < num_pixels; ++x)
1886             {
1887                 int pixel_index = x*channels;
1888 
1889                 for (n = 0; n < channels; n++)
1890                 {
1891                     int index = pixel_index + n;
1892                     (cast(float*)output_buffer)[index] = encode_buffer[index];
1893                 }
1894             }
1895             break;
1896 
1897         case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1898             for (x=0; x < num_pixels; ++x)
1899             {
1900                 int pixel_index = x*channels;
1901 
1902                 for (n = 0; n < num_nonalpha; n++)
1903                 {
1904                     int index = pixel_index + nonalpha[n];
1905                     (cast(float*)output_buffer)[index] = stbir__linear_to_srgb(encode_buffer[index]);
1906                 }
1907 
1908                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1909                     (cast(float*)output_buffer)[pixel_index + alpha_channel] = encode_buffer[pixel_index + alpha_channel];
1910             }
1911             break;
1912 
1913         default:
1914             assert(!"Unknown type/colorspace/channels combination.");
1915             break;
1916     }
1917 }
1918 
1919 static void stbir__resample_vertical_upsample(stbir__info* stbir_info, int n)
1920 {
1921     int x, k;
1922     int output_w = stbir_info.output_w;
1923     stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
1924     float* vertical_coefficients = stbir_info.vertical_coefficients;
1925     int channels = stbir_info.channels;
1926     int alpha_channel = stbir_info.alpha_channel;
1927     int type = stbir_info.type;
1928     int colorspace = stbir_info.colorspace;
1929     int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
1930     void* output_data = stbir_info.output_data;
1931     float* encode_buffer = stbir_info.encode_buffer;
1932     int decode = STBIR__DECODE(type, colorspace);
1933     int coefficient_width = stbir_info.vertical_coefficient_width;
1934     int coefficient_counter;
1935     int contributor = n;
1936 
1937     float* ring_buffer = stbir_info.ring_buffer;
1938     int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
1939     int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
1940     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
1941 
1942     int n0,n1, output_row_start;
1943     int coefficient_group = coefficient_width * contributor;
1944 
1945     n0 = vertical_contributors[contributor].n0;
1946     n1 = vertical_contributors[contributor].n1;
1947 
1948     output_row_start = n * stbir_info.output_stride_bytes;
1949 
1950     assert(stbir__use_height_upsampling(stbir_info));
1951 
1952     memset(encode_buffer, 0, output_w * float.sizeof * channels);
1953 
1954     // I tried reblocking this for better cache usage of encode_buffer
1955     // (using x_outer, k, x_inner), but it lost speed. -- stb
1956 
1957     coefficient_counter = 0;
1958     switch (channels) {
1959         case 1:
1960             for (k = n0; k <= n1; k++)
1961             {
1962                 int coefficient_index = coefficient_counter++;
1963                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1964                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1965                 for (x = 0; x < output_w; ++x)
1966                 {
1967                     int in_pixel_index = x * 1;
1968                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1969                 }
1970             }
1971             break;
1972         case 2:
1973             for (k = n0; k <= n1; k++)
1974             {
1975                 int coefficient_index = coefficient_counter++;
1976                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1977                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1978                 for (x = 0; x < output_w; ++x)
1979                 {
1980                     int in_pixel_index = x * 2;
1981                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1982                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1983                 }
1984             }
1985             break;
1986         case 3:
1987             for (k = n0; k <= n1; k++)
1988             {
1989                 int coefficient_index = coefficient_counter++;
1990                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1991                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1992                 for (x = 0; x < output_w; ++x)
1993                 {
1994                     int in_pixel_index = x * 3;
1995                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1996                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1997                     encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
1998                 }
1999             }
2000             break;
2001         case 4:
2002             for (k = n0; k <= n1; k++)
2003             {
2004                 int coefficient_index = coefficient_counter++;
2005                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2006                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2007                 for (x = 0; x < output_w; ++x)
2008                 {
2009                     int in_pixel_index = x * 4;
2010                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
2011                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
2012                     encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
2013                     encode_buffer[in_pixel_index + 3] += ring_buffer_entry[in_pixel_index + 3] * coefficient;
2014                 }
2015             }
2016             break;
2017         default:
2018             for (k = n0; k <= n1; k++)
2019             {
2020                 int coefficient_index = coefficient_counter++;
2021                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2022                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2023                 for (x = 0; x < output_w; ++x)
2024                 {
2025                     int in_pixel_index = x * channels;
2026                     int c;
2027                     for (c = 0; c < channels; c++)
2028                         encode_buffer[in_pixel_index + c] += ring_buffer_entry[in_pixel_index + c] * coefficient;
2029                 }
2030             }
2031             break;
2032     }
2033     stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, encode_buffer, channels, alpha_channel, decode);
2034 }
2035 
2036 static void stbir__resample_vertical_downsample(stbir__info* stbir_info, int n)
2037 {
2038     int x, k;
2039     int output_w = stbir_info.output_w;
2040     stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
2041     float* vertical_coefficients = stbir_info.vertical_coefficients;
2042     int channels = stbir_info.channels;
2043     int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
2044     float* horizontal_buffer = stbir_info.horizontal_buffer;
2045     int coefficient_width = stbir_info.vertical_coefficient_width;
2046     int contributor = n + stbir_info.vertical_filter_pixel_margin;
2047 
2048     float* ring_buffer = stbir_info.ring_buffer;
2049     int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
2050     int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
2051     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2052     int n0,n1;
2053 
2054     n0 = vertical_contributors[contributor].n0;
2055     n1 = vertical_contributors[contributor].n1;
2056 
2057     assert(!stbir__use_height_upsampling(stbir_info));
2058 
2059     for (k = n0; k <= n1; k++)
2060     {
2061         int coefficient_index = k - n0;
2062         int coefficient_group = coefficient_width * contributor;
2063         float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2064 
2065         float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2066 
2067         switch (channels) {
2068             case 1:
2069                 for (x = 0; x < output_w; x++)
2070                 {
2071                     int in_pixel_index = x * 1;
2072                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2073                 }
2074                 break;
2075             case 2:
2076                 for (x = 0; x < output_w; x++)
2077                 {
2078                     int in_pixel_index = x * 2;
2079                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2080                     ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
2081                 }
2082                 break;
2083             case 3:
2084                 for (x = 0; x < output_w; x++)
2085                 {
2086                     int in_pixel_index = x * 3;
2087                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2088                     ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
2089                     ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
2090                 }
2091                 break;
2092             case 4:
2093 
2094                 __m128 vCoefficients = _mm_set1_ps(coefficient);
2095 
2096                 for (x = 0; x < output_w; x++)
2097                 {
2098                     int in_pixel_index = x * 4;
2099                     __m128 A = _mm_loadu_ps(&horizontal_buffer[in_pixel_index]);
2100                     __m128 B = _mm_loadu_ps(&ring_buffer_entry[in_pixel_index]);
2101                     _mm_storeu_ps( &ring_buffer_entry[in_pixel_index], B + A * vCoefficients);
2102                 }
2103                 break;
2104             default:
2105                 for (x = 0; x < output_w; x++)
2106                 {
2107                     int in_pixel_index = x * channels;
2108 
2109                     int c;
2110                     for (c = 0; c < channels; c++)
2111                         ring_buffer_entry[in_pixel_index + c] += horizontal_buffer[in_pixel_index + c] * coefficient;
2112                 }
2113                 break;
2114         }
2115     }
2116 }
2117 
2118 static void stbir__buffer_loop_upsample(stbir__info* stbir_info)
2119 {
2120     int y;
2121     float scale_ratio = stbir_info.vertical_scale;
2122     float out_scanlines_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(1/scale_ratio) * scale_ratio;
2123 
2124     assert(stbir__use_height_upsampling(stbir_info));
2125 
2126     for (y = 0; y < stbir_info.output_h; y++)
2127     {
2128         float in_center_of_out = 0; // Center of the current out scanline in the in scanline space
2129         int in_first_scanline = 0, in_last_scanline = 0;
2130 
2131         stbir__calculate_sample_range_upsample(y, out_scanlines_radius, scale_ratio, stbir_info.vertical_shift, &in_first_scanline, &in_last_scanline, &in_center_of_out);
2132 
2133         assert(in_last_scanline - in_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2134 
2135         if (stbir_info.ring_buffer_begin_index >= 0)
2136         {
2137             // Get rid of whatever we don't need anymore.
2138             while (in_first_scanline > stbir_info.ring_buffer_first_scanline)
2139             {
2140                 if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2141                 {
2142                     // We just popped the last scanline off the ring buffer.
2143                     // Reset it to the empty state.
2144                     stbir_info.ring_buffer_begin_index = -1;
2145                     stbir_info.ring_buffer_first_scanline = 0;
2146                     stbir_info.ring_buffer_last_scanline = 0;
2147                     break;
2148                 }
2149                 else
2150                 {
2151                     stbir_info.ring_buffer_first_scanline++;
2152                     stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2153                 }
2154             }
2155         }
2156 
2157         // Load in new ones.
2158         if (stbir_info.ring_buffer_begin_index < 0)
2159             stbir__decode_and_resample_upsample(stbir_info, in_first_scanline);
2160 
2161         while (in_last_scanline > stbir_info.ring_buffer_last_scanline)
2162             stbir__decode_and_resample_upsample(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2163 
2164         // Now all buffers should be ready to write a row of vertical sampling.
2165         stbir__resample_vertical_upsample(stbir_info, y);
2166     }
2167 }
2168 
2169 static void stbir__empty_ring_buffer(stbir__info* stbir_info, int first_necessary_scanline)
2170 {
2171     int output_stride_bytes = stbir_info.output_stride_bytes;
2172     int channels = stbir_info.channels;
2173     int alpha_channel = stbir_info.alpha_channel;
2174     int type = stbir_info.type;
2175     int colorspace = stbir_info.colorspace;
2176     int output_w = stbir_info.output_w;
2177     void* output_data = stbir_info.output_data;
2178     int decode = STBIR__DECODE(type, colorspace);
2179 
2180     float* ring_buffer = stbir_info.ring_buffer;
2181     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2182 
2183     if (stbir_info.ring_buffer_begin_index >= 0)
2184     {
2185         // Get rid of whatever we don't need anymore.
2186         while (first_necessary_scanline > stbir_info.ring_buffer_first_scanline)
2187         {
2188             if (stbir_info.ring_buffer_first_scanline >= 0 && stbir_info.ring_buffer_first_scanline < stbir_info.output_h)
2189             {
2190                 int output_row_start = stbir_info.ring_buffer_first_scanline * output_stride_bytes;
2191                 float* ring_buffer_entry = stbir__get_ring_buffer_entry(ring_buffer, stbir_info.ring_buffer_begin_index, ring_buffer_length);
2192                 stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, ring_buffer_entry, channels, alpha_channel, decode);
2193             }
2194 
2195             if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2196             {
2197                 // We just popped the last scanline off the ring buffer.
2198                 // Reset it to the empty state.
2199                 stbir_info.ring_buffer_begin_index = -1;
2200                 stbir_info.ring_buffer_first_scanline = 0;
2201                 stbir_info.ring_buffer_last_scanline = 0;
2202                 break;
2203             }
2204             else
2205             {
2206                 stbir_info.ring_buffer_first_scanline++;
2207                 stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2208             }
2209         }
2210     }
2211 }
2212 
2213 static void stbir__buffer_loop_downsample(stbir__info* stbir_info)
2214 {
2215     int y;
2216     float scale_ratio = stbir_info.vertical_scale;
2217     int output_h = stbir_info.output_h;
2218     float in_pixels_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(scale_ratio) / scale_ratio;
2219     int pixel_margin = stbir_info.vertical_filter_pixel_margin;
2220     int max_y = stbir_info.input_h + pixel_margin;
2221 
2222     assert(!stbir__use_height_upsampling(stbir_info));
2223 
2224     for (y = -pixel_margin; y < max_y; y++)
2225     {
2226         float out_center_of_in; // Center of the current out scanline in the in scanline space
2227         int out_first_scanline, out_last_scanline;
2228 
2229         stbir__calculate_sample_range_downsample(y, in_pixels_radius, scale_ratio, stbir_info.vertical_shift, &out_first_scanline, &out_last_scanline, &out_center_of_in);
2230 
2231         assert(out_last_scanline - out_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2232 
2233         if (out_last_scanline < 0 || out_first_scanline >= output_h)
2234             continue;
2235 
2236         stbir__empty_ring_buffer(stbir_info, out_first_scanline);
2237 
2238         stbir__decode_and_resample_downsample(stbir_info, y);
2239 
2240         // Load in new ones.
2241         if (stbir_info.ring_buffer_begin_index < 0)
2242             stbir__add_empty_ring_buffer_entry(stbir_info, out_first_scanline);
2243 
2244         while (out_last_scanline > stbir_info.ring_buffer_last_scanline)
2245             stbir__add_empty_ring_buffer_entry(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2246 
2247         // Now the horizontal buffer is ready to write to all ring buffer rows.
2248         stbir__resample_vertical_downsample(stbir_info, y);
2249     }
2250 
2251     stbir__empty_ring_buffer(stbir_info, stbir_info.output_h);
2252 }
2253 
2254 static void stbir__setup(stbir__info *info, int input_w, int input_h, int output_w, int output_h, int channels)
2255 {
2256     info.input_w = input_w;
2257     info.input_h = input_h;
2258     info.output_w = output_w;
2259     info.output_h = output_h;
2260     info.channels = channels;
2261 }
2262 
2263 static void stbir__calculate_transform(stbir__info *info, float s0, float t0, float s1, float t1, float *transform)
2264 {
2265     info.s0 = s0;
2266     info.t0 = t0;
2267     info.s1 = s1;
2268     info.t1 = t1;
2269 
2270     if (transform)
2271     {
2272         info.horizontal_scale = transform[0];
2273         info.vertical_scale   = transform[1];
2274         info.horizontal_shift = transform[2];
2275         info.vertical_shift   = transform[3];
2276     }
2277     else
2278     {
2279         info.horizontal_scale = (cast(float)info.output_w / info.input_w) / (s1 - s0);
2280         info.vertical_scale = (cast(float)info.output_h / info.input_h) / (t1 - t0);
2281 
2282         info.horizontal_shift = s0 * info.output_w / (s1 - s0);
2283         info.vertical_shift = t0 * info.output_h / (t1 - t0);
2284     }
2285 }
2286 
2287 static void stbir__choose_filter(stbir__info *info, stbir_filter h_filter, stbir_filter v_filter)
2288 {
2289     if (h_filter == 0)
2290         h_filter = stbir__use_upsampling(info.horizontal_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2291     if (v_filter == 0)
2292         v_filter = stbir__use_upsampling(info.vertical_scale)   ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2293     info.horizontal_filter = h_filter;
2294     info.vertical_filter = v_filter;
2295 }
2296 
2297 static uint stbir__calculate_memory(stbir__info *info)
2298 {
2299     int pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2300     int filter_height = stbir__get_filter_pixel_width(info.vertical_filter, info.vertical_scale);
2301 
2302     info.horizontal_num_contributors = stbir__get_contributors(info.horizontal_scale, info.horizontal_filter, info.input_w, info.output_w);
2303     info.vertical_num_contributors   = stbir__get_contributors(info.vertical_scale  , info.vertical_filter  , info.input_h, info.output_h);
2304 
2305     // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
2306     info.ring_buffer_num_entries = filter_height + 1;
2307 
2308     info.horizontal_contributors_size = info.horizontal_num_contributors                  * cast(int)(stbir__contributors.sizeof);
2309     info.horizontal_coefficients_size = stbir__get_total_horizontal_coefficients(info)    * cast(int)(float.sizeof);
2310     info.vertical_contributors_size   = info.vertical_num_contributors                    * cast(int)(stbir__contributors.sizeof);
2311     info.vertical_coefficients_size   = stbir__get_total_vertical_coefficients(info)      * cast(int)(float.sizeof);
2312     info.decode_buffer_size           = (info.input_w + pixel_margin * 2) * info.channels * cast(int)(float.sizeof);
2313     info.horizontal_buffer_size       = info.output_w * info.channels                     * cast(int)(float.sizeof);
2314     info.ring_buffer_size             = info.output_w * info.channels                     * info.ring_buffer_num_entries * cast(int)(float.sizeof);
2315     info.encode_buffer_size           = info.output_w * info.channels                     * cast(int)(float.sizeof);
2316 
2317     assert(info.horizontal_filter != 0);
2318     assert(info.horizontal_filter < stbir__filter_info_table.length); // this now happens too late
2319     assert(info.vertical_filter != 0);
2320     assert(info.vertical_filter < stbir__filter_info_table.length); // this now happens too late
2321 
2322     if (stbir__use_height_upsampling(info))
2323         // The horizontal buffer is for when we're downsampling the height and we
2324         // can't output the result of sampling the decode buffer directly into the
2325         // ring buffers.
2326         info.horizontal_buffer_size = 0;
2327     else
2328         // The encode buffer is to retain precision in the height upsampling method
2329         // and isn't used when height downsampling.
2330         info.encode_buffer_size = 0;
2331 
2332     return info.horizontal_contributors_size + info.horizontal_coefficients_size
2333         + info.vertical_contributors_size + info.vertical_coefficients_size
2334         + info.decode_buffer_size + info.horizontal_buffer_size
2335         + info.ring_buffer_size + info.encode_buffer_size;
2336 }
2337 
2338 static int stbir__resize_allocated(stbir__info *info,
2339     const void* input_data, int input_stride_in_bytes,
2340     void* output_data, int output_stride_in_bytes,
2341     int alpha_channel, uint flags, stbir_datatype type,
2342     stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace,
2343     void* tempmem, size_t tempmem_size_in_bytes)
2344 {
2345     size_t memory_required = stbir__calculate_memory(info);
2346 
2347     int width_stride_input = input_stride_in_bytes ? input_stride_in_bytes : info.channels * info.input_w * stbir__type_size[type];
2348     int width_stride_output = output_stride_in_bytes ? output_stride_in_bytes : info.channels * info.output_w * stbir__type_size[type];
2349 
2350     assert(info.channels >= 0);
2351     assert(info.channels <= STBIR_MAX_CHANNELS);
2352 
2353     if (info.channels < 0 || info.channels > STBIR_MAX_CHANNELS)
2354         return 0;
2355 
2356     assert(info.horizontal_filter < stbir__filter_info_table.length);
2357     assert(info.vertical_filter < stbir__filter_info_table.length);
2358 
2359     if (info.horizontal_filter >= stbir__filter_info_table.length)
2360         return 0;
2361     if (info.vertical_filter >= stbir__filter_info_table.length)
2362         return 0;
2363 
2364     if (alpha_channel < 0)
2365         flags |= STBIR_FLAG_ALPHA_USES_COLORSPACE | STBIR_FLAG_ALPHA_PREMULTIPLIED;
2366 
2367     if (!(flags&STBIR_FLAG_ALPHA_USES_COLORSPACE) || !(flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)) {
2368         assert(alpha_channel >= 0 && alpha_channel < info.channels);
2369     }
2370 
2371     if (alpha_channel >= info.channels)
2372         return 0;
2373 
2374     assert(tempmem);
2375 
2376     if (!tempmem)
2377         return 0;
2378 
2379     assert(tempmem_size_in_bytes >= memory_required);
2380 
2381     if (tempmem_size_in_bytes < memory_required)
2382         return 0;
2383 
2384     memset(tempmem, 0, tempmem_size_in_bytes);
2385 
2386     info.input_data = input_data;
2387     info.input_stride_bytes = width_stride_input;
2388 
2389     info.output_data = output_data;
2390     info.output_stride_bytes = width_stride_output;
2391 
2392     info.alpha_channel = alpha_channel;
2393     info.flags = flags;
2394     info.type = type;
2395     info.edge_horizontal = edge_horizontal;
2396     info.edge_vertical = edge_vertical;
2397     info.colorspace = colorspace;
2398 
2399     info.horizontal_coefficient_width   = stbir__get_coefficient_width  (info.horizontal_filter, info.horizontal_scale);
2400     info.vertical_coefficient_width     = stbir__get_coefficient_width  (info.vertical_filter  , info.vertical_scale  );
2401     info.horizontal_filter_pixel_width  = stbir__get_filter_pixel_width (info.horizontal_filter, info.horizontal_scale);
2402     info.vertical_filter_pixel_width    = stbir__get_filter_pixel_width (info.vertical_filter  , info.vertical_scale  );
2403     info.horizontal_filter_pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2404     info.vertical_filter_pixel_margin   = stbir__get_filter_pixel_margin(info.vertical_filter  , info.vertical_scale  );
2405 
2406     info.ring_buffer_length_bytes = info.output_w * info.channels * cast(int)(float.sizeof);
2407     info.decode_buffer_pixels = info.input_w + info.horizontal_filter_pixel_margin * 2;
2408 
2409     static newtype* STBIR__NEXT_MEMPTR(newtype)(void* current, size_t current_size)
2410     {
2411         return cast(newtype*)( (cast(ubyte*)current) + current_size );
2412     }
2413 
2414     info.horizontal_contributors = cast(stbir__contributors *) tempmem;
2415     info.horizontal_coefficients = STBIR__NEXT_MEMPTR!float              (info.horizontal_contributors, info.horizontal_contributors_size);
2416     info.vertical_contributors   = STBIR__NEXT_MEMPTR!stbir__contributors(info.horizontal_coefficients, info.horizontal_coefficients_size);
2417     info.vertical_coefficients   = STBIR__NEXT_MEMPTR!float              (info.vertical_contributors,   info.vertical_contributors_size);
2418     info.decode_buffer           = STBIR__NEXT_MEMPTR!float              (info.vertical_coefficients,   info.vertical_coefficients_size);
2419 
2420     if (stbir__use_height_upsampling(info))
2421     {
2422         info.horizontal_buffer   = null;
2423         info.ring_buffer         = STBIR__NEXT_MEMPTR!float              (info.decode_buffer,           info.decode_buffer_size);
2424         info.encode_buffer       = STBIR__NEXT_MEMPTR!float              (info.ring_buffer,             info.ring_buffer_size);
2425 
2426         assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.encode_buffer, info.encode_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2427     }
2428     else
2429     {
2430         info.horizontal_buffer   = STBIR__NEXT_MEMPTR!float              (info.decode_buffer,           info.decode_buffer_size);
2431         info.ring_buffer         = STBIR__NEXT_MEMPTR!float              (info.horizontal_buffer,       info.horizontal_buffer_size);
2432         info.encode_buffer = null;
2433 
2434         assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.ring_buffer, info.ring_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2435     }
2436 
2437     // This signals that the ring buffer is empty
2438     info.ring_buffer_begin_index = -1;
2439 
2440     stbir__calculate_filters(info.horizontal_contributors, info.horizontal_coefficients, info.horizontal_filter, info.horizontal_scale, info.horizontal_shift, info.input_w, info.output_w);
2441     stbir__calculate_filters(info.vertical_contributors, info.vertical_coefficients, info.vertical_filter, info.vertical_scale, info.vertical_shift, info.input_h, info.output_h);
2442 
2443     if (stbir__use_height_upsampling(info))
2444         stbir__buffer_loop_upsample(info);
2445     else
2446         stbir__buffer_loop_downsample(info);
2447 
2448     return 1;
2449 }
2450 
2451 
2452 static int stbir__resize_arbitrary(
2453     void *alloc_context,
2454     const void* input_data, int input_w, int input_h, int input_stride_in_bytes,
2455     void* output_data, int output_w, int output_h, int output_stride_in_bytes,
2456     float s0, float t0, float s1, float t1, float *transform,
2457     int channels, int alpha_channel, uint flags, stbir_datatype type,
2458     stbir_filter h_filter, stbir_filter v_filter,
2459     stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace)
2460 {
2461     stbir__info info;
2462     int result;
2463     size_t memory_required;
2464     void* extra_memory;
2465 
2466     stbir__setup(&info, input_w, input_h, output_w, output_h, channels);
2467     stbir__calculate_transform(&info, s0,t0,s1,t1,transform);
2468     stbir__choose_filter(&info, h_filter, v_filter);
2469     memory_required = stbir__calculate_memory(&info);
2470     extra_memory = STBIR_MALLOC(memory_required, alloc_context);
2471 
2472     if (!extra_memory)
2473         return 0;
2474 
2475     result = stbir__resize_allocated(&info, input_data, input_stride_in_bytes,
2476                                             output_data, output_stride_in_bytes,
2477                                             alpha_channel, flags, type,
2478                                             edge_horizontal, edge_vertical,
2479                                             colorspace, extra_memory, memory_required);
2480 
2481     STBIR_FREE(extra_memory, alloc_context);
2482 
2483     return result;
2484 }
2485 
2486 
2487 
2488 int stbir_resize_uint8_srgb_edgemode(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2489                                                     ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2490                                               int num_channels, int alpha_channel, int flags,
2491                                               stbir_edge edge_wrap_mode)
2492 {
2493     return stbir__resize_arbitrary(null, input_pixels, input_w, input_h, input_stride_in_bytes,
2494         output_pixels, output_w, output_h, output_stride_in_bytes,
2495         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
2496         edge_wrap_mode, edge_wrap_mode, STBIR_COLORSPACE_SRGB);
2497 }
2498 
2499 int stbir_resize_uint8_generic( const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2500                                                ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2501                                          int num_channels, int alpha_channel, int flags,
2502                                          stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2503                                          void *alloc_context)
2504 {
2505     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2506         output_pixels, output_w, output_h, output_stride_in_bytes,
2507         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
2508         edge_wrap_mode, edge_wrap_mode, space);
2509 }
2510 
2511 int stbir_resize_uint16_generic(const ushort *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
2512                                                ushort *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
2513                                          int num_channels, int alpha_channel, int flags,
2514                                          stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2515                                          void *alloc_context)
2516 {
2517     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2518         output_pixels, output_w, output_h, output_stride_in_bytes,
2519         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT16, filter, filter,
2520         edge_wrap_mode, edge_wrap_mode, space);
2521 }
2522 
2523 
2524 int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2525                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2526                                    stbir_datatype datatype,
2527                                    int num_channels, int alpha_channel, int flags,
2528                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2529                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2530                                    stbir_colorspace space, void *alloc_context)
2531 {
2532     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2533         output_pixels, output_w, output_h, output_stride_in_bytes,
2534         0,0,1,1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2535         edge_mode_horizontal, edge_mode_vertical, space);
2536 }
2537 
2538 
2539 int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2540                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2541                                    stbir_datatype datatype,
2542                                    int num_channels, int alpha_channel, int flags,
2543                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2544                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2545                                    stbir_colorspace space, void *alloc_context,
2546                                    float x_scale, float y_scale,
2547                                    float x_offset, float y_offset)
2548 {
2549     float[4] transform;
2550     transform[0] = x_scale;
2551     transform[1] = y_scale;
2552     transform[2] = x_offset;
2553     transform[3] = y_offset;
2554     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2555         output_pixels, output_w, output_h, output_stride_in_bytes,
2556         0,0,1,1,transform.ptr,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2557         edge_mode_horizontal, edge_mode_vertical, space);
2558 }
2559 
2560 int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2561                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2562                                    stbir_datatype datatype,
2563                                    int num_channels, int alpha_channel, int flags,
2564                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2565                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2566                                    stbir_colorspace space, void *alloc_context,
2567                                    float s0, float t0, float s1, float t1)
2568 {
2569     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2570         output_pixels, output_w, output_h, output_stride_in_bytes,
2571         s0,t0,s1,t1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2572         edge_mode_horizontal, edge_mode_vertical, space);
2573 }
2574 
2575 /*
2576 ------------------------------------------------------------------------------
2577 This software is available under 2 licenses -- choose whichever you prefer.
2578 ------------------------------------------------------------------------------
2579 ALTERNATIVE A - MIT License
2580 Copyright (c) 2017 Sean Barrett
2581 Permission is hereby granted, free of charge, to any person obtaining a copy of
2582 this software and associated documentation files (the "Software"), to deal in
2583 the Software without restriction, including without limitation the rights to
2584 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
2585 of the Software, and to permit persons to whom the Software is furnished to do
2586 so, subject to the following conditions:
2587 The above copyright notice and this permission notice shall be included in all
2588 copies or substantial portions of the Software.
2589 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2590 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2591 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2592 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2593 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2594 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2595 SOFTWARE.
2596 ------------------------------------------------------------------------------
2597 ALTERNATIVE B - Public Domain (www.unlicense.org)
2598 This is free and unencumbered software released into the public domain.
2599 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
2600 software, either in source code form or as a compiled binary, for any purpose,
2601 commercial or non-commercial, and by any means.
2602 In jurisdictions that recognize copyright laws, the author or authors of this
2603 software dedicate any and all copyright interest in the software to the public
2604 domain. We make this dedication for the benefit of the public at large and to
2605 the detriment of our heirs and successors. We intend this dedication to be an
2606 overt act of relinquishment in perpetuity of all present and future rights to
2607 this software under copyright law.
2608 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2609 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2610 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2611 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
2612 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
2613 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2614 ------------------------------------------------------------------------------
2615 */