1 /* stb_image_resize - v0.96 - public domain image resizing
2    by Jorge L Rodriguez (@VinoBS) - 2014
3    http://github.com/nothings/stb
4 
5    Written with emphasis on usability, portability, and efficiency. (No
6    SIMD or threads, so it be easily outperformed by libs that use those.)
7    Only scaling and translation is supported, no rotations or shears.
8    Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
9 
10    QUICKSTART
11       stbir_resize_uint8(      input_pixels , in_w , in_h , 0,
12                                output_pixels, out_w, out_h, 0, num_channels)
13 
14       stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
15                                output_pixels, out_w, out_h, 0,
16                                num_channels , alpha_chan  , 0)
17       stbir_resize_uint8_srgb_edgemode(
18                                input_pixels , in_w , in_h , 0,
19                                output_pixels, out_w, out_h, 0,
20                                num_channels , alpha_chan  , 0, STBIR_EDGE_CLAMP)
21                                                             // WRAP/REFLECT/ZERO
22 
23    FULL API
24       See the "header file" section of the source for API documentation.
25 
26    ADDITIONAL DOCUMENTATION
27 
28       SRGB & FLOATING POINT REPRESENTATION
29          The sRGB functions presume IEEE floating point. If you do not have
30          IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
31          a slower implementation.
32 
33       MEMORY ALLOCATION
34          The resize functions here perform a single memory allocation using
35          malloc. To control the memory allocation, before the #include that
36          triggers the implementation, do:
37 
38             #define STBIR_MALLOC(size,context) ...
39             #define STBIR_FREE(ptr,context)   ...
40 
41          Each resize function makes exactly one call to malloc/free, so to use
42          temp memory, store the temp memory in the context and return that.
43 
44       DEFAULT FILTERS
45          For functions which don't provide explicit control over what filters
46          to use, you can change the compile-time defaults with
47 
48             #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
49             #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
50 
51          See stbir_filter in the header-file section for the list of filters.
52 
53       NEW FILTERS
54          A number of 1D filter kernels are used. For a list of
55          supported filters see the stbir_filter enum. To add a new filter,
56          write a filter function and add it to stbir__filter_info_table.
57 
58       MAX CHANNELS
59          If your image has more than 64 channels, define STBIR_MAX_CHANNELS
60          to the max you'll have.
61 
62       ALPHA CHANNEL
63          Most of the resizing functions provide the ability to control how
64          the alpha channel of an image is processed. The important things
65          to know about this:
66 
67          1. The best mathematically-behaved version of alpha to use is
68          called "premultiplied alpha", in which the other color channels
69          have had the alpha value multiplied in. If you use premultiplied
70          alpha, linear filtering (such as image resampling done by this
71          library, or performed in texture units on GPUs) does the "right
72          thing". While premultiplied alpha is standard in the movie CGI
73          industry, it is still uncommon in the videogame/real-time world.
74 
75          If you linearly filter non-premultiplied alpha, strange effects
76          occur. (For example, the 50/50 average of 99% transparent bright green
77          and 1% transparent black produces 50% transparent dark green when
78          non-premultiplied, whereas premultiplied it produces 50%
79          transparent near-black. The former introduces green energy
80          that doesn't exist in the source image.)
81 
82          2. Artists should not edit premultiplied-alpha images; artists
83          want non-premultiplied alpha images. Thus, art tools generally output
84          non-premultiplied alpha images.
85 
86          3. You will get best results in most cases by converting images
87          to premultiplied alpha before processing them mathematically.
88 
89          4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
90          resizer does not do anything special for the alpha channel;
91          it is resampled identically to other channels. This produces
92          the correct results for premultiplied-alpha images, but produces
93          less-than-ideal results for non-premultiplied-alpha images.
94 
95          5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
96          then the resizer weights the contribution of input pixels
97          based on their alpha values, or, equivalently, it multiplies
98          the alpha value into the color channels, resamples, then divides
99          by the resultant alpha value. Input pixels which have alpha=0 do
100          not contribute at all to output pixels unless _all_ of the input
101          pixels affecting that output pixel have alpha=0, in which case
102          the result for that pixel is the same as it would be without
103          STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
104          input images in integer formats. For input images in float format,
105          input pixels with alpha=0 have no effect, and output pixels
106          which have alpha=0 will be 0 in all channels. (For float images,
107          you can manually achieve the same result by adding a tiny epsilon
108          value to the alpha channel of every image, and then subtracting
109          or clamping it at the end.)
110 
111          6. You can suppress the behavior described in #5 and make
112          all-0-alpha pixels have 0 in all channels by #defining
113          STBIR_NO_ALPHA_EPSILON.
114 
115          7. You can separately control whether the alpha channel is
116          interpreted as linear or affected by the colorspace. By default
117          it is linear; you almost never want to apply the colorspace.
118          (For example, graphics hardware does not apply sRGB conversion
119          to the alpha channel.)
120 
121    CONTRIBUTORS
122       Jorge L Rodriguez: Implementation
123       Sean Barrett: API design, optimizations
124       Aras Pranckevicius: bugfix
125       Nathan Reed: warning fixes
126 
127    REVISIONS
128       0.97 (2020-02-02) fixed warning
129       0.96 (2019-03-04) fixed warnings
130       0.95 (2017-07-23) fixed warnings
131       0.94 (2017-03-18) fixed warnings
132       0.93 (2017-03-03) fixed bug with certain combinations of heights
133       0.92 (2017-01-02) fix integer overflow on large (>2GB) images
134       0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
135       0.90 (2014-09-17) first released version
136 
137    LICENSE
138      See end of file for license information.
139 
140    TODO
141       Don't decode all of the image data when only processing a partial tile
142       Don't use full-width decode buffers when only processing a partial tile
143       When processing wide images, break processing into tiles so data fits in L1 cache
144       Installable filters?
145       Resize that respects alpha test coverage
146          (Reference code: FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage:
147          https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp )
148 */
149 /**
150 Resizer ported to D from C. Removed a few features that did'nt make sense in Dplug.
151 Added Ryhor Spivak work on Lanczos filter... also added a few more lanczos kernels.
152 Copyright: (c) Guillaume Piolat (2021)
153 */
154 module dplug.graphics.stb_image_resize;
155 
156 
157 enum DPLUG_USE_STB_IMAGE_RESIZE_V2 = true;
158 
159 
160 static if (!DPLUG_USE_STB_IMAGE_RESIZE_V2)
161 {
162  
163 
164 import core.stdc.stdlib: malloc, free;
165 import core.stdc.string: memset;
166 
167 import inteli.smmintrin;
168 import inteli.math;
169 
170 import dplug.core.math : fast_fabs, fast_pow, fast_ceil, fast_floor, fast_sin;
171 import dplug.core.vec;
172 
173 
174 nothrow:
175 @nogc:
176 
177 
178 //////////////////////////////////////////////////////////////////////////////
179 //
180 // Easy-to-use API:
181 //
182 //     * "input pixels" points to an array of image data with 'num_channels' channels (e.g. RGB=3, RGBA=4)
183 //     * input_w is input image width (x-axis), input_h is input image height (y-axis)
184 //     * stride is the offset between successive rows of image data in memory, in bytes. you can
185 //       specify 0 to mean packed continuously in memory
186 //     * alpha channel is treated identically to other channels.
187 //     * colorspace is linear or sRGB as specified by function name
188 //     * returned result is 1 for success or 0 in case of an error.
189 //       #define assert() to trigger an assert on parameter validation errors.
190 //     * Memory required grows approximately linearly with input and output size, but with
191 //       discontinuities at input_w == output_w and input_h == output_h.
192 //     * These functions use a "default" resampling filter defined at compile time. To change the filter,
193 //       you can change the compile-time defaults by #defining STBIR_DEFAULT_FILTER_UPSAMPLE
194 //       and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or you can use the medium-complexity API.
195 
196 int stbir_resize_uint8(const(ubyte)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
197                        ubyte* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
198                        int num_channels, int filter, void *alloc_context)
199 {
200     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
201                                    output_pixels, output_w, output_h, output_stride_in_bytes,
202                                    0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT8, filter, filter,
203                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
204 }
205 
206 int stbir_resize_uint16(const(ushort)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
207                        ushort* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
208                        int num_channels, int filter, void *alloc_context)
209 {
210     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
211                                    output_pixels, output_w, output_h, output_stride_in_bytes,
212                                    0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT16, filter, filter,
213                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
214 }
215 
216 
217 // The following functions interpret image data as gamma-corrected sRGB.
218 // Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
219 // or otherwise provide the index of the alpha channel. Flags value
220 // of 0 will probably do the right thing if you're not sure what
221 // the flags mean.
222 
223 enum STBIR_ALPHA_CHANNEL_NONE      = -1;
224 
225 // Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
226 // use alpha-weighted resampling (effectively premultiplying, resampling,
227 // then unpremultiplying).
228 enum STBIR_FLAG_ALPHA_PREMULTIPLIED = (1 << 0);
229 
230 // The specified alpha channel should be handled as gamma-corrected value even
231 // when doing sRGB operations.
232 enum STBIR_FLAG_ALPHA_USES_COLORSPACE = (1 << 1);
233 
234 int stbir_resize_uint8_srgb(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
235                             ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
236                             int num_channels, int alpha_channel, int flags, void* alloc_context, int filter)
237 {
238     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
239                                    output_pixels, output_w, output_h, output_stride_in_bytes,
240                                    0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
241                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB);
242 }
243 
244 alias stbir_edge = int;
245 enum : stbir_edge
246 {
247     STBIR_EDGE_CLAMP   = 1,
248     STBIR_EDGE_REFLECT = 2,
249     STBIR_EDGE_WRAP    = 3,
250     STBIR_EDGE_ZERO    = 4,
251 }
252 
253 
254 //////////////////////////////////////////////////////////////////////////////
255 //
256 // Medium-complexity API
257 //
258 // This extends the easy-to-use API as follows:
259 //
260 //     * Alpha-channel can be processed separately
261 //       * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
262 //         * Alpha channel will not be gamma corrected (unless flags&STBIR_FLAG_GAMMA_CORRECT)
263 //         * Filters will be weighted by alpha channel (unless flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
264 //     * Filter can be selected explicitly
265 //     * uint16 image type
266 //     * sRGB colorspace available for all types
267 //     * context parameter for passing to STBIR_MALLOC
268 
269 alias stbir_filter = int;
270 enum : stbir_filter
271 {
272     STBIR_FILTER_DEFAULT      = 0,  // use same filter type that easy-to-use API chooses
273     STBIR_FILTER_BOX          = 1,  // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
274     STBIR_FILTER_TRIANGLE     = 2,  // On upsampling, produces same results as bilinear texture filtering
275     STBIR_FILTER_CUBICBSPLINE = 3,  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
276     STBIR_FILTER_CATMULLROM   = 4,  // An interpolating cubic spline
277     STBIR_FILTER_MITCHELL     = 5,  // Mitchell-Netrevalli filter with B=1/3, C=1/3
278     STBIR_FILTER_LANCZOS2     = 6,  // Lanczos 2
279     STBIR_FILTER_LANCZOS2_5   = 7,  // Lanczos 2.5
280     STBIR_FILTER_LANCZOS3     = 8,  // Lanczos 3
281     STBIR_FILTER_LANCZOS4     = 9,  // Lanczos 4
282     STBIR_FILTER_MK_2013      = 10, // Magic Kernel, without sharpening
283     STBIR_FILTER_MKS_2013_86  = 11, // Magic Kernel + Sharp 2013, but with only 86% sharpening (Dplug Issue #729)
284     STBIR_FILTER_MKS_2013     = 12, // Magic Kernel + Sharp 2013 (the one recommended by John Costella in 2013)
285     STBIR_FILTER_MKS_2021     = 13, // Magic Kernel + Sharp 2021 (the one recommended to us by John Costella in 2022)
286 
287     // To be continued, as John Costella has other kernels...
288 }
289 
290 alias stbir_colorspace = int;
291 enum : stbir_colorspace 
292 {
293     STBIR_COLORSPACE_LINEAR,
294     STBIR_COLORSPACE_SRGB,
295 
296     STBIR_MAX_COLORSPACES,
297 }
298 
299 
300 //////////////////////////////////////////////////////////////////////////////
301 //
302 // Full-complexity API
303 //
304 // This extends the medium API as follows:
305 //
306 //     * uint32 image type
307 //     * not typesafe
308 //     * separate filter types for each axis
309 //     * separate edge modes for each axis
310 //     * can specify scale explicitly for subpixel correctness
311 //     * can specify image source tile using texture coordinates
312 
313 alias stbir_datatype = int;
314 enum : stbir_datatype
315 {
316     STBIR_TYPE_UINT8 ,
317     STBIR_TYPE_UINT16,
318     STBIR_TYPE_UINT32,
319     STBIR_TYPE_FLOAT ,
320 
321     STBIR_MAX_TYPES
322 }
323 
324 // (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing style: [0, 1]x[0, 1]) of a region of the input image to use.
325 
326 struct STBAllocatorContext
327 {
328 nothrow:
329 @nogc:
330     void* buf = null;
331     size_t length = 0;
332 
333     @disable this(this);
334 
335     ~this()
336     {
337         alignedFree(buf, 1);
338     }
339 
340     void* reallocDiscard(size_t numBytes)
341     {
342         if (length < numBytes)
343         {         
344             buf = alignedReallocDiscard(buf, numBytes, 1);
345             length = numBytes;
346         }
347         return buf;
348     }
349 }
350 
351 void* STBIR_MALLOC(size_t size, void* context)
352 {
353     assert(context !is null);
354     STBAllocatorContext* alloc = cast(STBAllocatorContext*)context;
355     return alloc.reallocDiscard(size);
356 }
357 
358 void STBIR_FREE(void* p, void* context)
359 {
360     assert(context !is null);
361     // will be freed when resizer is freed, because it's relatively small and shared.
362 }
363 
364 enum STBIR_DEFAULT_FILTER_UPSAMPLE = STBIR_FILTER_CATMULLROM;
365 
366 enum STBIR_DEFAULT_FILTER_DOWNSAMPLE = STBIR_FILTER_MITCHELL;
367 
368 enum STBIR_MAX_CHANNELS = 4;
369 
370 // This value is added to alpha just before premultiplication to avoid
371 // zeroing out color values. It is equivalent to 2^-80. If you don't want
372 // that behavior (it may interfere if you have floating point images with
373 // very small alpha values) then you can define STBIR_NO_ALPHA_EPSILON to
374 // disable it.
375 enum float STBIR_ALPHA_EPSILON = (cast(float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20));
376 
377 // must match stbir_datatype
378 static immutable ubyte[4] stbir__type_size = 
379 [
380     1, // STBIR_TYPE_UINT8
381     2, // STBIR_TYPE_UINT16
382     4, // STBIR_TYPE_UINT32
383     4, // STBIR_TYPE_FLOAT
384 ];
385 
386 // Kernel function centered at 0
387 alias stbir__kernel_fn = float function(float x, float scale);
388 alias stbir__support_fn = float function(float scale);
389 
390 struct stbir__filter_info
391 {
392     stbir__kernel_fn kernel;
393     stbir__support_fn support;
394 }
395 
396 // When upsampling, the contributors are which source pixels contribute.
397 // When downsampling, the contributors are which destination pixels are contributed to.
398 struct stbir__contributors
399 {
400     int n0; // First contributing pixel
401     int n1; // Last contributing pixel
402 }
403 
404 struct stbir__info
405 {
406     const(void)* input_data;
407     int input_w;
408     int input_h;
409     int input_stride_bytes;
410 
411     void* output_data;
412     int output_w;
413     int output_h;
414     int output_stride_bytes;
415 
416     float s0, t0, s1, t1;
417 
418     float horizontal_shift; // Units: output pixels
419     float vertical_shift;   // Units: output pixels
420     float horizontal_scale;
421     float vertical_scale;
422 
423     int channels;
424     int alpha_channel;
425     uint flags;
426     stbir_datatype type;
427     stbir_filter horizontal_filter;
428     stbir_filter vertical_filter;
429     stbir_edge edge_horizontal;
430     stbir_edge edge_vertical;
431     stbir_colorspace colorspace;
432 
433     stbir__contributors* horizontal_contributors;
434     float* horizontal_coefficients;
435 
436     stbir__contributors* vertical_contributors;
437     float* vertical_coefficients;
438 
439     int decode_buffer_pixels;
440     float* decode_buffer;
441 
442     float* horizontal_buffer;
443 
444     // cache these because ceil/floor are inexplicably showing up in profile
445     int horizontal_coefficient_width;
446     int vertical_coefficient_width;
447     int horizontal_filter_pixel_width;
448     int vertical_filter_pixel_width;
449     int horizontal_filter_pixel_margin;
450     int vertical_filter_pixel_margin;
451     int horizontal_num_contributors;
452     int vertical_num_contributors;
453 
454     int ring_buffer_length_bytes;   // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
455     int ring_buffer_num_entries;    // Total number of entries in the ring buffer.
456     int ring_buffer_first_scanline;
457     int ring_buffer_last_scanline;
458     int ring_buffer_begin_index;    // first_scanline is at this index in the ring buffer
459     float* ring_buffer;
460 
461     float* encode_buffer; // A temporary buffer to store floats so we don't lose precision while we do multiply-adds.
462 
463     int horizontal_contributors_size;
464     int horizontal_coefficients_size;
465     int vertical_contributors_size;
466     int vertical_coefficients_size;
467     int decode_buffer_size;
468     int horizontal_buffer_size;
469     int ring_buffer_size;
470     int encode_buffer_size;
471 }
472 
473 
474 static immutable float stbir__max_uint8_as_float  = 255.0f;
475 static immutable float stbir__max_uint16_as_float = 65535.0f;
476 static immutable double stbir__max_uint32_as_float = 4294967295.0;
477 
478 
479 int stbir__min(int a, int b)
480 {
481     return a < b ? a : b;
482 }
483 
484 float stbir__saturate(float x)
485 {
486     if (x < 0)
487         return 0;
488 
489     if (x > 1)
490         return 1;
491 
492     return x;
493 }
494 
495 static immutable float[256] stbir__srgb_uchar_to_linear_float = 
496 [
497     0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
498     0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
499     0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
500     0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
501     0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
502     0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
503     0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
504     0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
505     0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
506     0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
507     0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
508     0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
509     0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
510     0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
511     0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
512     0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
513     0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
514     0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
515     0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
516     0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
517     0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
518     0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
519     0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
520     0.982251f, 0.991102f, 1.0f
521 ];
522 
523 float stbir__srgb_to_linear(float f)
524 {
525     if (f <= 0.04045f)
526         return f / 12.92f;
527     else
528         return cast(float)fast_pow((f + 0.055f) / 1.055f, 2.4f);
529 }
530 
531 float stbir__linear_to_srgb(float f)
532 {
533     if (f <= 0.0031308f)
534         return f * 12.92f;
535     else
536         return 1.055f * _mm_pow_ss(f, 0.4166666666f) - 0.055f;
537 }
538 /*
539 __m128 stbir__linear_to_srgb(__m128 f)
540 {
541     __m128 below = f * _mm_set1_ps(12.92f);
542     __m128 exponentiated = _mm_set1_ps(1.055f) * _mm_pow_ps(f, 0.4166666666f) - _mm_set1_ps(0.055f);
543     __m128 mask  =_mm_cmplt_ps(f, _mm_set1_ps(0.0031308f));
544     __m128i result = (cast(__m128i)below & cast(__m128i)mask) | (cast(__m128i)exponentiated & ~cast(__m128i)mask);
545     return cast(__m128)result;
546 }*/
547 
548 union stbir__FP32
549 {
550     uint u;
551     float f;
552 }
553 
554 static immutable uint[104] fp32_to_srgb8_tab4 = 
555 [
556     0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
557     0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
558     0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
559     0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
560     0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
561     0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
562     0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
563     0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
564     0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
565     0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
566     0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
567     0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
568     0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
569 ];
570 
571 ubyte stbir__linear_to_srgb_uchar(float in_)
572 {
573     static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
574     static const stbir__FP32 minval = { (127-13) << 23 };
575     uint tab,bias,scale,t;
576     stbir__FP32 f;
577 
578     // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
579     // The tests are carefully written so that NaNs map to 0, same as in the reference
580     // implementation.
581     if (!(in_ > minval.f)) // written this way to trap NaNs
582         in_ = minval.f;
583     if (in_ > almostone.f)
584         in_ = almostone.f;
585 
586     // Do the table lookup and unpack bias, scale
587     f.f = in_;
588     tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
589     bias = (tab >> 16) << 9;
590     scale = tab & 0xffff;
591 
592     // Grab next-highest mantissa bits and perform linear interpolation
593     t = (f.u >> 12) & 0xff;
594     return cast(ubyte) ((bias + scale*t) >> 16);
595 }
596 
597 // same but 4 float at once
598 __m128i stbir__linear_to_srgb_uchar(__m128 in_)
599 {
600     static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
601     static const stbir__FP32 minval = { (127-13) << 23 };
602     in_ = _mm_max_ps(in_, _mm_set1_ps(minval.f));
603     in_ = _mm_min_ps(in_, _mm_set1_ps(almostone.f));
604 
605     __m128i f = cast(__m128i) in_;
606     __m128i tblIndex = _mm_srli_epi32(f - _mm_set1_epi32(minval.u), 20);
607 
608     __m128i tab = _mm_setr_epi32(fp32_to_srgb8_tab4[ tblIndex.array[0] ], 
609                                  fp32_to_srgb8_tab4[ tblIndex.array[1] ],
610                                  fp32_to_srgb8_tab4[ tblIndex.array[2] ],
611                                  fp32_to_srgb8_tab4[ tblIndex.array[3] ]);
612     __m128i bias = _mm_slli_epi32(_mm_srli_epi32(tab, 16), 9);
613     __m128i scale = _mm_and_si128(tab, _mm_set1_epi32(0xffff));
614 
615     __m128i t = _mm_srli_epi32(f, 12) &  _mm_set1_epi32(0xff);
616     __m128i r = _mm_srli_epi32(bias + _mm_mullo_epi32(scale, t), 16);
617     __m128i zero = _mm_setzero_si128();
618     r = _mm_packs_epi32(r, zero);
619     r = _mm_packus_epi16(r, zero);
620     return r;
621 }
622 
623 float stbir__filter_trapezoid(float x, float scale)
624 {
625     float halfscale = scale / 2;
626     float t = 0.5f + halfscale;
627     assert(scale <= 1);
628 
629     x = cast(float)fast_fabs(x);
630 
631     if (x >= t)
632         return 0;
633     else
634     {
635         float r = 0.5f - halfscale;
636         if (x <= r)
637             return 1;
638         else
639             return (t - x) / scale;
640     }
641 }
642 
643 float stbir__support_trapezoid(float scale)
644 {
645     assert(scale <= 1);
646     return 0.5f + scale / 2;
647 }
648 
649 float stbir__filter_triangle(float x, float s)
650 {
651     x = cast(float)fast_fabs(x);
652 
653     if (x <= 1.0f)
654         return 1 - x;
655     else
656         return 0;
657 }
658 
659 float stbir__filter_cubic(float x, float s)
660 {
661     x = cast(float)fast_fabs(x);
662 
663     if (x < 1.0f)
664         return (4 + x*x*(3*x - 6))/6;
665     else if (x < 2.0f)
666         return (8 + x*(-12 + x*(6 - x)))/6;
667 
668     return (0.0f);
669 }
670 
671 float stbir__filter_catmullrom(float x, float s)
672 {
673     x = cast(float)fast_fabs(x);
674 
675     if (x < 1.0f)
676         return 1 - x*x*(2.5f - 1.5f*x);
677     else if (x < 2.0f)
678         return 2 - x*(4 + x*(0.5f*x - 2.5f));
679 
680     return (0.0f);
681 }
682 
683 float stbir__filter_mitchell(float x, float s)
684 {
685     x = cast(float)fast_fabs(x);
686 
687     if (x < 1.0f)
688         return (16 + x*x*(21 * x - 36))/18;
689     else if (x < 2.0f)
690         return (32 + x*(-60 + x*(36 - 7*x)))/18;
691 
692     return (0.0f);
693 }
694 
695 float stbir__filter_lanczos(float A)(float x, float s)
696 {
697     x = cast(float)fast_fabs(x);
698 
699     if (x <= float.min_normal)
700         return 1.0f;
701 
702     if (x < A)
703     {
704         float pix = 3.14159265358979323846f*x;
705         return A*fast_sin(pix)*fast_sin(pix/A)/(pix*pix);
706     }
707 
708     return 0.0f;
709 }
710 
711 float stbir__filter_mk2013(float x, float s) nothrow @nogc
712 {
713     x = fast_fabs(x);
714     if (x < 0.5)
715         return 0.75 - x * x;
716 
717     if (x < 1.5)
718         return 0.5 * (x - 1.5)*(x - 1.5);
719 
720     return 0.0f;
721 }
722 
723 float stbir__filter_mks2013_hs(float x, float s) nothrow @nogc
724 {
725     // Perhaps possible to do better with "MKS 2021".
726     return 0.14f * stbir__filter_mk2013(x, s)
727          + 0.86f * stbir__filter_mks2013(x, s);
728 }
729 
730 float stbir__filter_mks2013(float x, float s) nothrow @nogc
731 {
732     x = fast_fabs(x);
733 
734     if (x <= float.min_normal)
735         return 17.0f / 16.0f;
736 
737     if (x < 0.5)
738         return 17.0 / 16.0 - 7.0 * x * x / 4.0;
739 
740     if (x < 1.5)
741     {
742         double x2 = x * x;
743         return 0.25 * (4 * x2 - 11.0 * x + 7.0);
744     }
745 
746     if (x < 2.5)
747     {
748         return -0.125 * (x - 5.0 / 2.0)*(x - 5.0 / 2.0);
749     }
750     return 0.0f;
751 }
752 
753 float stbir__filter_mks2021(float x, float s) nothrow @nogc
754 {
755     x = fast_fabs(x);
756     float x2 = x * x;
757 
758     if (x < 0.5)
759         return 577.0f / 576.0f - (239.0f / 144.0f) * x2;
760 
761     if (x < 1.5)
762         return (140 * x2 - 379 * x + 239) / 144.0f;
763 
764     if (x < 2.5)
765         return -(24 * x2 - 113 * x + 130) / 144.0f;
766 
767     if (x < 3.5)
768         return (4 * x2 - 27 * x + 45) / 144.0f;
769 
770     if (x < 4.5)
771         return -(4 * x2 - 36 * x + 81) / 1152.0f;
772 
773     return 0.0f;
774 }
775 
776 float stbir__support_zero(float s)
777 {
778     return 0;
779 }
780 
781 float stbir__support_one(float s)
782 {
783     return 1;
784 }
785 
786 float stbir__support_two(float s)
787 {
788     return 2;
789 }
790 
791 float stbir__support_three(float s)
792 {
793     return 3;
794 }
795 
796 float stbir__support_four(float s)
797 {
798     return 4;
799 }
800 
801 float stbir__support_five(float s)
802 {
803     return 5;
804 }
805 
806 static immutable stbir__filter_info[14] stbir__filter_info_table = 
807 [
808         { null,                      &stbir__support_zero },
809         { &stbir__filter_trapezoid,  &stbir__support_trapezoid },
810         { &stbir__filter_triangle,   &stbir__support_one },
811         { &stbir__filter_cubic,      &stbir__support_two },
812         { &stbir__filter_catmullrom, &stbir__support_two },
813         { &stbir__filter_mitchell,   &stbir__support_two },
814         { &stbir__filter_lanczos!2.0f, &stbir__support_two },
815         { &stbir__filter_lanczos!2.5f, &stbir__support_three },
816         { &stbir__filter_lanczos!3.0f, &stbir__support_three },
817         { &stbir__filter_lanczos!4.0f, &stbir__support_four },
818         { &stbir__filter_mk2013,       &stbir__support_three },
819         { &stbir__filter_mks2013_hs,   &stbir__support_three },
820         { &stbir__filter_mks2013,      &stbir__support_three },
821         { &stbir__filter_mks2021,      &stbir__support_five },
822         ];
823 
824 
825 static int stbir__use_upsampling(float ratio)
826 {
827     return ratio > 1;
828 }
829 
830 static int stbir__use_width_upsampling(stbir__info* stbir_info)
831 {
832     return stbir__use_upsampling(stbir_info.horizontal_scale);
833 }
834 
835 static int stbir__use_height_upsampling(stbir__info* stbir_info)
836 {
837     return stbir__use_upsampling(stbir_info.vertical_scale);
838 }
839 
840 // This is the maximum number of input samples that can affect an output sample
841 // with the given filter
842 static int stbir__get_filter_pixel_width(stbir_filter filter, float scale)
843 {
844     assert(filter != 0);
845     assert(filter < stbir__filter_info_table.length);
846 
847     if (stbir__use_upsampling(scale))
848         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2);
849     else
850         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2 / scale);
851 }
852 
853 // This is how much to expand buffers to account for filters seeking outside
854 // the image boundaries.
855 static int stbir__get_filter_pixel_margin(stbir_filter filter, float scale)
856 {
857     return stbir__get_filter_pixel_width(filter, scale) / 2;
858 }
859 
860 static int stbir__get_coefficient_width(stbir_filter filter, float scale)
861 {
862     if (stbir__use_upsampling(scale))
863         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1 / scale) * 2);
864     else
865         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2);
866 }
867 
868 static int stbir__get_contributors(float scale, stbir_filter filter, int input_size, int output_size)
869 {
870     if (stbir__use_upsampling(scale))
871         return output_size;
872     else
873         return (input_size + stbir__get_filter_pixel_margin(filter, scale) * 2);
874 }
875 
876 static int stbir__get_total_horizontal_coefficients(stbir__info* info)
877 {
878     return info.horizontal_num_contributors
879          * stbir__get_coefficient_width      (info.horizontal_filter, info.horizontal_scale);
880 }
881 
882 static int stbir__get_total_vertical_coefficients(stbir__info* info)
883 {
884     return info.vertical_num_contributors
885          * stbir__get_coefficient_width      (info.vertical_filter, info.vertical_scale);
886 }
887 
888 static stbir__contributors* stbir__get_contributor(stbir__contributors* contributors, int n)
889 {
890     return &contributors[n];
891 }
892 
893 // For perf reasons this code is duplicated in stbir__resample_horizontal_upsample/downsample,
894 // if you change it here change it there too.
895 static float* stbir__get_coefficient(float* coefficients, stbir_filter filter, float scale, int n, int c)
896 {
897     int width = stbir__get_coefficient_width(filter, scale);
898     return &coefficients[width*n + c];
899 }
900 
901 static int stbir__edge_wrap_slow(stbir_edge edge, int n, int max)
902 {
903     switch (edge)
904     {
905     case STBIR_EDGE_ZERO:
906         return 0; // we'll decode the wrong pixel here, and then overwrite with 0s later
907 
908     case STBIR_EDGE_CLAMP:
909         if (n < 0)
910             return 0;
911 
912         if (n >= max)
913             return max - 1;
914 
915         return n; // NOTREACHED
916 
917     case STBIR_EDGE_REFLECT:
918     {
919         if (n < 0)
920         {
921             if (n < max)
922                 return -n;
923             else
924                 return max - 1;
925         }
926 
927         if (n >= max)
928         {
929             int max2 = max * 2;
930             if (n >= max2)
931                 return 0;
932             else
933                 return max2 - n - 1;
934         }
935 
936         return n; // NOTREACHED
937     }
938 
939     case STBIR_EDGE_WRAP:
940         if (n >= 0)
941             return (n % max);
942         else
943         {
944             int m = (-n) % max;
945 
946             if (m != 0)
947                 m = max - m;
948 
949             return (m);
950         }
951         // NOTREACHED
952 
953     default:
954         assert(false, "Unimplemented edge type");
955     }
956 }
957 
958 static int stbir__edge_wrap(stbir_edge edge, int n, int max)
959 {
960     // avoid per-pixel switch
961     if (n >= 0 && n < max)
962         return n;
963     return stbir__edge_wrap_slow(edge, n, max);
964 }
965 
966 // What input pixels contribute to this output pixel?
967 static void stbir__calculate_sample_range_upsample(int n, float out_filter_radius, float scale_ratio, float out_shift, int* in_first_pixel, int* in_last_pixel, float* in_center_of_out)
968 {
969     float out_pixel_center = cast(float)n + 0.5f;
970     float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
971     float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
972 
973     float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) / scale_ratio;
974     float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) / scale_ratio;
975 
976     *in_center_of_out = (out_pixel_center + out_shift) / scale_ratio;
977     *in_first_pixel = cast(int)(fast_floor(in_pixel_influence_lowerbound + 0.5));
978     *in_last_pixel = cast(int)(fast_floor(in_pixel_influence_upperbound - 0.5));
979 }
980 
981 // What output pixels does this input pixel contribute to?
982 static void stbir__calculate_sample_range_downsample(int n, float in_pixels_radius, float scale_ratio, float out_shift, int* out_first_pixel, int* out_last_pixel, float* out_center_of_in)
983 {
984     float in_pixel_center = cast(float)n + 0.5f;
985     float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
986     float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
987 
988     float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale_ratio - out_shift;
989     float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale_ratio - out_shift;
990 
991     *out_center_of_in = in_pixel_center * scale_ratio - out_shift;
992     *out_first_pixel = cast(int)(fast_floor(out_pixel_influence_lowerbound + 0.5));
993     *out_last_pixel = cast(int)(fast_floor(out_pixel_influence_upperbound - 0.5));
994 }
995 
996 static void stbir__calculate_coefficients_upsample(stbir_filter filter, float scale, int in_first_pixel, int in_last_pixel, float in_center_of_out, stbir__contributors* contributor, float* coefficient_group)
997 {
998     int i;
999     float total_filter = 0;
1000     float filter_scale;
1001 
1002     assert(in_last_pixel - in_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
1003 
1004     contributor.n0 = in_first_pixel;
1005     contributor.n1 = in_last_pixel;
1006 
1007     assert(contributor.n1 >= contributor.n0);
1008 
1009     for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
1010     {
1011         float in_pixel_center = cast(float)(i + in_first_pixel) + 0.5f;
1012         coefficient_group[i] = stbir__filter_info_table[filter].kernel(in_center_of_out - in_pixel_center, 1 / scale);
1013 
1014         // If the coefficient is zero, skip it. (Don't do the <0 check here, we want the influence of those outside pixels.)
1015         if (i == 0 && !coefficient_group[i])
1016         {
1017             contributor.n0 = ++in_first_pixel;
1018             i--;
1019             continue;
1020         }
1021 
1022         total_filter += coefficient_group[i];
1023     }
1024 
1025     assert(stbir__filter_info_table[filter].kernel(cast(float)(in_last_pixel + 1) + 0.5f - in_center_of_out, 1/scale) == 0);
1026 
1027     assert(total_filter > 0.9);
1028     assert(total_filter < 1.1f); // Make sure it's not way off.
1029 
1030     // Make sure the sum of all coefficients is 1.
1031     filter_scale = 1 / total_filter;
1032 
1033     for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
1034         coefficient_group[i] *= filter_scale;
1035 
1036     for (i = in_last_pixel - in_first_pixel; i >= 0; i--)
1037     {
1038         if (coefficient_group[i])
1039             break;
1040 
1041         // This line has no weight. We can skip it.
1042         contributor.n1 = contributor.n0 + i - 1;
1043     }
1044 }
1045 
1046 static void stbir__calculate_coefficients_downsample(stbir_filter filter, float scale_ratio, int out_first_pixel, int out_last_pixel, float out_center_of_in, stbir__contributors* contributor, float* coefficient_group)
1047 {
1048     int i;
1049 
1050      assert(out_last_pixel - out_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale_ratio) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
1051 
1052     contributor.n0 = out_first_pixel;
1053     contributor.n1 = out_last_pixel;
1054 
1055     assert(contributor.n1 >= contributor.n0);
1056 
1057     for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
1058     {
1059         float out_pixel_center = cast(float)(i + out_first_pixel) + 0.5f;
1060         float x = out_pixel_center - out_center_of_in;
1061         coefficient_group[i] = stbir__filter_info_table[filter].kernel(x, scale_ratio) * scale_ratio;
1062     }
1063 
1064     assert(stbir__filter_info_table[filter].kernel(cast(float)(out_last_pixel + 1) + 0.5f - out_center_of_in, scale_ratio) == 0);
1065 
1066     for (i = out_last_pixel - out_first_pixel; i >= 0; i--)
1067     {
1068         if (coefficient_group[i])
1069             break;
1070 
1071         // This line has no weight. We can skip it.
1072         contributor.n1 = contributor.n0 + i - 1;
1073     }
1074 }
1075 
1076 static void stbir__normalize_downsample_coefficients(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, int input_size, int output_size)
1077 {
1078     int num_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1079     int num_coefficients = stbir__get_coefficient_width(filter, scale_ratio);
1080     int i, j;
1081     int skip;
1082 
1083     for (i = 0; i < output_size; i++)
1084     {
1085         float scale;
1086         float total = 0;
1087 
1088         for (j = 0; j < num_contributors; j++)
1089         {
1090             if (i >= contributors[j].n0 && i <= contributors[j].n1)
1091             {
1092                 float coefficient = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0);
1093                 total += coefficient;
1094             }
1095             else if (i < contributors[j].n0)
1096                 break;
1097         }
1098 
1099         assert(total > 0.9f);
1100         assert(total < 1.1f);
1101 
1102         scale = 1 / total;
1103 
1104         for (j = 0; j < num_contributors; j++)
1105         {
1106             if (i >= contributors[j].n0 && i <= contributors[j].n1)
1107                 *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0) *= scale;
1108             else if (i < contributors[j].n0)
1109                 break;
1110         }
1111     }
1112 
1113     // Optimize: Skip zero coefficients and contributions outside of image bounds.
1114     // Do this after normalizing because normalization depends on the n0/n1 values.
1115     for (j = 0; j < num_contributors; j++)
1116     {
1117         int range, max, width;
1118 
1119         skip = 0;
1120         while (*stbir__get_coefficient(coefficients, filter, scale_ratio, j, skip) == 0)
1121             skip++;
1122 
1123         contributors[j].n0 += skip;
1124 
1125         while (contributors[j].n0 < 0)
1126         {
1127             contributors[j].n0++;
1128             skip++;
1129         }
1130 
1131         range = contributors[j].n1 - contributors[j].n0 + 1;
1132         max = stbir__min(num_coefficients, range);
1133 
1134         width = stbir__get_coefficient_width(filter, scale_ratio);
1135         for (i = 0; i < max; i++)
1136         {
1137             if (i + skip >= width)
1138                 break;
1139 
1140             *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i) = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i + skip);
1141         }
1142 
1143         continue;
1144     }
1145 
1146     // Using min to avoid writing into invalid pixels.
1147     for (i = 0; i < num_contributors; i++)
1148         contributors[i].n1 = stbir__min(contributors[i].n1, output_size - 1);
1149 }
1150 
1151 // Each scan line uses the same kernel values so we should calculate the kernel
1152 // values once and then we can use them for every scan line.
1153 static void stbir__calculate_filters(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, float shift, int input_size, int output_size)
1154 {
1155     int n;
1156     int total_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1157 
1158     if (stbir__use_upsampling(scale_ratio))
1159     {
1160         float out_pixels_radius = stbir__filter_info_table[filter].support(1 / scale_ratio) * scale_ratio;
1161 
1162         // Looping through out pixels
1163         for (n = 0; n < total_contributors; n++)
1164         {
1165             float in_center_of_out; // Center of the current out pixel in the in pixel space
1166             int in_first_pixel, in_last_pixel;
1167 
1168             stbir__calculate_sample_range_upsample(n, out_pixels_radius, scale_ratio, shift, &in_first_pixel, &in_last_pixel, &in_center_of_out);
1169 
1170             stbir__calculate_coefficients_upsample(filter, scale_ratio, in_first_pixel, in_last_pixel, in_center_of_out, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1171         }
1172     }
1173     else
1174     {
1175         float in_pixels_radius = stbir__filter_info_table[filter].support(scale_ratio) / scale_ratio;
1176 
1177         // Looping through in pixels
1178         for (n = 0; n < total_contributors; n++)
1179         {
1180             float out_center_of_in; // Center of the current out pixel in the in pixel space
1181             int out_first_pixel, out_last_pixel;
1182             int n_adjusted = n - stbir__get_filter_pixel_margin(filter, scale_ratio);
1183 
1184             stbir__calculate_sample_range_downsample(n_adjusted, in_pixels_radius, scale_ratio, shift, &out_first_pixel, &out_last_pixel, &out_center_of_in);
1185 
1186             stbir__calculate_coefficients_downsample(filter, scale_ratio, out_first_pixel, out_last_pixel, out_center_of_in, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1187         }
1188 
1189         stbir__normalize_downsample_coefficients(contributors, coefficients, filter, scale_ratio, input_size, output_size);
1190     }
1191 }
1192 
1193 static float* stbir__get_decode_buffer(stbir__info* stbir_info)
1194 {
1195     // The 0 index of the decode buffer starts after the margin. This makes
1196     // it okay to use negative indexes on the decode buffer.
1197     return &stbir_info.decode_buffer[stbir_info.horizontal_filter_pixel_margin * stbir_info.channels];
1198 }
1199 
1200 int STBIR__DECODE(int type, int colorspace)
1201 {
1202     return type * STBIR_MAX_COLORSPACES + colorspace;
1203 }
1204 
1205 static void stbir__decode_scanline(stbir__info* stbir_info, int n)
1206 {
1207     int c;
1208     int channels = stbir_info.channels;
1209     int alpha_channel = stbir_info.alpha_channel;
1210     int type = stbir_info.type;
1211     int colorspace = stbir_info.colorspace;
1212     int input_w = stbir_info.input_w;
1213     size_t input_stride_bytes = stbir_info.input_stride_bytes;
1214     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1215     stbir_edge edge_horizontal = stbir_info.edge_horizontal;
1216     stbir_edge edge_vertical = stbir_info.edge_vertical;
1217     size_t in_buffer_row_offset = stbir__edge_wrap(edge_vertical, n, stbir_info.input_h) * input_stride_bytes;
1218     const void* input_data = cast(char *) stbir_info.input_data + in_buffer_row_offset;
1219     int max_x = input_w + stbir_info.horizontal_filter_pixel_margin;
1220     int decode = STBIR__DECODE(type, colorspace);
1221 
1222     int x = -stbir_info.horizontal_filter_pixel_margin;
1223 
1224     // special handling for STBIR_EDGE_ZERO because it needs to return an item that doesn't appear in the input,
1225     // and we want to avoid paying overhead on every pixel if not STBIR_EDGE_ZERO
1226     if (edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info.input_h))
1227     {
1228         for (; x < max_x; x++)
1229             for (c = 0; c < channels; c++)
1230                 decode_buffer[x*channels + c] = 0;
1231         return;
1232     }
1233 
1234     switch (decode)
1235     {
1236     case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1237         for (; x < max_x; x++)
1238         {
1239             int decode_pixel_index = x * channels;
1240             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1241             for (c = 0; c < channels; c++)
1242                 decode_buffer[decode_pixel_index + c] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + c]) / stbir__max_uint8_as_float;
1243         }
1244         break;
1245 
1246     case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1247         if (channels == 4 && alpha_channel == 3 && !(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1248         {
1249             // This avoids one table lookup, but the table is the fastest way to onvet from sRGB to linear float
1250             for (; x < max_x; x++)
1251             {
1252                 int decode_pixel_index = x * channels;
1253                 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1254                 for (c = 0; c < 3; c++)
1255                     decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1256                 ubyte alpha = (cast(const(ubyte)*)input_data)[input_pixel_index + 3];
1257                 decode_buffer[decode_pixel_index + 3] = cast(float)(alpha * 0.00392156862f);
1258             }
1259         }
1260 
1261         for (; x < max_x; x++)
1262         {
1263             int decode_pixel_index = x * channels;
1264             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1265             for (c = 0; c < channels; c++)
1266                 decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1267 
1268             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1269                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint8_as_float;
1270         }
1271         break;
1272 
1273     case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1274     {
1275         if (channels == 1 && edge_horizontal == STBIR_EDGE_CLAMP)
1276         {
1277             for (; x < max_x; x++)
1278             {
1279                 int decode_pixel_index = x;
1280                 int input_pixel_index = stbir__edge_wrap(STBIR_EDGE_CLAMP, x, input_w) * channels;
1281                 ushort depth = (cast(const(ushort)*)input_data)[input_pixel_index];
1282                 decode_buffer[decode_pixel_index] = depth / stbir__max_uint16_as_float;
1283             }
1284         }
1285         else if (channels == 4 && edge_horizontal == STBIR_EDGE_CLAMP)
1286         {
1287             __m128i zero = _mm_setzero_si128();
1288             __m128 normalizingFactor = _mm_set1_ps(1 / 65535.0f);
1289 
1290             for (; x < max_x; x++)
1291             {
1292                 int decode_pixel_index = x * channels;
1293                 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1294 
1295                 // load four values at once
1296                 __m128i mmPixel = _mm_loadu_si64( (cast(const(ushort)*)input_data) + input_pixel_index );
1297                 mmPixel = _mm_unpacklo_epi16(mmPixel, zero); // convert to 32-bit
1298                 __m128 fPixel = _mm_cvtepi32_ps(mmPixel) * normalizingFactor;
1299                 _mm_storeu_ps(&decode_buffer[decode_pixel_index], fPixel);
1300             }
1301         }
1302         else
1303         {
1304             for (; x < max_x; x++)
1305             {
1306                 int decode_pixel_index = x * channels;
1307                 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1308                 for (c = 0; c < channels; c++)
1309                 {
1310                     ushort depth = (cast(const(ushort)*)input_data)[input_pixel_index + c];
1311                     decode_buffer[decode_pixel_index + c] = depth / stbir__max_uint16_as_float;
1312                 }
1313             }
1314         }
1315         break;
1316     }
1317 
1318     case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1319         for (; x < max_x; x++)
1320         {
1321             int decode_pixel_index = x * channels;
1322             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1323             for (c = 0; c < channels; c++)
1324                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float);
1325 
1326             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1327                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint16_as_float;
1328         }
1329         break;
1330 
1331     case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1332         for (; x < max_x; x++)
1333         {
1334             int decode_pixel_index = x * channels;
1335             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1336             for (c = 0; c < channels; c++)
1337                 decode_buffer[decode_pixel_index + c] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float);
1338         }
1339         break;
1340 
1341     case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1342         for (; x < max_x; x++)
1343         {
1344             int decode_pixel_index = x * channels;
1345             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1346             for (c = 0; c < channels; c++)
1347                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float));
1348 
1349             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1350                 decode_buffer[decode_pixel_index + alpha_channel] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint32_as_float);
1351         }
1352         break;
1353 
1354     case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1355         for (; x < max_x; x++)
1356         {
1357             int decode_pixel_index = x * channels;
1358             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1359             for (c = 0; c < channels; c++)
1360                 decode_buffer[decode_pixel_index + c] = (cast(const(float)*)input_data)[input_pixel_index + c];
1361         }
1362         break;
1363 
1364     case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1365         for (; x < max_x; x++)
1366         {
1367             int decode_pixel_index = x * channels;
1368             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1369             for (c = 0; c < channels; c++)
1370                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(const(float)*)input_data)[input_pixel_index + c]);
1371 
1372             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1373                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(const(float)*)input_data)[input_pixel_index + alpha_channel];
1374         }
1375 
1376         break;
1377 
1378     default:
1379         assert(!"Unknown type/colorspace/channels combination.");
1380         break;
1381     }
1382 
1383     if (!(stbir_info.flags & STBIR_FLAG_ALPHA_PREMULTIPLIED))
1384     {
1385         for (x = -stbir_info.horizontal_filter_pixel_margin; x < max_x; x++)
1386         {
1387             int decode_pixel_index = x * channels;
1388 
1389             // If the alpha value is 0 it will clobber the color values. Make sure it's not.
1390             float alpha = decode_buffer[decode_pixel_index + alpha_channel];
1391 
1392             version(STBIR_NO_ALPHA_EPSILON)
1393             {}
1394             else
1395             {
1396                 if (stbir_info.type != STBIR_TYPE_FLOAT) {
1397                     alpha += STBIR_ALPHA_EPSILON;
1398                     decode_buffer[decode_pixel_index + alpha_channel] = alpha;
1399                 }
1400             }
1401 
1402             for (c = 0; c < channels; c++)
1403             {
1404                 if (c == alpha_channel)
1405                     continue;
1406 
1407                 decode_buffer[decode_pixel_index + c] *= alpha;
1408             }
1409         }
1410     }
1411 
1412     if (edge_horizontal == STBIR_EDGE_ZERO)
1413     {
1414         for (x = -stbir_info.horizontal_filter_pixel_margin; x < 0; x++)
1415         {
1416             for (c = 0; c < channels; c++)
1417                 decode_buffer[x*channels + c] = 0;
1418         }
1419         for (x = input_w; x < max_x; x++)
1420         {
1421             for (c = 0; c < channels; c++)
1422                 decode_buffer[x*channels + c] = 0;
1423         }
1424     }
1425 }
1426 
1427 static float* stbir__get_ring_buffer_entry(float* ring_buffer, int index, int ring_buffer_length)
1428 {
1429     return &ring_buffer[index * ring_buffer_length];
1430 }
1431 
1432 static float* stbir__add_empty_ring_buffer_entry(stbir__info* stbir_info, int n)
1433 {
1434     int ring_buffer_index;
1435     float* ring_buffer;
1436 
1437     stbir_info.ring_buffer_last_scanline = n;
1438 
1439     if (stbir_info.ring_buffer_begin_index < 0)
1440     {
1441         ring_buffer_index = stbir_info.ring_buffer_begin_index = 0;
1442         stbir_info.ring_buffer_first_scanline = n;
1443     }
1444     else
1445     {
1446         ring_buffer_index = (stbir_info.ring_buffer_begin_index + (stbir_info.ring_buffer_last_scanline - stbir_info.ring_buffer_first_scanline)) % stbir_info.ring_buffer_num_entries;
1447         assert(ring_buffer_index != stbir_info.ring_buffer_begin_index);
1448     }
1449 
1450     ring_buffer = stbir__get_ring_buffer_entry(stbir_info.ring_buffer, ring_buffer_index, stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof));
1451     memset(ring_buffer, 0, stbir_info.ring_buffer_length_bytes);
1452 
1453     return ring_buffer;
1454 }
1455 
1456 
1457 static void stbir__resample_horizontal_upsample(stbir__info* stbir_info, float* output_buffer)
1458 {
1459     int x, k;
1460     int output_w = stbir_info.output_w;
1461     int channels = stbir_info.channels;
1462     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1463     stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1464     float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1465     int coefficient_width = stbir_info.horizontal_coefficient_width;
1466 
1467     for (x = 0; x < output_w; x++)
1468     {
1469         int n0 = horizontal_contributors[x].n0;
1470         int n1 = horizontal_contributors[x].n1;
1471 
1472         int out_pixel_index = x * channels;
1473         int coefficient_group = coefficient_width * x;
1474         int coefficient_counter = 0;
1475 
1476         assert(n1 >= n0);
1477         assert(n0 >= -stbir_info.horizontal_filter_pixel_margin);
1478         assert(n1 >= -stbir_info.horizontal_filter_pixel_margin);
1479         assert(n0 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1480         assert(n1 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1481 
1482         switch (channels) {
1483             case 1:
1484                 for (k = n0; k <= n1; k++)
1485                 {
1486                     int in_pixel_index = k * 1;
1487                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1488                     //assert(coefficient != 0);
1489                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1490                 }
1491                 break;
1492             case 2:
1493                 for (k = n0; k <= n1; k++)
1494                 {
1495                     int in_pixel_index = k * 2;
1496                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1497                     //assert(coefficient != 0);
1498                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1499                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1500                 }
1501                 break;
1502             case 3:
1503                 for (k = n0; k <= n1; k++)
1504                 {
1505                     int in_pixel_index = k * 3;
1506                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1507                     //assert(coefficient != 0);
1508                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1509                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1510                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1511                 }
1512                 break;
1513             case 4:
1514                 for (k = n0; k <= n1; k++)
1515                 {
1516                     int in_pixel_index = k * 4;
1517                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1518                     //assert(coefficient != 0);
1519                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1520                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1521                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1522                     output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1523                 }
1524                 break;
1525             default:
1526                 for (k = n0; k <= n1; k++)
1527                 {
1528                     int in_pixel_index = k * channels;
1529                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1530                     int c;
1531                     //assert(coefficient != 0);
1532                     for (c = 0; c < channels; c++)
1533                         output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1534                 }
1535                 break;
1536         }
1537     }
1538 }
1539 
1540 static void stbir__resample_horizontal_downsample(stbir__info* stbir_info, float* output_buffer)
1541 {
1542     int x, k;
1543     int input_w = stbir_info.input_w;
1544     int channels = stbir_info.channels;
1545     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1546     stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1547     float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1548     int coefficient_width = stbir_info.horizontal_coefficient_width;
1549     int filter_pixel_margin = stbir_info.horizontal_filter_pixel_margin;
1550     int max_x = input_w + filter_pixel_margin * 2;
1551 
1552     assert(!stbir__use_width_upsampling(stbir_info));
1553 
1554     switch (channels) {
1555         case 1:
1556             for (x = 0; x < max_x; x++)
1557             {
1558                 int n0 = horizontal_contributors[x].n0;
1559                 int n1 = horizontal_contributors[x].n1;
1560 
1561                 int in_x = x - filter_pixel_margin;
1562                 int in_pixel_index = in_x * 1;
1563                 int max_n = n1;
1564                 int coefficient_group = coefficient_width * x;
1565 
1566                 for (k = n0; k <= max_n; k++)
1567                 {
1568                     int out_pixel_index = k * 1;
1569                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1570                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1571                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1572                 }
1573             }
1574             break;
1575 
1576         case 2:
1577             for (x = 0; x < max_x; x++)
1578             {
1579                 int n0 = horizontal_contributors[x].n0;
1580                 int n1 = horizontal_contributors[x].n1;
1581 
1582                 int in_x = x - filter_pixel_margin;
1583                 int in_pixel_index = in_x * 2;
1584                 int max_n = n1;
1585                 int coefficient_group = coefficient_width * x;
1586 
1587                 for (k = n0; k <= max_n; k++)
1588                 {
1589                     int out_pixel_index = k * 2;
1590                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1591                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1592                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1593                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1594                 }
1595             }
1596             break;
1597 
1598         case 3:
1599             for (x = 0; x < max_x; x++)
1600             {
1601                 int n0 = horizontal_contributors[x].n0;
1602                 int n1 = horizontal_contributors[x].n1;
1603 
1604                 int in_x = x - filter_pixel_margin;
1605                 int in_pixel_index = in_x * 3;
1606                 int max_n = n1;
1607                 int coefficient_group = coefficient_width * x;
1608 
1609                 for (k = n0; k <= max_n; k++)
1610                 {
1611                     int out_pixel_index = k * 3;
1612                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1613                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1614                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1615                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1616                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1617                 }
1618             }
1619             break;
1620 
1621         case 4:
1622             for (x = 0; x < max_x; x++)
1623             {
1624                 int n0 = horizontal_contributors[x].n0;
1625                 int n1 = horizontal_contributors[x].n1;
1626 
1627                 int in_x = x - filter_pixel_margin;
1628                 int in_pixel_index = in_x * 4;
1629                 int max_n = n1;
1630                 int coefficient_group = coefficient_width * x;
1631 
1632                 for (k = n0; k <= max_n; k++)
1633                 {
1634                     int out_pixel_index = k * 4;
1635                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1636                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1637 
1638                     version(DigitalMars)
1639                     {
1640                         output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1641                         output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1642                         output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1643                         output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1644                     }
1645                     else
1646                     {
1647                         __m128 A = _mm_loadu_ps(&decode_buffer[in_pixel_index]);
1648                         __m128 B = _mm_loadu_ps(&output_buffer[out_pixel_index]);
1649                         B = B + A * _mm_set1_ps(coefficient);
1650                         _mm_storeu_ps(&output_buffer[out_pixel_index], B);
1651                     }
1652                 }
1653             }
1654             break;
1655 
1656         default:
1657             for (x = 0; x < max_x; x++)
1658             {
1659                 int n0 = horizontal_contributors[x].n0;
1660                 int n1 = horizontal_contributors[x].n1;
1661 
1662                 int in_x = x - filter_pixel_margin;
1663                 int in_pixel_index = in_x * channels;
1664                 int max_n = n1;
1665                 int coefficient_group = coefficient_width * x;
1666 
1667                 for (k = n0; k <= max_n; k++)
1668                 {
1669                     int c;
1670                     int out_pixel_index = k * channels;
1671                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1672                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1673                     for (c = 0; c < channels; c++)
1674                         output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1675                 }
1676             }
1677             break;
1678     }
1679 }
1680 
1681 static void stbir__decode_and_resample_upsample(stbir__info* stbir_info, int n)
1682 {
1683     // Decode the nth scanline from the source image into the decode buffer.
1684     stbir__decode_scanline(stbir_info, n);
1685 
1686     // Now resample it into the ring buffer.
1687     if (stbir__use_width_upsampling(stbir_info))
1688         stbir__resample_horizontal_upsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1689     else
1690         stbir__resample_horizontal_downsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1691 
1692     // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
1693 }
1694 
1695 static void stbir__decode_and_resample_downsample(stbir__info* stbir_info, int n)
1696 {
1697     // Decode the nth scanline from the source image into the decode buffer.
1698     stbir__decode_scanline(stbir_info, n);
1699 
1700     memset(stbir_info.horizontal_buffer, 0, stbir_info.output_w * stbir_info.channels * float.sizeof);
1701 
1702     // Now resample it into the horizontal buffer.
1703     if (stbir__use_width_upsampling(stbir_info))
1704         stbir__resample_horizontal_upsample(stbir_info, stbir_info.horizontal_buffer);
1705     else
1706         stbir__resample_horizontal_downsample(stbir_info, stbir_info.horizontal_buffer);
1707 
1708     // Now it's sitting in the horizontal buffer ready to be distributed into the ring buffers.
1709 }
1710 
1711 // Get the specified scan line from the ring buffer.
1712 static float* stbir__get_ring_buffer_scanline(int get_scanline, float* ring_buffer, int begin_index, int first_scanline, int ring_buffer_num_entries, int ring_buffer_length)
1713 {
1714     int ring_buffer_index = (begin_index + (get_scanline - first_scanline)) % ring_buffer_num_entries;
1715     return stbir__get_ring_buffer_entry(ring_buffer, ring_buffer_index, ring_buffer_length);
1716 }
1717 
1718 
1719 static void stbir__encode_scanline(stbir__info* stbir_info, int num_pixels, void *output_buffer, float *encode_buffer, int channels, int alpha_channel, int decode)
1720 {
1721     int x;
1722     int n;
1723     int num_nonalpha;
1724     ushort[STBIR_MAX_CHANNELS] nonalpha;
1725 
1726     if (!(stbir_info.flags&STBIR_FLAG_ALPHA_PREMULTIPLIED))
1727     {
1728         for (x=0; x < num_pixels; ++x)
1729         {
1730             int pixel_index = x*channels;
1731 
1732             float alpha = encode_buffer[pixel_index + alpha_channel];
1733             float reciprocal_alpha = alpha ? 1.0f / alpha : 0;
1734 
1735             // unrolling this produced a 1% slowdown upscaling a large RGBA linear-space image on my machine - stb
1736             for (n = 0; n < channels; n++)
1737                 if (n != alpha_channel)
1738                     encode_buffer[pixel_index + n] *= reciprocal_alpha;
1739 
1740             // We added in a small epsilon to prevent the color channel from being deleted with zero alpha.
1741             // Because we only add it for integer types, it will automatically be discarded on integer
1742             // conversion, so we don't need to subtract it back out (which would be problematic for
1743             // numeric precision reasons).
1744         }
1745     }
1746 
1747     // build a table of all channels that need colorspace correction, so
1748     // we don't perform colorspace correction on channels that don't need it.
1749     for (x = 0, num_nonalpha = 0; x < channels; ++x)
1750     {
1751         if (x != alpha_channel || (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1752         {
1753             nonalpha[num_nonalpha++] = cast(ushort)x;
1754         }
1755     }
1756 
1757     static int STBIR__ROUND_INT_f(float f)
1758     {
1759         return cast(int)(f + 0.5f);
1760     }
1761     static int STBIR__ROUND_INT_d(double f)
1762     {
1763         return cast(int)(f + 0.5);
1764     }
1765     static int STBIR__ROUND_UINT_f(float f)
1766     {
1767         return cast(uint)(f + 0.5f);
1768     }
1769     static int STBIR__ROUND_UINT_d(double f)
1770     {
1771         return cast(uint)(f + 0.5);
1772     }
1773 
1774     static ubyte STBIR__ENCODE_LINEAR8(float f)
1775     {
1776         return cast(ubyte) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint8_as_float );
1777     }
1778 
1779     static ushort STBIR__ENCODE_LINEAR16(float f)
1780     {
1781         return cast(ushort) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint16_as_float );
1782     }
1783 
1784     switch (decode)
1785     {
1786         case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1787             for (x=0; x < num_pixels; ++x)
1788             {
1789                 int pixel_index = x*channels;
1790 
1791                 for (n = 0; n < channels; n++)
1792                 {
1793                     int index = pixel_index + n;
1794                     (cast(ubyte*)output_buffer)[index] = STBIR__ENCODE_LINEAR8(encode_buffer[index]);
1795                 }
1796             }
1797             break;
1798 
1799         case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1800         {
1801             // Special case because of how slow it is in normal stb_image_resize.
1802             if (channels == 4 && alpha_channel == -1 && (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1803             {
1804                 for (x = 0; x < num_pixels; ++x)
1805                 {
1806                     __m128i zero = _mm_setzero_si128();
1807 
1808                     __m128 fpixels = _mm_loadu_ps( &encode_buffer[4*x] );
1809                     __m128i fpixels_desrgb = stbir__linear_to_srgb_uchar(fpixels);
1810                     _mm_storeu_si32( (cast(ubyte*)output_buffer) + 4*x, fpixels_desrgb);
1811                 }
1812             }
1813             else
1814             {
1815                 for (x = 0; x < num_pixels; ++x)
1816                 {
1817                     int pixel_index = x*channels;
1818 
1819                     for (n = 0; n < num_nonalpha; n++)
1820                     {
1821                         int index = pixel_index + nonalpha[n];
1822                         (cast(ubyte*)output_buffer)[index] = stbir__linear_to_srgb_uchar(encode_buffer[index]);
1823                     }
1824 
1825                     if (!(stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1826                         (cast(ubyte*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR8(encode_buffer[pixel_index+alpha_channel]);
1827                 }
1828             }
1829             break;
1830         }
1831 
1832         case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1833             for (x=0; x < num_pixels; ++x)
1834             {
1835                 int pixel_index = x*channels;
1836 
1837                 for (n = 0; n < channels; n++)
1838                 {
1839                     int index = pixel_index + n;
1840                     (cast(ushort*)output_buffer)[index] = STBIR__ENCODE_LINEAR16(encode_buffer[index]);
1841                 }
1842             }
1843             break;
1844 
1845         case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1846             for (x=0; x < num_pixels; ++x)
1847             {
1848                 int pixel_index = x*channels;
1849 
1850                 for (n = 0; n < num_nonalpha; n++)
1851                 {
1852                     int index = pixel_index + nonalpha[n];
1853                     (cast(ushort*)output_buffer)[index] = cast(ushort)STBIR__ROUND_INT_f(stbir__linear_to_srgb(stbir__saturate(encode_buffer[index])) * stbir__max_uint16_as_float);
1854                 }
1855 
1856                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1857                     (cast(ushort*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR16(encode_buffer[pixel_index + alpha_channel]);
1858             }
1859 
1860             break;
1861 
1862         case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1863             for (x=0; x < num_pixels; ++x)
1864             {
1865                 int pixel_index = x*channels;
1866 
1867                 for (n = 0; n < channels; n++)
1868                 {
1869                     int index = pixel_index + n;
1870                     (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__saturate(encode_buffer[index])) * stbir__max_uint32_as_float);
1871                 }
1872             }
1873             break;
1874 
1875         case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1876             for (x=0; x < num_pixels; ++x)
1877             {
1878                 int pixel_index = x*channels;
1879 
1880                 for (n = 0; n < num_nonalpha; n++)
1881                 {
1882                     int index = pixel_index + nonalpha[n];
1883                     (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__linear_to_srgb(stbir__saturate(encode_buffer[index]))) * stbir__max_uint32_as_float);
1884                 }
1885 
1886                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1887                     (cast(uint*)output_buffer)[pixel_index + alpha_channel] = cast(uint) STBIR__ROUND_INT_d((cast(double)stbir__saturate(encode_buffer[pixel_index + alpha_channel])) * stbir__max_uint32_as_float);
1888             }
1889             break;
1890 
1891         case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1892             for (x=0; x < num_pixels; ++x)
1893             {
1894                 int pixel_index = x*channels;
1895 
1896                 for (n = 0; n < channels; n++)
1897                 {
1898                     int index = pixel_index + n;
1899                     (cast(float*)output_buffer)[index] = encode_buffer[index];
1900                 }
1901             }
1902             break;
1903 
1904         case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1905             for (x=0; x < num_pixels; ++x)
1906             {
1907                 int pixel_index = x*channels;
1908 
1909                 for (n = 0; n < num_nonalpha; n++)
1910                 {
1911                     int index = pixel_index + nonalpha[n];
1912                     (cast(float*)output_buffer)[index] = stbir__linear_to_srgb(encode_buffer[index]);
1913                 }
1914 
1915                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1916                     (cast(float*)output_buffer)[pixel_index + alpha_channel] = encode_buffer[pixel_index + alpha_channel];
1917             }
1918             break;
1919 
1920         default:
1921             assert(!"Unknown type/colorspace/channels combination.");
1922             break;
1923     }
1924 }
1925 
1926 static void stbir__resample_vertical_upsample(stbir__info* stbir_info, int n)
1927 {
1928     int x, k;
1929     int output_w = stbir_info.output_w;
1930     stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
1931     float* vertical_coefficients = stbir_info.vertical_coefficients;
1932     int channels = stbir_info.channels;
1933     int alpha_channel = stbir_info.alpha_channel;
1934     int type = stbir_info.type;
1935     int colorspace = stbir_info.colorspace;
1936     int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
1937     void* output_data = stbir_info.output_data;
1938     float* encode_buffer = stbir_info.encode_buffer;
1939     int decode = STBIR__DECODE(type, colorspace);
1940     int coefficient_width = stbir_info.vertical_coefficient_width;
1941     int coefficient_counter;
1942     int contributor = n;
1943 
1944     float* ring_buffer = stbir_info.ring_buffer;
1945     int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
1946     int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
1947     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
1948 
1949     int n0,n1, output_row_start;
1950     int coefficient_group = coefficient_width * contributor;
1951 
1952     n0 = vertical_contributors[contributor].n0;
1953     n1 = vertical_contributors[contributor].n1;
1954 
1955     output_row_start = n * stbir_info.output_stride_bytes;
1956 
1957     assert(stbir__use_height_upsampling(stbir_info));
1958 
1959     memset(encode_buffer, 0, output_w * float.sizeof * channels);
1960 
1961     // I tried reblocking this for better cache usage of encode_buffer
1962     // (using x_outer, k, x_inner), but it lost speed. -- stb
1963 
1964     coefficient_counter = 0;
1965     switch (channels) {
1966         case 1:
1967             for (k = n0; k <= n1; k++)
1968             {
1969                 int coefficient_index = coefficient_counter++;
1970                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1971                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1972                 for (x = 0; x < output_w; ++x)
1973                 {
1974                     int in_pixel_index = x * 1;
1975                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1976                 }
1977             }
1978             break;
1979         case 2:
1980             for (k = n0; k <= n1; k++)
1981             {
1982                 int coefficient_index = coefficient_counter++;
1983                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1984                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1985                 for (x = 0; x < output_w; ++x)
1986                 {
1987                     int in_pixel_index = x * 2;
1988                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1989                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1990                 }
1991             }
1992             break;
1993         case 3:
1994             for (k = n0; k <= n1; k++)
1995             {
1996                 int coefficient_index = coefficient_counter++;
1997                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1998                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1999                 for (x = 0; x < output_w; ++x)
2000                 {
2001                     int in_pixel_index = x * 3;
2002                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
2003                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
2004                     encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
2005                 }
2006             }
2007             break;
2008         case 4:
2009             for (k = n0; k <= n1; k++)
2010             {
2011                 int coefficient_index = coefficient_counter++;
2012                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2013                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2014                 for (x = 0; x < output_w; ++x)
2015                 {
2016                     int in_pixel_index = x * 4;
2017                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
2018                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
2019                     encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
2020                     encode_buffer[in_pixel_index + 3] += ring_buffer_entry[in_pixel_index + 3] * coefficient;
2021                 }
2022             }
2023             break;
2024         default:
2025             for (k = n0; k <= n1; k++)
2026             {
2027                 int coefficient_index = coefficient_counter++;
2028                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2029                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2030                 for (x = 0; x < output_w; ++x)
2031                 {
2032                     int in_pixel_index = x * channels;
2033                     int c;
2034                     for (c = 0; c < channels; c++)
2035                         encode_buffer[in_pixel_index + c] += ring_buffer_entry[in_pixel_index + c] * coefficient;
2036                 }
2037             }
2038             break;
2039     }
2040     stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, encode_buffer, channels, alpha_channel, decode);
2041 }
2042 
2043 static void stbir__resample_vertical_downsample(stbir__info* stbir_info, int n)
2044 {
2045     int x, k;
2046     int output_w = stbir_info.output_w;
2047     stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
2048     float* vertical_coefficients = stbir_info.vertical_coefficients;
2049     int channels = stbir_info.channels;
2050     int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
2051     float* horizontal_buffer = stbir_info.horizontal_buffer;
2052     int coefficient_width = stbir_info.vertical_coefficient_width;
2053     int contributor = n + stbir_info.vertical_filter_pixel_margin;
2054 
2055     float* ring_buffer = stbir_info.ring_buffer;
2056     int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
2057     int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
2058     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2059     int n0,n1;
2060 
2061     n0 = vertical_contributors[contributor].n0;
2062     n1 = vertical_contributors[contributor].n1;
2063 
2064     assert(!stbir__use_height_upsampling(stbir_info));
2065 
2066     for (k = n0; k <= n1; k++)
2067     {
2068         int coefficient_index = k - n0;
2069         int coefficient_group = coefficient_width * contributor;
2070         float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2071 
2072         float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2073 
2074         switch (channels) {
2075             case 1:
2076                 for (x = 0; x < output_w; x++)
2077                 {
2078                     int in_pixel_index = x * 1;
2079                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2080                 }
2081                 break;
2082             case 2:
2083                 for (x = 0; x < output_w; x++)
2084                 {
2085                     int in_pixel_index = x * 2;
2086                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2087                     ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
2088                 }
2089                 break;
2090             case 3:
2091                 for (x = 0; x < output_w; x++)
2092                 {
2093                     int in_pixel_index = x * 3;
2094                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2095                     ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
2096                     ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
2097                 }
2098                 break;
2099             case 4:
2100 
2101                 __m128 vCoefficients = _mm_set1_ps(coefficient);
2102 
2103                 for (x = 0; x < output_w; x++)
2104                 {
2105                     int in_pixel_index = x * 4;
2106                     __m128 A = _mm_loadu_ps(&horizontal_buffer[in_pixel_index]);
2107                     __m128 B = _mm_loadu_ps(&ring_buffer_entry[in_pixel_index]);
2108                     _mm_storeu_ps( &ring_buffer_entry[in_pixel_index], B + A * vCoefficients);
2109                 }
2110                 break;
2111             default:
2112                 for (x = 0; x < output_w; x++)
2113                 {
2114                     int in_pixel_index = x * channels;
2115 
2116                     int c;
2117                     for (c = 0; c < channels; c++)
2118                         ring_buffer_entry[in_pixel_index + c] += horizontal_buffer[in_pixel_index + c] * coefficient;
2119                 }
2120                 break;
2121         }
2122     }
2123 }
2124 
2125 static void stbir__buffer_loop_upsample(stbir__info* stbir_info)
2126 {
2127     int y;
2128     float scale_ratio = stbir_info.vertical_scale;
2129     float out_scanlines_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(1/scale_ratio) * scale_ratio;
2130 
2131     assert(stbir__use_height_upsampling(stbir_info));
2132 
2133     for (y = 0; y < stbir_info.output_h; y++)
2134     {
2135         float in_center_of_out = 0; // Center of the current out scanline in the in scanline space
2136         int in_first_scanline = 0, in_last_scanline = 0;
2137 
2138         stbir__calculate_sample_range_upsample(y, out_scanlines_radius, scale_ratio, stbir_info.vertical_shift, &in_first_scanline, &in_last_scanline, &in_center_of_out);
2139 
2140         assert(in_last_scanline - in_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2141 
2142         if (stbir_info.ring_buffer_begin_index >= 0)
2143         {
2144             // Get rid of whatever we don't need anymore.
2145             while (in_first_scanline > stbir_info.ring_buffer_first_scanline)
2146             {
2147                 if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2148                 {
2149                     // We just popped the last scanline off the ring buffer.
2150                     // Reset it to the empty state.
2151                     stbir_info.ring_buffer_begin_index = -1;
2152                     stbir_info.ring_buffer_first_scanline = 0;
2153                     stbir_info.ring_buffer_last_scanline = 0;
2154                     break;
2155                 }
2156                 else
2157                 {
2158                     stbir_info.ring_buffer_first_scanline++;
2159                     stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2160                 }
2161             }
2162         }
2163 
2164         // Load in new ones.
2165         if (stbir_info.ring_buffer_begin_index < 0)
2166             stbir__decode_and_resample_upsample(stbir_info, in_first_scanline);
2167 
2168         while (in_last_scanline > stbir_info.ring_buffer_last_scanline)
2169             stbir__decode_and_resample_upsample(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2170 
2171         // Now all buffers should be ready to write a row of vertical sampling.
2172         stbir__resample_vertical_upsample(stbir_info, y);
2173     }
2174 }
2175 
2176 static void stbir__empty_ring_buffer(stbir__info* stbir_info, int first_necessary_scanline)
2177 {
2178     int output_stride_bytes = stbir_info.output_stride_bytes;
2179     int channels = stbir_info.channels;
2180     int alpha_channel = stbir_info.alpha_channel;
2181     int type = stbir_info.type;
2182     int colorspace = stbir_info.colorspace;
2183     int output_w = stbir_info.output_w;
2184     void* output_data = stbir_info.output_data;
2185     int decode = STBIR__DECODE(type, colorspace);
2186 
2187     float* ring_buffer = stbir_info.ring_buffer;
2188     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2189 
2190     if (stbir_info.ring_buffer_begin_index >= 0)
2191     {
2192         // Get rid of whatever we don't need anymore.
2193         while (first_necessary_scanline > stbir_info.ring_buffer_first_scanline)
2194         {
2195             if (stbir_info.ring_buffer_first_scanline >= 0 && stbir_info.ring_buffer_first_scanline < stbir_info.output_h)
2196             {
2197                 int output_row_start = stbir_info.ring_buffer_first_scanline * output_stride_bytes;
2198                 float* ring_buffer_entry = stbir__get_ring_buffer_entry(ring_buffer, stbir_info.ring_buffer_begin_index, ring_buffer_length);
2199                 stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, ring_buffer_entry, channels, alpha_channel, decode);
2200             }
2201 
2202             if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2203             {
2204                 // We just popped the last scanline off the ring buffer.
2205                 // Reset it to the empty state.
2206                 stbir_info.ring_buffer_begin_index = -1;
2207                 stbir_info.ring_buffer_first_scanline = 0;
2208                 stbir_info.ring_buffer_last_scanline = 0;
2209                 break;
2210             }
2211             else
2212             {
2213                 stbir_info.ring_buffer_first_scanline++;
2214                 stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2215             }
2216         }
2217     }
2218 }
2219 
2220 static void stbir__buffer_loop_downsample(stbir__info* stbir_info)
2221 {
2222     int y;
2223     float scale_ratio = stbir_info.vertical_scale;
2224     int output_h = stbir_info.output_h;
2225     float in_pixels_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(scale_ratio) / scale_ratio;
2226     int pixel_margin = stbir_info.vertical_filter_pixel_margin;
2227     int max_y = stbir_info.input_h + pixel_margin;
2228 
2229     assert(!stbir__use_height_upsampling(stbir_info));
2230 
2231     for (y = -pixel_margin; y < max_y; y++)
2232     {
2233         float out_center_of_in; // Center of the current out scanline in the in scanline space
2234         int out_first_scanline, out_last_scanline;
2235 
2236         stbir__calculate_sample_range_downsample(y, in_pixels_radius, scale_ratio, stbir_info.vertical_shift, &out_first_scanline, &out_last_scanline, &out_center_of_in);
2237 
2238         assert(out_last_scanline - out_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2239 
2240         if (out_last_scanline < 0 || out_first_scanline >= output_h)
2241             continue;
2242 
2243         stbir__empty_ring_buffer(stbir_info, out_first_scanline);
2244 
2245         stbir__decode_and_resample_downsample(stbir_info, y);
2246 
2247         // Load in new ones.
2248         if (stbir_info.ring_buffer_begin_index < 0)
2249             stbir__add_empty_ring_buffer_entry(stbir_info, out_first_scanline);
2250 
2251         while (out_last_scanline > stbir_info.ring_buffer_last_scanline)
2252             stbir__add_empty_ring_buffer_entry(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2253 
2254         // Now the horizontal buffer is ready to write to all ring buffer rows.
2255         stbir__resample_vertical_downsample(stbir_info, y);
2256     }
2257 
2258     stbir__empty_ring_buffer(stbir_info, stbir_info.output_h);
2259 }
2260 
2261 static void stbir__setup(stbir__info *info, int input_w, int input_h, int output_w, int output_h, int channels)
2262 {
2263     info.input_w = input_w;
2264     info.input_h = input_h;
2265     info.output_w = output_w;
2266     info.output_h = output_h;
2267     info.channels = channels;
2268 }
2269 
2270 static void stbir__calculate_transform(stbir__info *info, float s0, float t0, float s1, float t1, float *transform)
2271 {
2272     info.s0 = s0;
2273     info.t0 = t0;
2274     info.s1 = s1;
2275     info.t1 = t1;
2276 
2277     if (transform)
2278     {
2279         info.horizontal_scale = transform[0];
2280         info.vertical_scale   = transform[1];
2281         info.horizontal_shift = transform[2];
2282         info.vertical_shift   = transform[3];
2283     }
2284     else
2285     {
2286         info.horizontal_scale = (cast(float)info.output_w / info.input_w) / (s1 - s0);
2287         info.vertical_scale = (cast(float)info.output_h / info.input_h) / (t1 - t0);
2288 
2289         info.horizontal_shift = s0 * info.output_w / (s1 - s0);
2290         info.vertical_shift = t0 * info.output_h / (t1 - t0);
2291     }
2292 }
2293 
2294 static void stbir__choose_filter(stbir__info *info, stbir_filter h_filter, stbir_filter v_filter)
2295 {
2296     if (h_filter == 0)
2297         h_filter = stbir__use_upsampling(info.horizontal_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2298     if (v_filter == 0)
2299         v_filter = stbir__use_upsampling(info.vertical_scale)   ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2300     info.horizontal_filter = h_filter;
2301     info.vertical_filter = v_filter;
2302 }
2303 
2304 static uint stbir__calculate_memory(stbir__info *info)
2305 {
2306     int pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2307     int filter_height = stbir__get_filter_pixel_width(info.vertical_filter, info.vertical_scale);
2308 
2309     info.horizontal_num_contributors = stbir__get_contributors(info.horizontal_scale, info.horizontal_filter, info.input_w, info.output_w);
2310     info.vertical_num_contributors   = stbir__get_contributors(info.vertical_scale  , info.vertical_filter  , info.input_h, info.output_h);
2311 
2312     // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
2313     info.ring_buffer_num_entries = filter_height + 1;
2314 
2315     info.horizontal_contributors_size = info.horizontal_num_contributors                  * cast(int)(stbir__contributors.sizeof);
2316     info.horizontal_coefficients_size = stbir__get_total_horizontal_coefficients(info)    * cast(int)(float.sizeof);
2317     info.vertical_contributors_size   = info.vertical_num_contributors                    * cast(int)(stbir__contributors.sizeof);
2318     info.vertical_coefficients_size   = stbir__get_total_vertical_coefficients(info)      * cast(int)(float.sizeof);
2319     info.decode_buffer_size           = (info.input_w + pixel_margin * 2) * info.channels * cast(int)(float.sizeof);
2320     info.horizontal_buffer_size       = info.output_w * info.channels                     * cast(int)(float.sizeof);
2321     info.ring_buffer_size             = info.output_w * info.channels                     * info.ring_buffer_num_entries * cast(int)(float.sizeof);
2322     info.encode_buffer_size           = info.output_w * info.channels                     * cast(int)(float.sizeof);
2323 
2324     assert(info.horizontal_filter != 0);
2325     assert(info.horizontal_filter < stbir__filter_info_table.length); // this now happens too late
2326     assert(info.vertical_filter != 0);
2327     assert(info.vertical_filter < stbir__filter_info_table.length); // this now happens too late
2328 
2329     if (stbir__use_height_upsampling(info))
2330         // The horizontal buffer is for when we're downsampling the height and we
2331         // can't output the result of sampling the decode buffer directly into the
2332         // ring buffers.
2333         info.horizontal_buffer_size = 0;
2334     else
2335         // The encode buffer is to retain precision in the height upsampling method
2336         // and isn't used when height downsampling.
2337         info.encode_buffer_size = 0;
2338 
2339     return info.horizontal_contributors_size + info.horizontal_coefficients_size
2340         + info.vertical_contributors_size + info.vertical_coefficients_size
2341         + info.decode_buffer_size + info.horizontal_buffer_size
2342         + info.ring_buffer_size + info.encode_buffer_size;
2343 }
2344 
2345 static int stbir__resize_allocated(stbir__info *info,
2346     const void* input_data, int input_stride_in_bytes,
2347     void* output_data, int output_stride_in_bytes,
2348     int alpha_channel, uint flags, stbir_datatype type,
2349     stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace,
2350     void* tempmem, size_t tempmem_size_in_bytes)
2351 {
2352     size_t memory_required = stbir__calculate_memory(info);
2353 
2354     int width_stride_input = input_stride_in_bytes ? input_stride_in_bytes : info.channels * info.input_w * stbir__type_size[type];
2355     int width_stride_output = output_stride_in_bytes ? output_stride_in_bytes : info.channels * info.output_w * stbir__type_size[type];
2356 
2357     assert(info.channels >= 0);
2358     assert(info.channels <= STBIR_MAX_CHANNELS);
2359 
2360     if (info.channels < 0 || info.channels > STBIR_MAX_CHANNELS)
2361         return 0;
2362 
2363     assert(info.horizontal_filter < stbir__filter_info_table.length);
2364     assert(info.vertical_filter < stbir__filter_info_table.length);
2365 
2366     if (info.horizontal_filter >= stbir__filter_info_table.length)
2367         return 0;
2368     if (info.vertical_filter >= stbir__filter_info_table.length)
2369         return 0;
2370 
2371     if (alpha_channel < 0)
2372         flags |= STBIR_FLAG_ALPHA_USES_COLORSPACE | STBIR_FLAG_ALPHA_PREMULTIPLIED;
2373 
2374     if (!(flags&STBIR_FLAG_ALPHA_USES_COLORSPACE) || !(flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)) {
2375         assert(alpha_channel >= 0 && alpha_channel < info.channels);
2376     }
2377 
2378     if (alpha_channel >= info.channels)
2379         return 0;
2380 
2381     assert(tempmem);
2382 
2383     if (!tempmem)
2384         return 0;
2385 
2386     assert(tempmem_size_in_bytes >= memory_required);
2387 
2388     if (tempmem_size_in_bytes < memory_required)
2389         return 0;
2390 
2391     memset(tempmem, 0, tempmem_size_in_bytes);
2392 
2393     info.input_data = input_data;
2394     info.input_stride_bytes = width_stride_input;
2395 
2396     info.output_data = output_data;
2397     info.output_stride_bytes = width_stride_output;
2398 
2399     info.alpha_channel = alpha_channel;
2400     info.flags = flags;
2401     info.type = type;
2402     info.edge_horizontal = edge_horizontal;
2403     info.edge_vertical = edge_vertical;
2404     info.colorspace = colorspace;
2405 
2406     info.horizontal_coefficient_width   = stbir__get_coefficient_width  (info.horizontal_filter, info.horizontal_scale);
2407     info.vertical_coefficient_width     = stbir__get_coefficient_width  (info.vertical_filter  , info.vertical_scale  );
2408     info.horizontal_filter_pixel_width  = stbir__get_filter_pixel_width (info.horizontal_filter, info.horizontal_scale);
2409     info.vertical_filter_pixel_width    = stbir__get_filter_pixel_width (info.vertical_filter  , info.vertical_scale  );
2410     info.horizontal_filter_pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2411     info.vertical_filter_pixel_margin   = stbir__get_filter_pixel_margin(info.vertical_filter  , info.vertical_scale  );
2412 
2413     info.ring_buffer_length_bytes = info.output_w * info.channels * cast(int)(float.sizeof);
2414     info.decode_buffer_pixels = info.input_w + info.horizontal_filter_pixel_margin * 2;
2415 
2416     static newtype* STBIR__NEXT_MEMPTR(newtype)(void* current, size_t current_size)
2417     {
2418         return cast(newtype*)( (cast(ubyte*)current) + current_size );
2419     }
2420 
2421     info.horizontal_contributors = cast(stbir__contributors *) tempmem;
2422     info.horizontal_coefficients = STBIR__NEXT_MEMPTR!float              (info.horizontal_contributors, info.horizontal_contributors_size);
2423     info.vertical_contributors   = STBIR__NEXT_MEMPTR!stbir__contributors(info.horizontal_coefficients, info.horizontal_coefficients_size);
2424     info.vertical_coefficients   = STBIR__NEXT_MEMPTR!float              (info.vertical_contributors,   info.vertical_contributors_size);
2425     info.decode_buffer           = STBIR__NEXT_MEMPTR!float              (info.vertical_coefficients,   info.vertical_coefficients_size);
2426 
2427     if (stbir__use_height_upsampling(info))
2428     {
2429         info.horizontal_buffer   = null;
2430         info.ring_buffer         = STBIR__NEXT_MEMPTR!float              (info.decode_buffer,           info.decode_buffer_size);
2431         info.encode_buffer       = STBIR__NEXT_MEMPTR!float              (info.ring_buffer,             info.ring_buffer_size);
2432 
2433         assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.encode_buffer, info.encode_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2434     }
2435     else
2436     {
2437         info.horizontal_buffer   = STBIR__NEXT_MEMPTR!float              (info.decode_buffer,           info.decode_buffer_size);
2438         info.ring_buffer         = STBIR__NEXT_MEMPTR!float              (info.horizontal_buffer,       info.horizontal_buffer_size);
2439         info.encode_buffer = null;
2440 
2441         assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.ring_buffer, info.ring_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2442     }
2443 
2444     // This signals that the ring buffer is empty
2445     info.ring_buffer_begin_index = -1;
2446 
2447     stbir__calculate_filters(info.horizontal_contributors, info.horizontal_coefficients, info.horizontal_filter, info.horizontal_scale, info.horizontal_shift, info.input_w, info.output_w);
2448     stbir__calculate_filters(info.vertical_contributors, info.vertical_coefficients, info.vertical_filter, info.vertical_scale, info.vertical_shift, info.input_h, info.output_h);
2449 
2450     if (stbir__use_height_upsampling(info))
2451         stbir__buffer_loop_upsample(info);
2452     else
2453         stbir__buffer_loop_downsample(info);
2454 
2455     return 1;
2456 }
2457 
2458 
2459 static int stbir__resize_arbitrary(
2460     void *alloc_context,
2461     const void* input_data, int input_w, int input_h, int input_stride_in_bytes,
2462     void* output_data, int output_w, int output_h, int output_stride_in_bytes,
2463     float s0, float t0, float s1, float t1, float *transform,
2464     int channels, int alpha_channel, uint flags, stbir_datatype type,
2465     stbir_filter h_filter, stbir_filter v_filter,
2466     stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace)
2467 {
2468     stbir__info info;
2469     int result;
2470     size_t memory_required;
2471     void* extra_memory;
2472 
2473     stbir__setup(&info, input_w, input_h, output_w, output_h, channels);
2474     stbir__calculate_transform(&info, s0,t0,s1,t1,transform);
2475     stbir__choose_filter(&info, h_filter, v_filter);
2476     memory_required = stbir__calculate_memory(&info);
2477     extra_memory = STBIR_MALLOC(memory_required, alloc_context);
2478 
2479     if (!extra_memory)
2480         return 0;
2481 
2482     result = stbir__resize_allocated(&info, input_data, input_stride_in_bytes,
2483                                             output_data, output_stride_in_bytes,
2484                                             alpha_channel, flags, type,
2485                                             edge_horizontal, edge_vertical,
2486                                             colorspace, extra_memory, memory_required);
2487 
2488     STBIR_FREE(extra_memory, alloc_context);
2489 
2490     return result;
2491 }
2492 
2493 
2494 
2495 int stbir_resize_uint8_srgb_edgemode(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2496                                                     ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2497                                               int num_channels, int alpha_channel, int flags,
2498                                               stbir_edge edge_wrap_mode)
2499 {
2500     return stbir__resize_arbitrary(null, input_pixels, input_w, input_h, input_stride_in_bytes,
2501         output_pixels, output_w, output_h, output_stride_in_bytes,
2502         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
2503         edge_wrap_mode, edge_wrap_mode, STBIR_COLORSPACE_SRGB);
2504 }
2505 
2506 int stbir_resize_uint8_generic( const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2507                                                ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2508                                          int num_channels, int alpha_channel, int flags,
2509                                          stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2510                                          void *alloc_context)
2511 {
2512     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2513         output_pixels, output_w, output_h, output_stride_in_bytes,
2514         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
2515         edge_wrap_mode, edge_wrap_mode, space);
2516 }
2517 
2518 int stbir_resize_uint16_generic(const ushort *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
2519                                                ushort *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
2520                                          int num_channels, int alpha_channel, int flags,
2521                                          stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2522                                          void *alloc_context)
2523 {
2524     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2525         output_pixels, output_w, output_h, output_stride_in_bytes,
2526         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT16, filter, filter,
2527         edge_wrap_mode, edge_wrap_mode, space);
2528 }
2529 
2530 
2531 int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2532                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2533                                    stbir_datatype datatype,
2534                                    int num_channels, int alpha_channel, int flags,
2535                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2536                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2537                                    stbir_colorspace space, void *alloc_context)
2538 {
2539     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2540         output_pixels, output_w, output_h, output_stride_in_bytes,
2541         0,0,1,1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2542         edge_mode_horizontal, edge_mode_vertical, space);
2543 }
2544 
2545 
2546 int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2547                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2548                                    stbir_datatype datatype,
2549                                    int num_channels, int alpha_channel, int flags,
2550                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2551                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2552                                    stbir_colorspace space, void *alloc_context,
2553                                    float x_scale, float y_scale,
2554                                    float x_offset, float y_offset)
2555 {
2556     float[4] transform;
2557     transform[0] = x_scale;
2558     transform[1] = y_scale;
2559     transform[2] = x_offset;
2560     transform[3] = y_offset;
2561     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2562         output_pixels, output_w, output_h, output_stride_in_bytes,
2563         0,0,1,1,transform.ptr,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2564         edge_mode_horizontal, edge_mode_vertical, space);
2565 }
2566 
2567 int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2568                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2569                                    stbir_datatype datatype,
2570                                    int num_channels, int alpha_channel, int flags,
2571                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2572                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2573                                    stbir_colorspace space, void *alloc_context,
2574                                    float s0, float t0, float s1, float t1)
2575 {
2576     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2577         output_pixels, output_w, output_h, output_stride_in_bytes,
2578         s0,t0,s1,t1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2579         edge_mode_horizontal, edge_mode_vertical, space);
2580 }
2581 
2582 /*
2583 ------------------------------------------------------------------------------
2584 This software is available under 2 licenses -- choose whichever you prefer.
2585 ------------------------------------------------------------------------------
2586 ALTERNATIVE A - MIT License
2587 Copyright (c) 2017 Sean Barrett
2588 Permission is hereby granted, free of charge, to any person obtaining a copy of
2589 this software and associated documentation files (the "Software"), to deal in
2590 the Software without restriction, including without limitation the rights to
2591 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
2592 of the Software, and to permit persons to whom the Software is furnished to do
2593 so, subject to the following conditions:
2594 The above copyright notice and this permission notice shall be included in all
2595 copies or substantial portions of the Software.
2596 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2597 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2598 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2599 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2600 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2601 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2602 SOFTWARE.
2603 ------------------------------------------------------------------------------
2604 ALTERNATIVE B - Public Domain (www.unlicense.org)
2605 This is free and unencumbered software released into the public domain.
2606 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
2607 software, either in source code form or as a compiled binary, for any purpose,
2608 commercial or non-commercial, and by any means.
2609 In jurisdictions that recognize copyright laws, the author or authors of this
2610 software dedicate any and all copyright interest in the software to the public
2611 domain. We make this dedication for the benefit of the public at large and to
2612 the detriment of our heirs and successors. We intend this dedication to be an
2613 overt act of relinquishment in perpetuity of all present and future rights to
2614 this software under copyright law.
2615 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2616 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2617 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2618 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
2619 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
2620 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2621 ------------------------------------------------------------------------------
2622 */
2623 
2624 }