1 /* stb_image_resize - v0.96 - public domain image resizing
2    by Jorge L Rodriguez (@VinoBS) - 2014
3    http://github.com/nothings/stb
4 
5    Written with emphasis on usability, portability, and efficiency. (No
6    SIMD or threads, so it be easily outperformed by libs that use those.)
7    Only scaling and translation is supported, no rotations or shears.
8    Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
9 
10    QUICKSTART
11       stbir_resize_uint8(      input_pixels , in_w , in_h , 0,
12                                output_pixels, out_w, out_h, 0, num_channels)
13 
14       stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
15                                output_pixels, out_w, out_h, 0,
16                                num_channels , alpha_chan  , 0)
17       stbir_resize_uint8_srgb_edgemode(
18                                input_pixels , in_w , in_h , 0,
19                                output_pixels, out_w, out_h, 0,
20                                num_channels , alpha_chan  , 0, STBIR_EDGE_CLAMP)
21                                                             // WRAP/REFLECT/ZERO
22 
23    FULL API
24       See the "header file" section of the source for API documentation.
25 
26    ADDITIONAL DOCUMENTATION
27 
28       SRGB & FLOATING POINT REPRESENTATION
29          The sRGB functions presume IEEE floating point. If you do not have
30          IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
31          a slower implementation.
32 
33       MEMORY ALLOCATION
34          The resize functions here perform a single memory allocation using
35          malloc. To control the memory allocation, before the #include that
36          triggers the implementation, do:
37 
38             #define STBIR_MALLOC(size,context) ...
39             #define STBIR_FREE(ptr,context)   ...
40 
41          Each resize function makes exactly one call to malloc/free, so to use
42          temp memory, store the temp memory in the context and return that.
43 
44       DEFAULT FILTERS
45          For functions which don't provide explicit control over what filters
46          to use, you can change the compile-time defaults with
47 
48             #define STBIR_DEFAULT_FILTER_UPSAMPLE     STBIR_FILTER_something
49             #define STBIR_DEFAULT_FILTER_DOWNSAMPLE   STBIR_FILTER_something
50 
51          See stbir_filter in the header-file section for the list of filters.
52 
53       NEW FILTERS
54          A number of 1D filter kernels are used. For a list of
55          supported filters see the stbir_filter enum. To add a new filter,
56          write a filter function and add it to stbir__filter_info_table.
57 
58       MAX CHANNELS
59          If your image has more than 64 channels, define STBIR_MAX_CHANNELS
60          to the max you'll have.
61 
62       ALPHA CHANNEL
63          Most of the resizing functions provide the ability to control how
64          the alpha channel of an image is processed. The important things
65          to know about this:
66 
67          1. The best mathematically-behaved version of alpha to use is
68          called "premultiplied alpha", in which the other color channels
69          have had the alpha value multiplied in. If you use premultiplied
70          alpha, linear filtering (such as image resampling done by this
71          library, or performed in texture units on GPUs) does the "right
72          thing". While premultiplied alpha is standard in the movie CGI
73          industry, it is still uncommon in the videogame/real-time world.
74 
75          If you linearly filter non-premultiplied alpha, strange effects
76          occur. (For example, the 50/50 average of 99% transparent bright green
77          and 1% transparent black produces 50% transparent dark green when
78          non-premultiplied, whereas premultiplied it produces 50%
79          transparent near-black. The former introduces green energy
80          that doesn't exist in the source image.)
81 
82          2. Artists should not edit premultiplied-alpha images; artists
83          want non-premultiplied alpha images. Thus, art tools generally output
84          non-premultiplied alpha images.
85 
86          3. You will get best results in most cases by converting images
87          to premultiplied alpha before processing them mathematically.
88 
89          4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
90          resizer does not do anything special for the alpha channel;
91          it is resampled identically to other channels. This produces
92          the correct results for premultiplied-alpha images, but produces
93          less-than-ideal results for non-premultiplied-alpha images.
94 
95          5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
96          then the resizer weights the contribution of input pixels
97          based on their alpha values, or, equivalently, it multiplies
98          the alpha value into the color channels, resamples, then divides
99          by the resultant alpha value. Input pixels which have alpha=0 do
100          not contribute at all to output pixels unless _all_ of the input
101          pixels affecting that output pixel have alpha=0, in which case
102          the result for that pixel is the same as it would be without
103          STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
104          input images in integer formats. For input images in float format,
105          input pixels with alpha=0 have no effect, and output pixels
106          which have alpha=0 will be 0 in all channels. (For float images,
107          you can manually achieve the same result by adding a tiny epsilon
108          value to the alpha channel of every image, and then subtracting
109          or clamping it at the end.)
110 
111          6. You can suppress the behavior described in #5 and make
112          all-0-alpha pixels have 0 in all channels by #defining
113          STBIR_NO_ALPHA_EPSILON.
114 
115          7. You can separately control whether the alpha channel is
116          interpreted as linear or affected by the colorspace. By default
117          it is linear; you almost never want to apply the colorspace.
118          (For example, graphics hardware does not apply sRGB conversion
119          to the alpha channel.)
120 
121    CONTRIBUTORS
122       Jorge L Rodriguez: Implementation
123       Sean Barrett: API design, optimizations
124       Aras Pranckevicius: bugfix
125       Nathan Reed: warning fixes
126 
127    REVISIONS
128       0.97 (2020-02-02) fixed warning
129       0.96 (2019-03-04) fixed warnings
130       0.95 (2017-07-23) fixed warnings
131       0.94 (2017-03-18) fixed warnings
132       0.93 (2017-03-03) fixed bug with certain combinations of heights
133       0.92 (2017-01-02) fix integer overflow on large (>2GB) images
134       0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
135       0.90 (2014-09-17) first released version
136 
137    LICENSE
138      See end of file for license information.
139 
140    TODO
141       Don't decode all of the image data when only processing a partial tile
142       Don't use full-width decode buffers when only processing a partial tile
143       When processing wide images, break processing into tiles so data fits in L1 cache
144       Installable filters?
145       Resize that respects alpha test coverage
146          (Reference code: FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage:
147          https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp )
148 */
149 /**
150 Resizer ported to D from C. Removed a few features that did'nt make sense in Dplug.
151 Added Ryhor Spivak work on Lanczos filter... also added a few more lanczos kernels.
152 Copyright: (c) Guillaume Piolat (2021)
153 */
154 module dplug.graphics.stb_image_resize;
155 
156 
157 import core.stdc.stdlib: malloc, free;
158 import core.stdc.string: memset;
159 
160 import inteli.emmintrin;
161 
162 import dplug.core.math : fast_fabs, fast_pow, fast_ceil, fast_floor, fast_sin;
163 import dplug.core.vec;
164 
165 
166 nothrow:
167 @nogc:
168 
169 
170 //////////////////////////////////////////////////////////////////////////////
171 //
172 // Easy-to-use API:
173 //
174 //     * "input pixels" points to an array of image data with 'num_channels' channels (e.g. RGB=3, RGBA=4)
175 //     * input_w is input image width (x-axis), input_h is input image height (y-axis)
176 //     * stride is the offset between successive rows of image data in memory, in bytes. you can
177 //       specify 0 to mean packed continuously in memory
178 //     * alpha channel is treated identically to other channels.
179 //     * colorspace is linear or sRGB as specified by function name
180 //     * returned result is 1 for success or 0 in case of an error.
181 //       #define assert() to trigger an assert on parameter validation errors.
182 //     * Memory required grows approximately linearly with input and output size, but with
183 //       discontinuities at input_w == output_w and input_h == output_h.
184 //     * These functions use a "default" resampling filter defined at compile time. To change the filter,
185 //       you can change the compile-time defaults by #defining STBIR_DEFAULT_FILTER_UPSAMPLE
186 //       and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or you can use the medium-complexity API.
187 
188 int stbir_resize_uint8(const(ubyte)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
189                        ubyte* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
190                        int num_channels, int filter, void *alloc_context)
191 {
192     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
193                                    output_pixels, output_w, output_h, output_stride_in_bytes,
194                                    0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT8, filter, filter,
195                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
196 }
197 
198 int stbir_resize_uint16(const(ushort)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
199                        ushort* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
200                        int num_channels, int filter, void *alloc_context)
201 {
202     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
203                                    output_pixels, output_w, output_h, output_stride_in_bytes,
204                                    0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT16, filter, filter,
205                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
206 }
207 
208 
209 // The following functions interpret image data as gamma-corrected sRGB.
210 // Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
211 // or otherwise provide the index of the alpha channel. Flags value
212 // of 0 will probably do the right thing if you're not sure what
213 // the flags mean.
214 
215 enum STBIR_ALPHA_CHANNEL_NONE      = -1;
216 
217 // Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
218 // use alpha-weighted resampling (effectively premultiplying, resampling,
219 // then unpremultiplying).
220 enum STBIR_FLAG_ALPHA_PREMULTIPLIED = (1 << 0);
221 
222 // The specified alpha channel should be handled as gamma-corrected value even
223 // when doing sRGB operations.
224 enum STBIR_FLAG_ALPHA_USES_COLORSPACE = (1 << 1);
225 
226 int stbir_resize_uint8_srgb(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
227                             ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
228                             int num_channels, int alpha_channel, int flags, void* alloc_context, int filter)
229 {
230     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
231                                    output_pixels, output_w, output_h, output_stride_in_bytes,
232                                    0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
233                                    STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB);
234 }
235 
236 alias stbir_edge = int;
237 enum : stbir_edge
238 {
239     STBIR_EDGE_CLAMP   = 1,
240     STBIR_EDGE_REFLECT = 2,
241     STBIR_EDGE_WRAP    = 3,
242     STBIR_EDGE_ZERO    = 4,
243 }
244 
245 
246 //////////////////////////////////////////////////////////////////////////////
247 //
248 // Medium-complexity API
249 //
250 // This extends the easy-to-use API as follows:
251 //
252 //     * Alpha-channel can be processed separately
253 //       * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
254 //         * Alpha channel will not be gamma corrected (unless flags&STBIR_FLAG_GAMMA_CORRECT)
255 //         * Filters will be weighted by alpha channel (unless flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
256 //     * Filter can be selected explicitly
257 //     * uint16 image type
258 //     * sRGB colorspace available for all types
259 //     * context parameter for passing to STBIR_MALLOC
260 
261 alias stbir_filter = int;
262 enum : stbir_filter
263 {
264     STBIR_FILTER_DEFAULT      = 0,  // use same filter type that easy-to-use API chooses
265     STBIR_FILTER_BOX          = 1,  // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
266     STBIR_FILTER_TRIANGLE     = 2,  // On upsampling, produces same results as bilinear texture filtering
267     STBIR_FILTER_CUBICBSPLINE = 3,  // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
268     STBIR_FILTER_CATMULLROM   = 4,  // An interpolating cubic spline
269     STBIR_FILTER_MITCHELL     = 5,  // Mitchell-Netrevalli filter with B=1/3, C=1/3
270     STBIR_FILTER_LANCZOS2     = 6,  // Lanczos 2
271     STBIR_FILTER_LANCZOS2_5   = 7,  // Lanczos 2.5
272     STBIR_FILTER_LANCZOS3     = 8,  // Lanczos 3
273     STBIR_FILTER_LANCZOS4     = 9,  // Lanczos 4
274     STBIR_FILTER_MK_2013      = 10, // Magic Kernel, without sharpening
275     STBIR_FILTER_MKS_2013_86  = 11, // Magic Kernel + Sharp 2013, but with only 86% sharpening (Dplug Issue #729)
276     STBIR_FILTER_MKS_2013     = 12, // Magic Kernel + Sharp 2013 (the one recommended by John Costella in 2013)
277     STBIR_FILTER_MKS_2021     = 13, // Magic Kernel + Sharp 2021 (the one recommended to us by John Costella in 2022)
278 
279     // To be continued, as John Costella has other kernels...
280 }
281 
282 alias stbir_colorspace = int;
283 enum : stbir_colorspace 
284 {
285     STBIR_COLORSPACE_LINEAR,
286     STBIR_COLORSPACE_SRGB,
287 
288     STBIR_MAX_COLORSPACES,
289 }
290 
291 
292 //////////////////////////////////////////////////////////////////////////////
293 //
294 // Full-complexity API
295 //
296 // This extends the medium API as follows:
297 //
298 //     * uint32 image type
299 //     * not typesafe
300 //     * separate filter types for each axis
301 //     * separate edge modes for each axis
302 //     * can specify scale explicitly for subpixel correctness
303 //     * can specify image source tile using texture coordinates
304 
305 alias stbir_datatype = int;
306 enum : stbir_datatype
307 {
308     STBIR_TYPE_UINT8 ,
309     STBIR_TYPE_UINT16,
310     STBIR_TYPE_UINT32,
311     STBIR_TYPE_FLOAT ,
312 
313     STBIR_MAX_TYPES
314 }
315 
316 // (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing style: [0, 1]x[0, 1]) of a region of the input image to use.
317 
318 struct STBAllocatorContext
319 {
320 nothrow:
321 @nogc:
322     void* buf = null;
323     size_t length = 0;
324 
325     @disable this(this);
326 
327     ~this()
328     {
329         alignedFree(buf, 1);
330     }
331 
332     void* reallocDiscard(size_t numBytes)
333     {
334         if (length < numBytes)
335         {         
336             buf = alignedReallocDiscard(buf, numBytes, 1);
337             length = numBytes;
338         }
339         return buf;
340     }
341 }
342 
343 void* STBIR_MALLOC(size_t size, void* context)
344 {
345     assert(context !is null);
346     STBAllocatorContext* alloc = cast(STBAllocatorContext*)context;
347     return alloc.reallocDiscard(size);
348 }
349 
350 void STBIR_FREE(void* p, void* context)
351 {
352     assert(context !is null);
353     // will be freed when resizer is freed, because it's relatively small and shared.
354 }
355 
356 enum STBIR_DEFAULT_FILTER_UPSAMPLE = STBIR_FILTER_CATMULLROM;
357 
358 enum STBIR_DEFAULT_FILTER_DOWNSAMPLE = STBIR_FILTER_MITCHELL;
359 
360 enum STBIR_MAX_CHANNELS = 4;
361 
362 // This value is added to alpha just before premultiplication to avoid
363 // zeroing out color values. It is equivalent to 2^-80. If you don't want
364 // that behavior (it may interfere if you have floating point images with
365 // very small alpha values) then you can define STBIR_NO_ALPHA_EPSILON to
366 // disable it.
367 enum float STBIR_ALPHA_EPSILON = (cast(float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20));
368 
369 // must match stbir_datatype
370 static immutable ubyte[4] stbir__type_size = 
371 [
372     1, // STBIR_TYPE_UINT8
373     2, // STBIR_TYPE_UINT16
374     4, // STBIR_TYPE_UINT32
375     4, // STBIR_TYPE_FLOAT
376 ];
377 
378 // Kernel function centered at 0
379 alias stbir__kernel_fn = float function(float x, float scale);
380 alias stbir__support_fn = float function(float scale);
381 
382 struct stbir__filter_info
383 {
384     stbir__kernel_fn kernel;
385     stbir__support_fn support;
386 }
387 
388 // When upsampling, the contributors are which source pixels contribute.
389 // When downsampling, the contributors are which destination pixels are contributed to.
390 struct stbir__contributors
391 {
392     int n0; // First contributing pixel
393     int n1; // Last contributing pixel
394 }
395 
396 struct stbir__info
397 {
398     const(void)* input_data;
399     int input_w;
400     int input_h;
401     int input_stride_bytes;
402 
403     void* output_data;
404     int output_w;
405     int output_h;
406     int output_stride_bytes;
407 
408     float s0, t0, s1, t1;
409 
410     float horizontal_shift; // Units: output pixels
411     float vertical_shift;   // Units: output pixels
412     float horizontal_scale;
413     float vertical_scale;
414 
415     int channels;
416     int alpha_channel;
417     uint flags;
418     stbir_datatype type;
419     stbir_filter horizontal_filter;
420     stbir_filter vertical_filter;
421     stbir_edge edge_horizontal;
422     stbir_edge edge_vertical;
423     stbir_colorspace colorspace;
424 
425     stbir__contributors* horizontal_contributors;
426     float* horizontal_coefficients;
427 
428     stbir__contributors* vertical_contributors;
429     float* vertical_coefficients;
430 
431     int decode_buffer_pixels;
432     float* decode_buffer;
433 
434     float* horizontal_buffer;
435 
436     // cache these because ceil/floor are inexplicably showing up in profile
437     int horizontal_coefficient_width;
438     int vertical_coefficient_width;
439     int horizontal_filter_pixel_width;
440     int vertical_filter_pixel_width;
441     int horizontal_filter_pixel_margin;
442     int vertical_filter_pixel_margin;
443     int horizontal_num_contributors;
444     int vertical_num_contributors;
445 
446     int ring_buffer_length_bytes;   // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
447     int ring_buffer_num_entries;    // Total number of entries in the ring buffer.
448     int ring_buffer_first_scanline;
449     int ring_buffer_last_scanline;
450     int ring_buffer_begin_index;    // first_scanline is at this index in the ring buffer
451     float* ring_buffer;
452 
453     float* encode_buffer; // A temporary buffer to store floats so we don't lose precision while we do multiply-adds.
454 
455     int horizontal_contributors_size;
456     int horizontal_coefficients_size;
457     int vertical_contributors_size;
458     int vertical_coefficients_size;
459     int decode_buffer_size;
460     int horizontal_buffer_size;
461     int ring_buffer_size;
462     int encode_buffer_size;
463 }
464 
465 
466 static immutable float stbir__max_uint8_as_float  = 255.0f;
467 static immutable float stbir__max_uint16_as_float = 65535.0f;
468 static immutable double stbir__max_uint32_as_float = 4294967295.0;
469 
470 
471 int stbir__min(int a, int b)
472 {
473     return a < b ? a : b;
474 }
475 
476 float stbir__saturate(float x)
477 {
478     if (x < 0)
479         return 0;
480 
481     if (x > 1)
482         return 1;
483 
484     return x;
485 }
486 
487 static immutable float[256] stbir__srgb_uchar_to_linear_float = 
488 [
489     0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
490     0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
491     0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
492     0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
493     0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
494     0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
495     0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
496     0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
497     0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
498     0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
499     0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
500     0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
501     0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
502     0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
503     0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
504     0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
505     0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
506     0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
507     0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
508     0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
509     0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
510     0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
511     0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
512     0.982251f, 0.991102f, 1.0f
513 ];
514 
515 float stbir__srgb_to_linear(float f)
516 {
517     if (f <= 0.04045f)
518         return f / 12.92f;
519     else
520         return cast(float)fast_pow((f + 0.055f) / 1.055f, 2.4f);
521 }
522 
523 float stbir__linear_to_srgb(float f)
524 {
525     if (f <= 0.0031308f)
526         return f * 12.92f;
527     else
528         return 1.055f * cast(float)fast_pow(f, 1 / 2.4f) - 0.055f;
529 }
530 
531 union stbir__FP32
532 {
533     uint u;
534     float f;
535 }
536 
537 static immutable uint[104] fp32_to_srgb8_tab4 = 
538 [
539     0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
540     0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
541     0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
542     0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
543     0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
544     0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
545     0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
546     0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
547     0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
548     0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
549     0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
550     0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
551     0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
552 ];
553 
554 ubyte stbir__linear_to_srgb_uchar(float in_)
555 {
556     static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
557     static const stbir__FP32 minval = { (127-13) << 23 };
558     uint tab,bias,scale,t;
559     stbir__FP32 f;
560 
561     // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
562     // The tests are carefully written so that NaNs map to 0, same as in the reference
563     // implementation.
564     if (!(in_ > minval.f)) // written this way to catch NaNs
565         in_ = minval.f;
566     if (in_ > almostone.f)
567         in_ = almostone.f;
568 
569     // Do the table lookup and unpack bias, scale
570     f.f = in_;
571     tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
572     bias = (tab >> 16) << 9;
573     scale = tab & 0xffff;
574 
575     // Grab next-highest mantissa bits and perform linear interpolation
576     t = (f.u >> 12) & 0xff;
577     return cast(ubyte) ((bias + scale*t) >> 16);
578 }
579 
580 
581 float stbir__filter_trapezoid(float x, float scale)
582 {
583     float halfscale = scale / 2;
584     float t = 0.5f + halfscale;
585     assert(scale <= 1);
586 
587     x = cast(float)fast_fabs(x);
588 
589     if (x >= t)
590         return 0;
591     else
592     {
593         float r = 0.5f - halfscale;
594         if (x <= r)
595             return 1;
596         else
597             return (t - x) / scale;
598     }
599 }
600 
601 float stbir__support_trapezoid(float scale)
602 {
603     assert(scale <= 1);
604     return 0.5f + scale / 2;
605 }
606 
607 float stbir__filter_triangle(float x, float s)
608 {
609     x = cast(float)fast_fabs(x);
610 
611     if (x <= 1.0f)
612         return 1 - x;
613     else
614         return 0;
615 }
616 
617 float stbir__filter_cubic(float x, float s)
618 {
619     x = cast(float)fast_fabs(x);
620 
621     if (x < 1.0f)
622         return (4 + x*x*(3*x - 6))/6;
623     else if (x < 2.0f)
624         return (8 + x*(-12 + x*(6 - x)))/6;
625 
626     return (0.0f);
627 }
628 
629 float stbir__filter_catmullrom(float x, float s)
630 {
631     x = cast(float)fast_fabs(x);
632 
633     if (x < 1.0f)
634         return 1 - x*x*(2.5f - 1.5f*x);
635     else if (x < 2.0f)
636         return 2 - x*(4 + x*(0.5f*x - 2.5f));
637 
638     return (0.0f);
639 }
640 
641 float stbir__filter_mitchell(float x, float s)
642 {
643     x = cast(float)fast_fabs(x);
644 
645     if (x < 1.0f)
646         return (16 + x*x*(21 * x - 36))/18;
647     else if (x < 2.0f)
648         return (32 + x*(-60 + x*(36 - 7*x)))/18;
649 
650     return (0.0f);
651 }
652 
653 float stbir__filter_lanczos(float A)(float x, float s)
654 {
655     x = cast(float)fast_fabs(x);
656 
657     if (x <= float.min_normal)
658         return 1.0f;
659 
660     if (x < A)
661     {
662         float pix = 3.14159265358979323846f*x;
663         return A*fast_sin(pix)*fast_sin(pix/A)/(pix*pix);
664     }
665 
666     return 0.0f;
667 }
668 
669 float stbir__filter_mk2013(float x, float s) nothrow @nogc
670 {
671     x = fast_fabs(x);
672     if (x < 0.5)
673         return 0.75 - x * x;
674 
675     if (x < 1.5)
676         return 0.5 * (x - 1.5)*(x - 1.5);
677 
678     return 0.0f;
679 }
680 
681 float stbir__filter_mks2013_hs(float x, float s) nothrow @nogc
682 {
683     // Perhaps possible to do better with "MKS 2021".
684     return 0.14f * stbir__filter_mk2013(x, s)
685          + 0.86f * stbir__filter_mks2013(x, s);
686 }
687 
688 float stbir__filter_mks2013(float x, float s) nothrow @nogc
689 {
690     x = fast_fabs(x);
691 
692     if (x <= float.min_normal)
693         return 17.0f / 16.0f;
694 
695     if (x < 0.5)
696         return 17.0 / 16.0 - 7.0 * x * x / 4.0;
697 
698     if (x < 1.5)
699     {
700         double x2 = x * x;
701         return 0.25 * (4 * x2 - 11.0 * x + 7.0);
702     }
703 
704     if (x < 2.5)
705     {
706         return -0.125 * (x - 5.0 / 2.0)*(x - 5.0 / 2.0);
707     }
708     return 0.0f;
709 }
710 
711 float stbir__filter_mks2021(float x, float s) nothrow @nogc
712 {
713     x = fast_fabs(x);
714     float x2 = x * x;
715 
716     if (x < 0.5)
717         return 577.0f / 576.0f - (239.0f / 144.0f) * x2;
718 
719     if (x < 1.5)
720         return (140 * x2 - 379 * x + 239) / 144.0f;
721 
722     if (x < 2.5)
723         return -(24 * x2 - 113 * x + 130) / 144.0f;
724 
725     if (x < 3.5)
726         return (4 * x2 - 27 * x + 45) / 144.0f;
727 
728     if (x < 4.5)
729         return -(4 * x2 - 36 * x + 81) / 1152.0f;
730 
731     return 0.0f;
732 }
733 
734 float stbir__support_zero(float s)
735 {
736     return 0;
737 }
738 
739 float stbir__support_one(float s)
740 {
741     return 1;
742 }
743 
744 float stbir__support_two(float s)
745 {
746     return 2;
747 }
748 
749 float stbir__support_three(float s)
750 {
751     return 3;
752 }
753 
754 float stbir__support_four(float s)
755 {
756     return 4;
757 }
758 
759 float stbir__support_five(float s)
760 {
761     return 5;
762 }
763 
764 static immutable stbir__filter_info[14] stbir__filter_info_table = 
765 [
766         { null,                      &stbir__support_zero },
767         { &stbir__filter_trapezoid,  &stbir__support_trapezoid },
768         { &stbir__filter_triangle,   &stbir__support_one },
769         { &stbir__filter_cubic,      &stbir__support_two },
770         { &stbir__filter_catmullrom, &stbir__support_two },
771         { &stbir__filter_mitchell,   &stbir__support_two },
772         { &stbir__filter_lanczos!2.0f, &stbir__support_two },
773         { &stbir__filter_lanczos!2.5f, &stbir__support_three },
774         { &stbir__filter_lanczos!3.0f, &stbir__support_three },
775         { &stbir__filter_lanczos!4.0f, &stbir__support_four },
776         { &stbir__filter_mk2013,       &stbir__support_three },
777         { &stbir__filter_mks2013_hs,   &stbir__support_three },
778         { &stbir__filter_mks2013,      &stbir__support_three },
779         { &stbir__filter_mks2021,      &stbir__support_five },
780         ];
781 
782 
783 static int stbir__use_upsampling(float ratio)
784 {
785     return ratio > 1;
786 }
787 
788 static int stbir__use_width_upsampling(stbir__info* stbir_info)
789 {
790     return stbir__use_upsampling(stbir_info.horizontal_scale);
791 }
792 
793 static int stbir__use_height_upsampling(stbir__info* stbir_info)
794 {
795     return stbir__use_upsampling(stbir_info.vertical_scale);
796 }
797 
798 // This is the maximum number of input samples that can affect an output sample
799 // with the given filter
800 static int stbir__get_filter_pixel_width(stbir_filter filter, float scale)
801 {
802     assert(filter != 0);
803     assert(filter < stbir__filter_info_table.length);
804 
805     if (stbir__use_upsampling(scale))
806         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2);
807     else
808         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2 / scale);
809 }
810 
811 // This is how much to expand buffers to account for filters seeking outside
812 // the image boundaries.
813 static int stbir__get_filter_pixel_margin(stbir_filter filter, float scale)
814 {
815     return stbir__get_filter_pixel_width(filter, scale) / 2;
816 }
817 
818 static int stbir__get_coefficient_width(stbir_filter filter, float scale)
819 {
820     if (stbir__use_upsampling(scale))
821         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1 / scale) * 2);
822     else
823         return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2);
824 }
825 
826 static int stbir__get_contributors(float scale, stbir_filter filter, int input_size, int output_size)
827 {
828     if (stbir__use_upsampling(scale))
829         return output_size;
830     else
831         return (input_size + stbir__get_filter_pixel_margin(filter, scale) * 2);
832 }
833 
834 static int stbir__get_total_horizontal_coefficients(stbir__info* info)
835 {
836     return info.horizontal_num_contributors
837          * stbir__get_coefficient_width      (info.horizontal_filter, info.horizontal_scale);
838 }
839 
840 static int stbir__get_total_vertical_coefficients(stbir__info* info)
841 {
842     return info.vertical_num_contributors
843          * stbir__get_coefficient_width      (info.vertical_filter, info.vertical_scale);
844 }
845 
846 static stbir__contributors* stbir__get_contributor(stbir__contributors* contributors, int n)
847 {
848     return &contributors[n];
849 }
850 
851 // For perf reasons this code is duplicated in stbir__resample_horizontal_upsample/downsample,
852 // if you change it here change it there too.
853 static float* stbir__get_coefficient(float* coefficients, stbir_filter filter, float scale, int n, int c)
854 {
855     int width = stbir__get_coefficient_width(filter, scale);
856     return &coefficients[width*n + c];
857 }
858 
859 static int stbir__edge_wrap_slow(stbir_edge edge, int n, int max)
860 {
861     switch (edge)
862     {
863     case STBIR_EDGE_ZERO:
864         return 0; // we'll decode the wrong pixel here, and then overwrite with 0s later
865 
866     case STBIR_EDGE_CLAMP:
867         if (n < 0)
868             return 0;
869 
870         if (n >= max)
871             return max - 1;
872 
873         return n; // NOTREACHED
874 
875     case STBIR_EDGE_REFLECT:
876     {
877         if (n < 0)
878         {
879             if (n < max)
880                 return -n;
881             else
882                 return max - 1;
883         }
884 
885         if (n >= max)
886         {
887             int max2 = max * 2;
888             if (n >= max2)
889                 return 0;
890             else
891                 return max2 - n - 1;
892         }
893 
894         return n; // NOTREACHED
895     }
896 
897     case STBIR_EDGE_WRAP:
898         if (n >= 0)
899             return (n % max);
900         else
901         {
902             int m = (-n) % max;
903 
904             if (m != 0)
905                 m = max - m;
906 
907             return (m);
908         }
909         // NOTREACHED
910 
911     default:
912         assert(false, "Unimplemented edge type");
913     }
914 }
915 
916 static int stbir__edge_wrap(stbir_edge edge, int n, int max)
917 {
918     // avoid per-pixel switch
919     if (n >= 0 && n < max)
920         return n;
921     return stbir__edge_wrap_slow(edge, n, max);
922 }
923 
924 // What input pixels contribute to this output pixel?
925 static void stbir__calculate_sample_range_upsample(int n, float out_filter_radius, float scale_ratio, float out_shift, int* in_first_pixel, int* in_last_pixel, float* in_center_of_out)
926 {
927     float out_pixel_center = cast(float)n + 0.5f;
928     float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
929     float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
930 
931     float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) / scale_ratio;
932     float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) / scale_ratio;
933 
934     *in_center_of_out = (out_pixel_center + out_shift) / scale_ratio;
935     *in_first_pixel = cast(int)(fast_floor(in_pixel_influence_lowerbound + 0.5));
936     *in_last_pixel = cast(int)(fast_floor(in_pixel_influence_upperbound - 0.5));
937 }
938 
939 // What output pixels does this input pixel contribute to?
940 static void stbir__calculate_sample_range_downsample(int n, float in_pixels_radius, float scale_ratio, float out_shift, int* out_first_pixel, int* out_last_pixel, float* out_center_of_in)
941 {
942     float in_pixel_center = cast(float)n + 0.5f;
943     float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
944     float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
945 
946     float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale_ratio - out_shift;
947     float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale_ratio - out_shift;
948 
949     *out_center_of_in = in_pixel_center * scale_ratio - out_shift;
950     *out_first_pixel = cast(int)(fast_floor(out_pixel_influence_lowerbound + 0.5));
951     *out_last_pixel = cast(int)(fast_floor(out_pixel_influence_upperbound - 0.5));
952 }
953 
954 static void stbir__calculate_coefficients_upsample(stbir_filter filter, float scale, int in_first_pixel, int in_last_pixel, float in_center_of_out, stbir__contributors* contributor, float* coefficient_group)
955 {
956     int i;
957     float total_filter = 0;
958     float filter_scale;
959 
960     assert(in_last_pixel - in_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
961 
962     contributor.n0 = in_first_pixel;
963     contributor.n1 = in_last_pixel;
964 
965     assert(contributor.n1 >= contributor.n0);
966 
967     for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
968     {
969         float in_pixel_center = cast(float)(i + in_first_pixel) + 0.5f;
970         coefficient_group[i] = stbir__filter_info_table[filter].kernel(in_center_of_out - in_pixel_center, 1 / scale);
971 
972         // If the coefficient is zero, skip it. (Don't do the <0 check here, we want the influence of those outside pixels.)
973         if (i == 0 && !coefficient_group[i])
974         {
975             contributor.n0 = ++in_first_pixel;
976             i--;
977             continue;
978         }
979 
980         total_filter += coefficient_group[i];
981     }
982 
983     assert(stbir__filter_info_table[filter].kernel(cast(float)(in_last_pixel + 1) + 0.5f - in_center_of_out, 1/scale) == 0);
984 
985     assert(total_filter > 0.9);
986     assert(total_filter < 1.1f); // Make sure it's not way off.
987 
988     // Make sure the sum of all coefficients is 1.
989     filter_scale = 1 / total_filter;
990 
991     for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
992         coefficient_group[i] *= filter_scale;
993 
994     for (i = in_last_pixel - in_first_pixel; i >= 0; i--)
995     {
996         if (coefficient_group[i])
997             break;
998 
999         // This line has no weight. We can skip it.
1000         contributor.n1 = contributor.n0 + i - 1;
1001     }
1002 }
1003 
1004 static void stbir__calculate_coefficients_downsample(stbir_filter filter, float scale_ratio, int out_first_pixel, int out_last_pixel, float out_center_of_in, stbir__contributors* contributor, float* coefficient_group)
1005 {
1006     int i;
1007 
1008      assert(out_last_pixel - out_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale_ratio) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
1009 
1010     contributor.n0 = out_first_pixel;
1011     contributor.n1 = out_last_pixel;
1012 
1013     assert(contributor.n1 >= contributor.n0);
1014 
1015     for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
1016     {
1017         float out_pixel_center = cast(float)(i + out_first_pixel) + 0.5f;
1018         float x = out_pixel_center - out_center_of_in;
1019         coefficient_group[i] = stbir__filter_info_table[filter].kernel(x, scale_ratio) * scale_ratio;
1020     }
1021 
1022     assert(stbir__filter_info_table[filter].kernel(cast(float)(out_last_pixel + 1) + 0.5f - out_center_of_in, scale_ratio) == 0);
1023 
1024     for (i = out_last_pixel - out_first_pixel; i >= 0; i--)
1025     {
1026         if (coefficient_group[i])
1027             break;
1028 
1029         // This line has no weight. We can skip it.
1030         contributor.n1 = contributor.n0 + i - 1;
1031     }
1032 }
1033 
1034 static void stbir__normalize_downsample_coefficients(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, int input_size, int output_size)
1035 {
1036     int num_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1037     int num_coefficients = stbir__get_coefficient_width(filter, scale_ratio);
1038     int i, j;
1039     int skip;
1040 
1041     for (i = 0; i < output_size; i++)
1042     {
1043         float scale;
1044         float total = 0;
1045 
1046         for (j = 0; j < num_contributors; j++)
1047         {
1048             if (i >= contributors[j].n0 && i <= contributors[j].n1)
1049             {
1050                 float coefficient = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0);
1051                 total += coefficient;
1052             }
1053             else if (i < contributors[j].n0)
1054                 break;
1055         }
1056 
1057         assert(total > 0.9f);
1058         assert(total < 1.1f);
1059 
1060         scale = 1 / total;
1061 
1062         for (j = 0; j < num_contributors; j++)
1063         {
1064             if (i >= contributors[j].n0 && i <= contributors[j].n1)
1065                 *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0) *= scale;
1066             else if (i < contributors[j].n0)
1067                 break;
1068         }
1069     }
1070 
1071     // Optimize: Skip zero coefficients and contributions outside of image bounds.
1072     // Do this after normalizing because normalization depends on the n0/n1 values.
1073     for (j = 0; j < num_contributors; j++)
1074     {
1075         int range, max, width;
1076 
1077         skip = 0;
1078         while (*stbir__get_coefficient(coefficients, filter, scale_ratio, j, skip) == 0)
1079             skip++;
1080 
1081         contributors[j].n0 += skip;
1082 
1083         while (contributors[j].n0 < 0)
1084         {
1085             contributors[j].n0++;
1086             skip++;
1087         }
1088 
1089         range = contributors[j].n1 - contributors[j].n0 + 1;
1090         max = stbir__min(num_coefficients, range);
1091 
1092         width = stbir__get_coefficient_width(filter, scale_ratio);
1093         for (i = 0; i < max; i++)
1094         {
1095             if (i + skip >= width)
1096                 break;
1097 
1098             *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i) = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i + skip);
1099         }
1100 
1101         continue;
1102     }
1103 
1104     // Using min to avoid writing into invalid pixels.
1105     for (i = 0; i < num_contributors; i++)
1106         contributors[i].n1 = stbir__min(contributors[i].n1, output_size - 1);
1107 }
1108 
1109 // Each scan line uses the same kernel values so we should calculate the kernel
1110 // values once and then we can use them for every scan line.
1111 static void stbir__calculate_filters(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, float shift, int input_size, int output_size)
1112 {
1113     int n;
1114     int total_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1115 
1116     if (stbir__use_upsampling(scale_ratio))
1117     {
1118         float out_pixels_radius = stbir__filter_info_table[filter].support(1 / scale_ratio) * scale_ratio;
1119 
1120         // Looping through out pixels
1121         for (n = 0; n < total_contributors; n++)
1122         {
1123             float in_center_of_out; // Center of the current out pixel in the in pixel space
1124             int in_first_pixel, in_last_pixel;
1125 
1126             stbir__calculate_sample_range_upsample(n, out_pixels_radius, scale_ratio, shift, &in_first_pixel, &in_last_pixel, &in_center_of_out);
1127 
1128             stbir__calculate_coefficients_upsample(filter, scale_ratio, in_first_pixel, in_last_pixel, in_center_of_out, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1129         }
1130     }
1131     else
1132     {
1133         float in_pixels_radius = stbir__filter_info_table[filter].support(scale_ratio) / scale_ratio;
1134 
1135         // Looping through in pixels
1136         for (n = 0; n < total_contributors; n++)
1137         {
1138             float out_center_of_in; // Center of the current out pixel in the in pixel space
1139             int out_first_pixel, out_last_pixel;
1140             int n_adjusted = n - stbir__get_filter_pixel_margin(filter, scale_ratio);
1141 
1142             stbir__calculate_sample_range_downsample(n_adjusted, in_pixels_radius, scale_ratio, shift, &out_first_pixel, &out_last_pixel, &out_center_of_in);
1143 
1144             stbir__calculate_coefficients_downsample(filter, scale_ratio, out_first_pixel, out_last_pixel, out_center_of_in, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1145         }
1146 
1147         stbir__normalize_downsample_coefficients(contributors, coefficients, filter, scale_ratio, input_size, output_size);
1148     }
1149 }
1150 
1151 static float* stbir__get_decode_buffer(stbir__info* stbir_info)
1152 {
1153     // The 0 index of the decode buffer starts after the margin. This makes
1154     // it okay to use negative indexes on the decode buffer.
1155     return &stbir_info.decode_buffer[stbir_info.horizontal_filter_pixel_margin * stbir_info.channels];
1156 }
1157 
1158 int STBIR__DECODE(int type, int colorspace)
1159 {
1160     return type * STBIR_MAX_COLORSPACES + colorspace;
1161 }
1162 
1163 static void stbir__decode_scanline(stbir__info* stbir_info, int n)
1164 {
1165     int c;
1166     int channels = stbir_info.channels;
1167     int alpha_channel = stbir_info.alpha_channel;
1168     int type = stbir_info.type;
1169     int colorspace = stbir_info.colorspace;
1170     int input_w = stbir_info.input_w;
1171     size_t input_stride_bytes = stbir_info.input_stride_bytes;
1172     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1173     stbir_edge edge_horizontal = stbir_info.edge_horizontal;
1174     stbir_edge edge_vertical = stbir_info.edge_vertical;
1175     size_t in_buffer_row_offset = stbir__edge_wrap(edge_vertical, n, stbir_info.input_h) * input_stride_bytes;
1176     const void* input_data = cast(char *) stbir_info.input_data + in_buffer_row_offset;
1177     int max_x = input_w + stbir_info.horizontal_filter_pixel_margin;
1178     int decode = STBIR__DECODE(type, colorspace);
1179 
1180     int x = -stbir_info.horizontal_filter_pixel_margin;
1181 
1182     // special handling for STBIR_EDGE_ZERO because it needs to return an item that doesn't appear in the input,
1183     // and we want to avoid paying overhead on every pixel if not STBIR_EDGE_ZERO
1184     if (edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info.input_h))
1185     {
1186         for (; x < max_x; x++)
1187             for (c = 0; c < channels; c++)
1188                 decode_buffer[x*channels + c] = 0;
1189         return;
1190     }
1191 
1192     switch (decode)
1193     {
1194     case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1195         for (; x < max_x; x++)
1196         {
1197             int decode_pixel_index = x * channels;
1198             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1199             for (c = 0; c < channels; c++)
1200                 decode_buffer[decode_pixel_index + c] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + c]) / stbir__max_uint8_as_float;
1201         }
1202         break;
1203 
1204     case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1205         for (; x < max_x; x++)
1206         {
1207             int decode_pixel_index = x * channels;
1208             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1209             for (c = 0; c < channels; c++)
1210                 decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1211 
1212             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1213                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint8_as_float;
1214         }
1215         break;
1216 
1217     case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1218         for (; x < max_x; x++)
1219         {
1220             int decode_pixel_index = x * channels;
1221             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1222             for (c = 0; c < channels; c++)
1223                 decode_buffer[decode_pixel_index + c] = (cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float;
1224         }
1225         break;
1226 
1227     case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1228         for (; x < max_x; x++)
1229         {
1230             int decode_pixel_index = x * channels;
1231             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1232             for (c = 0; c < channels; c++)
1233                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float);
1234 
1235             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1236                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint16_as_float;
1237         }
1238         break;
1239 
1240     case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1241         for (; x < max_x; x++)
1242         {
1243             int decode_pixel_index = x * channels;
1244             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1245             for (c = 0; c < channels; c++)
1246                 decode_buffer[decode_pixel_index + c] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float);
1247         }
1248         break;
1249 
1250     case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1251         for (; x < max_x; x++)
1252         {
1253             int decode_pixel_index = x * channels;
1254             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1255             for (c = 0; c < channels; c++)
1256                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float));
1257 
1258             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1259                 decode_buffer[decode_pixel_index + alpha_channel] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint32_as_float);
1260         }
1261         break;
1262 
1263     case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1264         for (; x < max_x; x++)
1265         {
1266             int decode_pixel_index = x * channels;
1267             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1268             for (c = 0; c < channels; c++)
1269                 decode_buffer[decode_pixel_index + c] = (cast(const(float)*)input_data)[input_pixel_index + c];
1270         }
1271         break;
1272 
1273     case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1274         for (; x < max_x; x++)
1275         {
1276             int decode_pixel_index = x * channels;
1277             int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1278             for (c = 0; c < channels; c++)
1279                 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(const(float)*)input_data)[input_pixel_index + c]);
1280 
1281             if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1282                 decode_buffer[decode_pixel_index + alpha_channel] = (cast(const(float)*)input_data)[input_pixel_index + alpha_channel];
1283         }
1284 
1285         break;
1286 
1287     default:
1288         assert(!"Unknown type/colorspace/channels combination.");
1289         break;
1290     }
1291 
1292     if (!(stbir_info.flags & STBIR_FLAG_ALPHA_PREMULTIPLIED))
1293     {
1294         for (x = -stbir_info.horizontal_filter_pixel_margin; x < max_x; x++)
1295         {
1296             int decode_pixel_index = x * channels;
1297 
1298             // If the alpha value is 0 it will clobber the color values. Make sure it's not.
1299             float alpha = decode_buffer[decode_pixel_index + alpha_channel];
1300 
1301             version(STBIR_NO_ALPHA_EPSILON)
1302             {}
1303             else
1304             {
1305                 if (stbir_info.type != STBIR_TYPE_FLOAT) {
1306                     alpha += STBIR_ALPHA_EPSILON;
1307                     decode_buffer[decode_pixel_index + alpha_channel] = alpha;
1308                 }
1309             }
1310 
1311             for (c = 0; c < channels; c++)
1312             {
1313                 if (c == alpha_channel)
1314                     continue;
1315 
1316                 decode_buffer[decode_pixel_index + c] *= alpha;
1317             }
1318         }
1319     }
1320 
1321     if (edge_horizontal == STBIR_EDGE_ZERO)
1322     {
1323         for (x = -stbir_info.horizontal_filter_pixel_margin; x < 0; x++)
1324         {
1325             for (c = 0; c < channels; c++)
1326                 decode_buffer[x*channels + c] = 0;
1327         }
1328         for (x = input_w; x < max_x; x++)
1329         {
1330             for (c = 0; c < channels; c++)
1331                 decode_buffer[x*channels + c] = 0;
1332         }
1333     }
1334 }
1335 
1336 static float* stbir__get_ring_buffer_entry(float* ring_buffer, int index, int ring_buffer_length)
1337 {
1338     return &ring_buffer[index * ring_buffer_length];
1339 }
1340 
1341 static float* stbir__add_empty_ring_buffer_entry(stbir__info* stbir_info, int n)
1342 {
1343     int ring_buffer_index;
1344     float* ring_buffer;
1345 
1346     stbir_info.ring_buffer_last_scanline = n;
1347 
1348     if (stbir_info.ring_buffer_begin_index < 0)
1349     {
1350         ring_buffer_index = stbir_info.ring_buffer_begin_index = 0;
1351         stbir_info.ring_buffer_first_scanline = n;
1352     }
1353     else
1354     {
1355         ring_buffer_index = (stbir_info.ring_buffer_begin_index + (stbir_info.ring_buffer_last_scanline - stbir_info.ring_buffer_first_scanline)) % stbir_info.ring_buffer_num_entries;
1356         assert(ring_buffer_index != stbir_info.ring_buffer_begin_index);
1357     }
1358 
1359     ring_buffer = stbir__get_ring_buffer_entry(stbir_info.ring_buffer, ring_buffer_index, stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof));
1360     memset(ring_buffer, 0, stbir_info.ring_buffer_length_bytes);
1361 
1362     return ring_buffer;
1363 }
1364 
1365 
1366 static void stbir__resample_horizontal_upsample(stbir__info* stbir_info, float* output_buffer)
1367 {
1368     int x, k;
1369     int output_w = stbir_info.output_w;
1370     int channels = stbir_info.channels;
1371     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1372     stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1373     float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1374     int coefficient_width = stbir_info.horizontal_coefficient_width;
1375 
1376     for (x = 0; x < output_w; x++)
1377     {
1378         int n0 = horizontal_contributors[x].n0;
1379         int n1 = horizontal_contributors[x].n1;
1380 
1381         int out_pixel_index = x * channels;
1382         int coefficient_group = coefficient_width * x;
1383         int coefficient_counter = 0;
1384 
1385         assert(n1 >= n0);
1386         assert(n0 >= -stbir_info.horizontal_filter_pixel_margin);
1387         assert(n1 >= -stbir_info.horizontal_filter_pixel_margin);
1388         assert(n0 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1389         assert(n1 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1390 
1391         switch (channels) {
1392             case 1:
1393                 for (k = n0; k <= n1; k++)
1394                 {
1395                     int in_pixel_index = k * 1;
1396                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1397                     //assert(coefficient != 0);
1398                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1399                 }
1400                 break;
1401             case 2:
1402                 for (k = n0; k <= n1; k++)
1403                 {
1404                     int in_pixel_index = k * 2;
1405                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1406                     //assert(coefficient != 0);
1407                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1408                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1409                 }
1410                 break;
1411             case 3:
1412                 for (k = n0; k <= n1; k++)
1413                 {
1414                     int in_pixel_index = k * 3;
1415                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1416                     //assert(coefficient != 0);
1417                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1418                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1419                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1420                 }
1421                 break;
1422             case 4:
1423                 for (k = n0; k <= n1; k++)
1424                 {
1425                     int in_pixel_index = k * 4;
1426                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1427                     //assert(coefficient != 0);
1428                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1429                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1430                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1431                     output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1432                 }
1433                 break;
1434             default:
1435                 for (k = n0; k <= n1; k++)
1436                 {
1437                     int in_pixel_index = k * channels;
1438                     float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1439                     int c;
1440                     //assert(coefficient != 0);
1441                     for (c = 0; c < channels; c++)
1442                         output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1443                 }
1444                 break;
1445         }
1446     }
1447 }
1448 
1449 static void stbir__resample_horizontal_downsample(stbir__info* stbir_info, float* output_buffer)
1450 {
1451     int x, k;
1452     int input_w = stbir_info.input_w;
1453     int channels = stbir_info.channels;
1454     float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1455     stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1456     float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1457     int coefficient_width = stbir_info.horizontal_coefficient_width;
1458     int filter_pixel_margin = stbir_info.horizontal_filter_pixel_margin;
1459     int max_x = input_w + filter_pixel_margin * 2;
1460 
1461     assert(!stbir__use_width_upsampling(stbir_info));
1462 
1463     switch (channels) {
1464         case 1:
1465             for (x = 0; x < max_x; x++)
1466             {
1467                 int n0 = horizontal_contributors[x].n0;
1468                 int n1 = horizontal_contributors[x].n1;
1469 
1470                 int in_x = x - filter_pixel_margin;
1471                 int in_pixel_index = in_x * 1;
1472                 int max_n = n1;
1473                 int coefficient_group = coefficient_width * x;
1474 
1475                 for (k = n0; k <= max_n; k++)
1476                 {
1477                     int out_pixel_index = k * 1;
1478                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1479                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1480                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1481                 }
1482             }
1483             break;
1484 
1485         case 2:
1486             for (x = 0; x < max_x; x++)
1487             {
1488                 int n0 = horizontal_contributors[x].n0;
1489                 int n1 = horizontal_contributors[x].n1;
1490 
1491                 int in_x = x - filter_pixel_margin;
1492                 int in_pixel_index = in_x * 2;
1493                 int max_n = n1;
1494                 int coefficient_group = coefficient_width * x;
1495 
1496                 for (k = n0; k <= max_n; k++)
1497                 {
1498                     int out_pixel_index = k * 2;
1499                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1500                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1501                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1502                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1503                 }
1504             }
1505             break;
1506 
1507         case 3:
1508             for (x = 0; x < max_x; x++)
1509             {
1510                 int n0 = horizontal_contributors[x].n0;
1511                 int n1 = horizontal_contributors[x].n1;
1512 
1513                 int in_x = x - filter_pixel_margin;
1514                 int in_pixel_index = in_x * 3;
1515                 int max_n = n1;
1516                 int coefficient_group = coefficient_width * x;
1517 
1518                 for (k = n0; k <= max_n; k++)
1519                 {
1520                     int out_pixel_index = k * 3;
1521                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1522                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1523                     output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1524                     output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1525                     output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1526                 }
1527             }
1528             break;
1529 
1530         case 4:
1531             for (x = 0; x < max_x; x++)
1532             {
1533                 int n0 = horizontal_contributors[x].n0;
1534                 int n1 = horizontal_contributors[x].n1;
1535 
1536                 int in_x = x - filter_pixel_margin;
1537                 int in_pixel_index = in_x * 4;
1538                 int max_n = n1;
1539                 int coefficient_group = coefficient_width * x;
1540 
1541                 for (k = n0; k <= max_n; k++)
1542                 {
1543                     int out_pixel_index = k * 4;
1544                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1545                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1546 
1547                     version(DigitalMars)
1548                     {
1549                         output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1550                         output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1551                         output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1552                         output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1553                     }
1554                     else
1555                     {
1556                         __m128 A = _mm_loadu_ps(&decode_buffer[in_pixel_index]);
1557                         __m128 B = _mm_loadu_ps(&output_buffer[out_pixel_index]);
1558                         B = B + A * _mm_set1_ps(coefficient);
1559                         _mm_storeu_ps(&output_buffer[out_pixel_index], B);
1560                     }
1561                 }
1562             }
1563             break;
1564 
1565         default:
1566             for (x = 0; x < max_x; x++)
1567             {
1568                 int n0 = horizontal_contributors[x].n0;
1569                 int n1 = horizontal_contributors[x].n1;
1570 
1571                 int in_x = x - filter_pixel_margin;
1572                 int in_pixel_index = in_x * channels;
1573                 int max_n = n1;
1574                 int coefficient_group = coefficient_width * x;
1575 
1576                 for (k = n0; k <= max_n; k++)
1577                 {
1578                     int c;
1579                     int out_pixel_index = k * channels;
1580                     float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1581                     //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1582                     for (c = 0; c < channels; c++)
1583                         output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1584                 }
1585             }
1586             break;
1587     }
1588 }
1589 
1590 static void stbir__decode_and_resample_upsample(stbir__info* stbir_info, int n)
1591 {
1592     // Decode the nth scanline from the source image into the decode buffer.
1593     stbir__decode_scanline(stbir_info, n);
1594 
1595     // Now resample it into the ring buffer.
1596     if (stbir__use_width_upsampling(stbir_info))
1597         stbir__resample_horizontal_upsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1598     else
1599         stbir__resample_horizontal_downsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1600 
1601     // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
1602 }
1603 
1604 static void stbir__decode_and_resample_downsample(stbir__info* stbir_info, int n)
1605 {
1606     // Decode the nth scanline from the source image into the decode buffer.
1607     stbir__decode_scanline(stbir_info, n);
1608 
1609     memset(stbir_info.horizontal_buffer, 0, stbir_info.output_w * stbir_info.channels * float.sizeof);
1610 
1611     // Now resample it into the horizontal buffer.
1612     if (stbir__use_width_upsampling(stbir_info))
1613         stbir__resample_horizontal_upsample(stbir_info, stbir_info.horizontal_buffer);
1614     else
1615         stbir__resample_horizontal_downsample(stbir_info, stbir_info.horizontal_buffer);
1616 
1617     // Now it's sitting in the horizontal buffer ready to be distributed into the ring buffers.
1618 }
1619 
1620 // Get the specified scan line from the ring buffer.
1621 static float* stbir__get_ring_buffer_scanline(int get_scanline, float* ring_buffer, int begin_index, int first_scanline, int ring_buffer_num_entries, int ring_buffer_length)
1622 {
1623     int ring_buffer_index = (begin_index + (get_scanline - first_scanline)) % ring_buffer_num_entries;
1624     return stbir__get_ring_buffer_entry(ring_buffer, ring_buffer_index, ring_buffer_length);
1625 }
1626 
1627 
1628 static void stbir__encode_scanline(stbir__info* stbir_info, int num_pixels, void *output_buffer, float *encode_buffer, int channels, int alpha_channel, int decode)
1629 {
1630     int x;
1631     int n;
1632     int num_nonalpha;
1633     ushort[STBIR_MAX_CHANNELS] nonalpha;
1634 
1635     if (!(stbir_info.flags&STBIR_FLAG_ALPHA_PREMULTIPLIED))
1636     {
1637         for (x=0; x < num_pixels; ++x)
1638         {
1639             int pixel_index = x*channels;
1640 
1641             float alpha = encode_buffer[pixel_index + alpha_channel];
1642             float reciprocal_alpha = alpha ? 1.0f / alpha : 0;
1643 
1644             // unrolling this produced a 1% slowdown upscaling a large RGBA linear-space image on my machine - stb
1645             for (n = 0; n < channels; n++)
1646                 if (n != alpha_channel)
1647                     encode_buffer[pixel_index + n] *= reciprocal_alpha;
1648 
1649             // We added in a small epsilon to prevent the color channel from being deleted with zero alpha.
1650             // Because we only add it for integer types, it will automatically be discarded on integer
1651             // conversion, so we don't need to subtract it back out (which would be problematic for
1652             // numeric precision reasons).
1653         }
1654     }
1655 
1656     // build a table of all channels that need colorspace correction, so
1657     // we don't perform colorspace correction on channels that don't need it.
1658     for (x = 0, num_nonalpha = 0; x < channels; ++x)
1659     {
1660         if (x != alpha_channel || (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1661         {
1662             nonalpha[num_nonalpha++] = cast(ushort)x;
1663         }
1664     }
1665 
1666     static int STBIR__ROUND_INT_f(float f)
1667     {
1668         return cast(int)(f + 0.5f);
1669     }
1670     static int STBIR__ROUND_INT_d(double f)
1671     {
1672         return cast(int)(f + 0.5);
1673     }
1674     static int STBIR__ROUND_UINT_f(float f)
1675     {
1676         return cast(uint)(f + 0.5f);
1677     }
1678     static int STBIR__ROUND_UINT_d(double f)
1679     {
1680         return cast(uint)(f + 0.5);
1681     }
1682     
1683    
1684     static ubyte STBIR__ENCODE_LINEAR8(float f)
1685     {
1686         return cast(ubyte) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint8_as_float );
1687     }
1688 
1689     static ushort STBIR__ENCODE_LINEAR16(float f)
1690     {
1691         return cast(ushort) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint16_as_float );
1692     }
1693 
1694     switch (decode)
1695     {
1696         case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1697             for (x=0; x < num_pixels; ++x)
1698             {
1699                 int pixel_index = x*channels;
1700 
1701                 for (n = 0; n < channels; n++)
1702                 {
1703                     int index = pixel_index + n;
1704                     (cast(ubyte*)output_buffer)[index] = STBIR__ENCODE_LINEAR8(encode_buffer[index]);
1705                 }
1706             }
1707             break;
1708 
1709         case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1710             for (x=0; x < num_pixels; ++x)
1711             {
1712                 int pixel_index = x*channels;
1713 
1714                 for (n = 0; n < num_nonalpha; n++)
1715                 {
1716                     int index = pixel_index + nonalpha[n];
1717                     (cast(ubyte*)output_buffer)[index] = stbir__linear_to_srgb_uchar(encode_buffer[index]);
1718                 }
1719 
1720                 if (!(stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1721                     (cast(ubyte*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR8(encode_buffer[pixel_index+alpha_channel]);
1722             }
1723             break;
1724 
1725         case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1726             for (x=0; x < num_pixels; ++x)
1727             {
1728                 int pixel_index = x*channels;
1729 
1730                 for (n = 0; n < channels; n++)
1731                 {
1732                     int index = pixel_index + n;
1733                     (cast(ushort*)output_buffer)[index] = STBIR__ENCODE_LINEAR16(encode_buffer[index]);
1734                 }
1735             }
1736             break;
1737 
1738         case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1739             for (x=0; x < num_pixels; ++x)
1740             {
1741                 int pixel_index = x*channels;
1742 
1743                 for (n = 0; n < num_nonalpha; n++)
1744                 {
1745                     int index = pixel_index + nonalpha[n];
1746                     (cast(ushort*)output_buffer)[index] = cast(ushort)STBIR__ROUND_INT_f(stbir__linear_to_srgb(stbir__saturate(encode_buffer[index])) * stbir__max_uint16_as_float);
1747                 }
1748 
1749                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1750                     (cast(ushort*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR16(encode_buffer[pixel_index + alpha_channel]);
1751             }
1752 
1753             break;
1754 
1755         case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1756             for (x=0; x < num_pixels; ++x)
1757             {
1758                 int pixel_index = x*channels;
1759 
1760                 for (n = 0; n < channels; n++)
1761                 {
1762                     int index = pixel_index + n;
1763                     (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__saturate(encode_buffer[index])) * stbir__max_uint32_as_float);
1764                 }
1765             }
1766             break;
1767 
1768         case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1769             for (x=0; x < num_pixels; ++x)
1770             {
1771                 int pixel_index = x*channels;
1772 
1773                 for (n = 0; n < num_nonalpha; n++)
1774                 {
1775                     int index = pixel_index + nonalpha[n];
1776                     (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__linear_to_srgb(stbir__saturate(encode_buffer[index]))) * stbir__max_uint32_as_float);
1777                 }
1778 
1779                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1780                     (cast(uint*)output_buffer)[pixel_index + alpha_channel] = cast(uint) STBIR__ROUND_INT_d((cast(double)stbir__saturate(encode_buffer[pixel_index + alpha_channel])) * stbir__max_uint32_as_float);
1781             }
1782             break;
1783 
1784         case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1785             for (x=0; x < num_pixels; ++x)
1786             {
1787                 int pixel_index = x*channels;
1788 
1789                 for (n = 0; n < channels; n++)
1790                 {
1791                     int index = pixel_index + n;
1792                     (cast(float*)output_buffer)[index] = encode_buffer[index];
1793                 }
1794             }
1795             break;
1796 
1797         case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1798             for (x=0; x < num_pixels; ++x)
1799             {
1800                 int pixel_index = x*channels;
1801 
1802                 for (n = 0; n < num_nonalpha; n++)
1803                 {
1804                     int index = pixel_index + nonalpha[n];
1805                     (cast(float*)output_buffer)[index] = stbir__linear_to_srgb(encode_buffer[index]);
1806                 }
1807 
1808                 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1809                     (cast(float*)output_buffer)[pixel_index + alpha_channel] = encode_buffer[pixel_index + alpha_channel];
1810             }
1811             break;
1812 
1813         default:
1814             assert(!"Unknown type/colorspace/channels combination.");
1815             break;
1816     }
1817 }
1818 
1819 static void stbir__resample_vertical_upsample(stbir__info* stbir_info, int n)
1820 {
1821     int x, k;
1822     int output_w = stbir_info.output_w;
1823     stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
1824     float* vertical_coefficients = stbir_info.vertical_coefficients;
1825     int channels = stbir_info.channels;
1826     int alpha_channel = stbir_info.alpha_channel;
1827     int type = stbir_info.type;
1828     int colorspace = stbir_info.colorspace;
1829     int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
1830     void* output_data = stbir_info.output_data;
1831     float* encode_buffer = stbir_info.encode_buffer;
1832     int decode = STBIR__DECODE(type, colorspace);
1833     int coefficient_width = stbir_info.vertical_coefficient_width;
1834     int coefficient_counter;
1835     int contributor = n;
1836 
1837     float* ring_buffer = stbir_info.ring_buffer;
1838     int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
1839     int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
1840     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
1841 
1842     int n0,n1, output_row_start;
1843     int coefficient_group = coefficient_width * contributor;
1844 
1845     n0 = vertical_contributors[contributor].n0;
1846     n1 = vertical_contributors[contributor].n1;
1847 
1848     output_row_start = n * stbir_info.output_stride_bytes;
1849 
1850     assert(stbir__use_height_upsampling(stbir_info));
1851 
1852     memset(encode_buffer, 0, output_w * float.sizeof * channels);
1853 
1854     // I tried reblocking this for better cache usage of encode_buffer
1855     // (using x_outer, k, x_inner), but it lost speed. -- stb
1856 
1857     coefficient_counter = 0;
1858     switch (channels) {
1859         case 1:
1860             for (k = n0; k <= n1; k++)
1861             {
1862                 int coefficient_index = coefficient_counter++;
1863                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1864                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1865                 for (x = 0; x < output_w; ++x)
1866                 {
1867                     int in_pixel_index = x * 1;
1868                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1869                 }
1870             }
1871             break;
1872         case 2:
1873             for (k = n0; k <= n1; k++)
1874             {
1875                 int coefficient_index = coefficient_counter++;
1876                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1877                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1878                 for (x = 0; x < output_w; ++x)
1879                 {
1880                     int in_pixel_index = x * 2;
1881                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1882                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1883                 }
1884             }
1885             break;
1886         case 3:
1887             for (k = n0; k <= n1; k++)
1888             {
1889                 int coefficient_index = coefficient_counter++;
1890                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1891                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1892                 for (x = 0; x < output_w; ++x)
1893                 {
1894                     int in_pixel_index = x * 3;
1895                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1896                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1897                     encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
1898                 }
1899             }
1900             break;
1901         case 4:
1902             for (k = n0; k <= n1; k++)
1903             {
1904                 int coefficient_index = coefficient_counter++;
1905                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1906                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1907                 for (x = 0; x < output_w; ++x)
1908                 {
1909                     int in_pixel_index = x * 4;
1910                     encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1911                     encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1912                     encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
1913                     encode_buffer[in_pixel_index + 3] += ring_buffer_entry[in_pixel_index + 3] * coefficient;
1914                 }
1915             }
1916             break;
1917         default:
1918             for (k = n0; k <= n1; k++)
1919             {
1920                 int coefficient_index = coefficient_counter++;
1921                 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1922                 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1923                 for (x = 0; x < output_w; ++x)
1924                 {
1925                     int in_pixel_index = x * channels;
1926                     int c;
1927                     for (c = 0; c < channels; c++)
1928                         encode_buffer[in_pixel_index + c] += ring_buffer_entry[in_pixel_index + c] * coefficient;
1929                 }
1930             }
1931             break;
1932     }
1933     stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, encode_buffer, channels, alpha_channel, decode);
1934 }
1935 
1936 static void stbir__resample_vertical_downsample(stbir__info* stbir_info, int n)
1937 {
1938     int x, k;
1939     int output_w = stbir_info.output_w;
1940     stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
1941     float* vertical_coefficients = stbir_info.vertical_coefficients;
1942     int channels = stbir_info.channels;
1943     int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
1944     float* horizontal_buffer = stbir_info.horizontal_buffer;
1945     int coefficient_width = stbir_info.vertical_coefficient_width;
1946     int contributor = n + stbir_info.vertical_filter_pixel_margin;
1947 
1948     float* ring_buffer = stbir_info.ring_buffer;
1949     int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
1950     int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
1951     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
1952     int n0,n1;
1953 
1954     n0 = vertical_contributors[contributor].n0;
1955     n1 = vertical_contributors[contributor].n1;
1956 
1957     assert(!stbir__use_height_upsampling(stbir_info));
1958 
1959     for (k = n0; k <= n1; k++)
1960     {
1961         int coefficient_index = k - n0;
1962         int coefficient_group = coefficient_width * contributor;
1963         float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1964 
1965         float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1966 
1967         switch (channels) {
1968             case 1:
1969                 for (x = 0; x < output_w; x++)
1970                 {
1971                     int in_pixel_index = x * 1;
1972                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
1973                 }
1974                 break;
1975             case 2:
1976                 for (x = 0; x < output_w; x++)
1977                 {
1978                     int in_pixel_index = x * 2;
1979                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
1980                     ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
1981                 }
1982                 break;
1983             case 3:
1984                 for (x = 0; x < output_w; x++)
1985                 {
1986                     int in_pixel_index = x * 3;
1987                     ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
1988                     ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
1989                     ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
1990                 }
1991                 break;
1992             case 4:
1993 
1994                 __m128 vCoefficients = _mm_set1_ps(coefficient);
1995 
1996                 for (x = 0; x < output_w; x++)
1997                 {
1998                     int in_pixel_index = x * 4;
1999                     __m128 A = _mm_loadu_ps(&horizontal_buffer[in_pixel_index]);
2000                     __m128 B = _mm_loadu_ps(&ring_buffer_entry[in_pixel_index]);
2001                     _mm_storeu_ps( &ring_buffer_entry[in_pixel_index], B + A * vCoefficients);
2002                 }
2003                 break;
2004             default:
2005                 for (x = 0; x < output_w; x++)
2006                 {
2007                     int in_pixel_index = x * channels;
2008 
2009                     int c;
2010                     for (c = 0; c < channels; c++)
2011                         ring_buffer_entry[in_pixel_index + c] += horizontal_buffer[in_pixel_index + c] * coefficient;
2012                 }
2013                 break;
2014         }
2015     }
2016 }
2017 
2018 static void stbir__buffer_loop_upsample(stbir__info* stbir_info)
2019 {
2020     int y;
2021     float scale_ratio = stbir_info.vertical_scale;
2022     float out_scanlines_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(1/scale_ratio) * scale_ratio;
2023 
2024     assert(stbir__use_height_upsampling(stbir_info));
2025 
2026     for (y = 0; y < stbir_info.output_h; y++)
2027     {
2028         float in_center_of_out = 0; // Center of the current out scanline in the in scanline space
2029         int in_first_scanline = 0, in_last_scanline = 0;
2030 
2031         stbir__calculate_sample_range_upsample(y, out_scanlines_radius, scale_ratio, stbir_info.vertical_shift, &in_first_scanline, &in_last_scanline, &in_center_of_out);
2032 
2033         assert(in_last_scanline - in_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2034 
2035         if (stbir_info.ring_buffer_begin_index >= 0)
2036         {
2037             // Get rid of whatever we don't need anymore.
2038             while (in_first_scanline > stbir_info.ring_buffer_first_scanline)
2039             {
2040                 if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2041                 {
2042                     // We just popped the last scanline off the ring buffer.
2043                     // Reset it to the empty state.
2044                     stbir_info.ring_buffer_begin_index = -1;
2045                     stbir_info.ring_buffer_first_scanline = 0;
2046                     stbir_info.ring_buffer_last_scanline = 0;
2047                     break;
2048                 }
2049                 else
2050                 {
2051                     stbir_info.ring_buffer_first_scanline++;
2052                     stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2053                 }
2054             }
2055         }
2056 
2057         // Load in new ones.
2058         if (stbir_info.ring_buffer_begin_index < 0)
2059             stbir__decode_and_resample_upsample(stbir_info, in_first_scanline);
2060 
2061         while (in_last_scanline > stbir_info.ring_buffer_last_scanline)
2062             stbir__decode_and_resample_upsample(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2063 
2064         // Now all buffers should be ready to write a row of vertical sampling.
2065         stbir__resample_vertical_upsample(stbir_info, y);
2066     }
2067 }
2068 
2069 static void stbir__empty_ring_buffer(stbir__info* stbir_info, int first_necessary_scanline)
2070 {
2071     int output_stride_bytes = stbir_info.output_stride_bytes;
2072     int channels = stbir_info.channels;
2073     int alpha_channel = stbir_info.alpha_channel;
2074     int type = stbir_info.type;
2075     int colorspace = stbir_info.colorspace;
2076     int output_w = stbir_info.output_w;
2077     void* output_data = stbir_info.output_data;
2078     int decode = STBIR__DECODE(type, colorspace);
2079 
2080     float* ring_buffer = stbir_info.ring_buffer;
2081     int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2082 
2083     if (stbir_info.ring_buffer_begin_index >= 0)
2084     {
2085         // Get rid of whatever we don't need anymore.
2086         while (first_necessary_scanline > stbir_info.ring_buffer_first_scanline)
2087         {
2088             if (stbir_info.ring_buffer_first_scanline >= 0 && stbir_info.ring_buffer_first_scanline < stbir_info.output_h)
2089             {
2090                 int output_row_start = stbir_info.ring_buffer_first_scanline * output_stride_bytes;
2091                 float* ring_buffer_entry = stbir__get_ring_buffer_entry(ring_buffer, stbir_info.ring_buffer_begin_index, ring_buffer_length);
2092                 stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, ring_buffer_entry, channels, alpha_channel, decode);
2093             }
2094 
2095             if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2096             {
2097                 // We just popped the last scanline off the ring buffer.
2098                 // Reset it to the empty state.
2099                 stbir_info.ring_buffer_begin_index = -1;
2100                 stbir_info.ring_buffer_first_scanline = 0;
2101                 stbir_info.ring_buffer_last_scanline = 0;
2102                 break;
2103             }
2104             else
2105             {
2106                 stbir_info.ring_buffer_first_scanline++;
2107                 stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2108             }
2109         }
2110     }
2111 }
2112 
2113 static void stbir__buffer_loop_downsample(stbir__info* stbir_info)
2114 {
2115     int y;
2116     float scale_ratio = stbir_info.vertical_scale;
2117     int output_h = stbir_info.output_h;
2118     float in_pixels_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(scale_ratio) / scale_ratio;
2119     int pixel_margin = stbir_info.vertical_filter_pixel_margin;
2120     int max_y = stbir_info.input_h + pixel_margin;
2121 
2122     assert(!stbir__use_height_upsampling(stbir_info));
2123 
2124     for (y = -pixel_margin; y < max_y; y++)
2125     {
2126         float out_center_of_in; // Center of the current out scanline in the in scanline space
2127         int out_first_scanline, out_last_scanline;
2128 
2129         stbir__calculate_sample_range_downsample(y, in_pixels_radius, scale_ratio, stbir_info.vertical_shift, &out_first_scanline, &out_last_scanline, &out_center_of_in);
2130 
2131         assert(out_last_scanline - out_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2132 
2133         if (out_last_scanline < 0 || out_first_scanline >= output_h)
2134             continue;
2135 
2136         stbir__empty_ring_buffer(stbir_info, out_first_scanline);
2137 
2138         stbir__decode_and_resample_downsample(stbir_info, y);
2139 
2140         // Load in new ones.
2141         if (stbir_info.ring_buffer_begin_index < 0)
2142             stbir__add_empty_ring_buffer_entry(stbir_info, out_first_scanline);
2143 
2144         while (out_last_scanline > stbir_info.ring_buffer_last_scanline)
2145             stbir__add_empty_ring_buffer_entry(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2146 
2147         // Now the horizontal buffer is ready to write to all ring buffer rows.
2148         stbir__resample_vertical_downsample(stbir_info, y);
2149     }
2150 
2151     stbir__empty_ring_buffer(stbir_info, stbir_info.output_h);
2152 }
2153 
2154 static void stbir__setup(stbir__info *info, int input_w, int input_h, int output_w, int output_h, int channels)
2155 {
2156     info.input_w = input_w;
2157     info.input_h = input_h;
2158     info.output_w = output_w;
2159     info.output_h = output_h;
2160     info.channels = channels;
2161 }
2162 
2163 static void stbir__calculate_transform(stbir__info *info, float s0, float t0, float s1, float t1, float *transform)
2164 {
2165     info.s0 = s0;
2166     info.t0 = t0;
2167     info.s1 = s1;
2168     info.t1 = t1;
2169 
2170     if (transform)
2171     {
2172         info.horizontal_scale = transform[0];
2173         info.vertical_scale   = transform[1];
2174         info.horizontal_shift = transform[2];
2175         info.vertical_shift   = transform[3];
2176     }
2177     else
2178     {
2179         info.horizontal_scale = (cast(float)info.output_w / info.input_w) / (s1 - s0);
2180         info.vertical_scale = (cast(float)info.output_h / info.input_h) / (t1 - t0);
2181 
2182         info.horizontal_shift = s0 * info.output_w / (s1 - s0);
2183         info.vertical_shift = t0 * info.output_h / (t1 - t0);
2184     }
2185 }
2186 
2187 static void stbir__choose_filter(stbir__info *info, stbir_filter h_filter, stbir_filter v_filter)
2188 {
2189     if (h_filter == 0)
2190         h_filter = stbir__use_upsampling(info.horizontal_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2191     if (v_filter == 0)
2192         v_filter = stbir__use_upsampling(info.vertical_scale)   ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2193     info.horizontal_filter = h_filter;
2194     info.vertical_filter = v_filter;
2195 }
2196 
2197 static uint stbir__calculate_memory(stbir__info *info)
2198 {
2199     int pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2200     int filter_height = stbir__get_filter_pixel_width(info.vertical_filter, info.vertical_scale);
2201 
2202     info.horizontal_num_contributors = stbir__get_contributors(info.horizontal_scale, info.horizontal_filter, info.input_w, info.output_w);
2203     info.vertical_num_contributors   = stbir__get_contributors(info.vertical_scale  , info.vertical_filter  , info.input_h, info.output_h);
2204 
2205     // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
2206     info.ring_buffer_num_entries = filter_height + 1;
2207 
2208     info.horizontal_contributors_size = info.horizontal_num_contributors                  * cast(int)(stbir__contributors.sizeof);
2209     info.horizontal_coefficients_size = stbir__get_total_horizontal_coefficients(info)    * cast(int)(float.sizeof);
2210     info.vertical_contributors_size   = info.vertical_num_contributors                    * cast(int)(stbir__contributors.sizeof);
2211     info.vertical_coefficients_size   = stbir__get_total_vertical_coefficients(info)      * cast(int)(float.sizeof);
2212     info.decode_buffer_size           = (info.input_w + pixel_margin * 2) * info.channels * cast(int)(float.sizeof);
2213     info.horizontal_buffer_size       = info.output_w * info.channels                     * cast(int)(float.sizeof);
2214     info.ring_buffer_size             = info.output_w * info.channels                     * info.ring_buffer_num_entries * cast(int)(float.sizeof);
2215     info.encode_buffer_size           = info.output_w * info.channels                     * cast(int)(float.sizeof);
2216 
2217     assert(info.horizontal_filter != 0);
2218     assert(info.horizontal_filter < stbir__filter_info_table.length); // this now happens too late
2219     assert(info.vertical_filter != 0);
2220     assert(info.vertical_filter < stbir__filter_info_table.length); // this now happens too late
2221 
2222     if (stbir__use_height_upsampling(info))
2223         // The horizontal buffer is for when we're downsampling the height and we
2224         // can't output the result of sampling the decode buffer directly into the
2225         // ring buffers.
2226         info.horizontal_buffer_size = 0;
2227     else
2228         // The encode buffer is to retain precision in the height upsampling method
2229         // and isn't used when height downsampling.
2230         info.encode_buffer_size = 0;
2231 
2232     return info.horizontal_contributors_size + info.horizontal_coefficients_size
2233         + info.vertical_contributors_size + info.vertical_coefficients_size
2234         + info.decode_buffer_size + info.horizontal_buffer_size
2235         + info.ring_buffer_size + info.encode_buffer_size;
2236 }
2237 
2238 static int stbir__resize_allocated(stbir__info *info,
2239     const void* input_data, int input_stride_in_bytes,
2240     void* output_data, int output_stride_in_bytes,
2241     int alpha_channel, uint flags, stbir_datatype type,
2242     stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace,
2243     void* tempmem, size_t tempmem_size_in_bytes)
2244 {
2245     size_t memory_required = stbir__calculate_memory(info);
2246 
2247     int width_stride_input = input_stride_in_bytes ? input_stride_in_bytes : info.channels * info.input_w * stbir__type_size[type];
2248     int width_stride_output = output_stride_in_bytes ? output_stride_in_bytes : info.channels * info.output_w * stbir__type_size[type];
2249 
2250     assert(info.channels >= 0);
2251     assert(info.channels <= STBIR_MAX_CHANNELS);
2252 
2253     if (info.channels < 0 || info.channels > STBIR_MAX_CHANNELS)
2254         return 0;
2255 
2256     assert(info.horizontal_filter < stbir__filter_info_table.length);
2257     assert(info.vertical_filter < stbir__filter_info_table.length);
2258 
2259     if (info.horizontal_filter >= stbir__filter_info_table.length)
2260         return 0;
2261     if (info.vertical_filter >= stbir__filter_info_table.length)
2262         return 0;
2263 
2264     if (alpha_channel < 0)
2265         flags |= STBIR_FLAG_ALPHA_USES_COLORSPACE | STBIR_FLAG_ALPHA_PREMULTIPLIED;
2266 
2267     if (!(flags&STBIR_FLAG_ALPHA_USES_COLORSPACE) || !(flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)) {
2268         assert(alpha_channel >= 0 && alpha_channel < info.channels);
2269     }
2270 
2271     if (alpha_channel >= info.channels)
2272         return 0;
2273 
2274     assert(tempmem);
2275 
2276     if (!tempmem)
2277         return 0;
2278 
2279     assert(tempmem_size_in_bytes >= memory_required);
2280 
2281     if (tempmem_size_in_bytes < memory_required)
2282         return 0;
2283 
2284     memset(tempmem, 0, tempmem_size_in_bytes);
2285 
2286     info.input_data = input_data;
2287     info.input_stride_bytes = width_stride_input;
2288 
2289     info.output_data = output_data;
2290     info.output_stride_bytes = width_stride_output;
2291 
2292     info.alpha_channel = alpha_channel;
2293     info.flags = flags;
2294     info.type = type;
2295     info.edge_horizontal = edge_horizontal;
2296     info.edge_vertical = edge_vertical;
2297     info.colorspace = colorspace;
2298 
2299     info.horizontal_coefficient_width   = stbir__get_coefficient_width  (info.horizontal_filter, info.horizontal_scale);
2300     info.vertical_coefficient_width     = stbir__get_coefficient_width  (info.vertical_filter  , info.vertical_scale  );
2301     info.horizontal_filter_pixel_width  = stbir__get_filter_pixel_width (info.horizontal_filter, info.horizontal_scale);
2302     info.vertical_filter_pixel_width    = stbir__get_filter_pixel_width (info.vertical_filter  , info.vertical_scale  );
2303     info.horizontal_filter_pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2304     info.vertical_filter_pixel_margin   = stbir__get_filter_pixel_margin(info.vertical_filter  , info.vertical_scale  );
2305 
2306     info.ring_buffer_length_bytes = info.output_w * info.channels * cast(int)(float.sizeof);
2307     info.decode_buffer_pixels = info.input_w + info.horizontal_filter_pixel_margin * 2;
2308 
2309     static newtype* STBIR__NEXT_MEMPTR(newtype)(void* current, size_t current_size)
2310     {
2311         return cast(newtype*)( (cast(ubyte*)current) + current_size );
2312     }
2313 
2314     info.horizontal_contributors = cast(stbir__contributors *) tempmem;
2315     info.horizontal_coefficients = STBIR__NEXT_MEMPTR!float              (info.horizontal_contributors, info.horizontal_contributors_size);
2316     info.vertical_contributors   = STBIR__NEXT_MEMPTR!stbir__contributors(info.horizontal_coefficients, info.horizontal_coefficients_size);
2317     info.vertical_coefficients   = STBIR__NEXT_MEMPTR!float              (info.vertical_contributors,   info.vertical_contributors_size);
2318     info.decode_buffer           = STBIR__NEXT_MEMPTR!float              (info.vertical_coefficients,   info.vertical_coefficients_size);
2319 
2320     if (stbir__use_height_upsampling(info))
2321     {
2322         info.horizontal_buffer   = null;
2323         info.ring_buffer         = STBIR__NEXT_MEMPTR!float              (info.decode_buffer,           info.decode_buffer_size);
2324         info.encode_buffer       = STBIR__NEXT_MEMPTR!float              (info.ring_buffer,             info.ring_buffer_size);
2325 
2326         assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.encode_buffer, info.encode_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2327     }
2328     else
2329     {
2330         info.horizontal_buffer   = STBIR__NEXT_MEMPTR!float              (info.decode_buffer,           info.decode_buffer_size);
2331         info.ring_buffer         = STBIR__NEXT_MEMPTR!float              (info.horizontal_buffer,       info.horizontal_buffer_size);
2332         info.encode_buffer = null;
2333 
2334         assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.ring_buffer, info.ring_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2335     }
2336 
2337     // This signals that the ring buffer is empty
2338     info.ring_buffer_begin_index = -1;
2339 
2340     stbir__calculate_filters(info.horizontal_contributors, info.horizontal_coefficients, info.horizontal_filter, info.horizontal_scale, info.horizontal_shift, info.input_w, info.output_w);
2341     stbir__calculate_filters(info.vertical_contributors, info.vertical_coefficients, info.vertical_filter, info.vertical_scale, info.vertical_shift, info.input_h, info.output_h);
2342 
2343     if (stbir__use_height_upsampling(info))
2344         stbir__buffer_loop_upsample(info);
2345     else
2346         stbir__buffer_loop_downsample(info);
2347 
2348     return 1;
2349 }
2350 
2351 
2352 static int stbir__resize_arbitrary(
2353     void *alloc_context,
2354     const void* input_data, int input_w, int input_h, int input_stride_in_bytes,
2355     void* output_data, int output_w, int output_h, int output_stride_in_bytes,
2356     float s0, float t0, float s1, float t1, float *transform,
2357     int channels, int alpha_channel, uint flags, stbir_datatype type,
2358     stbir_filter h_filter, stbir_filter v_filter,
2359     stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace)
2360 {
2361     stbir__info info;
2362     int result;
2363     size_t memory_required;
2364     void* extra_memory;
2365 
2366     stbir__setup(&info, input_w, input_h, output_w, output_h, channels);
2367     stbir__calculate_transform(&info, s0,t0,s1,t1,transform);
2368     stbir__choose_filter(&info, h_filter, v_filter);
2369     memory_required = stbir__calculate_memory(&info);
2370     extra_memory = STBIR_MALLOC(memory_required, alloc_context);
2371 
2372     if (!extra_memory)
2373         return 0;
2374 
2375     result = stbir__resize_allocated(&info, input_data, input_stride_in_bytes,
2376                                             output_data, output_stride_in_bytes,
2377                                             alpha_channel, flags, type,
2378                                             edge_horizontal, edge_vertical,
2379                                             colorspace, extra_memory, memory_required);
2380 
2381     STBIR_FREE(extra_memory, alloc_context);
2382 
2383     return result;
2384 }
2385 
2386 
2387 
2388 int stbir_resize_uint8_srgb_edgemode(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2389                                                     ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2390                                               int num_channels, int alpha_channel, int flags,
2391                                               stbir_edge edge_wrap_mode)
2392 {
2393     return stbir__resize_arbitrary(null, input_pixels, input_w, input_h, input_stride_in_bytes,
2394         output_pixels, output_w, output_h, output_stride_in_bytes,
2395         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
2396         edge_wrap_mode, edge_wrap_mode, STBIR_COLORSPACE_SRGB);
2397 }
2398 
2399 int stbir_resize_uint8_generic( const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2400                                                ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2401                                          int num_channels, int alpha_channel, int flags,
2402                                          stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2403                                          void *alloc_context)
2404 {
2405     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2406         output_pixels, output_w, output_h, output_stride_in_bytes,
2407         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
2408         edge_wrap_mode, edge_wrap_mode, space);
2409 }
2410 
2411 int stbir_resize_uint16_generic(const ushort *input_pixels  , int input_w , int input_h , int input_stride_in_bytes,
2412                                                ushort *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
2413                                          int num_channels, int alpha_channel, int flags,
2414                                          stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2415                                          void *alloc_context)
2416 {
2417     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2418         output_pixels, output_w, output_h, output_stride_in_bytes,
2419         0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT16, filter, filter,
2420         edge_wrap_mode, edge_wrap_mode, space);
2421 }
2422 
2423 
2424 int stbir_resize(         const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2425                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2426                                    stbir_datatype datatype,
2427                                    int num_channels, int alpha_channel, int flags,
2428                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2429                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2430                                    stbir_colorspace space, void *alloc_context)
2431 {
2432     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2433         output_pixels, output_w, output_h, output_stride_in_bytes,
2434         0,0,1,1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2435         edge_mode_horizontal, edge_mode_vertical, space);
2436 }
2437 
2438 
2439 int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2440                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2441                                    stbir_datatype datatype,
2442                                    int num_channels, int alpha_channel, int flags,
2443                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2444                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2445                                    stbir_colorspace space, void *alloc_context,
2446                                    float x_scale, float y_scale,
2447                                    float x_offset, float y_offset)
2448 {
2449     float[4] transform;
2450     transform[0] = x_scale;
2451     transform[1] = y_scale;
2452     transform[2] = x_offset;
2453     transform[3] = y_offset;
2454     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2455         output_pixels, output_w, output_h, output_stride_in_bytes,
2456         0,0,1,1,transform.ptr,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2457         edge_mode_horizontal, edge_mode_vertical, space);
2458 }
2459 
2460 int stbir_resize_region(  const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2461                                          void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2462                                    stbir_datatype datatype,
2463                                    int num_channels, int alpha_channel, int flags,
2464                                    stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2465                                    stbir_filter filter_horizontal,  stbir_filter filter_vertical,
2466                                    stbir_colorspace space, void *alloc_context,
2467                                    float s0, float t0, float s1, float t1)
2468 {
2469     return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2470         output_pixels, output_w, output_h, output_stride_in_bytes,
2471         s0,t0,s1,t1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2472         edge_mode_horizontal, edge_mode_vertical, space);
2473 }
2474 
2475 /*
2476 ------------------------------------------------------------------------------
2477 This software is available under 2 licenses -- choose whichever you prefer.
2478 ------------------------------------------------------------------------------
2479 ALTERNATIVE A - MIT License
2480 Copyright (c) 2017 Sean Barrett
2481 Permission is hereby granted, free of charge, to any person obtaining a copy of
2482 this software and associated documentation files (the "Software"), to deal in
2483 the Software without restriction, including without limitation the rights to
2484 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
2485 of the Software, and to permit persons to whom the Software is furnished to do
2486 so, subject to the following conditions:
2487 The above copyright notice and this permission notice shall be included in all
2488 copies or substantial portions of the Software.
2489 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2490 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2491 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2492 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2493 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2494 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2495 SOFTWARE.
2496 ------------------------------------------------------------------------------
2497 ALTERNATIVE B - Public Domain (www.unlicense.org)
2498 This is free and unencumbered software released into the public domain.
2499 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
2500 software, either in source code form or as a compiled binary, for any purpose,
2501 commercial or non-commercial, and by any means.
2502 In jurisdictions that recognize copyright laws, the author or authors of this
2503 software dedicate any and all copyright interest in the software to the public
2504 domain. We make this dedication for the benefit of the public at large and to
2505 the detriment of our heirs and successors. We intend this dedication to be an
2506 overt act of relinquishment in perpetuity of all present and future rights to
2507 this software under copyright law.
2508 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2509 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2510 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2511 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
2512 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
2513 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2514 ------------------------------------------------------------------------------
2515 */