1 /* stb_image_resize - v0.96 - public domain image resizing
2 by Jorge L Rodriguez (@VinoBS) - 2014
3 http://github.com/nothings/stb
4
5 Written with emphasis on usability, portability, and efficiency. (No
6 SIMD or threads, so it be easily outperformed by libs that use those.)
7 Only scaling and translation is supported, no rotations or shears.
8 Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation.
9
10 QUICKSTART
11 stbir_resize_uint8( input_pixels , in_w , in_h , 0,
12 output_pixels, out_w, out_h, 0, num_channels)
13
14 stbir_resize_uint8_srgb( input_pixels , in_w , in_h , 0,
15 output_pixels, out_w, out_h, 0,
16 num_channels , alpha_chan , 0)
17 stbir_resize_uint8_srgb_edgemode(
18 input_pixels , in_w , in_h , 0,
19 output_pixels, out_w, out_h, 0,
20 num_channels , alpha_chan , 0, STBIR_EDGE_CLAMP)
21 // WRAP/REFLECT/ZERO
22
23 FULL API
24 See the "header file" section of the source for API documentation.
25
26 ADDITIONAL DOCUMENTATION
27
28 SRGB & FLOATING POINT REPRESENTATION
29 The sRGB functions presume IEEE floating point. If you do not have
30 IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
31 a slower implementation.
32
33 MEMORY ALLOCATION
34 The resize functions here perform a single memory allocation using
35 malloc. To control the memory allocation, before the #include that
36 triggers the implementation, do:
37
38 #define STBIR_MALLOC(size,context) ...
39 #define STBIR_FREE(ptr,context) ...
40
41 Each resize function makes exactly one call to malloc/free, so to use
42 temp memory, store the temp memory in the context and return that.
43
44 DEFAULT FILTERS
45 For functions which don't provide explicit control over what filters
46 to use, you can change the compile-time defaults with
47
48 #define STBIR_DEFAULT_FILTER_UPSAMPLE STBIR_FILTER_something
49 #define STBIR_DEFAULT_FILTER_DOWNSAMPLE STBIR_FILTER_something
50
51 See stbir_filter in the header-file section for the list of filters.
52
53 NEW FILTERS
54 A number of 1D filter kernels are used. For a list of
55 supported filters see the stbir_filter enum. To add a new filter,
56 write a filter function and add it to stbir__filter_info_table.
57
58 MAX CHANNELS
59 If your image has more than 64 channels, define STBIR_MAX_CHANNELS
60 to the max you'll have.
61
62 ALPHA CHANNEL
63 Most of the resizing functions provide the ability to control how
64 the alpha channel of an image is processed. The important things
65 to know about this:
66
67 1. The best mathematically-behaved version of alpha to use is
68 called "premultiplied alpha", in which the other color channels
69 have had the alpha value multiplied in. If you use premultiplied
70 alpha, linear filtering (such as image resampling done by this
71 library, or performed in texture units on GPUs) does the "right
72 thing". While premultiplied alpha is standard in the movie CGI
73 industry, it is still uncommon in the videogame/real-time world.
74
75 If you linearly filter non-premultiplied alpha, strange effects
76 occur. (For example, the 50/50 average of 99% transparent bright green
77 and 1% transparent black produces 50% transparent dark green when
78 non-premultiplied, whereas premultiplied it produces 50%
79 transparent near-black. The former introduces green energy
80 that doesn't exist in the source image.)
81
82 2. Artists should not edit premultiplied-alpha images; artists
83 want non-premultiplied alpha images. Thus, art tools generally output
84 non-premultiplied alpha images.
85
86 3. You will get best results in most cases by converting images
87 to premultiplied alpha before processing them mathematically.
88
89 4. If you pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED, the
90 resizer does not do anything special for the alpha channel;
91 it is resampled identically to other channels. This produces
92 the correct results for premultiplied-alpha images, but produces
93 less-than-ideal results for non-premultiplied-alpha images.
94
95 5. If you do not pass the flag STBIR_FLAG_ALPHA_PREMULTIPLIED,
96 then the resizer weights the contribution of input pixels
97 based on their alpha values, or, equivalently, it multiplies
98 the alpha value into the color channels, resamples, then divides
99 by the resultant alpha value. Input pixels which have alpha=0 do
100 not contribute at all to output pixels unless _all_ of the input
101 pixels affecting that output pixel have alpha=0, in which case
102 the result for that pixel is the same as it would be without
103 STBIR_FLAG_ALPHA_PREMULTIPLIED. However, this is only true for
104 input images in integer formats. For input images in float format,
105 input pixels with alpha=0 have no effect, and output pixels
106 which have alpha=0 will be 0 in all channels. (For float images,
107 you can manually achieve the same result by adding a tiny epsilon
108 value to the alpha channel of every image, and then subtracting
109 or clamping it at the end.)
110
111 6. You can suppress the behavior described in #5 and make
112 all-0-alpha pixels have 0 in all channels by #defining
113 STBIR_NO_ALPHA_EPSILON.
114
115 7. You can separately control whether the alpha channel is
116 interpreted as linear or affected by the colorspace. By default
117 it is linear; you almost never want to apply the colorspace.
118 (For example, graphics hardware does not apply sRGB conversion
119 to the alpha channel.)
120
121 CONTRIBUTORS
122 Jorge L Rodriguez: Implementation
123 Sean Barrett: API design, optimizations
124 Aras Pranckevicius: bugfix
125 Nathan Reed: warning fixes
126
127 REVISIONS
128 0.97 (2020-02-02) fixed warning
129 0.96 (2019-03-04) fixed warnings
130 0.95 (2017-07-23) fixed warnings
131 0.94 (2017-03-18) fixed warnings
132 0.93 (2017-03-03) fixed bug with certain combinations of heights
133 0.92 (2017-01-02) fix integer overflow on large (>2GB) images
134 0.91 (2016-04-02) fix warnings; fix handling of subpixel regions
135 0.90 (2014-09-17) first released version
136
137 LICENSE
138 See end of file for license information.
139
140 TODO
141 Don't decode all of the image data when only processing a partial tile
142 Don't use full-width decode buffers when only processing a partial tile
143 When processing wide images, break processing into tiles so data fits in L1 cache
144 Installable filters?
145 Resize that respects alpha test coverage
146 (Reference code: FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage:
147 https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp )
148 */
149 /**
150 Resizer ported to D from C. Removed a few features that did'nt make sense in Dplug.
151 Added Ryhor Spivak work on Lanczos filter... also added a few more lanczos kernels.
152 Copyright: (c) Guillaume Piolat (2021)
153 */
154 module dplug.graphics.stb_image_resize;
155
156
157 import core.stdc.stdlib: malloc, free;
158 import core.stdc.string: memset;
159
160 import inteli.smmintrin;
161 import inteli.math;
162
163 import dplug.core.math : fast_fabs, fast_pow, fast_ceil, fast_floor, fast_sin;
164 import dplug.core.vec;
165
166
167 nothrow:
168 @nogc:
169
170
171 //////////////////////////////////////////////////////////////////////////////
172 //
173 // Easy-to-use API:
174 //
175 // * "input pixels" points to an array of image data with 'num_channels' channels (e.g. RGB=3, RGBA=4)
176 // * input_w is input image width (x-axis), input_h is input image height (y-axis)
177 // * stride is the offset between successive rows of image data in memory, in bytes. you can
178 // specify 0 to mean packed continuously in memory
179 // * alpha channel is treated identically to other channels.
180 // * colorspace is linear or sRGB as specified by function name
181 // * returned result is 1 for success or 0 in case of an error.
182 // #define assert() to trigger an assert on parameter validation errors.
183 // * Memory required grows approximately linearly with input and output size, but with
184 // discontinuities at input_w == output_w and input_h == output_h.
185 // * These functions use a "default" resampling filter defined at compile time. To change the filter,
186 // you can change the compile-time defaults by #defining STBIR_DEFAULT_FILTER_UPSAMPLE
187 // and STBIR_DEFAULT_FILTER_DOWNSAMPLE, or you can use the medium-complexity API.
188
189 int stbir_resize_uint8(const(ubyte)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
190 ubyte* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
191 int num_channels, int filter, void *alloc_context)
192 {
193 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
194 output_pixels, output_w, output_h, output_stride_in_bytes,
195 0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT8, filter, filter,
196 STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
197 }
198
199 int stbir_resize_uint16(const(ushort)* input_pixels , int input_w , int input_h , int input_stride_in_bytes,
200 ushort* output_pixels, int output_w, int output_h, int output_stride_in_bytes,
201 int num_channels, int filter, void *alloc_context)
202 {
203 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
204 output_pixels, output_w, output_h, output_stride_in_bytes,
205 0,0,1,1,null,num_channels,-1,0, STBIR_TYPE_UINT16, filter, filter,
206 STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_LINEAR);
207 }
208
209
210 // The following functions interpret image data as gamma-corrected sRGB.
211 // Specify STBIR_ALPHA_CHANNEL_NONE if you have no alpha channel,
212 // or otherwise provide the index of the alpha channel. Flags value
213 // of 0 will probably do the right thing if you're not sure what
214 // the flags mean.
215
216 enum STBIR_ALPHA_CHANNEL_NONE = -1;
217
218 // Set this flag if your texture has premultiplied alpha. Otherwise, stbir will
219 // use alpha-weighted resampling (effectively premultiplying, resampling,
220 // then unpremultiplying).
221 enum STBIR_FLAG_ALPHA_PREMULTIPLIED = (1 << 0);
222
223 // The specified alpha channel should be handled as gamma-corrected value even
224 // when doing sRGB operations.
225 enum STBIR_FLAG_ALPHA_USES_COLORSPACE = (1 << 1);
226
227 int stbir_resize_uint8_srgb(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
228 ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
229 int num_channels, int alpha_channel, int flags, void* alloc_context, int filter)
230 {
231 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
232 output_pixels, output_w, output_h, output_stride_in_bytes,
233 0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
234 STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP, STBIR_COLORSPACE_SRGB);
235 }
236
237 alias stbir_edge = int;
238 enum : stbir_edge
239 {
240 STBIR_EDGE_CLAMP = 1,
241 STBIR_EDGE_REFLECT = 2,
242 STBIR_EDGE_WRAP = 3,
243 STBIR_EDGE_ZERO = 4,
244 }
245
246
247 //////////////////////////////////////////////////////////////////////////////
248 //
249 // Medium-complexity API
250 //
251 // This extends the easy-to-use API as follows:
252 //
253 // * Alpha-channel can be processed separately
254 // * If alpha_channel is not STBIR_ALPHA_CHANNEL_NONE
255 // * Alpha channel will not be gamma corrected (unless flags&STBIR_FLAG_GAMMA_CORRECT)
256 // * Filters will be weighted by alpha channel (unless flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)
257 // * Filter can be selected explicitly
258 // * uint16 image type
259 // * sRGB colorspace available for all types
260 // * context parameter for passing to STBIR_MALLOC
261
262 alias stbir_filter = int;
263 enum : stbir_filter
264 {
265 STBIR_FILTER_DEFAULT = 0, // use same filter type that easy-to-use API chooses
266 STBIR_FILTER_BOX = 1, // A trapezoid w/1-pixel wide ramps, same result as box for integer scale ratios
267 STBIR_FILTER_TRIANGLE = 2, // On upsampling, produces same results as bilinear texture filtering
268 STBIR_FILTER_CUBICBSPLINE = 3, // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0), gaussian-esque
269 STBIR_FILTER_CATMULLROM = 4, // An interpolating cubic spline
270 STBIR_FILTER_MITCHELL = 5, // Mitchell-Netrevalli filter with B=1/3, C=1/3
271 STBIR_FILTER_LANCZOS2 = 6, // Lanczos 2
272 STBIR_FILTER_LANCZOS2_5 = 7, // Lanczos 2.5
273 STBIR_FILTER_LANCZOS3 = 8, // Lanczos 3
274 STBIR_FILTER_LANCZOS4 = 9, // Lanczos 4
275 STBIR_FILTER_MK_2013 = 10, // Magic Kernel, without sharpening
276 STBIR_FILTER_MKS_2013_86 = 11, // Magic Kernel + Sharp 2013, but with only 86% sharpening (Dplug Issue #729)
277 STBIR_FILTER_MKS_2013 = 12, // Magic Kernel + Sharp 2013 (the one recommended by John Costella in 2013)
278 STBIR_FILTER_MKS_2021 = 13, // Magic Kernel + Sharp 2021 (the one recommended to us by John Costella in 2022)
279
280 // To be continued, as John Costella has other kernels...
281 }
282
283 alias stbir_colorspace = int;
284 enum : stbir_colorspace
285 {
286 STBIR_COLORSPACE_LINEAR,
287 STBIR_COLORSPACE_SRGB,
288
289 STBIR_MAX_COLORSPACES,
290 }
291
292
293 //////////////////////////////////////////////////////////////////////////////
294 //
295 // Full-complexity API
296 //
297 // This extends the medium API as follows:
298 //
299 // * uint32 image type
300 // * not typesafe
301 // * separate filter types for each axis
302 // * separate edge modes for each axis
303 // * can specify scale explicitly for subpixel correctness
304 // * can specify image source tile using texture coordinates
305
306 alias stbir_datatype = int;
307 enum : stbir_datatype
308 {
309 STBIR_TYPE_UINT8 ,
310 STBIR_TYPE_UINT16,
311 STBIR_TYPE_UINT32,
312 STBIR_TYPE_FLOAT ,
313
314 STBIR_MAX_TYPES
315 }
316
317 // (s0, t0) & (s1, t1) are the top-left and bottom right corner (uv addressing style: [0, 1]x[0, 1]) of a region of the input image to use.
318
319 struct STBAllocatorContext
320 {
321 nothrow:
322 @nogc:
323 void* buf = null;
324 size_t length = 0;
325
326 @disable this(this);
327
328 ~this()
329 {
330 alignedFree(buf, 1);
331 }
332
333 void* reallocDiscard(size_t numBytes)
334 {
335 if (length < numBytes)
336 {
337 buf = alignedReallocDiscard(buf, numBytes, 1);
338 length = numBytes;
339 }
340 return buf;
341 }
342 }
343
344 void* STBIR_MALLOC(size_t size, void* context)
345 {
346 assert(context !is null);
347 STBAllocatorContext* alloc = cast(STBAllocatorContext*)context;
348 return alloc.reallocDiscard(size);
349 }
350
351 void STBIR_FREE(void* p, void* context)
352 {
353 assert(context !is null);
354 // will be freed when resizer is freed, because it's relatively small and shared.
355 }
356
357 enum STBIR_DEFAULT_FILTER_UPSAMPLE = STBIR_FILTER_CATMULLROM;
358
359 enum STBIR_DEFAULT_FILTER_DOWNSAMPLE = STBIR_FILTER_MITCHELL;
360
361 enum STBIR_MAX_CHANNELS = 4;
362
363 // This value is added to alpha just before premultiplication to avoid
364 // zeroing out color values. It is equivalent to 2^-80. If you don't want
365 // that behavior (it may interfere if you have floating point images with
366 // very small alpha values) then you can define STBIR_NO_ALPHA_EPSILON to
367 // disable it.
368 enum float STBIR_ALPHA_EPSILON = (cast(float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20));
369
370 // must match stbir_datatype
371 static immutable ubyte[4] stbir__type_size =
372 [
373 1, // STBIR_TYPE_UINT8
374 2, // STBIR_TYPE_UINT16
375 4, // STBIR_TYPE_UINT32
376 4, // STBIR_TYPE_FLOAT
377 ];
378
379 // Kernel function centered at 0
380 alias stbir__kernel_fn = float function(float x, float scale);
381 alias stbir__support_fn = float function(float scale);
382
383 struct stbir__filter_info
384 {
385 stbir__kernel_fn kernel;
386 stbir__support_fn support;
387 }
388
389 // When upsampling, the contributors are which source pixels contribute.
390 // When downsampling, the contributors are which destination pixels are contributed to.
391 struct stbir__contributors
392 {
393 int n0; // First contributing pixel
394 int n1; // Last contributing pixel
395 }
396
397 struct stbir__info
398 {
399 const(void)* input_data;
400 int input_w;
401 int input_h;
402 int input_stride_bytes;
403
404 void* output_data;
405 int output_w;
406 int output_h;
407 int output_stride_bytes;
408
409 float s0, t0, s1, t1;
410
411 float horizontal_shift; // Units: output pixels
412 float vertical_shift; // Units: output pixels
413 float horizontal_scale;
414 float vertical_scale;
415
416 int channels;
417 int alpha_channel;
418 uint flags;
419 stbir_datatype type;
420 stbir_filter horizontal_filter;
421 stbir_filter vertical_filter;
422 stbir_edge edge_horizontal;
423 stbir_edge edge_vertical;
424 stbir_colorspace colorspace;
425
426 stbir__contributors* horizontal_contributors;
427 float* horizontal_coefficients;
428
429 stbir__contributors* vertical_contributors;
430 float* vertical_coefficients;
431
432 int decode_buffer_pixels;
433 float* decode_buffer;
434
435 float* horizontal_buffer;
436
437 // cache these because ceil/floor are inexplicably showing up in profile
438 int horizontal_coefficient_width;
439 int vertical_coefficient_width;
440 int horizontal_filter_pixel_width;
441 int vertical_filter_pixel_width;
442 int horizontal_filter_pixel_margin;
443 int vertical_filter_pixel_margin;
444 int horizontal_num_contributors;
445 int vertical_num_contributors;
446
447 int ring_buffer_length_bytes; // The length of an individual entry in the ring buffer. The total number of ring buffers is stbir__get_filter_pixel_width(filter)
448 int ring_buffer_num_entries; // Total number of entries in the ring buffer.
449 int ring_buffer_first_scanline;
450 int ring_buffer_last_scanline;
451 int ring_buffer_begin_index; // first_scanline is at this index in the ring buffer
452 float* ring_buffer;
453
454 float* encode_buffer; // A temporary buffer to store floats so we don't lose precision while we do multiply-adds.
455
456 int horizontal_contributors_size;
457 int horizontal_coefficients_size;
458 int vertical_contributors_size;
459 int vertical_coefficients_size;
460 int decode_buffer_size;
461 int horizontal_buffer_size;
462 int ring_buffer_size;
463 int encode_buffer_size;
464 }
465
466
467 static immutable float stbir__max_uint8_as_float = 255.0f;
468 static immutable float stbir__max_uint16_as_float = 65535.0f;
469 static immutable double stbir__max_uint32_as_float = 4294967295.0;
470
471
472 int stbir__min(int a, int b)
473 {
474 return a < b ? a : b;
475 }
476
477 float stbir__saturate(float x)
478 {
479 if (x < 0)
480 return 0;
481
482 if (x > 1)
483 return 1;
484
485 return x;
486 }
487
488 static immutable float[256] stbir__srgb_uchar_to_linear_float =
489 [
490 0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f, 0.002125f, 0.002428f, 0.002732f, 0.003035f,
491 0.003347f, 0.003677f, 0.004025f, 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f, 0.007499f,
492 0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f, 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f,
493 0.015209f, 0.015996f, 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f, 0.023153f, 0.024158f,
494 0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f, 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f,
495 0.038204f, 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f, 0.049707f, 0.051269f, 0.052861f,
496 0.054480f, 0.056128f, 0.057805f, 0.059511f, 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
497 0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f, 0.088656f, 0.090842f, 0.093059f, 0.095307f,
498 0.097587f, 0.099899f, 0.102242f, 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f, 0.122139f,
499 0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f, 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f,
500 0.155926f, 0.158961f, 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f, 0.184475f, 0.187821f,
501 0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f, 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f,
502 0.230740f, 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f, 0.262251f, 0.266356f, 0.270498f,
503 0.274677f, 0.278894f, 0.283149f, 0.287441f, 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
504 0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f, 0.356400f, 0.361307f, 0.366253f, 0.371238f,
505 0.376262f, 0.381326f, 0.386430f, 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f, 0.428691f,
506 0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f, 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f,
507 0.496933f, 0.502887f, 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f, 0.552011f, 0.558340f,
508 0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f, 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f,
509 0.637597f, 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f, 0.693872f, 0.701102f, 0.708376f,
510 0.715694f, 0.723055f, 0.730461f, 0.737911f, 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
511 0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f, 0.854993f, 0.863157f, 0.871367f, 0.879622f,
512 0.887923f, 0.896269f, 0.904661f, 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f, 0.973445f,
513 0.982251f, 0.991102f, 1.0f
514 ];
515
516 float stbir__srgb_to_linear(float f)
517 {
518 if (f <= 0.04045f)
519 return f / 12.92f;
520 else
521 return cast(float)fast_pow((f + 0.055f) / 1.055f, 2.4f);
522 }
523
524 float stbir__linear_to_srgb(float f)
525 {
526 if (f <= 0.0031308f)
527 return f * 12.92f;
528 else
529 return 1.055f * _mm_pow_ss(f, 0.4166666666f) - 0.055f;
530 }
531 /*
532 __m128 stbir__linear_to_srgb(__m128 f)
533 {
534 __m128 below = f * _mm_set1_ps(12.92f);
535 __m128 exponentiated = _mm_set1_ps(1.055f) * _mm_pow_ps(f, 0.4166666666f) - _mm_set1_ps(0.055f);
536 __m128 mask =_mm_cmplt_ps(f, _mm_set1_ps(0.0031308f));
537 __m128i result = (cast(__m128i)below & cast(__m128i)mask) | (cast(__m128i)exponentiated & ~cast(__m128i)mask);
538 return cast(__m128)result;
539 }*/
540
541 union stbir__FP32
542 {
543 uint u;
544 float f;
545 }
546
547 static immutable uint[104] fp32_to_srgb8_tab4 =
548 [
549 0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d, 0x009a000d, 0x00a1000d,
550 0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a, 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a,
551 0x010e0033, 0x01280033, 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
552 0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067, 0x03110067, 0x03440067,
553 0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce, 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5,
554 0x06970158, 0x07420142, 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
555 0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e, 0x0fbc0150, 0x10630143,
556 0x11070264, 0x1238023e, 0x1357021d, 0x14660201, 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af,
557 0x18fe0331, 0x1a9602fe, 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
558 0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341, 0x2ebe031f, 0x304d0300,
559 0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5, 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401,
560 0x44c20798, 0x488e071e, 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
561 0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd, 0x787d076c, 0x7c330723,
562 ];
563
564 ubyte stbir__linear_to_srgb_uchar(float in_)
565 {
566 static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
567 static const stbir__FP32 minval = { (127-13) << 23 };
568 uint tab,bias,scale,t;
569 stbir__FP32 f;
570
571 // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
572 // The tests are carefully written so that NaNs map to 0, same as in the reference
573 // implementation.
574 if (!(in_ > minval.f)) // written this way to trap NaNs
575 in_ = minval.f;
576 if (in_ > almostone.f)
577 in_ = almostone.f;
578
579 // Do the table lookup and unpack bias, scale
580 f.f = in_;
581 tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
582 bias = (tab >> 16) << 9;
583 scale = tab & 0xffff;
584
585 // Grab next-highest mantissa bits and perform linear interpolation
586 t = (f.u >> 12) & 0xff;
587 return cast(ubyte) ((bias + scale*t) >> 16);
588 }
589
590 // same but 4 float at once
591 __m128i stbir__linear_to_srgb_uchar(__m128 in_)
592 {
593 static const stbir__FP32 almostone = { 0x3f7fffff }; // 1-eps
594 static const stbir__FP32 minval = { (127-13) << 23 };
595 in_ = _mm_max_ps(in_, _mm_set1_ps(minval.f));
596 in_ = _mm_min_ps(in_, _mm_set1_ps(almostone.f));
597
598 __m128i f = cast(__m128i) in_;
599 __m128i tblIndex = _mm_srli_epi32(f - _mm_set1_epi32(minval.u), 20);
600
601 __m128i tab = _mm_setr_epi32(fp32_to_srgb8_tab4[ tblIndex.array[0] ],
602 fp32_to_srgb8_tab4[ tblIndex.array[1] ],
603 fp32_to_srgb8_tab4[ tblIndex.array[2] ],
604 fp32_to_srgb8_tab4[ tblIndex.array[3] ]);
605 __m128i bias = _mm_slli_epi32(_mm_srli_epi32(tab, 16), 9);
606 __m128i scale = _mm_and_si128(tab, _mm_set1_epi32(0xffff));
607
608 __m128i t = _mm_srli_epi32(f, 12) & _mm_set1_epi32(0xff);
609 __m128i r = _mm_srli_epi32(bias + _mm_mullo_epi32(scale, t), 16);
610 __m128i zero = _mm_setzero_si128();
611 r = _mm_packs_epi32(r, zero);
612 r = _mm_packus_epi16(r, zero);
613 return r;
614 }
615
616 float stbir__filter_trapezoid(float x, float scale)
617 {
618 float halfscale = scale / 2;
619 float t = 0.5f + halfscale;
620 assert(scale <= 1);
621
622 x = cast(float)fast_fabs(x);
623
624 if (x >= t)
625 return 0;
626 else
627 {
628 float r = 0.5f - halfscale;
629 if (x <= r)
630 return 1;
631 else
632 return (t - x) / scale;
633 }
634 }
635
636 float stbir__support_trapezoid(float scale)
637 {
638 assert(scale <= 1);
639 return 0.5f + scale / 2;
640 }
641
642 float stbir__filter_triangle(float x, float s)
643 {
644 x = cast(float)fast_fabs(x);
645
646 if (x <= 1.0f)
647 return 1 - x;
648 else
649 return 0;
650 }
651
652 float stbir__filter_cubic(float x, float s)
653 {
654 x = cast(float)fast_fabs(x);
655
656 if (x < 1.0f)
657 return (4 + x*x*(3*x - 6))/6;
658 else if (x < 2.0f)
659 return (8 + x*(-12 + x*(6 - x)))/6;
660
661 return (0.0f);
662 }
663
664 float stbir__filter_catmullrom(float x, float s)
665 {
666 x = cast(float)fast_fabs(x);
667
668 if (x < 1.0f)
669 return 1 - x*x*(2.5f - 1.5f*x);
670 else if (x < 2.0f)
671 return 2 - x*(4 + x*(0.5f*x - 2.5f));
672
673 return (0.0f);
674 }
675
676 float stbir__filter_mitchell(float x, float s)
677 {
678 x = cast(float)fast_fabs(x);
679
680 if (x < 1.0f)
681 return (16 + x*x*(21 * x - 36))/18;
682 else if (x < 2.0f)
683 return (32 + x*(-60 + x*(36 - 7*x)))/18;
684
685 return (0.0f);
686 }
687
688 float stbir__filter_lanczos(float A)(float x, float s)
689 {
690 x = cast(float)fast_fabs(x);
691
692 if (x <= float.min_normal)
693 return 1.0f;
694
695 if (x < A)
696 {
697 float pix = 3.14159265358979323846f*x;
698 return A*fast_sin(pix)*fast_sin(pix/A)/(pix*pix);
699 }
700
701 return 0.0f;
702 }
703
704 float stbir__filter_mk2013(float x, float s) nothrow @nogc
705 {
706 x = fast_fabs(x);
707 if (x < 0.5)
708 return 0.75 - x * x;
709
710 if (x < 1.5)
711 return 0.5 * (x - 1.5)*(x - 1.5);
712
713 return 0.0f;
714 }
715
716 float stbir__filter_mks2013_hs(float x, float s) nothrow @nogc
717 {
718 // Perhaps possible to do better with "MKS 2021".
719 return 0.14f * stbir__filter_mk2013(x, s)
720 + 0.86f * stbir__filter_mks2013(x, s);
721 }
722
723 float stbir__filter_mks2013(float x, float s) nothrow @nogc
724 {
725 x = fast_fabs(x);
726
727 if (x <= float.min_normal)
728 return 17.0f / 16.0f;
729
730 if (x < 0.5)
731 return 17.0 / 16.0 - 7.0 * x * x / 4.0;
732
733 if (x < 1.5)
734 {
735 double x2 = x * x;
736 return 0.25 * (4 * x2 - 11.0 * x + 7.0);
737 }
738
739 if (x < 2.5)
740 {
741 return -0.125 * (x - 5.0 / 2.0)*(x - 5.0 / 2.0);
742 }
743 return 0.0f;
744 }
745
746 float stbir__filter_mks2021(float x, float s) nothrow @nogc
747 {
748 x = fast_fabs(x);
749 float x2 = x * x;
750
751 if (x < 0.5)
752 return 577.0f / 576.0f - (239.0f / 144.0f) * x2;
753
754 if (x < 1.5)
755 return (140 * x2 - 379 * x + 239) / 144.0f;
756
757 if (x < 2.5)
758 return -(24 * x2 - 113 * x + 130) / 144.0f;
759
760 if (x < 3.5)
761 return (4 * x2 - 27 * x + 45) / 144.0f;
762
763 if (x < 4.5)
764 return -(4 * x2 - 36 * x + 81) / 1152.0f;
765
766 return 0.0f;
767 }
768
769 float stbir__support_zero(float s)
770 {
771 return 0;
772 }
773
774 float stbir__support_one(float s)
775 {
776 return 1;
777 }
778
779 float stbir__support_two(float s)
780 {
781 return 2;
782 }
783
784 float stbir__support_three(float s)
785 {
786 return 3;
787 }
788
789 float stbir__support_four(float s)
790 {
791 return 4;
792 }
793
794 float stbir__support_five(float s)
795 {
796 return 5;
797 }
798
799 static immutable stbir__filter_info[14] stbir__filter_info_table =
800 [
801 { null, &stbir__support_zero },
802 { &stbir__filter_trapezoid, &stbir__support_trapezoid },
803 { &stbir__filter_triangle, &stbir__support_one },
804 { &stbir__filter_cubic, &stbir__support_two },
805 { &stbir__filter_catmullrom, &stbir__support_two },
806 { &stbir__filter_mitchell, &stbir__support_two },
807 { &stbir__filter_lanczos!2.0f, &stbir__support_two },
808 { &stbir__filter_lanczos!2.5f, &stbir__support_three },
809 { &stbir__filter_lanczos!3.0f, &stbir__support_three },
810 { &stbir__filter_lanczos!4.0f, &stbir__support_four },
811 { &stbir__filter_mk2013, &stbir__support_three },
812 { &stbir__filter_mks2013_hs, &stbir__support_three },
813 { &stbir__filter_mks2013, &stbir__support_three },
814 { &stbir__filter_mks2021, &stbir__support_five },
815 ];
816
817
818 static int stbir__use_upsampling(float ratio)
819 {
820 return ratio > 1;
821 }
822
823 static int stbir__use_width_upsampling(stbir__info* stbir_info)
824 {
825 return stbir__use_upsampling(stbir_info.horizontal_scale);
826 }
827
828 static int stbir__use_height_upsampling(stbir__info* stbir_info)
829 {
830 return stbir__use_upsampling(stbir_info.vertical_scale);
831 }
832
833 // This is the maximum number of input samples that can affect an output sample
834 // with the given filter
835 static int stbir__get_filter_pixel_width(stbir_filter filter, float scale)
836 {
837 assert(filter != 0);
838 assert(filter < stbir__filter_info_table.length);
839
840 if (stbir__use_upsampling(scale))
841 return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2);
842 else
843 return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2 / scale);
844 }
845
846 // This is how much to expand buffers to account for filters seeking outside
847 // the image boundaries.
848 static int stbir__get_filter_pixel_margin(stbir_filter filter, float scale)
849 {
850 return stbir__get_filter_pixel_width(filter, scale) / 2;
851 }
852
853 static int stbir__get_coefficient_width(stbir_filter filter, float scale)
854 {
855 if (stbir__use_upsampling(scale))
856 return cast(int)fast_ceil(stbir__filter_info_table[filter].support(1 / scale) * 2);
857 else
858 return cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale) * 2);
859 }
860
861 static int stbir__get_contributors(float scale, stbir_filter filter, int input_size, int output_size)
862 {
863 if (stbir__use_upsampling(scale))
864 return output_size;
865 else
866 return (input_size + stbir__get_filter_pixel_margin(filter, scale) * 2);
867 }
868
869 static int stbir__get_total_horizontal_coefficients(stbir__info* info)
870 {
871 return info.horizontal_num_contributors
872 * stbir__get_coefficient_width (info.horizontal_filter, info.horizontal_scale);
873 }
874
875 static int stbir__get_total_vertical_coefficients(stbir__info* info)
876 {
877 return info.vertical_num_contributors
878 * stbir__get_coefficient_width (info.vertical_filter, info.vertical_scale);
879 }
880
881 static stbir__contributors* stbir__get_contributor(stbir__contributors* contributors, int n)
882 {
883 return &contributors[n];
884 }
885
886 // For perf reasons this code is duplicated in stbir__resample_horizontal_upsample/downsample,
887 // if you change it here change it there too.
888 static float* stbir__get_coefficient(float* coefficients, stbir_filter filter, float scale, int n, int c)
889 {
890 int width = stbir__get_coefficient_width(filter, scale);
891 return &coefficients[width*n + c];
892 }
893
894 static int stbir__edge_wrap_slow(stbir_edge edge, int n, int max)
895 {
896 switch (edge)
897 {
898 case STBIR_EDGE_ZERO:
899 return 0; // we'll decode the wrong pixel here, and then overwrite with 0s later
900
901 case STBIR_EDGE_CLAMP:
902 if (n < 0)
903 return 0;
904
905 if (n >= max)
906 return max - 1;
907
908 return n; // NOTREACHED
909
910 case STBIR_EDGE_REFLECT:
911 {
912 if (n < 0)
913 {
914 if (n < max)
915 return -n;
916 else
917 return max - 1;
918 }
919
920 if (n >= max)
921 {
922 int max2 = max * 2;
923 if (n >= max2)
924 return 0;
925 else
926 return max2 - n - 1;
927 }
928
929 return n; // NOTREACHED
930 }
931
932 case STBIR_EDGE_WRAP:
933 if (n >= 0)
934 return (n % max);
935 else
936 {
937 int m = (-n) % max;
938
939 if (m != 0)
940 m = max - m;
941
942 return (m);
943 }
944 // NOTREACHED
945
946 default:
947 assert(false, "Unimplemented edge type");
948 }
949 }
950
951 static int stbir__edge_wrap(stbir_edge edge, int n, int max)
952 {
953 // avoid per-pixel switch
954 if (n >= 0 && n < max)
955 return n;
956 return stbir__edge_wrap_slow(edge, n, max);
957 }
958
959 // What input pixels contribute to this output pixel?
960 static void stbir__calculate_sample_range_upsample(int n, float out_filter_radius, float scale_ratio, float out_shift, int* in_first_pixel, int* in_last_pixel, float* in_center_of_out)
961 {
962 float out_pixel_center = cast(float)n + 0.5f;
963 float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
964 float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
965
966 float in_pixel_influence_lowerbound = (out_pixel_influence_lowerbound + out_shift) / scale_ratio;
967 float in_pixel_influence_upperbound = (out_pixel_influence_upperbound + out_shift) / scale_ratio;
968
969 *in_center_of_out = (out_pixel_center + out_shift) / scale_ratio;
970 *in_first_pixel = cast(int)(fast_floor(in_pixel_influence_lowerbound + 0.5));
971 *in_last_pixel = cast(int)(fast_floor(in_pixel_influence_upperbound - 0.5));
972 }
973
974 // What output pixels does this input pixel contribute to?
975 static void stbir__calculate_sample_range_downsample(int n, float in_pixels_radius, float scale_ratio, float out_shift, int* out_first_pixel, int* out_last_pixel, float* out_center_of_in)
976 {
977 float in_pixel_center = cast(float)n + 0.5f;
978 float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
979 float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
980
981 float out_pixel_influence_lowerbound = in_pixel_influence_lowerbound * scale_ratio - out_shift;
982 float out_pixel_influence_upperbound = in_pixel_influence_upperbound * scale_ratio - out_shift;
983
984 *out_center_of_in = in_pixel_center * scale_ratio - out_shift;
985 *out_first_pixel = cast(int)(fast_floor(out_pixel_influence_lowerbound + 0.5));
986 *out_last_pixel = cast(int)(fast_floor(out_pixel_influence_upperbound - 0.5));
987 }
988
989 static void stbir__calculate_coefficients_upsample(stbir_filter filter, float scale, int in_first_pixel, int in_last_pixel, float in_center_of_out, stbir__contributors* contributor, float* coefficient_group)
990 {
991 int i;
992 float total_filter = 0;
993 float filter_scale;
994
995 assert(in_last_pixel - in_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(1/scale) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
996
997 contributor.n0 = in_first_pixel;
998 contributor.n1 = in_last_pixel;
999
1000 assert(contributor.n1 >= contributor.n0);
1001
1002 for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
1003 {
1004 float in_pixel_center = cast(float)(i + in_first_pixel) + 0.5f;
1005 coefficient_group[i] = stbir__filter_info_table[filter].kernel(in_center_of_out - in_pixel_center, 1 / scale);
1006
1007 // If the coefficient is zero, skip it. (Don't do the <0 check here, we want the influence of those outside pixels.)
1008 if (i == 0 && !coefficient_group[i])
1009 {
1010 contributor.n0 = ++in_first_pixel;
1011 i--;
1012 continue;
1013 }
1014
1015 total_filter += coefficient_group[i];
1016 }
1017
1018 assert(stbir__filter_info_table[filter].kernel(cast(float)(in_last_pixel + 1) + 0.5f - in_center_of_out, 1/scale) == 0);
1019
1020 assert(total_filter > 0.9);
1021 assert(total_filter < 1.1f); // Make sure it's not way off.
1022
1023 // Make sure the sum of all coefficients is 1.
1024 filter_scale = 1 / total_filter;
1025
1026 for (i = 0; i <= in_last_pixel - in_first_pixel; i++)
1027 coefficient_group[i] *= filter_scale;
1028
1029 for (i = in_last_pixel - in_first_pixel; i >= 0; i--)
1030 {
1031 if (coefficient_group[i])
1032 break;
1033
1034 // This line has no weight. We can skip it.
1035 contributor.n1 = contributor.n0 + i - 1;
1036 }
1037 }
1038
1039 static void stbir__calculate_coefficients_downsample(stbir_filter filter, float scale_ratio, int out_first_pixel, int out_last_pixel, float out_center_of_in, stbir__contributors* contributor, float* coefficient_group)
1040 {
1041 int i;
1042
1043 assert(out_last_pixel - out_first_pixel <= cast(int)fast_ceil(stbir__filter_info_table[filter].support(scale_ratio) * 2)); // Taken directly from stbir__get_coefficient_width() which we can't call because we don't know if we're horizontal or vertical.
1044
1045 contributor.n0 = out_first_pixel;
1046 contributor.n1 = out_last_pixel;
1047
1048 assert(contributor.n1 >= contributor.n0);
1049
1050 for (i = 0; i <= out_last_pixel - out_first_pixel; i++)
1051 {
1052 float out_pixel_center = cast(float)(i + out_first_pixel) + 0.5f;
1053 float x = out_pixel_center - out_center_of_in;
1054 coefficient_group[i] = stbir__filter_info_table[filter].kernel(x, scale_ratio) * scale_ratio;
1055 }
1056
1057 assert(stbir__filter_info_table[filter].kernel(cast(float)(out_last_pixel + 1) + 0.5f - out_center_of_in, scale_ratio) == 0);
1058
1059 for (i = out_last_pixel - out_first_pixel; i >= 0; i--)
1060 {
1061 if (coefficient_group[i])
1062 break;
1063
1064 // This line has no weight. We can skip it.
1065 contributor.n1 = contributor.n0 + i - 1;
1066 }
1067 }
1068
1069 static void stbir__normalize_downsample_coefficients(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, int input_size, int output_size)
1070 {
1071 int num_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1072 int num_coefficients = stbir__get_coefficient_width(filter, scale_ratio);
1073 int i, j;
1074 int skip;
1075
1076 for (i = 0; i < output_size; i++)
1077 {
1078 float scale;
1079 float total = 0;
1080
1081 for (j = 0; j < num_contributors; j++)
1082 {
1083 if (i >= contributors[j].n0 && i <= contributors[j].n1)
1084 {
1085 float coefficient = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0);
1086 total += coefficient;
1087 }
1088 else if (i < contributors[j].n0)
1089 break;
1090 }
1091
1092 assert(total > 0.9f);
1093 assert(total < 1.1f);
1094
1095 scale = 1 / total;
1096
1097 for (j = 0; j < num_contributors; j++)
1098 {
1099 if (i >= contributors[j].n0 && i <= contributors[j].n1)
1100 *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i - contributors[j].n0) *= scale;
1101 else if (i < contributors[j].n0)
1102 break;
1103 }
1104 }
1105
1106 // Optimize: Skip zero coefficients and contributions outside of image bounds.
1107 // Do this after normalizing because normalization depends on the n0/n1 values.
1108 for (j = 0; j < num_contributors; j++)
1109 {
1110 int range, max, width;
1111
1112 skip = 0;
1113 while (*stbir__get_coefficient(coefficients, filter, scale_ratio, j, skip) == 0)
1114 skip++;
1115
1116 contributors[j].n0 += skip;
1117
1118 while (contributors[j].n0 < 0)
1119 {
1120 contributors[j].n0++;
1121 skip++;
1122 }
1123
1124 range = contributors[j].n1 - contributors[j].n0 + 1;
1125 max = stbir__min(num_coefficients, range);
1126
1127 width = stbir__get_coefficient_width(filter, scale_ratio);
1128 for (i = 0; i < max; i++)
1129 {
1130 if (i + skip >= width)
1131 break;
1132
1133 *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i) = *stbir__get_coefficient(coefficients, filter, scale_ratio, j, i + skip);
1134 }
1135
1136 continue;
1137 }
1138
1139 // Using min to avoid writing into invalid pixels.
1140 for (i = 0; i < num_contributors; i++)
1141 contributors[i].n1 = stbir__min(contributors[i].n1, output_size - 1);
1142 }
1143
1144 // Each scan line uses the same kernel values so we should calculate the kernel
1145 // values once and then we can use them for every scan line.
1146 static void stbir__calculate_filters(stbir__contributors* contributors, float* coefficients, stbir_filter filter, float scale_ratio, float shift, int input_size, int output_size)
1147 {
1148 int n;
1149 int total_contributors = stbir__get_contributors(scale_ratio, filter, input_size, output_size);
1150
1151 if (stbir__use_upsampling(scale_ratio))
1152 {
1153 float out_pixels_radius = stbir__filter_info_table[filter].support(1 / scale_ratio) * scale_ratio;
1154
1155 // Looping through out pixels
1156 for (n = 0; n < total_contributors; n++)
1157 {
1158 float in_center_of_out; // Center of the current out pixel in the in pixel space
1159 int in_first_pixel, in_last_pixel;
1160
1161 stbir__calculate_sample_range_upsample(n, out_pixels_radius, scale_ratio, shift, &in_first_pixel, &in_last_pixel, &in_center_of_out);
1162
1163 stbir__calculate_coefficients_upsample(filter, scale_ratio, in_first_pixel, in_last_pixel, in_center_of_out, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1164 }
1165 }
1166 else
1167 {
1168 float in_pixels_radius = stbir__filter_info_table[filter].support(scale_ratio) / scale_ratio;
1169
1170 // Looping through in pixels
1171 for (n = 0; n < total_contributors; n++)
1172 {
1173 float out_center_of_in; // Center of the current out pixel in the in pixel space
1174 int out_first_pixel, out_last_pixel;
1175 int n_adjusted = n - stbir__get_filter_pixel_margin(filter, scale_ratio);
1176
1177 stbir__calculate_sample_range_downsample(n_adjusted, in_pixels_radius, scale_ratio, shift, &out_first_pixel, &out_last_pixel, &out_center_of_in);
1178
1179 stbir__calculate_coefficients_downsample(filter, scale_ratio, out_first_pixel, out_last_pixel, out_center_of_in, stbir__get_contributor(contributors, n), stbir__get_coefficient(coefficients, filter, scale_ratio, n, 0));
1180 }
1181
1182 stbir__normalize_downsample_coefficients(contributors, coefficients, filter, scale_ratio, input_size, output_size);
1183 }
1184 }
1185
1186 static float* stbir__get_decode_buffer(stbir__info* stbir_info)
1187 {
1188 // The 0 index of the decode buffer starts after the margin. This makes
1189 // it okay to use negative indexes on the decode buffer.
1190 return &stbir_info.decode_buffer[stbir_info.horizontal_filter_pixel_margin * stbir_info.channels];
1191 }
1192
1193 int STBIR__DECODE(int type, int colorspace)
1194 {
1195 return type * STBIR_MAX_COLORSPACES + colorspace;
1196 }
1197
1198 static void stbir__decode_scanline(stbir__info* stbir_info, int n)
1199 {
1200 int c;
1201 int channels = stbir_info.channels;
1202 int alpha_channel = stbir_info.alpha_channel;
1203 int type = stbir_info.type;
1204 int colorspace = stbir_info.colorspace;
1205 int input_w = stbir_info.input_w;
1206 size_t input_stride_bytes = stbir_info.input_stride_bytes;
1207 float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1208 stbir_edge edge_horizontal = stbir_info.edge_horizontal;
1209 stbir_edge edge_vertical = stbir_info.edge_vertical;
1210 size_t in_buffer_row_offset = stbir__edge_wrap(edge_vertical, n, stbir_info.input_h) * input_stride_bytes;
1211 const void* input_data = cast(char *) stbir_info.input_data + in_buffer_row_offset;
1212 int max_x = input_w + stbir_info.horizontal_filter_pixel_margin;
1213 int decode = STBIR__DECODE(type, colorspace);
1214
1215 int x = -stbir_info.horizontal_filter_pixel_margin;
1216
1217 // special handling for STBIR_EDGE_ZERO because it needs to return an item that doesn't appear in the input,
1218 // and we want to avoid paying overhead on every pixel if not STBIR_EDGE_ZERO
1219 if (edge_vertical == STBIR_EDGE_ZERO && (n < 0 || n >= stbir_info.input_h))
1220 {
1221 for (; x < max_x; x++)
1222 for (c = 0; c < channels; c++)
1223 decode_buffer[x*channels + c] = 0;
1224 return;
1225 }
1226
1227 switch (decode)
1228 {
1229 case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1230 for (; x < max_x; x++)
1231 {
1232 int decode_pixel_index = x * channels;
1233 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1234 for (c = 0; c < channels; c++)
1235 decode_buffer[decode_pixel_index + c] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + c]) / stbir__max_uint8_as_float;
1236 }
1237 break;
1238
1239 case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1240 if (channels == 4 && alpha_channel == 3 && !(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1241 {
1242 // This avoids one table lookup, but the table is the fastest way to onvet from sRGB to linear float
1243 for (; x < max_x; x++)
1244 {
1245 int decode_pixel_index = x * channels;
1246 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1247 for (c = 0; c < 3; c++)
1248 decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1249 ubyte alpha = (cast(const(ubyte)*)input_data)[input_pixel_index + 3];
1250 decode_buffer[decode_pixel_index + 3] = cast(float)(alpha * 0.00392156862f);
1251 }
1252 }
1253
1254 for (; x < max_x; x++)
1255 {
1256 int decode_pixel_index = x * channels;
1257 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1258 for (c = 0; c < channels; c++)
1259 decode_buffer[decode_pixel_index + c] = stbir__srgb_uchar_to_linear_float[(cast(const(ubyte)*)input_data)[input_pixel_index + c]];
1260
1261 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1262 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ubyte)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint8_as_float;
1263 }
1264 break;
1265
1266 case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1267 {
1268 if (channels == 1 && edge_horizontal == STBIR_EDGE_CLAMP)
1269 {
1270 for (; x < max_x; x++)
1271 {
1272 int decode_pixel_index = x;
1273 int input_pixel_index = stbir__edge_wrap(STBIR_EDGE_CLAMP, x, input_w) * channels;
1274 ushort depth = (cast(const(ushort)*)input_data)[input_pixel_index];
1275 decode_buffer[decode_pixel_index] = depth / stbir__max_uint16_as_float;
1276 }
1277 }
1278 else if (channels == 4 && edge_horizontal == STBIR_EDGE_CLAMP)
1279 {
1280 __m128i zero = _mm_setzero_si128();
1281 __m128 normalizingFactor = _mm_set1_ps(1 / 65535.0f);
1282
1283 for (; x < max_x; x++)
1284 {
1285 int decode_pixel_index = x * channels;
1286 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1287
1288 // load four values at once
1289 __m128i mmPixel = _mm_loadu_si64( (cast(const(ushort)*)input_data) + input_pixel_index );
1290 mmPixel = _mm_unpacklo_epi16(mmPixel, zero); // convert to 32-bit
1291 __m128 fPixel = _mm_cvtepi32_ps(mmPixel) * normalizingFactor;
1292 _mm_storeu_ps(&decode_buffer[decode_pixel_index], fPixel);
1293 }
1294 }
1295 else
1296 {
1297 for (; x < max_x; x++)
1298 {
1299 int decode_pixel_index = x * channels;
1300 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1301 for (c = 0; c < channels; c++)
1302 {
1303 ushort depth = (cast(const(ushort)*)input_data)[input_pixel_index + c];
1304 decode_buffer[decode_pixel_index + c] = depth / stbir__max_uint16_as_float;
1305 }
1306 }
1307 }
1308 break;
1309 }
1310
1311 case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1312 for (; x < max_x; x++)
1313 {
1314 int decode_pixel_index = x * channels;
1315 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1316 for (c = 0; c < channels; c++)
1317 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + c]) / stbir__max_uint16_as_float);
1318
1319 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1320 decode_buffer[decode_pixel_index + alpha_channel] = (cast(float)(cast(const(ushort)*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint16_as_float;
1321 }
1322 break;
1323
1324 case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1325 for (; x < max_x; x++)
1326 {
1327 int decode_pixel_index = x * channels;
1328 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1329 for (c = 0; c < channels; c++)
1330 decode_buffer[decode_pixel_index + c] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float);
1331 }
1332 break;
1333
1334 case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1335 for (; x < max_x; x++)
1336 {
1337 int decode_pixel_index = x * channels;
1338 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1339 for (c = 0; c < channels; c++)
1340 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear(cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + c]) / stbir__max_uint32_as_float));
1341
1342 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1343 decode_buffer[decode_pixel_index + alpha_channel] = cast(float)((cast(double)(cast(const uint*)input_data)[input_pixel_index + alpha_channel]) / stbir__max_uint32_as_float);
1344 }
1345 break;
1346
1347 case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1348 for (; x < max_x; x++)
1349 {
1350 int decode_pixel_index = x * channels;
1351 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1352 for (c = 0; c < channels; c++)
1353 decode_buffer[decode_pixel_index + c] = (cast(const(float)*)input_data)[input_pixel_index + c];
1354 }
1355 break;
1356
1357 case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1358 for (; x < max_x; x++)
1359 {
1360 int decode_pixel_index = x * channels;
1361 int input_pixel_index = stbir__edge_wrap(edge_horizontal, x, input_w) * channels;
1362 for (c = 0; c < channels; c++)
1363 decode_buffer[decode_pixel_index + c] = stbir__srgb_to_linear((cast(const(float)*)input_data)[input_pixel_index + c]);
1364
1365 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1366 decode_buffer[decode_pixel_index + alpha_channel] = (cast(const(float)*)input_data)[input_pixel_index + alpha_channel];
1367 }
1368
1369 break;
1370
1371 default:
1372 assert(!"Unknown type/colorspace/channels combination.");
1373 break;
1374 }
1375
1376 if (!(stbir_info.flags & STBIR_FLAG_ALPHA_PREMULTIPLIED))
1377 {
1378 for (x = -stbir_info.horizontal_filter_pixel_margin; x < max_x; x++)
1379 {
1380 int decode_pixel_index = x * channels;
1381
1382 // If the alpha value is 0 it will clobber the color values. Make sure it's not.
1383 float alpha = decode_buffer[decode_pixel_index + alpha_channel];
1384
1385 version(STBIR_NO_ALPHA_EPSILON)
1386 {}
1387 else
1388 {
1389 if (stbir_info.type != STBIR_TYPE_FLOAT) {
1390 alpha += STBIR_ALPHA_EPSILON;
1391 decode_buffer[decode_pixel_index + alpha_channel] = alpha;
1392 }
1393 }
1394
1395 for (c = 0; c < channels; c++)
1396 {
1397 if (c == alpha_channel)
1398 continue;
1399
1400 decode_buffer[decode_pixel_index + c] *= alpha;
1401 }
1402 }
1403 }
1404
1405 if (edge_horizontal == STBIR_EDGE_ZERO)
1406 {
1407 for (x = -stbir_info.horizontal_filter_pixel_margin; x < 0; x++)
1408 {
1409 for (c = 0; c < channels; c++)
1410 decode_buffer[x*channels + c] = 0;
1411 }
1412 for (x = input_w; x < max_x; x++)
1413 {
1414 for (c = 0; c < channels; c++)
1415 decode_buffer[x*channels + c] = 0;
1416 }
1417 }
1418 }
1419
1420 static float* stbir__get_ring_buffer_entry(float* ring_buffer, int index, int ring_buffer_length)
1421 {
1422 return &ring_buffer[index * ring_buffer_length];
1423 }
1424
1425 static float* stbir__add_empty_ring_buffer_entry(stbir__info* stbir_info, int n)
1426 {
1427 int ring_buffer_index;
1428 float* ring_buffer;
1429
1430 stbir_info.ring_buffer_last_scanline = n;
1431
1432 if (stbir_info.ring_buffer_begin_index < 0)
1433 {
1434 ring_buffer_index = stbir_info.ring_buffer_begin_index = 0;
1435 stbir_info.ring_buffer_first_scanline = n;
1436 }
1437 else
1438 {
1439 ring_buffer_index = (stbir_info.ring_buffer_begin_index + (stbir_info.ring_buffer_last_scanline - stbir_info.ring_buffer_first_scanline)) % stbir_info.ring_buffer_num_entries;
1440 assert(ring_buffer_index != stbir_info.ring_buffer_begin_index);
1441 }
1442
1443 ring_buffer = stbir__get_ring_buffer_entry(stbir_info.ring_buffer, ring_buffer_index, stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof));
1444 memset(ring_buffer, 0, stbir_info.ring_buffer_length_bytes);
1445
1446 return ring_buffer;
1447 }
1448
1449
1450 static void stbir__resample_horizontal_upsample(stbir__info* stbir_info, float* output_buffer)
1451 {
1452 int x, k;
1453 int output_w = stbir_info.output_w;
1454 int channels = stbir_info.channels;
1455 float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1456 stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1457 float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1458 int coefficient_width = stbir_info.horizontal_coefficient_width;
1459
1460 for (x = 0; x < output_w; x++)
1461 {
1462 int n0 = horizontal_contributors[x].n0;
1463 int n1 = horizontal_contributors[x].n1;
1464
1465 int out_pixel_index = x * channels;
1466 int coefficient_group = coefficient_width * x;
1467 int coefficient_counter = 0;
1468
1469 assert(n1 >= n0);
1470 assert(n0 >= -stbir_info.horizontal_filter_pixel_margin);
1471 assert(n1 >= -stbir_info.horizontal_filter_pixel_margin);
1472 assert(n0 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1473 assert(n1 < stbir_info.input_w + stbir_info.horizontal_filter_pixel_margin);
1474
1475 switch (channels) {
1476 case 1:
1477 for (k = n0; k <= n1; k++)
1478 {
1479 int in_pixel_index = k * 1;
1480 float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1481 //assert(coefficient != 0);
1482 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1483 }
1484 break;
1485 case 2:
1486 for (k = n0; k <= n1; k++)
1487 {
1488 int in_pixel_index = k * 2;
1489 float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1490 //assert(coefficient != 0);
1491 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1492 output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1493 }
1494 break;
1495 case 3:
1496 for (k = n0; k <= n1; k++)
1497 {
1498 int in_pixel_index = k * 3;
1499 float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1500 //assert(coefficient != 0);
1501 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1502 output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1503 output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1504 }
1505 break;
1506 case 4:
1507 for (k = n0; k <= n1; k++)
1508 {
1509 int in_pixel_index = k * 4;
1510 float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1511 //assert(coefficient != 0);
1512 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1513 output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1514 output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1515 output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1516 }
1517 break;
1518 default:
1519 for (k = n0; k <= n1; k++)
1520 {
1521 int in_pixel_index = k * channels;
1522 float coefficient = horizontal_coefficients[coefficient_group + coefficient_counter++];
1523 int c;
1524 //assert(coefficient != 0);
1525 for (c = 0; c < channels; c++)
1526 output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1527 }
1528 break;
1529 }
1530 }
1531 }
1532
1533 static void stbir__resample_horizontal_downsample(stbir__info* stbir_info, float* output_buffer)
1534 {
1535 int x, k;
1536 int input_w = stbir_info.input_w;
1537 int channels = stbir_info.channels;
1538 float* decode_buffer = stbir__get_decode_buffer(stbir_info);
1539 stbir__contributors* horizontal_contributors = stbir_info.horizontal_contributors;
1540 float* horizontal_coefficients = stbir_info.horizontal_coefficients;
1541 int coefficient_width = stbir_info.horizontal_coefficient_width;
1542 int filter_pixel_margin = stbir_info.horizontal_filter_pixel_margin;
1543 int max_x = input_w + filter_pixel_margin * 2;
1544
1545 assert(!stbir__use_width_upsampling(stbir_info));
1546
1547 switch (channels) {
1548 case 1:
1549 for (x = 0; x < max_x; x++)
1550 {
1551 int n0 = horizontal_contributors[x].n0;
1552 int n1 = horizontal_contributors[x].n1;
1553
1554 int in_x = x - filter_pixel_margin;
1555 int in_pixel_index = in_x * 1;
1556 int max_n = n1;
1557 int coefficient_group = coefficient_width * x;
1558
1559 for (k = n0; k <= max_n; k++)
1560 {
1561 int out_pixel_index = k * 1;
1562 float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1563 //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1564 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1565 }
1566 }
1567 break;
1568
1569 case 2:
1570 for (x = 0; x < max_x; x++)
1571 {
1572 int n0 = horizontal_contributors[x].n0;
1573 int n1 = horizontal_contributors[x].n1;
1574
1575 int in_x = x - filter_pixel_margin;
1576 int in_pixel_index = in_x * 2;
1577 int max_n = n1;
1578 int coefficient_group = coefficient_width * x;
1579
1580 for (k = n0; k <= max_n; k++)
1581 {
1582 int out_pixel_index = k * 2;
1583 float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1584 //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1585 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1586 output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1587 }
1588 }
1589 break;
1590
1591 case 3:
1592 for (x = 0; x < max_x; x++)
1593 {
1594 int n0 = horizontal_contributors[x].n0;
1595 int n1 = horizontal_contributors[x].n1;
1596
1597 int in_x = x - filter_pixel_margin;
1598 int in_pixel_index = in_x * 3;
1599 int max_n = n1;
1600 int coefficient_group = coefficient_width * x;
1601
1602 for (k = n0; k <= max_n; k++)
1603 {
1604 int out_pixel_index = k * 3;
1605 float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1606 //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1607 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1608 output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1609 output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1610 }
1611 }
1612 break;
1613
1614 case 4:
1615 for (x = 0; x < max_x; x++)
1616 {
1617 int n0 = horizontal_contributors[x].n0;
1618 int n1 = horizontal_contributors[x].n1;
1619
1620 int in_x = x - filter_pixel_margin;
1621 int in_pixel_index = in_x * 4;
1622 int max_n = n1;
1623 int coefficient_group = coefficient_width * x;
1624
1625 for (k = n0; k <= max_n; k++)
1626 {
1627 int out_pixel_index = k * 4;
1628 float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1629 //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1630
1631 version(DigitalMars)
1632 {
1633 output_buffer[out_pixel_index + 0] += decode_buffer[in_pixel_index + 0] * coefficient;
1634 output_buffer[out_pixel_index + 1] += decode_buffer[in_pixel_index + 1] * coefficient;
1635 output_buffer[out_pixel_index + 2] += decode_buffer[in_pixel_index + 2] * coefficient;
1636 output_buffer[out_pixel_index + 3] += decode_buffer[in_pixel_index + 3] * coefficient;
1637 }
1638 else
1639 {
1640 __m128 A = _mm_loadu_ps(&decode_buffer[in_pixel_index]);
1641 __m128 B = _mm_loadu_ps(&output_buffer[out_pixel_index]);
1642 B = B + A * _mm_set1_ps(coefficient);
1643 _mm_storeu_ps(&output_buffer[out_pixel_index], B);
1644 }
1645 }
1646 }
1647 break;
1648
1649 default:
1650 for (x = 0; x < max_x; x++)
1651 {
1652 int n0 = horizontal_contributors[x].n0;
1653 int n1 = horizontal_contributors[x].n1;
1654
1655 int in_x = x - filter_pixel_margin;
1656 int in_pixel_index = in_x * channels;
1657 int max_n = n1;
1658 int coefficient_group = coefficient_width * x;
1659
1660 for (k = n0; k <= max_n; k++)
1661 {
1662 int c;
1663 int out_pixel_index = k * channels;
1664 float coefficient = horizontal_coefficients[coefficient_group + k - n0];
1665 //assert(coefficient != 0); // Note: this makes MKS 2021 crash
1666 for (c = 0; c < channels; c++)
1667 output_buffer[out_pixel_index + c] += decode_buffer[in_pixel_index + c] * coefficient;
1668 }
1669 }
1670 break;
1671 }
1672 }
1673
1674 static void stbir__decode_and_resample_upsample(stbir__info* stbir_info, int n)
1675 {
1676 // Decode the nth scanline from the source image into the decode buffer.
1677 stbir__decode_scanline(stbir_info, n);
1678
1679 // Now resample it into the ring buffer.
1680 if (stbir__use_width_upsampling(stbir_info))
1681 stbir__resample_horizontal_upsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1682 else
1683 stbir__resample_horizontal_downsample(stbir_info, stbir__add_empty_ring_buffer_entry(stbir_info, n));
1684
1685 // Now it's sitting in the ring buffer ready to be used as source for the vertical sampling.
1686 }
1687
1688 static void stbir__decode_and_resample_downsample(stbir__info* stbir_info, int n)
1689 {
1690 // Decode the nth scanline from the source image into the decode buffer.
1691 stbir__decode_scanline(stbir_info, n);
1692
1693 memset(stbir_info.horizontal_buffer, 0, stbir_info.output_w * stbir_info.channels * float.sizeof);
1694
1695 // Now resample it into the horizontal buffer.
1696 if (stbir__use_width_upsampling(stbir_info))
1697 stbir__resample_horizontal_upsample(stbir_info, stbir_info.horizontal_buffer);
1698 else
1699 stbir__resample_horizontal_downsample(stbir_info, stbir_info.horizontal_buffer);
1700
1701 // Now it's sitting in the horizontal buffer ready to be distributed into the ring buffers.
1702 }
1703
1704 // Get the specified scan line from the ring buffer.
1705 static float* stbir__get_ring_buffer_scanline(int get_scanline, float* ring_buffer, int begin_index, int first_scanline, int ring_buffer_num_entries, int ring_buffer_length)
1706 {
1707 int ring_buffer_index = (begin_index + (get_scanline - first_scanline)) % ring_buffer_num_entries;
1708 return stbir__get_ring_buffer_entry(ring_buffer, ring_buffer_index, ring_buffer_length);
1709 }
1710
1711
1712 static void stbir__encode_scanline(stbir__info* stbir_info, int num_pixels, void *output_buffer, float *encode_buffer, int channels, int alpha_channel, int decode)
1713 {
1714 int x;
1715 int n;
1716 int num_nonalpha;
1717 ushort[STBIR_MAX_CHANNELS] nonalpha;
1718
1719 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_PREMULTIPLIED))
1720 {
1721 for (x=0; x < num_pixels; ++x)
1722 {
1723 int pixel_index = x*channels;
1724
1725 float alpha = encode_buffer[pixel_index + alpha_channel];
1726 float reciprocal_alpha = alpha ? 1.0f / alpha : 0;
1727
1728 // unrolling this produced a 1% slowdown upscaling a large RGBA linear-space image on my machine - stb
1729 for (n = 0; n < channels; n++)
1730 if (n != alpha_channel)
1731 encode_buffer[pixel_index + n] *= reciprocal_alpha;
1732
1733 // We added in a small epsilon to prevent the color channel from being deleted with zero alpha.
1734 // Because we only add it for integer types, it will automatically be discarded on integer
1735 // conversion, so we don't need to subtract it back out (which would be problematic for
1736 // numeric precision reasons).
1737 }
1738 }
1739
1740 // build a table of all channels that need colorspace correction, so
1741 // we don't perform colorspace correction on channels that don't need it.
1742 for (x = 0, num_nonalpha = 0; x < channels; ++x)
1743 {
1744 if (x != alpha_channel || (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1745 {
1746 nonalpha[num_nonalpha++] = cast(ushort)x;
1747 }
1748 }
1749
1750 static int STBIR__ROUND_INT_f(float f)
1751 {
1752 return cast(int)(f + 0.5f);
1753 }
1754 static int STBIR__ROUND_INT_d(double f)
1755 {
1756 return cast(int)(f + 0.5);
1757 }
1758 static int STBIR__ROUND_UINT_f(float f)
1759 {
1760 return cast(uint)(f + 0.5f);
1761 }
1762 static int STBIR__ROUND_UINT_d(double f)
1763 {
1764 return cast(uint)(f + 0.5);
1765 }
1766
1767 static ubyte STBIR__ENCODE_LINEAR8(float f)
1768 {
1769 return cast(ubyte) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint8_as_float );
1770 }
1771
1772 static ushort STBIR__ENCODE_LINEAR16(float f)
1773 {
1774 return cast(ushort) STBIR__ROUND_INT_f(stbir__saturate(f) * stbir__max_uint16_as_float );
1775 }
1776
1777 switch (decode)
1778 {
1779 case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_LINEAR):
1780 for (x=0; x < num_pixels; ++x)
1781 {
1782 int pixel_index = x*channels;
1783
1784 for (n = 0; n < channels; n++)
1785 {
1786 int index = pixel_index + n;
1787 (cast(ubyte*)output_buffer)[index] = STBIR__ENCODE_LINEAR8(encode_buffer[index]);
1788 }
1789 }
1790 break;
1791
1792 case STBIR__DECODE(STBIR_TYPE_UINT8, STBIR_COLORSPACE_SRGB):
1793 {
1794 // Special case because of how slow it is in normal stb_image_resize.
1795 if (channels == 4 && alpha_channel == -1 && (stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1796 {
1797 for (x = 0; x < num_pixels; ++x)
1798 {
1799 __m128i zero = _mm_setzero_si128();
1800
1801 __m128 fpixels = _mm_loadu_ps( &encode_buffer[4*x] );
1802 __m128i fpixels_desrgb = stbir__linear_to_srgb_uchar(fpixels);
1803 _mm_storeu_si32( (cast(ubyte*)output_buffer) + 4*x, fpixels_desrgb);
1804 }
1805 }
1806 else
1807 {
1808 for (x = 0; x < num_pixels; ++x)
1809 {
1810 int pixel_index = x*channels;
1811
1812 for (n = 0; n < num_nonalpha; n++)
1813 {
1814 int index = pixel_index + nonalpha[n];
1815 (cast(ubyte*)output_buffer)[index] = stbir__linear_to_srgb_uchar(encode_buffer[index]);
1816 }
1817
1818 if (!(stbir_info.flags & STBIR_FLAG_ALPHA_USES_COLORSPACE))
1819 (cast(ubyte*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR8(encode_buffer[pixel_index+alpha_channel]);
1820 }
1821 }
1822 break;
1823 }
1824
1825 case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_LINEAR):
1826 for (x=0; x < num_pixels; ++x)
1827 {
1828 int pixel_index = x*channels;
1829
1830 for (n = 0; n < channels; n++)
1831 {
1832 int index = pixel_index + n;
1833 (cast(ushort*)output_buffer)[index] = STBIR__ENCODE_LINEAR16(encode_buffer[index]);
1834 }
1835 }
1836 break;
1837
1838 case STBIR__DECODE(STBIR_TYPE_UINT16, STBIR_COLORSPACE_SRGB):
1839 for (x=0; x < num_pixels; ++x)
1840 {
1841 int pixel_index = x*channels;
1842
1843 for (n = 0; n < num_nonalpha; n++)
1844 {
1845 int index = pixel_index + nonalpha[n];
1846 (cast(ushort*)output_buffer)[index] = cast(ushort)STBIR__ROUND_INT_f(stbir__linear_to_srgb(stbir__saturate(encode_buffer[index])) * stbir__max_uint16_as_float);
1847 }
1848
1849 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1850 (cast(ushort*)output_buffer)[pixel_index + alpha_channel] = STBIR__ENCODE_LINEAR16(encode_buffer[pixel_index + alpha_channel]);
1851 }
1852
1853 break;
1854
1855 case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_LINEAR):
1856 for (x=0; x < num_pixels; ++x)
1857 {
1858 int pixel_index = x*channels;
1859
1860 for (n = 0; n < channels; n++)
1861 {
1862 int index = pixel_index + n;
1863 (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__saturate(encode_buffer[index])) * stbir__max_uint32_as_float);
1864 }
1865 }
1866 break;
1867
1868 case STBIR__DECODE(STBIR_TYPE_UINT32, STBIR_COLORSPACE_SRGB):
1869 for (x=0; x < num_pixels; ++x)
1870 {
1871 int pixel_index = x*channels;
1872
1873 for (n = 0; n < num_nonalpha; n++)
1874 {
1875 int index = pixel_index + nonalpha[n];
1876 (cast(uint*)output_buffer)[index] = cast(uint)STBIR__ROUND_UINT_d((cast(double)stbir__linear_to_srgb(stbir__saturate(encode_buffer[index]))) * stbir__max_uint32_as_float);
1877 }
1878
1879 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1880 (cast(uint*)output_buffer)[pixel_index + alpha_channel] = cast(uint) STBIR__ROUND_INT_d((cast(double)stbir__saturate(encode_buffer[pixel_index + alpha_channel])) * stbir__max_uint32_as_float);
1881 }
1882 break;
1883
1884 case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_LINEAR):
1885 for (x=0; x < num_pixels; ++x)
1886 {
1887 int pixel_index = x*channels;
1888
1889 for (n = 0; n < channels; n++)
1890 {
1891 int index = pixel_index + n;
1892 (cast(float*)output_buffer)[index] = encode_buffer[index];
1893 }
1894 }
1895 break;
1896
1897 case STBIR__DECODE(STBIR_TYPE_FLOAT, STBIR_COLORSPACE_SRGB):
1898 for (x=0; x < num_pixels; ++x)
1899 {
1900 int pixel_index = x*channels;
1901
1902 for (n = 0; n < num_nonalpha; n++)
1903 {
1904 int index = pixel_index + nonalpha[n];
1905 (cast(float*)output_buffer)[index] = stbir__linear_to_srgb(encode_buffer[index]);
1906 }
1907
1908 if (!(stbir_info.flags&STBIR_FLAG_ALPHA_USES_COLORSPACE))
1909 (cast(float*)output_buffer)[pixel_index + alpha_channel] = encode_buffer[pixel_index + alpha_channel];
1910 }
1911 break;
1912
1913 default:
1914 assert(!"Unknown type/colorspace/channels combination.");
1915 break;
1916 }
1917 }
1918
1919 static void stbir__resample_vertical_upsample(stbir__info* stbir_info, int n)
1920 {
1921 int x, k;
1922 int output_w = stbir_info.output_w;
1923 stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
1924 float* vertical_coefficients = stbir_info.vertical_coefficients;
1925 int channels = stbir_info.channels;
1926 int alpha_channel = stbir_info.alpha_channel;
1927 int type = stbir_info.type;
1928 int colorspace = stbir_info.colorspace;
1929 int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
1930 void* output_data = stbir_info.output_data;
1931 float* encode_buffer = stbir_info.encode_buffer;
1932 int decode = STBIR__DECODE(type, colorspace);
1933 int coefficient_width = stbir_info.vertical_coefficient_width;
1934 int coefficient_counter;
1935 int contributor = n;
1936
1937 float* ring_buffer = stbir_info.ring_buffer;
1938 int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
1939 int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
1940 int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
1941
1942 int n0,n1, output_row_start;
1943 int coefficient_group = coefficient_width * contributor;
1944
1945 n0 = vertical_contributors[contributor].n0;
1946 n1 = vertical_contributors[contributor].n1;
1947
1948 output_row_start = n * stbir_info.output_stride_bytes;
1949
1950 assert(stbir__use_height_upsampling(stbir_info));
1951
1952 memset(encode_buffer, 0, output_w * float.sizeof * channels);
1953
1954 // I tried reblocking this for better cache usage of encode_buffer
1955 // (using x_outer, k, x_inner), but it lost speed. -- stb
1956
1957 coefficient_counter = 0;
1958 switch (channels) {
1959 case 1:
1960 for (k = n0; k <= n1; k++)
1961 {
1962 int coefficient_index = coefficient_counter++;
1963 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1964 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1965 for (x = 0; x < output_w; ++x)
1966 {
1967 int in_pixel_index = x * 1;
1968 encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1969 }
1970 }
1971 break;
1972 case 2:
1973 for (k = n0; k <= n1; k++)
1974 {
1975 int coefficient_index = coefficient_counter++;
1976 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1977 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1978 for (x = 0; x < output_w; ++x)
1979 {
1980 int in_pixel_index = x * 2;
1981 encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1982 encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1983 }
1984 }
1985 break;
1986 case 3:
1987 for (k = n0; k <= n1; k++)
1988 {
1989 int coefficient_index = coefficient_counter++;
1990 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
1991 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
1992 for (x = 0; x < output_w; ++x)
1993 {
1994 int in_pixel_index = x * 3;
1995 encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
1996 encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
1997 encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
1998 }
1999 }
2000 break;
2001 case 4:
2002 for (k = n0; k <= n1; k++)
2003 {
2004 int coefficient_index = coefficient_counter++;
2005 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2006 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2007 for (x = 0; x < output_w; ++x)
2008 {
2009 int in_pixel_index = x * 4;
2010 encode_buffer[in_pixel_index + 0] += ring_buffer_entry[in_pixel_index + 0] * coefficient;
2011 encode_buffer[in_pixel_index + 1] += ring_buffer_entry[in_pixel_index + 1] * coefficient;
2012 encode_buffer[in_pixel_index + 2] += ring_buffer_entry[in_pixel_index + 2] * coefficient;
2013 encode_buffer[in_pixel_index + 3] += ring_buffer_entry[in_pixel_index + 3] * coefficient;
2014 }
2015 }
2016 break;
2017 default:
2018 for (k = n0; k <= n1; k++)
2019 {
2020 int coefficient_index = coefficient_counter++;
2021 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2022 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2023 for (x = 0; x < output_w; ++x)
2024 {
2025 int in_pixel_index = x * channels;
2026 int c;
2027 for (c = 0; c < channels; c++)
2028 encode_buffer[in_pixel_index + c] += ring_buffer_entry[in_pixel_index + c] * coefficient;
2029 }
2030 }
2031 break;
2032 }
2033 stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, encode_buffer, channels, alpha_channel, decode);
2034 }
2035
2036 static void stbir__resample_vertical_downsample(stbir__info* stbir_info, int n)
2037 {
2038 int x, k;
2039 int output_w = stbir_info.output_w;
2040 stbir__contributors* vertical_contributors = stbir_info.vertical_contributors;
2041 float* vertical_coefficients = stbir_info.vertical_coefficients;
2042 int channels = stbir_info.channels;
2043 int ring_buffer_entries = stbir_info.ring_buffer_num_entries;
2044 float* horizontal_buffer = stbir_info.horizontal_buffer;
2045 int coefficient_width = stbir_info.vertical_coefficient_width;
2046 int contributor = n + stbir_info.vertical_filter_pixel_margin;
2047
2048 float* ring_buffer = stbir_info.ring_buffer;
2049 int ring_buffer_begin_index = stbir_info.ring_buffer_begin_index;
2050 int ring_buffer_first_scanline = stbir_info.ring_buffer_first_scanline;
2051 int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2052 int n0,n1;
2053
2054 n0 = vertical_contributors[contributor].n0;
2055 n1 = vertical_contributors[contributor].n1;
2056
2057 assert(!stbir__use_height_upsampling(stbir_info));
2058
2059 for (k = n0; k <= n1; k++)
2060 {
2061 int coefficient_index = k - n0;
2062 int coefficient_group = coefficient_width * contributor;
2063 float coefficient = vertical_coefficients[coefficient_group + coefficient_index];
2064
2065 float* ring_buffer_entry = stbir__get_ring_buffer_scanline(k, ring_buffer, ring_buffer_begin_index, ring_buffer_first_scanline, ring_buffer_entries, ring_buffer_length);
2066
2067 switch (channels) {
2068 case 1:
2069 for (x = 0; x < output_w; x++)
2070 {
2071 int in_pixel_index = x * 1;
2072 ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2073 }
2074 break;
2075 case 2:
2076 for (x = 0; x < output_w; x++)
2077 {
2078 int in_pixel_index = x * 2;
2079 ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2080 ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
2081 }
2082 break;
2083 case 3:
2084 for (x = 0; x < output_w; x++)
2085 {
2086 int in_pixel_index = x * 3;
2087 ring_buffer_entry[in_pixel_index + 0] += horizontal_buffer[in_pixel_index + 0] * coefficient;
2088 ring_buffer_entry[in_pixel_index + 1] += horizontal_buffer[in_pixel_index + 1] * coefficient;
2089 ring_buffer_entry[in_pixel_index + 2] += horizontal_buffer[in_pixel_index + 2] * coefficient;
2090 }
2091 break;
2092 case 4:
2093
2094 __m128 vCoefficients = _mm_set1_ps(coefficient);
2095
2096 for (x = 0; x < output_w; x++)
2097 {
2098 int in_pixel_index = x * 4;
2099 __m128 A = _mm_loadu_ps(&horizontal_buffer[in_pixel_index]);
2100 __m128 B = _mm_loadu_ps(&ring_buffer_entry[in_pixel_index]);
2101 _mm_storeu_ps( &ring_buffer_entry[in_pixel_index], B + A * vCoefficients);
2102 }
2103 break;
2104 default:
2105 for (x = 0; x < output_w; x++)
2106 {
2107 int in_pixel_index = x * channels;
2108
2109 int c;
2110 for (c = 0; c < channels; c++)
2111 ring_buffer_entry[in_pixel_index + c] += horizontal_buffer[in_pixel_index + c] * coefficient;
2112 }
2113 break;
2114 }
2115 }
2116 }
2117
2118 static void stbir__buffer_loop_upsample(stbir__info* stbir_info)
2119 {
2120 int y;
2121 float scale_ratio = stbir_info.vertical_scale;
2122 float out_scanlines_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(1/scale_ratio) * scale_ratio;
2123
2124 assert(stbir__use_height_upsampling(stbir_info));
2125
2126 for (y = 0; y < stbir_info.output_h; y++)
2127 {
2128 float in_center_of_out = 0; // Center of the current out scanline in the in scanline space
2129 int in_first_scanline = 0, in_last_scanline = 0;
2130
2131 stbir__calculate_sample_range_upsample(y, out_scanlines_radius, scale_ratio, stbir_info.vertical_shift, &in_first_scanline, &in_last_scanline, &in_center_of_out);
2132
2133 assert(in_last_scanline - in_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2134
2135 if (stbir_info.ring_buffer_begin_index >= 0)
2136 {
2137 // Get rid of whatever we don't need anymore.
2138 while (in_first_scanline > stbir_info.ring_buffer_first_scanline)
2139 {
2140 if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2141 {
2142 // We just popped the last scanline off the ring buffer.
2143 // Reset it to the empty state.
2144 stbir_info.ring_buffer_begin_index = -1;
2145 stbir_info.ring_buffer_first_scanline = 0;
2146 stbir_info.ring_buffer_last_scanline = 0;
2147 break;
2148 }
2149 else
2150 {
2151 stbir_info.ring_buffer_first_scanline++;
2152 stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2153 }
2154 }
2155 }
2156
2157 // Load in new ones.
2158 if (stbir_info.ring_buffer_begin_index < 0)
2159 stbir__decode_and_resample_upsample(stbir_info, in_first_scanline);
2160
2161 while (in_last_scanline > stbir_info.ring_buffer_last_scanline)
2162 stbir__decode_and_resample_upsample(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2163
2164 // Now all buffers should be ready to write a row of vertical sampling.
2165 stbir__resample_vertical_upsample(stbir_info, y);
2166 }
2167 }
2168
2169 static void stbir__empty_ring_buffer(stbir__info* stbir_info, int first_necessary_scanline)
2170 {
2171 int output_stride_bytes = stbir_info.output_stride_bytes;
2172 int channels = stbir_info.channels;
2173 int alpha_channel = stbir_info.alpha_channel;
2174 int type = stbir_info.type;
2175 int colorspace = stbir_info.colorspace;
2176 int output_w = stbir_info.output_w;
2177 void* output_data = stbir_info.output_data;
2178 int decode = STBIR__DECODE(type, colorspace);
2179
2180 float* ring_buffer = stbir_info.ring_buffer;
2181 int ring_buffer_length = stbir_info.ring_buffer_length_bytes / cast(int)(float.sizeof);
2182
2183 if (stbir_info.ring_buffer_begin_index >= 0)
2184 {
2185 // Get rid of whatever we don't need anymore.
2186 while (first_necessary_scanline > stbir_info.ring_buffer_first_scanline)
2187 {
2188 if (stbir_info.ring_buffer_first_scanline >= 0 && stbir_info.ring_buffer_first_scanline < stbir_info.output_h)
2189 {
2190 int output_row_start = stbir_info.ring_buffer_first_scanline * output_stride_bytes;
2191 float* ring_buffer_entry = stbir__get_ring_buffer_entry(ring_buffer, stbir_info.ring_buffer_begin_index, ring_buffer_length);
2192 stbir__encode_scanline(stbir_info, output_w, cast(char *) output_data + output_row_start, ring_buffer_entry, channels, alpha_channel, decode);
2193 }
2194
2195 if (stbir_info.ring_buffer_first_scanline == stbir_info.ring_buffer_last_scanline)
2196 {
2197 // We just popped the last scanline off the ring buffer.
2198 // Reset it to the empty state.
2199 stbir_info.ring_buffer_begin_index = -1;
2200 stbir_info.ring_buffer_first_scanline = 0;
2201 stbir_info.ring_buffer_last_scanline = 0;
2202 break;
2203 }
2204 else
2205 {
2206 stbir_info.ring_buffer_first_scanline++;
2207 stbir_info.ring_buffer_begin_index = (stbir_info.ring_buffer_begin_index + 1) % stbir_info.ring_buffer_num_entries;
2208 }
2209 }
2210 }
2211 }
2212
2213 static void stbir__buffer_loop_downsample(stbir__info* stbir_info)
2214 {
2215 int y;
2216 float scale_ratio = stbir_info.vertical_scale;
2217 int output_h = stbir_info.output_h;
2218 float in_pixels_radius = stbir__filter_info_table[stbir_info.vertical_filter].support(scale_ratio) / scale_ratio;
2219 int pixel_margin = stbir_info.vertical_filter_pixel_margin;
2220 int max_y = stbir_info.input_h + pixel_margin;
2221
2222 assert(!stbir__use_height_upsampling(stbir_info));
2223
2224 for (y = -pixel_margin; y < max_y; y++)
2225 {
2226 float out_center_of_in; // Center of the current out scanline in the in scanline space
2227 int out_first_scanline, out_last_scanline;
2228
2229 stbir__calculate_sample_range_downsample(y, in_pixels_radius, scale_ratio, stbir_info.vertical_shift, &out_first_scanline, &out_last_scanline, &out_center_of_in);
2230
2231 assert(out_last_scanline - out_first_scanline + 1 <= stbir_info.ring_buffer_num_entries);
2232
2233 if (out_last_scanline < 0 || out_first_scanline >= output_h)
2234 continue;
2235
2236 stbir__empty_ring_buffer(stbir_info, out_first_scanline);
2237
2238 stbir__decode_and_resample_downsample(stbir_info, y);
2239
2240 // Load in new ones.
2241 if (stbir_info.ring_buffer_begin_index < 0)
2242 stbir__add_empty_ring_buffer_entry(stbir_info, out_first_scanline);
2243
2244 while (out_last_scanline > stbir_info.ring_buffer_last_scanline)
2245 stbir__add_empty_ring_buffer_entry(stbir_info, stbir_info.ring_buffer_last_scanline + 1);
2246
2247 // Now the horizontal buffer is ready to write to all ring buffer rows.
2248 stbir__resample_vertical_downsample(stbir_info, y);
2249 }
2250
2251 stbir__empty_ring_buffer(stbir_info, stbir_info.output_h);
2252 }
2253
2254 static void stbir__setup(stbir__info *info, int input_w, int input_h, int output_w, int output_h, int channels)
2255 {
2256 info.input_w = input_w;
2257 info.input_h = input_h;
2258 info.output_w = output_w;
2259 info.output_h = output_h;
2260 info.channels = channels;
2261 }
2262
2263 static void stbir__calculate_transform(stbir__info *info, float s0, float t0, float s1, float t1, float *transform)
2264 {
2265 info.s0 = s0;
2266 info.t0 = t0;
2267 info.s1 = s1;
2268 info.t1 = t1;
2269
2270 if (transform)
2271 {
2272 info.horizontal_scale = transform[0];
2273 info.vertical_scale = transform[1];
2274 info.horizontal_shift = transform[2];
2275 info.vertical_shift = transform[3];
2276 }
2277 else
2278 {
2279 info.horizontal_scale = (cast(float)info.output_w / info.input_w) / (s1 - s0);
2280 info.vertical_scale = (cast(float)info.output_h / info.input_h) / (t1 - t0);
2281
2282 info.horizontal_shift = s0 * info.output_w / (s1 - s0);
2283 info.vertical_shift = t0 * info.output_h / (t1 - t0);
2284 }
2285 }
2286
2287 static void stbir__choose_filter(stbir__info *info, stbir_filter h_filter, stbir_filter v_filter)
2288 {
2289 if (h_filter == 0)
2290 h_filter = stbir__use_upsampling(info.horizontal_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2291 if (v_filter == 0)
2292 v_filter = stbir__use_upsampling(info.vertical_scale) ? STBIR_DEFAULT_FILTER_UPSAMPLE : STBIR_DEFAULT_FILTER_DOWNSAMPLE;
2293 info.horizontal_filter = h_filter;
2294 info.vertical_filter = v_filter;
2295 }
2296
2297 static uint stbir__calculate_memory(stbir__info *info)
2298 {
2299 int pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2300 int filter_height = stbir__get_filter_pixel_width(info.vertical_filter, info.vertical_scale);
2301
2302 info.horizontal_num_contributors = stbir__get_contributors(info.horizontal_scale, info.horizontal_filter, info.input_w, info.output_w);
2303 info.vertical_num_contributors = stbir__get_contributors(info.vertical_scale , info.vertical_filter , info.input_h, info.output_h);
2304
2305 // One extra entry because floating point precision problems sometimes cause an extra to be necessary.
2306 info.ring_buffer_num_entries = filter_height + 1;
2307
2308 info.horizontal_contributors_size = info.horizontal_num_contributors * cast(int)(stbir__contributors.sizeof);
2309 info.horizontal_coefficients_size = stbir__get_total_horizontal_coefficients(info) * cast(int)(float.sizeof);
2310 info.vertical_contributors_size = info.vertical_num_contributors * cast(int)(stbir__contributors.sizeof);
2311 info.vertical_coefficients_size = stbir__get_total_vertical_coefficients(info) * cast(int)(float.sizeof);
2312 info.decode_buffer_size = (info.input_w + pixel_margin * 2) * info.channels * cast(int)(float.sizeof);
2313 info.horizontal_buffer_size = info.output_w * info.channels * cast(int)(float.sizeof);
2314 info.ring_buffer_size = info.output_w * info.channels * info.ring_buffer_num_entries * cast(int)(float.sizeof);
2315 info.encode_buffer_size = info.output_w * info.channels * cast(int)(float.sizeof);
2316
2317 assert(info.horizontal_filter != 0);
2318 assert(info.horizontal_filter < stbir__filter_info_table.length); // this now happens too late
2319 assert(info.vertical_filter != 0);
2320 assert(info.vertical_filter < stbir__filter_info_table.length); // this now happens too late
2321
2322 if (stbir__use_height_upsampling(info))
2323 // The horizontal buffer is for when we're downsampling the height and we
2324 // can't output the result of sampling the decode buffer directly into the
2325 // ring buffers.
2326 info.horizontal_buffer_size = 0;
2327 else
2328 // The encode buffer is to retain precision in the height upsampling method
2329 // and isn't used when height downsampling.
2330 info.encode_buffer_size = 0;
2331
2332 return info.horizontal_contributors_size + info.horizontal_coefficients_size
2333 + info.vertical_contributors_size + info.vertical_coefficients_size
2334 + info.decode_buffer_size + info.horizontal_buffer_size
2335 + info.ring_buffer_size + info.encode_buffer_size;
2336 }
2337
2338 static int stbir__resize_allocated(stbir__info *info,
2339 const void* input_data, int input_stride_in_bytes,
2340 void* output_data, int output_stride_in_bytes,
2341 int alpha_channel, uint flags, stbir_datatype type,
2342 stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace,
2343 void* tempmem, size_t tempmem_size_in_bytes)
2344 {
2345 size_t memory_required = stbir__calculate_memory(info);
2346
2347 int width_stride_input = input_stride_in_bytes ? input_stride_in_bytes : info.channels * info.input_w * stbir__type_size[type];
2348 int width_stride_output = output_stride_in_bytes ? output_stride_in_bytes : info.channels * info.output_w * stbir__type_size[type];
2349
2350 assert(info.channels >= 0);
2351 assert(info.channels <= STBIR_MAX_CHANNELS);
2352
2353 if (info.channels < 0 || info.channels > STBIR_MAX_CHANNELS)
2354 return 0;
2355
2356 assert(info.horizontal_filter < stbir__filter_info_table.length);
2357 assert(info.vertical_filter < stbir__filter_info_table.length);
2358
2359 if (info.horizontal_filter >= stbir__filter_info_table.length)
2360 return 0;
2361 if (info.vertical_filter >= stbir__filter_info_table.length)
2362 return 0;
2363
2364 if (alpha_channel < 0)
2365 flags |= STBIR_FLAG_ALPHA_USES_COLORSPACE | STBIR_FLAG_ALPHA_PREMULTIPLIED;
2366
2367 if (!(flags&STBIR_FLAG_ALPHA_USES_COLORSPACE) || !(flags&STBIR_FLAG_ALPHA_PREMULTIPLIED)) {
2368 assert(alpha_channel >= 0 && alpha_channel < info.channels);
2369 }
2370
2371 if (alpha_channel >= info.channels)
2372 return 0;
2373
2374 assert(tempmem);
2375
2376 if (!tempmem)
2377 return 0;
2378
2379 assert(tempmem_size_in_bytes >= memory_required);
2380
2381 if (tempmem_size_in_bytes < memory_required)
2382 return 0;
2383
2384 memset(tempmem, 0, tempmem_size_in_bytes);
2385
2386 info.input_data = input_data;
2387 info.input_stride_bytes = width_stride_input;
2388
2389 info.output_data = output_data;
2390 info.output_stride_bytes = width_stride_output;
2391
2392 info.alpha_channel = alpha_channel;
2393 info.flags = flags;
2394 info.type = type;
2395 info.edge_horizontal = edge_horizontal;
2396 info.edge_vertical = edge_vertical;
2397 info.colorspace = colorspace;
2398
2399 info.horizontal_coefficient_width = stbir__get_coefficient_width (info.horizontal_filter, info.horizontal_scale);
2400 info.vertical_coefficient_width = stbir__get_coefficient_width (info.vertical_filter , info.vertical_scale );
2401 info.horizontal_filter_pixel_width = stbir__get_filter_pixel_width (info.horizontal_filter, info.horizontal_scale);
2402 info.vertical_filter_pixel_width = stbir__get_filter_pixel_width (info.vertical_filter , info.vertical_scale );
2403 info.horizontal_filter_pixel_margin = stbir__get_filter_pixel_margin(info.horizontal_filter, info.horizontal_scale);
2404 info.vertical_filter_pixel_margin = stbir__get_filter_pixel_margin(info.vertical_filter , info.vertical_scale );
2405
2406 info.ring_buffer_length_bytes = info.output_w * info.channels * cast(int)(float.sizeof);
2407 info.decode_buffer_pixels = info.input_w + info.horizontal_filter_pixel_margin * 2;
2408
2409 static newtype* STBIR__NEXT_MEMPTR(newtype)(void* current, size_t current_size)
2410 {
2411 return cast(newtype*)( (cast(ubyte*)current) + current_size );
2412 }
2413
2414 info.horizontal_contributors = cast(stbir__contributors *) tempmem;
2415 info.horizontal_coefficients = STBIR__NEXT_MEMPTR!float (info.horizontal_contributors, info.horizontal_contributors_size);
2416 info.vertical_contributors = STBIR__NEXT_MEMPTR!stbir__contributors(info.horizontal_coefficients, info.horizontal_coefficients_size);
2417 info.vertical_coefficients = STBIR__NEXT_MEMPTR!float (info.vertical_contributors, info.vertical_contributors_size);
2418 info.decode_buffer = STBIR__NEXT_MEMPTR!float (info.vertical_coefficients, info.vertical_coefficients_size);
2419
2420 if (stbir__use_height_upsampling(info))
2421 {
2422 info.horizontal_buffer = null;
2423 info.ring_buffer = STBIR__NEXT_MEMPTR!float (info.decode_buffer, info.decode_buffer_size);
2424 info.encode_buffer = STBIR__NEXT_MEMPTR!float (info.ring_buffer, info.ring_buffer_size);
2425
2426 assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.encode_buffer, info.encode_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2427 }
2428 else
2429 {
2430 info.horizontal_buffer = STBIR__NEXT_MEMPTR!float (info.decode_buffer, info.decode_buffer_size);
2431 info.ring_buffer = STBIR__NEXT_MEMPTR!float (info.horizontal_buffer, info.horizontal_buffer_size);
2432 info.encode_buffer = null;
2433
2434 assert(cast(size_t)STBIR__NEXT_MEMPTR!ubyte(info.ring_buffer, info.ring_buffer_size) == cast(size_t)tempmem + tempmem_size_in_bytes);
2435 }
2436
2437 // This signals that the ring buffer is empty
2438 info.ring_buffer_begin_index = -1;
2439
2440 stbir__calculate_filters(info.horizontal_contributors, info.horizontal_coefficients, info.horizontal_filter, info.horizontal_scale, info.horizontal_shift, info.input_w, info.output_w);
2441 stbir__calculate_filters(info.vertical_contributors, info.vertical_coefficients, info.vertical_filter, info.vertical_scale, info.vertical_shift, info.input_h, info.output_h);
2442
2443 if (stbir__use_height_upsampling(info))
2444 stbir__buffer_loop_upsample(info);
2445 else
2446 stbir__buffer_loop_downsample(info);
2447
2448 return 1;
2449 }
2450
2451
2452 static int stbir__resize_arbitrary(
2453 void *alloc_context,
2454 const void* input_data, int input_w, int input_h, int input_stride_in_bytes,
2455 void* output_data, int output_w, int output_h, int output_stride_in_bytes,
2456 float s0, float t0, float s1, float t1, float *transform,
2457 int channels, int alpha_channel, uint flags, stbir_datatype type,
2458 stbir_filter h_filter, stbir_filter v_filter,
2459 stbir_edge edge_horizontal, stbir_edge edge_vertical, stbir_colorspace colorspace)
2460 {
2461 stbir__info info;
2462 int result;
2463 size_t memory_required;
2464 void* extra_memory;
2465
2466 stbir__setup(&info, input_w, input_h, output_w, output_h, channels);
2467 stbir__calculate_transform(&info, s0,t0,s1,t1,transform);
2468 stbir__choose_filter(&info, h_filter, v_filter);
2469 memory_required = stbir__calculate_memory(&info);
2470 extra_memory = STBIR_MALLOC(memory_required, alloc_context);
2471
2472 if (!extra_memory)
2473 return 0;
2474
2475 result = stbir__resize_allocated(&info, input_data, input_stride_in_bytes,
2476 output_data, output_stride_in_bytes,
2477 alpha_channel, flags, type,
2478 edge_horizontal, edge_vertical,
2479 colorspace, extra_memory, memory_required);
2480
2481 STBIR_FREE(extra_memory, alloc_context);
2482
2483 return result;
2484 }
2485
2486
2487
2488 int stbir_resize_uint8_srgb_edgemode(const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2489 ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2490 int num_channels, int alpha_channel, int flags,
2491 stbir_edge edge_wrap_mode)
2492 {
2493 return stbir__resize_arbitrary(null, input_pixels, input_w, input_h, input_stride_in_bytes,
2494 output_pixels, output_w, output_h, output_stride_in_bytes,
2495 0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, STBIR_FILTER_DEFAULT, STBIR_FILTER_DEFAULT,
2496 edge_wrap_mode, edge_wrap_mode, STBIR_COLORSPACE_SRGB);
2497 }
2498
2499 int stbir_resize_uint8_generic( const(ubyte)*input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2500 ubyte*output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2501 int num_channels, int alpha_channel, int flags,
2502 stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2503 void *alloc_context)
2504 {
2505 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2506 output_pixels, output_w, output_h, output_stride_in_bytes,
2507 0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT8, filter, filter,
2508 edge_wrap_mode, edge_wrap_mode, space);
2509 }
2510
2511 int stbir_resize_uint16_generic(const ushort *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2512 ushort *output_pixels , int output_w, int output_h, int output_stride_in_bytes,
2513 int num_channels, int alpha_channel, int flags,
2514 stbir_edge edge_wrap_mode, stbir_filter filter, stbir_colorspace space,
2515 void *alloc_context)
2516 {
2517 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2518 output_pixels, output_w, output_h, output_stride_in_bytes,
2519 0,0,1,1,null,num_channels,alpha_channel,flags, STBIR_TYPE_UINT16, filter, filter,
2520 edge_wrap_mode, edge_wrap_mode, space);
2521 }
2522
2523
2524 int stbir_resize( const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2525 void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2526 stbir_datatype datatype,
2527 int num_channels, int alpha_channel, int flags,
2528 stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2529 stbir_filter filter_horizontal, stbir_filter filter_vertical,
2530 stbir_colorspace space, void *alloc_context)
2531 {
2532 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2533 output_pixels, output_w, output_h, output_stride_in_bytes,
2534 0,0,1,1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2535 edge_mode_horizontal, edge_mode_vertical, space);
2536 }
2537
2538
2539 int stbir_resize_subpixel(const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2540 void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2541 stbir_datatype datatype,
2542 int num_channels, int alpha_channel, int flags,
2543 stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2544 stbir_filter filter_horizontal, stbir_filter filter_vertical,
2545 stbir_colorspace space, void *alloc_context,
2546 float x_scale, float y_scale,
2547 float x_offset, float y_offset)
2548 {
2549 float[4] transform;
2550 transform[0] = x_scale;
2551 transform[1] = y_scale;
2552 transform[2] = x_offset;
2553 transform[3] = y_offset;
2554 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2555 output_pixels, output_w, output_h, output_stride_in_bytes,
2556 0,0,1,1,transform.ptr,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2557 edge_mode_horizontal, edge_mode_vertical, space);
2558 }
2559
2560 int stbir_resize_region( const void *input_pixels , int input_w , int input_h , int input_stride_in_bytes,
2561 void *output_pixels, int output_w, int output_h, int output_stride_in_bytes,
2562 stbir_datatype datatype,
2563 int num_channels, int alpha_channel, int flags,
2564 stbir_edge edge_mode_horizontal, stbir_edge edge_mode_vertical,
2565 stbir_filter filter_horizontal, stbir_filter filter_vertical,
2566 stbir_colorspace space, void *alloc_context,
2567 float s0, float t0, float s1, float t1)
2568 {
2569 return stbir__resize_arbitrary(alloc_context, input_pixels, input_w, input_h, input_stride_in_bytes,
2570 output_pixels, output_w, output_h, output_stride_in_bytes,
2571 s0,t0,s1,t1,null,num_channels,alpha_channel,flags, datatype, filter_horizontal, filter_vertical,
2572 edge_mode_horizontal, edge_mode_vertical, space);
2573 }
2574
2575 /*
2576 ------------------------------------------------------------------------------
2577 This software is available under 2 licenses -- choose whichever you prefer.
2578 ------------------------------------------------------------------------------
2579 ALTERNATIVE A - MIT License
2580 Copyright (c) 2017 Sean Barrett
2581 Permission is hereby granted, free of charge, to any person obtaining a copy of
2582 this software and associated documentation files (the "Software"), to deal in
2583 the Software without restriction, including without limitation the rights to
2584 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
2585 of the Software, and to permit persons to whom the Software is furnished to do
2586 so, subject to the following conditions:
2587 The above copyright notice and this permission notice shall be included in all
2588 copies or substantial portions of the Software.
2589 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2590 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2591 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2592 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2593 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2594 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2595 SOFTWARE.
2596 ------------------------------------------------------------------------------
2597 ALTERNATIVE B - Public Domain (www.unlicense.org)
2598 This is free and unencumbered software released into the public domain.
2599 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
2600 software, either in source code form or as a compiled binary, for any purpose,
2601 commercial or non-commercial, and by any means.
2602 In jurisdictions that recognize copyright laws, the author or authors of this
2603 software dedicate any and all copyright interest in the software to the public
2604 domain. We make this dedication for the benefit of the public at large and to
2605 the detriment of our heirs and successors. We intend this dedication to be an
2606 overt act of relinquishment in perpetuity of all present and future rights to
2607 this software under copyright law.
2608 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2609 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2610 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2611 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
2612 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
2613 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2614 ------------------------------------------------------------------------------
2615 */